1 # Writing a TableGen Backend in Python
3 This tutorial is going to walk through creating a TableGen backend using Python.
5 We are using Python to better fit into a notebook, but backends in LLVM are written in C++. The principles you learn here will still apply and you could port this tutorial to any language that has a JSON parser.
7 This is the process in LLVM, using a C++ backend:
9 TableGen source -> llvm-tblgen -> backend (within llvm-tblgen) -> results
11 This is what we will be doing:
13 TableGen source -> llvm-tblgen -> JSON -> Python -> results
16 The backend here is ported from one of several in "SQLGen" which was written by Min-Yih Hsu.
17 * SQLGen C++ sources - https://github.com/mshockwave/SQLGen
18 * LLVM dev presentation - https://www.youtube.com/watch?v=UP-LBRbvI_U
20 I encourage you to use those resources to supplement this notebook.
24 Unlike the other tutorial notebooks we are not using the TableGen kernel. This is an iPython notebook and we're going to run `llvm-tblgen` as a subprocess.
26 First let's find it, in the same way the TableGen kernel does.
34 path = os.environ.get("LLVM_TBLGEN_EXECUTABLE")
35 if path is not None and os.path.isfile(path) and os.access(path, os.X_OK):
38 path = shutil.which("llvm-tblgen")
40 raise OSError("llvm-tblgen not found")
46 If the above cell raises an exception, either put `llvm-tblgen` on your `PATH` or point to it using the `LLVM_TBLGEN_EXECUTABLE` environment variable. Alternatively, edit the code to use whatever path you want.
48 Then we need to compile some TableGen by passing it to `llvm-tblgen`'s stdin. We will be using the option `--dump-json` and returning the JSON as a Python dictionary if the compilation succeeds. If it fails, we raise an exception.
57 # Passing to stdin requires a file like object.
58 with tempfile.TemporaryFile("w+") as f:
62 [find_tblgen(), "--dump-json"],
64 stderr=subprocess.PIPE,
65 stdout=subprocess.PIPE,
66 universal_newlines=True,
70 raise RuntimeError("llvm-tblgen failed with stderr: " + got.stderr)
72 return json.loads(got.stdout)
74 print(json.dumps(run_tblgen("class Foo {}"), indent=4))
81 "!tablegen_json_version": 1
85 ## Structure of a SQL Query
87 This backend is going to generate SQL queries. The general form of a SQL query is:
89 SELECT <some field names> FROM <table name>
91 ORDER BY <field tags>;
112 Normally you'd write this to a `.td` file but here we have it in a Python string to fit into this notebook. We will add to this string to produce the final source.
114 This section defines some constants. First are the fields we want to get back from the query:
115 * `all` - Return all fields.
116 * `fields` - Means that we will provide a list of fields we are interested in.
118 The second set are the logical operators for what will become the `WHERE` clause (called `condition` in the TableGen). These are string versions of various symbols. For example `ne` means `!=`, which in SQL is `<>`.
120 Finally `none` is used to mean there is no condition to the query (no `WHERE`).
125 class Query <string table, dag query_fields = (all), dag condition = (none)> {
126 string TableName = table;
127 dag Fields = query_fields;
128 dag WhereClause = condition;
129 list<string> OrderedBy = [];
134 Then the Query class. Its arguments are:
135 * `table` - The name of the table to query (`FROM <table>`).
136 * `query_fields` - The fields you want returned (`SELECT <fields>`).
137 * Defaults to `all` meaning return all fields.
138 * `condition` - Logic to select entries (`WHERE <conditions>`).
139 * Defaults to `none` meaning there is no condition, or in other words select all entries in the table.
141 ## Using The Query Class
145 full_tblgen = query_tblgen + """\
146 def : Query<"Customer">;
148 def : Query<"Orders", (fields "Person", "Amount")>;
150 def : Query<"Customer", (fields "Affiliation"),
151 (eq "Name", "Mary Blackburn":$str)>;
153 def : Query<"Orders", (fields "ProductName"),
156 def : Query<"Orders", (fields "ProductName":$name, "Person"),
157 (and (gt "Amount", 8), (ne "Person", 1))> {
158 let OrderedBy = ["$name"];
163 Now we can define some queries. Let's go go over the last one in detail.
166 def : Query<"Orders", (fields "ProductName":$name, "Person"),
167 (and (gt "Amount", 8), (ne "Person", 1))> {
168 let OrderedBy = ["$name"];
172 * It will run on a table called `Orders`.
173 * We want to see the fields `ProductName` and `Person`.
174 * We have tagged `ProductName` with `$name`.
175 * The condition is that `Amount` must be greater than `8` and
176 `Person` must not be equal to `1`.
177 * The results of this query should be ordered by the field
178 tagged `$name`, which is `ProductName`.
180 The condition being of DAG type (Directed Acyclic Graph) allows us to describe nested conditions. You might write this condition in Python as:
182 if (Amount > 8) and (Person != 1):
184 Putting that into a graph form:
188 | Amount > 8 | | Person != 1 |
190 Which is what we're describing with the DAG in TableGen.
196 full_json = run_tblgen(full_tblgen)
197 print(json.dumps(full_json, indent=4))
210 "!tablegen_json_version": 1,
226 "!name": "anonymous_0",
241 "TableName": "Customer",
250 "printable": "(none)"
256 "!name": "anonymous_1",
275 "printable": "fields"
277 "printable": "(fields \"Person\", \"Amount\")"
280 "TableName": "Orders",
289 "printable": "(none)"
295 "!name": "anonymous_2",
310 "printable": "fields"
312 "printable": "(fields \"Affiliation\")"
315 "TableName": "Customer",
333 "printable": "(eq \"Name\", \"Mary Blackburn\":$str)"
339 "!name": "anonymous_3",
354 "printable": "fields"
356 "printable": "(fields \"ProductName\")"
359 "TableName": "Orders",
377 "printable": "(gt \"Amount\", 8)"
383 "!name": "anonymous_4",
402 "printable": "fields"
404 "printable": "(fields \"ProductName\":$name, \"Person\")"
409 "TableName": "Orders",
430 "printable": "(gt \"Amount\", 8)"
452 "printable": "(ne \"Person\", 1)"
463 "printable": "(and (gt \"Amount\", 8), (ne \"Person\", 1))"
511 The backend is going to walk the JSON we decoded. You can see the full output above in case you want to browse but for now don't read the whole thing. We will highlight the key aspects of it as we go along.
515 print(full_json["!instanceof"])
518 {'Query': ['anonymous_0', 'anonymous_1', 'anonymous_2', 'anonymous_3', 'anonymous_4']}
521 Any key beginning with `!` is some sort of metadata about the other keys. Here this is a list of all instances of certain classes. We just have `Query` which lists all the queries we defined.
525 print(full_json["anonymous_0"]["!superclasses"])
531 On each def there is also a `!superclasses` that gives you the same information. Meaning you could use `!instanceof` to get a list of keys to lookup, or you could walk all keys and check `!superclasses`.
535 print(full_json["anonymous_0"]["Fields"])
538 {'args': [], 'kind': 'dag', 'operator': {'def': 'all', 'kind': 'def', 'printable': 'all'}, 'printable': '(all)'}
541 From a def object you can find its attributes. Here we have the fields we want the query to show, which is all of them.
545 The core of a backend is looping over all defs of a certain class and outputting some text based on their properties.
547 Here we're going to loop over all defs of type `Query` and emit SQL queries for them.
551 def find_all_queries(j):
554 # ! means it is some metadata, not a def.
555 if not key.startswith("!"):
556 value = full_json[key]
557 # If we inherit from Query.
558 if "Query" in value["!superclasses"]:
559 queries.append(value)
562 queries = find_all_queries(full_json)
564 print([q["!name"] for q in queries])
567 ['anonymous_0', 'anonymous_1', 'anonymous_2', 'anonymous_3', 'anonymous_4']
570 Why are the names `anonymous_...`? When we defined them we did `def :` and missed out the name. This is allowed and `llvm-tblgen` just came up with a name for us. For this purpose the names are irrelevant.
572 Now we have the relevant classes we need to "emit" them. Meaning produce something from them, in this case a SQL query.
576 def emit_operator(operator):
588 print(emit_operator('and'))
594 The maps our TableGen constants to the equivalent SQL logical operation.
598 def emit_fields(args):
599 # Return a comma separated list of arg names.
600 return ", ".join([arg[0] for arg in args])
602 print(emit_fields([["Abc", None], ["Def", None]]))
608 This emits the the fields we are selecting. Each field has a name (`arg[0]`) and an optional tag that we will use later.
612 from collections.abc import Mapping
614 def emit_where_clause(where_clause):
616 num_args = len(where_clause["args"])
618 for idx, arg in enumerate(where_clause["args"]):
619 arg_name, arg_type = arg
621 if isinstance(arg_name, Mapping):
622 # This is a nested where clause.
623 output += emit_where_clause(arg_name)
625 # This is some condition.
626 if arg_type == "str":
627 # String types must be emitted with "" around them.
628 output += '"' + arg_name + '"'
630 output += str(arg_name)
632 # If this is not the last arg, emit the condition.
633 if idx != (num_args-1):
634 output += emit_operator(where_clause["operator"]["def"])
638 print(emit_where_clause({
639 "args": [["Name",None],
640 ["Mary Blackburn", "str"]],
649 Name = "Mary Blackburn"
652 This emits the condition that goes with the `WHERE`. The condition is a DAG, which means that we will find a possible mix of conditions and other DAGS. We recurse to handle the latter case.
654 For each part of the condition we print the name of the thing you're checking, then the condition (`=`, `<>`, etc.). The value to check against is last and that goes on the end.
658 def emit_ordered_by(ordered_by, field_tag_map):
659 # No ORDER BY statement to emit.
663 output = "\n ORDER BY "
664 num_ordered_by = len(ordered_by)
666 for idx, field_name in enumerate(ordered_by):
668 if field_name.startswith('$'):
669 # Find the corresponding field name
670 tag_name = field_name[1:]
671 field_name = field_tag_map.get(tag_name)
672 if field_name is None:
673 raise RuntimeError('Unrecognized tag "{}"'.format(
676 # Separate each tag after the first with ", ".
683 print(emit_ordered_by(["$abc", "$def"], {'abc':"ABC", 'def':"DEF"}))
690 `emit_ordered_by` produces the `ORDER BY` text. If there is no ordering return nothing, otherwise loop over all the fields we want to order by and emit their names.
692 If the name is a tag, we look that up in a map to get the real field name. Here's how we build that map:
696 def build_tag_map(arguments):
697 # Args are [Name, Tag]. Reverse this so we have [Tag, Name].
698 # Add each one to a dictionary where Tag is the key and Name is the value.
699 return dict([reversed(a) for a in arguments])
701 print(build_tag_map([["ABC", "abc"], ["DEF", "def"]]))
704 {'abc': 'ABC', 'def': 'DEF'}
710 fields_init = q["Fields"]
711 field_op_name = fields_init["operator"]["def"]
712 if not field_op_name in ["all", "fields"]:
713 raise RuntimeError("Invalid dag operator " + field_op_name)
715 field_tag_map = build_tag_map(fields_init["args"])
717 where_clause = q["WhereClause"]
718 has_where = where_clause["operator"]["def"] != "none"
721 if field_op_name == "all":
723 ret += emit_fields(fields_init["args"])
724 ret += " FROM " + q["TableName"]
726 ret += "\n WHERE " + emit_where_clause(where_clause)
727 ret += emit_ordered_by(q["OrderedBy"], field_tag_map)
733 Finally the main function. It emits the skeleton of the query and calls the helpers we defined earlier to fill in the gaps.
740 print(emit_query(q) + "\n")
743 SELECT * FROM Customer;
745 SELECT Person, Amount FROM Orders;
747 SELECT Affiliation FROM Customer
748 WHERE Name = "Mary Blackburn";
750 SELECT ProductName FROM Orders
753 SELECT ProductName, Person FROM Orders
754 WHERE Amount > 8 AND Person <> 1
755 ORDER BY ProductName;
759 Now we run `emit_query` and print out the results. There you have it, that's a TableGen backend!
761 You've seen the core concepts. Loop over all the defs of a certain class and then emit some other structure based on the fields of each one. In this case it was SQL queries. In LLVM it's most often C++ code but it can be anything you want.
763 If you want to see the same thing done with a C++ backend (one written in C++ that is, not producing it), check out the links at the start of this notebook.