llvm/utils/TableGen/jupyter/sql_query_backend.md

   1 # Writing a TableGen Backend in Python
   2
   3 This tutorial is going to walk through creating a TableGen backend using Python.
   4
   5 We are using Python to better fit into a notebook, but backends in LLVM are written in C++. The principles you learn here will still apply and you could port this tutorial to any language that has a JSON parser.
   6
   7 This is the process in LLVM, using a C++ backend:
   8 ```
   9 TableGen source -> llvm-tblgen -> backend (within llvm-tblgen) -> results
  10 ```
  11 This is what we will be doing:
  12 ```
  13 TableGen source -> llvm-tblgen -> JSON -> Python -> results
  14 ```
  15
  16 The backend here is ported from one of several in "SQLGen" which was written by Min-Yih Hsu.
  17 * SQLGen C++ sources - https://github.com/mshockwave/SQLGen
  18 * LLVM dev presentation - https://www.youtube.com/watch?v=UP-LBRbvI_U
  19
  20 I encourage you to use those resources to supplement this notebook.
  21
  22 ## Compiling TableGen
  23
  24 Unlike the other tutorial notebooks we are not using the TableGen kernel. This is an iPython notebook and we're going to run `llvm-tblgen` as a subprocess.
  25
  26 First let's find it, in the same way the TableGen kernel does.
  27
  28
  29 ```python
  30 import os
  31 import shutil
  32
  33 def find_tblgen():
  34     path = os.environ.get("LLVM_TBLGEN_EXECUTABLE")
  35     if path is not None and os.path.isfile(path) and os.access(path, os.X_OK):
  36         return path
  37     else:
  38         path = shutil.which("llvm-tblgen")
  39         if path is None:
  40             raise OSError("llvm-tblgen not found")
  41         return path
  42
  43 _ = find_tblgen()
  44 ```
  45
  46 If the above cell raises an exception, either put `llvm-tblgen` on your `PATH` or point to it using the `LLVM_TBLGEN_EXECUTABLE` environment variable. Alternatively, edit the code to use whatever path you want.
  47
  48 Then we need to compile some TableGen by passing it to `llvm-tblgen`'s stdin. We will be using the option `--dump-json` and returning the JSON as a Python dictionary if the compilation succeeds. If it fails, we raise an exception.
  49
  50
  51 ```python
  52 import subprocess
  53 import tempfile
  54 import json
  55
  56 def run_tblgen(src):
  57     # Passing to stdin requires a file like object.
  58     with tempfile.TemporaryFile("w+") as f:
  59         f.write(src)
  60         f.seek(0)
  61         got = subprocess.run(
  62             [find_tblgen(), "--dump-json"],
  63             stdin=f,
  64             stderr=subprocess.PIPE,
  65             stdout=subprocess.PIPE,
  66             universal_newlines=True,
  67         )
  68
  69     if got.stderr:
  70         raise RuntimeError("llvm-tblgen failed with stderr: " + got.stderr)
  71
  72     return json.loads(got.stdout)
  73
  74 print(json.dumps(run_tblgen("class Foo {}"), indent=4))
  75 ```
  76
  77     {
  78         "!instanceof": {
  79             "Foo": []
  80         },
  81         "!tablegen_json_version": 1
  82     }
  83
  84
  85 ## Structure of a SQL Query
  86
  87 This backend is going to generate SQL queries. The general form of a SQL query is:
  88 ```
  89 SELECT <some field names> FROM <table name>
  90  WHERE <conditions>
  91  ORDER BY <field tags>;
  92 ```
  93
  94 ## SQL Query TableGen
  95
  96
  97 ```python
  98 query_tblgen = """\
  99 def all;
 100 def fields;
 101 def none;
 102
 103 def eq;
 104 def ne;
 105 def gt;
 106 def ge;
 107 def and;
 108 def or;
 109 """
 110 ```
 111
 112 Normally you'd write this to a `.td` file but here we have it in a Python string to fit into this notebook. We will add to this string to produce the final source.
 113
 114 This section defines some constants. First are the fields we want to get back from the query:
 115 * `all` - Return all fields.
 116 * `fields` - Means that we will provide a list of fields we are interested in.
 117
 118 The second set are the logical operators for what will become the `WHERE` clause (called `condition` in the TableGen). These are string versions of various symbols. For example `ne` means `!=`, which in SQL is `<>`.
 119
 120 Finally `none` is used to mean there is no condition to the query (no `WHERE`).
 121
 122
 123 ```python
 124 query_tblgen += """\
 125 class Query <string table, dag query_fields = (all), dag condition = (none)> {
 126   string TableName = table;
 127   dag Fields = query_fields;
 128   dag WhereClause = condition;
 129   list<string> OrderedBy = [];
 130 }
 131 """
 132 ```
 133
 134 Then the Query class. Its arguments are:
 135 * `table` - The name of the table to query (`FROM <table>`).
 136 * `query_fields` - The fields you want returned (`SELECT <fields>`).
 137     * Defaults to `all` meaning return all fields.
 138 * `condition` - Logic to select entries (`WHERE <conditions>`).
 139     * Defaults to `none` meaning there is no condition, or in other words select all entries in the table.
 140
 141 ## Using The Query Class
 142
 143
 144 ```python
 145 full_tblgen = query_tblgen + """\
 146 def : Query<"Customer">;
 147
 148 def : Query<"Orders", (fields "Person", "Amount")>;
 149
 150 def : Query<"Customer", (fields "Affiliation"),
 151             (eq "Name", "Mary Blackburn":$str)>;
 152
 153 def : Query<"Orders", (fields "ProductName"),
 154             (gt "Amount", 8)>;
 155
 156 def : Query<"Orders", (fields "ProductName":$name, "Person"),
 157             (and (gt "Amount", 8), (ne "Person", 1))> {
 158   let OrderedBy = ["$name"];
 159 }
 160 """
 161 ```
 162
 163 Now we can define some queries. Let's go go over the last one in detail.
 164
 165 ```
 166 def : Query<"Orders", (fields "ProductName":$name, "Person"),
 167             (and (gt "Amount", 8), (ne "Person", 1))> {
 168   let OrderedBy = ["$name"];
 169 }
 170 ```
 171
 172 * It will run on a table called `Orders`.
 173 * We want to see the fields `ProductName` and `Person`.
 174 * We have tagged `ProductName` with `$name`.
 175 * The condition is that `Amount` must be greater than `8` and
 176   `Person` must not be equal to `1`.
 177 * The results of this query should be ordered by the field
 178   tagged `$name`, which is `ProductName`.
 179
 180 The condition being of DAG type (Directed Acyclic Graph) allows us to describe nested conditions. You might write this condition in Python as:
 181 ```
 182 if (Amount > 8) and (Person != 1):
 183 ```
 184 Putting that into a graph form:
 185 ```
 186         |------|and|------|
 187         |                 |
 188 | Amount > 8 |       | Person != 1 |
 189 ```
 190 Which is what we're describing with the DAG in TableGen.
 191
 192 ## The JSON format
 193
 194
 195 ```python
 196 full_json = run_tblgen(full_tblgen)
 197 print(json.dumps(full_json, indent=4))
 198 ```
 199
 200     {
 201         "!instanceof": {
 202             "Query": [
 203                 "anonymous_0",
 204                 "anonymous_1",
 205                 "anonymous_2",
 206                 "anonymous_3",
 207                 "anonymous_4"
 208             ]
 209         },
 210         "!tablegen_json_version": 1,
 211         "all": {
 212             "!anonymous": false,
 213             "!fields": [],
 214             "!name": "all",
 215             "!superclasses": []
 216         },
 217         "and": {
 218             "!anonymous": false,
 219             "!fields": [],
 220             "!name": "and",
 221             "!superclasses": []
 222         },
 223         "anonymous_0": {
 224             "!anonymous": true,
 225             "!fields": [],
 226             "!name": "anonymous_0",
 227             "!superclasses": [
 228                 "Query"
 229             ],
 230             "Fields": {
 231                 "args": [],
 232                 "kind": "dag",
 233                 "operator": {
 234                     "def": "all",
 235                     "kind": "def",
 236                     "printable": "all"
 237                 },
 238                 "printable": "(all)"
 239             },
 240             "OrderedBy": [],
 241             "TableName": "Customer",
 242             "WhereClause": {
 243                 "args": [],
 244                 "kind": "dag",
 245                 "operator": {
 246                     "def": "none",
 247                     "kind": "def",
 248                     "printable": "none"
 249                 },
 250                 "printable": "(none)"
 251             }
 252         },
 253         "anonymous_1": {
 254             "!anonymous": true,
 255             "!fields": [],
 256             "!name": "anonymous_1",
 257             "!superclasses": [
 258                 "Query"
 259             ],
 260             "Fields": {
 261                 "args": [
 262                     [
 263                         "Person",
 264                         null
 265                     ],
 266                     [
 267                         "Amount",
 268                         null
 269                     ]
 270                 ],
 271                 "kind": "dag",
 272                 "operator": {
 273                     "def": "fields",
 274                     "kind": "def",
 275                     "printable": "fields"
 276                 },
 277                 "printable": "(fields \"Person\", \"Amount\")"
 278             },
 279             "OrderedBy": [],
 280             "TableName": "Orders",
 281             "WhereClause": {
 282                 "args": [],
 283                 "kind": "dag",
 284                 "operator": {
 285                     "def": "none",
 286                     "kind": "def",
 287                     "printable": "none"
 288                 },
 289                 "printable": "(none)"
 290             }
 291         },
 292         "anonymous_2": {
 293             "!anonymous": true,
 294             "!fields": [],
 295             "!name": "anonymous_2",
 296             "!superclasses": [
 297                 "Query"
 298             ],
 299             "Fields": {
 300                 "args": [
 301                     [
 302                         "Affiliation",
 303                         null
 304                     ]
 305                 ],
 306                 "kind": "dag",
 307                 "operator": {
 308                     "def": "fields",
 309                     "kind": "def",
 310                     "printable": "fields"
 311                 },
 312                 "printable": "(fields \"Affiliation\")"
 313             },
 314             "OrderedBy": [],
 315             "TableName": "Customer",
 316             "WhereClause": {
 317                 "args": [
 318                     [
 319                         "Name",
 320                         null
 321                     ],
 322                     [
 323                         "Mary Blackburn",
 324                         "str"
 325                     ]
 326                 ],
 327                 "kind": "dag",
 328                 "operator": {
 329                     "def": "eq",
 330                     "kind": "def",
 331                     "printable": "eq"
 332                 },
 333                 "printable": "(eq \"Name\", \"Mary Blackburn\":$str)"
 334             }
 335         },
 336         "anonymous_3": {
 337             "!anonymous": true,
 338             "!fields": [],
 339             "!name": "anonymous_3",
 340             "!superclasses": [
 341                 "Query"
 342             ],
 343             "Fields": {
 344                 "args": [
 345                     [
 346                         "ProductName",
 347                         null
 348                     ]
 349                 ],
 350                 "kind": "dag",
 351                 "operator": {
 352                     "def": "fields",
 353                     "kind": "def",
 354                     "printable": "fields"
 355                 },
 356                 "printable": "(fields \"ProductName\")"
 357             },
 358             "OrderedBy": [],
 359             "TableName": "Orders",
 360             "WhereClause": {
 361                 "args": [
 362                     [
 363                         "Amount",
 364                         null
 365                     ],
 366                     [
 367                         8,
 368                         null
 369                     ]
 370                 ],
 371                 "kind": "dag",
 372                 "operator": {
 373                     "def": "gt",
 374                     "kind": "def",
 375                     "printable": "gt"
 376                 },
 377                 "printable": "(gt \"Amount\", 8)"
 378             }
 379         },
 380         "anonymous_4": {
 381             "!anonymous": true,
 382             "!fields": [],
 383             "!name": "anonymous_4",
 384             "!superclasses": [
 385                 "Query"
 386             ],
 387             "Fields": {
 388                 "args": [
 389                     [
 390                         "ProductName",
 391                         "name"
 392                     ],
 393                     [
 394                         "Person",
 395                         null
 396                     ]
 397                 ],
 398                 "kind": "dag",
 399                 "operator": {
 400                     "def": "fields",
 401                     "kind": "def",
 402                     "printable": "fields"
 403                 },
 404                 "printable": "(fields \"ProductName\":$name, \"Person\")"
 405             },
 406             "OrderedBy": [
 407                 "$name"
 408             ],
 409             "TableName": "Orders",
 410             "WhereClause": {
 411                 "args": [
 412                     [
 413                         {
 414                             "args": [
 415                                 [
 416                                     "Amount",
 417                                     null
 418                                 ],
 419                                 [
 420                                     8,
 421                                     null
 422                                 ]
 423                             ],
 424                             "kind": "dag",
 425                             "operator": {
 426                                 "def": "gt",
 427                                 "kind": "def",
 428                                 "printable": "gt"
 429                             },
 430                             "printable": "(gt \"Amount\", 8)"
 431                         },
 432                         null
 433                     ],
 434                     [
 435                         {
 436                             "args": [
 437                                 [
 438                                     "Person",
 439                                     null
 440                                 ],
 441                                 [
 442                                     1,
 443                                     null
 444                                 ]
 445                             ],
 446                             "kind": "dag",
 447                             "operator": {
 448                                 "def": "ne",
 449                                 "kind": "def",
 450                                 "printable": "ne"
 451                             },
 452                             "printable": "(ne \"Person\", 1)"
 453                         },
 454                         null
 455                     ]
 456                 ],
 457                 "kind": "dag",
 458                 "operator": {
 459                     "def": "and",
 460                     "kind": "def",
 461                     "printable": "and"
 462                 },
 463                 "printable": "(and (gt \"Amount\", 8), (ne \"Person\", 1))"
 464             }
 465         },
 466         "eq": {
 467             "!anonymous": false,
 468             "!fields": [],
 469             "!name": "eq",
 470             "!superclasses": []
 471         },
 472         "fields": {
 473             "!anonymous": false,
 474             "!fields": [],
 475             "!name": "fields",
 476             "!superclasses": []
 477         },
 478         "ge": {
 479             "!anonymous": false,
 480             "!fields": [],
 481             "!name": "ge",
 482             "!superclasses": []
 483         },
 484         "gt": {
 485             "!anonymous": false,
 486             "!fields": [],
 487             "!name": "gt",
 488             "!superclasses": []
 489         },
 490         "ne": {
 491             "!anonymous": false,
 492             "!fields": [],
 493             "!name": "ne",
 494             "!superclasses": []
 495         },
 496         "none": {
 497             "!anonymous": false,
 498             "!fields": [],
 499             "!name": "none",
 500             "!superclasses": []
 501         },
 502         "or": {
 503             "!anonymous": false,
 504             "!fields": [],
 505             "!name": "or",
 506             "!superclasses": []
 507         }
 508     }
 509
 510
 511 The backend is going to walk the JSON we decoded. You can see the full output above in case you want to browse but for now don't read the whole thing. We will highlight the key aspects of it as we go along.
 512
 513
 514 ```python
 515 print(full_json["!instanceof"])
 516 ```
 517
 518     {'Query': ['anonymous_0', 'anonymous_1', 'anonymous_2', 'anonymous_3', 'anonymous_4']}
 519
 520
 521 Any key beginning with `!` is some sort of metadata about the other keys. Here this is a list of all instances of certain classes. We just have `Query` which lists all the queries we defined.
 522
 523
 524 ```python
 525 print(full_json["anonymous_0"]["!superclasses"])
 526 ```
 527
 528     ['Query']
 529
 530
 531 On each def there is also a `!superclasses` that gives you the same information. Meaning you could use `!instanceof` to get a list of keys to lookup, or you could walk all keys and check `!superclasses`.
 532
 533
 534 ```python
 535 print(full_json["anonymous_0"]["Fields"])
 536 ```
 537
 538     {'args': [], 'kind': 'dag', 'operator': {'def': 'all', 'kind': 'def', 'printable': 'all'}, 'printable': '(all)'}
 539
 540
 541 From a def object you can find its attributes. Here we have the fields we want the query to show, which is all of them.
 542
 543 # The Backend
 544
 545 The core of a backend is looping over all defs of a certain class and outputting some text based on their properties.
 546
 547 Here we're going to loop over all defs of type `Query` and emit SQL queries for them.
 548
 549
 550 ```python
 551 def find_all_queries(j):
 552     queries = []
 553     for key in j:
 554         # ! means it is some metadata, not a def.
 555         if not key.startswith("!"):
 556             value = full_json[key]
 557             # If we inherit from Query.
 558             if "Query" in value["!superclasses"]:
 559                 queries.append(value)
 560     return queries
 561
 562 queries = find_all_queries(full_json)
 563
 564 print([q["!name"] for q in queries])
 565 ```
 566
 567     ['anonymous_0', 'anonymous_1', 'anonymous_2', 'anonymous_3', 'anonymous_4']
 568
 569
 570 Why are the names `anonymous_...`? When we defined them we did `def :` and missed out the name. This is allowed and `llvm-tblgen` just came up with a name for us. For this purpose the names are irrelevant.
 571
 572 Now we have the relevant classes we need to "emit" them. Meaning produce something from them, in this case a SQL query.
 573
 574
 575 ```python
 576 def emit_operator(operator):
 577     return {
 578             'gt': ' > ',
 579             'ge': ' >= ',
 580             'lt': ' < ',
 581             'le': ' <= ',
 582             'ne': ' <> ',
 583             'eq': ' = ',
 584             'or': ' OR ',
 585             'and': ' AND '
 586             }[operator]
 587
 588 print(emit_operator('and'))
 589 ```
 590
 591      AND
 592
 593
 594 The maps our TableGen constants to the equivalent SQL logical operation.
 595
 596
 597 ```python
 598 def emit_fields(args):
 599     # Return a comma separated list of arg names.
 600     return ", ".join([arg[0] for arg in args])
 601
 602 print(emit_fields([["Abc", None], ["Def", None]]))
 603 ```
 604
 605     Abc, Def
 606
 607
 608 This emits the the fields we are selecting. Each field has a name (`arg[0]`) and an optional tag that we will use later.
 609
 610
 611 ```python
 612 from collections.abc import Mapping
 613
 614 def emit_where_clause(where_clause):
 615     output = ""
 616     num_args = len(where_clause["args"])
 617
 618     for idx, arg in enumerate(where_clause["args"]):
 619         arg_name, arg_type = arg
 620
 621         if isinstance(arg_name, Mapping):
 622             # This is a nested where clause.
 623             output += emit_where_clause(arg_name)
 624         else:
 625             # This is some condition.
 626             if arg_type == "str":
 627                 # String types must be emitted with "" around them.
 628                 output += '"' + arg_name + '"'
 629             else:
 630                 output += str(arg_name)
 631
 632         # If this is not the last arg, emit the condition.
 633         if idx != (num_args-1):
 634             output += emit_operator(where_clause["operator"]["def"])
 635
 636     return output
 637
 638 print(emit_where_clause({
 639 "args": [["Name",None],
 640         ["Mary Blackburn", "str"]],
 641 "kind": "dag",
 642 "operator": {
 643     "def": "eq",
 644     "kind": "def",
 645     "printable": "eq"
 646 }}))
 647 ```
 648
 649     Name = "Mary Blackburn"
 650
 651
 652 This emits the condition that goes with the `WHERE`. The condition is a DAG, which means that we will find a possible mix of conditions and other DAGS. We recurse to handle the latter case.
 653
 654 For each part of the condition we print the name of the thing you're checking, then the condition (`=`, `<>`, etc.). The value to check against is last and that goes on the end.
 655
 656
 657 ```python
 658 def emit_ordered_by(ordered_by, field_tag_map):
 659     # No ORDER BY statement to emit.
 660     if not ordered_by:
 661         return ""
 662
 663     output = "\n ORDER BY "
 664     num_ordered_by = len(ordered_by)
 665
 666     for idx, field_name in enumerate(ordered_by):
 667         # If it is a tag
 668         if field_name.startswith('$'):
 669             # Find the corresponding field name
 670             tag_name = field_name[1:]
 671             field_name = field_tag_map.get(tag_name)
 672             if field_name is None:
 673                 raise RuntimeError('Unrecognized tag "{}"'.format(
 674                     tag_name))
 675
 676         # Separate each tag after the first with ", ".
 677         if idx != 0:
 678             output += ", "
 679         output += field_name
 680
 681     return output
 682
 683 print(emit_ordered_by(["$abc", "$def"], {'abc':"ABC", 'def':"DEF"}))
 684 ```
 685
 686
 687      ORDER BY ABC, DEF
 688
 689
 690 `emit_ordered_by` produces the `ORDER BY` text. If there is no ordering return nothing, otherwise loop over all the fields we want to order by and emit their names.
 691
 692 If the name is a tag, we look that up in a map to get the real field name. Here's how we build that map:
 693
 694
 695 ```python
 696 def build_tag_map(arguments):
 697     # Args are [Name, Tag]. Reverse this so we have [Tag, Name].
 698     # Add each one to a dictionary where Tag is the key and Name is the value.
 699     return dict([reversed(a) for a in arguments])
 700
 701 print(build_tag_map([["ABC", "abc"], ["DEF", "def"]]))
 702 ```
 703
 704     {'abc': 'ABC', 'def': 'DEF'}
 705
 706
 707
 708 ```python
 709 def emit_query(q):
 710     fields_init = q["Fields"]
 711     field_op_name = fields_init["operator"]["def"]
 712     if not field_op_name in ["all", "fields"]:
 713         raise RuntimeError("Invalid dag operator " + field_op_name)
 714
 715     field_tag_map = build_tag_map(fields_init["args"])
 716
 717     where_clause = q["WhereClause"]
 718     has_where = where_clause["operator"]["def"] != "none"
 719
 720     ret = "SELECT "
 721     if field_op_name == "all":
 722         ret += "*"
 723     ret += emit_fields(fields_init["args"])
 724     ret += " FROM " + q["TableName"]
 725     if has_where:
 726         ret += "\n WHERE " + emit_where_clause(where_clause)
 727     ret += emit_ordered_by(q["OrderedBy"], field_tag_map)
 728     ret += ";"
 729
 730     return ret
 731 ```
 732
 733 Finally the main function. It emits the skeleton of the query and calls the helpers we defined earlier to fill in the gaps.
 734
 735 ## The Result
 736
 737
 738 ```python
 739 for q in queries:
 740     print(emit_query(q) + "\n")
 741 ```
 742
 743     SELECT * FROM Customer;
 744
 745     SELECT Person, Amount FROM Orders;
 746
 747     SELECT Affiliation FROM Customer
 748      WHERE Name = "Mary Blackburn";
 749
 750     SELECT ProductName FROM Orders
 751      WHERE Amount > 8;
 752
 753     SELECT ProductName, Person FROM Orders
 754      WHERE Amount > 8 AND Person <> 1
 755      ORDER BY ProductName;
 756
 757
 758
 759 Now we run `emit_query` and print out the results. There you have it, that's a TableGen backend!
 760
 761 You've seen the core concepts. Loop over all the defs of a certain class and then emit some other structure based on the fields of each one. In this case it was SQL queries. In LLVM it's most often C++ code but it can be anything you want.
 762
 763 If you want to see the same thing done with a C++ backend (one written in C++ that is, not producing it), check out the links at the start of this notebook.