2023/day15.eli5m4

   1 dnl -*- m4 -*-
   2 dnl This is my attempt to let you peer in on the internal workings of
   3 dnl my golfed O(n^4) m4 solution for day 15.
   4 dnl Name your input file "I" (as in the Roman numeral one), or pass
   5 dnl the "-DI=path/to/yourinput" command line argument to m4.
   6 dnl
   7 dnl First things first: m4 is a macro processing language, from 1977.
   8 dnl It is still a major player in building a Linux distro (any software
   9 dnl that provides a configure script from Autoconf or which describes
  10 dnl its own parse grammar using Bison used m4 under the hood on the
  11 dnl developer's machine), but few people code natively in m4 these days.
  12 dnl See https://pubs.opengroup.org/onlinepubs/9699919799/utilities/m4.html
  13 dnl for the POSIX requirements on what an m4 implementation must
  14 dnl provide; this particular solution does not require any extensions,
  15 dnl although I only tested with GNU m4 1.4.19.
  16 dnl
  17 dnl In m4, you can define any macro name to another blob of text, then call
  18 dnl name(param, param...), where the parameters will be substituted into
  19 dnl the $1, $2, ... placeholders in the substitution text; $@ expands into
  20 dnl all the parameters as a quoted list.  m4 also recognizes () grouping
  21 dnl (a comma separates parameters except when embedded in (), as well as
  22 dnl quoted strings `like this', which intentionally use mismatched delimiters
  23 dnl to allow nesting of quotes.  More on that below.  m4 does not include
  24 dnl any built-in looping operators; instead, you can build your own by
  25 dnl writing a macro whose expansion invokes the same macro, but with
  26 dnl altered arguments.  In this manner, m4 behaves as a Turing complete
  27 dnl language.
  28 dnl
  29 dnl I'm going to solve the problem with just a single macro definition,
  30 dnl although it will recursively invoke itself as many times as needed to
  31 dnl conditionally perform various sub-tasks used in computing the final
  32 dnl result.  It is also possible to use macros as a hashtable (mapping
  33 dnl arbitrary text blobs to a macro name for recall later, rather than
  34 dnl invoking those names as macros); but this solution does not use
  35 dnl variables.  Without variables, the only thing left is a functional
  36 dnl approach, rather than an imperative one.
  37 dnl
  38 dnl The dnl macro is special, in that it eats text until the next
  39 dnl newline.  My golfed solution does not use dnl, but this one does,
  40 dnl as my way of explaining the bits and pieces of my _ macro.  m4
  41 dnl also includes # comments, but those cannot be used as nicely inside
  42 dnl a macro definition (the advantage of dnl is that it is elided
  43 dnl completely; while a # comment remains behind and would cause the
  44 dnl expansion text to also contain the comment).  When building up a macro
  45 dnl definition, m4 concatenates adjacent quoted strings; and when collecting
  46 dnl arguments to a macro, m4 elides whitespace between the comma and the
  47 dnl first non-whitespace byte (although even the dnl macro counts as such
  48 dnl non-whitespace, despite expanding to nothing).  The golfed
  49 dnl version does not take advantage of joining `' strings, but this
  50 dnl version does in order to explain the pieces.  Thus, the following two
  51 dnl contructs are identical:
  52 dnl   define(`mymacro', `ifelse(cond1a,cond1b,`expansion1',
  53 dnl   cond2a,cond2b,`expansion2')')
  54 dnl and the version I use below:
  55 dnl   define(`mymacro', `ifelse('dnl
  56 dnl   dnl filler
  57 dnl   `cond1a, cond1b, `expansion1','dnl
  58 dnl   dnl more filler
  59 dnl   `cond2a, cond2b, `expansion2''dnl
  60 dnl   `)'
  61 dnl
  62 dnl Now it's time to dive into creating the definition for the one macro
  63 dnl that we will be using; to minimize my use of letters, I named it _.
  64 dnl
  65 define(_,
  66 dnl
  67 dnl We want this macro to conditionally dispatch to a number of various
  68 dnl subtasks, based on the first parameter.  The ifelse builtin in m4
  69 dnl splits its input into triplets; once it finds two strings that match,
  70 dnl the expansion of ifelse is the third argument.  All remaining parameters
  71 dnl are still parsed, and $@ parameters expanded during that parse, but
  72 dnl those clauses are ignored as long as they do not inject any unquoted
  73 dnl commas into the stream to change the number of parameters seen by
  74 dnl ifelse.  Because the _ macro uses so many triples in its lone ifelse,
  75 dnl we are making the m4 parser do a LOT of throwaway work; but such is
  76 dnl life when golfing a solution.
  77 dnl
  78 `ifelse('dnl
  79 dnl
  80 dnl For ease of exposition, I am going to comment each triple.  First up,
  81 dnl my shift wrapper.  As I mentioned above, a common idiom in m4 is to do
  82 dnl processing on the first (few) arguments of an arbitrary-length
  83 dnl parameter list, then recurse back into the same macro with the remaining
  84 dnl parameters (tail-recursion).  But duplicating the text shift($@)
  85 dnl everywhere takes up valuable golfing space, so I compress it to _(^$@).
  86 dnl Additionally, since some of my uses of _ take a (packed,list) as a single
  87 dnl argument, while others need to get at the individual elements of that
  88 dnl list as separate arguments, I use the idiom that every packed list starts
  89 dnl out as (^,), so that I can then use _$2 to get at the elements inside
  90 dnl the packed list passed in as parameter 2.
  91 dnl
  92 dnl The leading ^ is my one magic character, where it can be concatenated
  93 dnl with arbitrary other text.  That is, _(^,a,b...) and _(^junk,a,b...) both
  94 dnl expand to the sequence a,b....  All other calls to _() pass at most a
  95 dnl single character for the first parameter.  (Having more than one magic
  96 dnl parameter would require additional invocations of index; but unlike
  97 dnl macros in the quoted third part of a triple, macros in the first or
  98 dnl second position of a triple are unconditionally invoked, and must not
  99 dnl call back into _() as that would cause infinite recursion).
 100 dnl
 101 dnl Note that m4 shift-recursion is INHERENTLY O(n^2) - if you want to process
 102 dnl a list of n elements one at a time, you will tail-call the macro n times;
 103 dnl each of those n macro calls will contain an average of n/2 parameters.
 104 dnl This is on top of any algorithmic cost of what is being done in that
 105 dnl recursion.  Hence, when you combine m4's O(n^2) tail-recursion with my
 106 dnl O(n^2) list addition in _(^) below, you get O(n^4) algorithmic effects.
 107 dnl While the example input with 11 elements completes in less than 50ms,
 108 dnl the puzzle input with 4000 elements takes more than 26 minutes, more than
 109 dnl four magnitudes of order slower.
 110 dnl
 111 `index($1,^),0,`shift($@)','dnl
 112 dnl
 113 dnl I've got two more builtins that I need to use frequently.  Time to
 114 dnl write compressions for those, as well.  First is eval, for performing
 115 dnl signed 32-bit math (thankfully, this puzzle does not exceed that),
 116 dnl under the shortcut _($,expr)->eval(expr)->int:
 117 dnl
 118 `$1,$,`eval($2)','dnl
 119 dnl
 120 dnl Next, I need a way to split an input list element into single bytes for
 121 dnl the HASH algorithm.  In GNU m4, substr(string,1) behaves differently
 122 dnl than substr(string,1,) (the trailing blank causes a warning, and acts
 123 dnl like a zero-length slice, rather than the rest of the string), so here,
 124 dnl I need a packed argument _(~,(str,start)) or _(~,(str,start,end))
 125 dnl ->substr(str,start...)->str
 126 dnl
 127 `$1,~,`substr$2','dnl
 128 dnl
 129 dnl With builtin wrappers out of the way, I found myself frequently doing
 130 dnl so many shifts that I could still benefit from even more compression.
 131 dnl This is a shift-4 macro: _(@,1,2,3,4,5,6...)->5,6,...
 132 dnl There are 5 calls to _(^) because I also have to elide the @ in $1.
 133 dnl
 134 `$1,@,`_(^_(^_(^_(^_(^$@)))))','dnl
 135 dnl
 136 dnl The HASH algorithm takes the current hash value (always an integer) and
 137 dnl the ASCII value of the next character to hash (possibly empty, when
 138 dnl called to prime the hash and the first character is not yet known).
 139 dnl The output remains raw text for now; but will eventually be passed to
 140 dnl eval at later points in the expansion.
 141 dnl _(&,cur,val)->int_expr
 142 dnl This is one of the few triples with an unquoted third parameter, because
 143 dnl there are no harmful side effects even when this branch is not selected.
 144 dnl
 145 `$1,&,($3+$2)*17%256,'dnl
 146 dnl
 147 dnl Next up, my scoring helper.  _(?,cnt,cond)1 expands to cnt+1 if cond
 148 dnl is 0 (shown here), or to bare 1 if cond is nonzero (see below; I split
 149 dnl the two halves of this helper solely to get nicer placement of line
 150 dnl breaks in my golfed solution)...
 151 dnl
 152 `$1$3,?0,$2+,'dnl
 153 dnl
 154 dnl The actual scoring function.  Tail recursive: computes the score for
 155 dnl the first triple in the list, then recurses on the rest of the list.
 156 dnl The recursion ends when label1 is empty.  Note that the call to the
 157 dnl scoring helper for computing nextslot  is written so that it forms a
 158 dnl valid eval expression even when box2 is not available.
 159 dnl _(>,slot)-> end of recursion
 160 dnl _(>,slot,label1,box1,focal1,label2,box2,focal2...)->
 161 dnl   output "+slot*box*focal", tail-call _(>,nextslot,label2,box2,focal2...)
 162 dnl
 163 `$1$3,>,,'dnl
 164 `$1,>,`+($2)*(1+$4)*$5_(>,_(?,$2,_($,$4-$7+0))1,_(^_(@,$@)))','dnl
 165 dnl
 166 dnl List removal. Checking for one label to remove is O(n) on failure,
 167 dnl although on success it can shortcut by using the rest of the list
 168 dnl unchanged.  First, the two special cases: end of list, and name match
 169 dnl _(-,name)-> end of recursion
 170 dnl _(-,name,name,box,focal,name2,box2,focal2..)-> output "name2,box2,focal2.."
 171 dnl
 172 `$1$3,-,,'dnl
 173 `$1$2,-$3,`,_(^_(@,$@))','dnl
 174 dnl
 175 dnl ...and here is the rest of the score helper, mentioned above
 176 dnl
 177 `?,$1,,'dnl
 178 dnl
 179 dnl Back to the recursive workhorse of the removal function, on name mismatch:
 180 dnl _(-,name,name1,box1,focal1,name2,box2,focal2...)->
 181 dnl  output "name1,box1,focal1,", tail-call _(-,name,name2,box2,focal2...)
 182 dnl
 183 dnl The net result is that (^,_(-,name,_LIST)) produces an updated LIST with
 184 dnl the triple associated with name, if any, removed
 185 dnl
 186 `$1,-,`,$3,$4,$5_(-,$2,_(^_(@,$@)))','dnl
 187 dnl
 188 dnl Now a pair of mutually recursive helpers used for list addition.
 189 dnl Ultimately, we want (^,_(+,name,box,focal,_LIST)) to produce an updated
 190 dnl LIST with name amended in-place or added in the correct sort order (after
 191 dnl all other triples with the same box, but before any triple with a larger
 192 dnl box). To do this, I split up the output into two halves: the left half
 193 dnl produced by _(*,...) contains any list elements known to be correct, plus
 194 dnl the opening ( and possible first arguments of a final macro call, while
 195 dnl the right half produced by _(*,...) contains the tail of the LIST
 196 dnl unaffected by the left half, as well as the closing ) to perform that
 197 dnl final macro call.
 198 dnl
 199 dnl The general form _(*,cond+,name,box,focal,name2,box2,focal2) has three
 200 dnl subcases:
 201 dnl _(*,1+,name,box,focal,name2,box2,focal)->
 202 dnl   box sorts before box2, output ",name,box,focal,name2,box2,focal2" and
 203 dnl   prepare to tail-call _(^,name3,box3,focal3...)
 204 dnl _(*,0+,name,box,newfocal,name,box,oldfocal)->
 205 dnl   updating box in-place, output ",name,box,newfocal" and prepare to
 206 dnl   tail-call _(^,name3,box3,focal3...)
 207 dnl _(*,0+,name,box,focal,name1,box1,focal1)->
 208 dnl   no match yet, output ",name1,box1,focal1" and prepare to tail-call
 209 dnl   _(+,name,box,focal,name3,box3,focal3...)
 210 dnl
 211 `$1$2,*1+,`,$3,$4,$5,$6,$7,$8,_(^','dnl
 212 `$1$3,*$6,`,$3,$4,$5,_(^','dnl
 213 `$1,*,`,$6,$7,$8_(+,$3,$4,$5','dnl
 214 dnl
 215 dnl List addition.  Uses _(*) to generate earlier elements, then proceeds to
 216 dnl tail-call either _(^) or _(+) if there are more elements in list.
 217 dnl _(+,name,box,focal)-> end of recursion, output ",name,box,focal"
 218 dnl _(+,name,box,focal,name2,box2,focal2,name3,box3,focal3...)->
 219 dnl   provides output ",name3,box3,focal3...)" to couple with tail-call started
 220 dnl   in the _(*) helper.  We only need to shift away 7 elements, but since
 221 dnl   we only have shift-1 _(^) or shift-4 _(@), this creates an empty 8th
 222 dnl   element and uses shift-4 twice.
 223 dnl Like deletion, this is worst-case O(n) for one addition (the triple needs
 224 dnl to be added at the end), but short-circuits if it finds the right box or
 225 dnl updates a name earlier in the list, or O(n^2) for addition of n elements.
 226 dnl
 227 `$1$5,+,`,$2,$3,$4','dnl
 228 `$1,+,`_(*,_($,$3<$6)$@),_(@,_(@,,$@)))','dnl
 229 dnl
 230 dnl Element processing. Given an element label- or label=digit, we kick off
 231 dnl the HASH processing for that element.
 232 dnl _(!,prevhash,(^,LIST))-> end of recursion. Output "prevhash)" to finish
 233 dnl   the open-ended _($ that collected all element hashes for part 1, then
 234 dnl   output "_(>,1,name1,box1,focal1,..." to start an open-ended collection
 235 dnl   of all scores for part 2.
 236 dnl _(!,prevhash,(^,LIST),element1,element2...)->
 237 dnl   output "+prevhash" to the ongoing collection of part1, then tail-call
 238 dnl   _(!,_(%...),element2...) where _(%...) supplies both the hash and
 239 dnl   updated LIST based on element1
 240 `$1$4,!,`$2) _($,_(>,1,_$3)','dnl
 241 `$1,!,`$2_(!,+_(%,0,,,$4,$3),_(@,$@))','dnl
 242 dnl
 243 dnl HASH processing.  Updates the hash by the value of the current character
 244 dnl extracted from element.  Prime with _(%,0,,,(^,LIST),element), then
 245 dnl subsequent calls are _(%,value,prefix,curchar,tail).  Recursion will
 246 dnl always end when curchar is = (addition, tail is a single digit) or -
 247 dnl (deletion, tail is empty); this works even when elementtail includes a
 248 dnl trailing newline supplied with the final element of the input file.
 249 dnl _(%,box,name,-,,(^,LIST))-> end recursion: output the result of
 250 dnl   HASH(label-) for part 1, then ",(^,LIST)" with LIST modified to drop
 251 dnl   name using _(-)
 252 dnl _(%,box,name,=,focal,(^,LIST))-> end recursion: output the result of
 253 dnl   HASH(label=focal) for part 1, then ",(^,LIST)" with LIST modified to
 254 dnl   add or update label hashed to box with the value focal
 255 dnl _(%,partial,oldprefix,oldchar,oldtail)-> tail-call _(%,newvalue,newprefix,
 256 dnl   nextchar,newtail) where newvalue is determined by hashing in oldchar to
 257 dnl   partial, newprefix is oldprefix with curchar appended, nextchar is the
 258 dnl   first byte of oldtail, and newtail is everything else in oldtail
 259 dnl
 260 `$1$4,%-,`_(&,$2,45),(^_(-,$3,_$6))','dnl
 261 `$1$4,%=,`_(&,_(&,$2,61),$5+48),(^_(+,$3,$2,$5,_$6))','dnl
 262 `$1,%,`_(%,_(&,$2,_($4)),$3$4,_(~,($5,0,1)),_(~,($5,1)),$6)','dnl
 263 dnl
 264 dnl Special case for no input; useful for priming _(&) when the first
 265 dnl character is not yet known
 266 dnl
 267 `$1,,,'dnl
 268 dnl
 269 dnl Final catchall clause, used for mapping bytes to ASCII. This conditional
 270 dnl will only be reached if $1 is a letter, since _(%) takes care of =, -, and
 271 dnl digits.  Only 19 letters appear in the actual puzzles (no [aeiouwy]),
 272 dnl but the example puzzle includes a and o.  This could be made one byte
 273 dnl shorter as index(cdef...,$1)+99 and still get the right result for _(b),
 274 dnl but it would miscalculate the hash of _(a) in the example puzzle.
 275 dnl
 276 dnl It is also possible to avoid the second index(), by open-coding a series
 277 dnl of triples, such as $1,b,98,$1,c,99,...,$1,z,122, although that takes
 278 dnl more bytes and no longer golfs to less than 2 punchcards.
 279 dnl
 280 `index(bcdefghijklmnopqrstuvwxyz,$1)+98'dnl
 281 dnl
 282 dnl With that, the ifelse is complete, and our definition of _ is finished.
 283 dnl
 284 `)')dnl
 285 dnl
 286 dnl Now to put it all to work. Start an open-ended _($) that will collect
 287 dnl all the output for the part 1 HASH values; when the open-ended list of
 288 dnl elements provided by include(I) reaches the end of recursion, _(!) will
 289 dnl then close that macro, and open another _($) to collect the output of
 290 dnl the scoring function applied to the final value of the (^,LIST) parameter.
 291 dnl
 292 _($,_(!,,,include(I)))