1 divert(-1)dnl -*- m4 -*-
2 # Usage: m4 [-Dfile=day05.input] day05.m4
4 # Welcome to my ELI5 submission.
6 # I suspect you've never coded in m4 before, even though the language
7 # has been around since 1977. So this particular solution is
8 # extra-verbose compared to what I usually do.
10 # m4 is a macro processing language: you can define(`name', `value')
11 # to create a macro, which you later call by name(arg1, arg2...) which
12 # expands into `value' modified by inlining argN into $N substrings.
13 # $@ is shorthand for all the args at once. M4 uses asymmetric
14 # quoting `', so that quoting can nest. When a macro is expanded, the
15 # resulting text is scanned for further macros, unless the output was
16 # quoted. In this manner, it is possible to write recursive macros.
17 # When you see call(bare), you are asking to expand the 'bare' code
18 # prior to calling 'call' (and sometimes, bare can create more than
19 # one argument, if its expansion includes an unquoted comma); while
20 # call(`quoted') is asking to use the literal string, but no longer
21 # quoted (unless the use of $1 in the expansion was itself quoted).
22 # This lets you write both stack recursion (a macro does not produce
23 # output until helper macros called within its parameter list expand
24 # first) and tail recursion (a macro produces output which will then
25 # trigger another round of macro expansion at the same stack level).
27 # The only data structure in m4 is macros, which is basically a
28 # hashtable between names and values; but it allows you to create a
29 # stack of definitions sharing the same name. This comes in handy: I
30 # store the list of seeds in a variable stack name l0; each
31 # pushdef(`l0', seed) pushes another seed into the stack, and each
32 # popdef(`l0') retrieves the most-recently-pushed seed. This puzzle
33 # does not care about ordering (seeds can be processed in any order),
34 # so rather than trying to sort things or preserve input order, I just
35 # destructively consume a stack, meaning my list of seed ranges ends
36 # up reversing itself across iterations. This works because the
37 # target is a minimum result from the list, regardless of what
40 # That was a lot to cover at once; but hopefully it helps you
41 # understand the gist of the code below, I'll use the notation:
42 # macro(args) - a macro that runs for its side-effects with no output,
43 # such as altering the definition of another macro
44 # macro(args)->return - a macro called to produce output
45 # With that introduction, here we go!
47 # Section 1: general setup
48 # Boilerplate that helps me reuse framework code for all of my m4
49 # solutions. For example, m4 does NOT have a native loop operand;
50 # instead, it lets you write your own macros to achieve looping. I
51 # use forloop below, so this saves me the effort of open-coding it yet
52 # another time in this file.
53 include(`common.m4')ifelse(common(05), `ok', `',
54 `errprint(`Missing common initialization
57 # The input file uses unsigned 32-bit integers, but m4 eval() only
58 # does 32-bit signed math; input values overflow to negative. But
59 # rather than pull out arbitrary precision math64.m4 library that I
60 # wrote in previous years, I can skew all inputs into the signed
61 # range, then skew the answer back; it is extremely likely that the
62 # final answer will not be more than 2G.
65 # Slurp in the input, but remap it to avoid whitespace, since that
66 # messes up the eat macro below. The translit here remaps newline to
67 # `;', space to ` ', and discards letters, colon, and dash, since they
68 # don't affect anything we need to do later.
69 define(`input', translit(include(defn(`file')), nl` 'defn(`alpha'):-, `;.'))
71 # offset()->int - the separation between the seeds line and the maps data.
72 define(`offset', index(defn(`input'), `;'))
74 # skew(n)->`int' - adjust a number between signed and unsigned ranges.
75 # This is a bi-directional remap, thanks to the properties of
76 # twos-complement. Note that this macro produces quoted output -
77 # that's because most seed values will be 8 or more bytes (thanks to
78 # the skewing), and m4 is faster at parsing `12345678' than it is
79 # 12345678 (for a string, it can search ahead to the close quote; but
80 # for an integer, it is parsing one byte at a time while computing the
82 define(`skew', `dquote(eval(`$1-0x80000000'))')
84 # prep(n1, n2, ...) - parse of pairs of seed to populate pushdef
85 # stacks l0 (for part 1) and L0 (for part 2). Each stack element of
86 # [lL]N contains start,len,`l/L'. Here is an example of performing
87 # iterative work: you can pass any number of arguments, a single call
88 # of prep processes just the first two list elements then tail
89 # recurses to call $0 (that is shorthand for the current macro, prep)
90 # with a shorter input list (shifting away the first two arguments),
91 # and it uses ifelse() to end recursion when the input list is empty.
93 # M4's main conditional operation is ifelse, which works 3 arguments
94 # at a time: if the first two arguments expand to the same text, then
95 # unquote the third argument as the expansion of ifelse, otherwise
96 # proceed to the next 3 arguments. If no conditions ever match, a
97 # fourth argument provides the final default expansion. This type of
98 # recursion is inherently quadratic in m4 (when passing in N arguments
99 # to the original call, you end up making N calls that process an
100 # average of N/2 arguments); thankfully, the input file list of seeds
101 # is only 20 elements long.
102 define(`prep', `ifelse(`$1', `', `', `pushdef(`l0', skew(`$1')`,1,`l'')pushdef(
103 `l0', skew(`$2')`,1,`l'')pushdef(`L0', skew(`$1')`,`$2',`L'')$0(shift(
105 prep(shift(translit(substr(defn(`input'), 0, offset), `.', `,')))
107 # Section 2: map parsing. with the macos defined in this section,
108 # expanding eat will define p1 through pN, with the macro p storing
109 # the current map (it so happens that both the example and input
110 # puzzles each have 7 maps, but this way I can handle an arbitrary
113 define(`maps', substr(defn(`input'), incr(incr(offset)));)
115 # append(N, text) - append text to the current definition of pN.
116 # Yes, this is a case of writing one macro which in turn rewrites
117 # another macro. Part of the power of m4 comes from dynamically
118 # writing macros whose content depends on runtime input.
119 define(`append', `define(`p$1', defn(`p$1')`$2')')
121 # emit(start, len, `l/L', N) - store an entry into [lL]N
122 define(`emit', `pushdef(`$3$4', `$1,$2,`$3'')')
124 # short-circuit. Once pN has called emit(), all remaining code in the
125 # expansion of pN has nothing to do. These gracefully skip that work.
126 define(`xrange', `x')define(`xpushdef')
128 # range(curstart, curlen, `l/L', p, dest, orig, len) - the real
129 # workhorse of this code. Checks a given seed range against one line
130 # of a mapping, possibly splitting the range in 2 if it exceeds
131 # bounds. The code is careful to avoid signed integer overflow (note
132 # that eval(a<b+c) is NOT the same as eval(a=b+c-1) when b+c wraps
133 # around to INT_MIN). Also, GNU m4 complains about 1--1 (it
134 # recognizes that C has a -- operator even though m4 doesn't, and
135 # complains rather than using my preferred parse of 1-(-1)), so I have
138 # More specifically, this code handles the following cases:
139 # No overlap: expands to nothing, proceeds to next term in pN chain
141 # v--------------v---^-------------^--v---------------v
142 # curstart.+curlen curstart..+curlen
144 # Completely contained: calls emit(dest, len, `l/L', N) to remap the
145 # input into the correct spot for the next iteration, and outputs x to
146 # short-circuit the rest of the pN chain.
147 # orig..........orig+len
148 # ^---v-------------v--^
151 # Start contained: calls both emit(dest, low, `l/L', N) and
152 # pN(orig+len, high, `l/L', N) to recurse to see if the rest of the
153 # input range matches anywhere else in the map. Outputs x.
154 # orig.......orig+len
155 # ^--v--------------|-------v
156 # curstart....+low...+high (where low+high=curlen)
158 # End contained: calls both pN(curstart, low, `l/L', N) and
159 # pN(curstart+low, len-low, `l/L, N), then outputs x. Works
160 # whether curstart+curlen occurs before or after orig+len.
161 # orig.......orig+len
162 # v------------|-------v---------^
163 # curstart..+low...+high (where low+high=curlen)
164 define(`_range', `ifelse(eval(`$1<$3'), 1, `$1p$5(eval(`$2+$1'), eval(`$3- $1'),
165 `$4', $5)', `$3'), `$4', $5')
166 define(`range', `ifelse(eval(`$1>=$6 && $1<=$6+$7-1'), 1, `emit(eval(
167 `$1+$5- $6'),_$0(eval(`$7- $1+$6'), $@))x', eval(`$1<$6 && $1+$2-1>=$6'),
168 1, `p$4($1, eval(`$6- $1'), `$3', $4)p$4($6, eval($2+$1- $6), `$3', $4)x')')
170 # term(@, N, deststart, origstart, len) - appends `range($@, int, int,
171 # int)' to pN. Note that term() requires a literal @ for its first
172 # parameter which is paired with $$1 in the macro value; that is the
173 # only way in m4 to produce a literal $@ in the expansion text which
174 # will itself be assigned to another macro definition (we want $@
175 # expanded into the arguments at the time pN() is called, and not at
176 # the time term() is called). On the other hand, this definition
177 # intentionally stops quoting around the skew() calls - this is a case
178 # where we want the numbers parsed off the line of text to be munged
179 # up front and a single integer stored in the pN macro, rather than
180 # each call to pN calling skew(N) for duplicated work. Knowing when
181 # to quote for late expansion, vs. intentionally underquoting for
182 # early expansion, is the mark of a true m4 expert. But thankfully
183 # you are just reading this code, rather than writing it, so hopefully
184 # you get the gist even if you don't understand the nuance.
185 define(`term', `append($2, `range($$1, 'skew(`$3')`, 'skew(`$4')`, `$5')')')
187 # do(line, p) - process another line of maps while building up pN
188 # macros. When a range of lines is complete, pN will be defined as
189 # range(...)range(...)pushdef(...)
190 define(`do', `ifelse(`$1', `.', `define(`p', incr($2))', `$1', `', `append($2,
191 defn(`emit'))', `term(`@', $2, translit(`$1', `.', `,'))')')
193 # eat(data) - My parsing workhorse. I copy this segment from puzzle to
194 # puzzle, but fine-tune it to the specific data at hand. In this
195 # case, after the translit done in creating input above, a given map
197 # .;50.98.2;52.50.48;;
198 # This splits on each ; to pass a line to do(); the first variant is
199 # O(N) but relies on regex via patsubst, a GNU extension; the second
200 # variant is O(N log N) by dividing the problem into smaller halves
201 # until each half is below 70 characters (roughly double the max line
202 # length of the input file, excluding the seed line) before resorting
203 # to actual hunting for the line delimiter of semicolon.
205 define(`eat', `patsubst(`$1', `\([^;]*\);', `do(`\1', p)')')
207 define(`_chew', `do(substr(`$1', 0, index(`$1', `;')), p)define(
208 `tail', substr(`$1', incr(index(`$1', `;'))))ifelse(index(defn(`tail'),
209 `;'), -1, `', `$0(defn(`tail'))')')
210 define(`chew', `ifelse(eval($1 < 70), 1, `_$0(`$2')', `$0(eval($1/2),
211 substr(`$2', 0, eval($1/2)))$0(eval(len(defn(`tail')) + $1 - $1/2),
212 defn(`tail')substr(`$2', eval($1/2)))')')
213 define(`eat', `chew(len(`$1'), `$1')')
217 # Section 3: computation
218 # Now that all the data is parsed, we need to execute it.
219 # run(`l/L', N) - calls pN() on each element in the [lL]N-1 stack.
221 # Contrast this to the recursion done in prep() above - even though
222 # both macros expand to $0(...$@) except when the end of recursion is
223 # detected, this one is O(N) rather than O(n^2) because it only parses
224 # 2 arguments on each of N iterations, rather than an average of N/2
225 # arguments. ifdef is m4's other conditional operator; the second
226 # argument is only expanded when the first names a defined macro.
227 define(`_run', `ifdef(`$1', `p$2($1, $2)popdef(`$1')$0($@)')')
228 define(`run', `_$0(`$1'decr($2), $2)')
230 # min(a, b)->int - m4's version of a very common idiom
231 define(`min', `ifelse(eval(`$1 < $2'), 1, `$1', `$2')')
233 # _minstack(`[lL]N', best, ignored)->int
234 # minstack(`[lL]N')->int - determines the minimum value in the [lL]N
235 # stack. Very often in m4 code, you'll see a friendly public macro
236 # which then calls out to a companion helper macro to do actual
237 # recursion with an additional accumulator parameter.
238 define(`_minstack', `ifdef(`$1', `$0(`$1', min($2, $1)popdef(`$1'))', `$2')')
239 define(`minstack', `skew(_$0(`$1', $1`'popdef(`$1')))')
241 # Time to run it all! forloop() is one of my own macros from
242 # common.m4 (if you want to study that); calling forloop(start, end,
243 # `prefix', `suffix') expands to "prefixNsuffix" for each N in the
244 # closed range [start-end]; in this case, "run(`l', `1')" through
246 define(`part1', forloop(1, p, `run(`l', ', `)')minstack(`l'p))
247 define(`part2', forloop(1, p, `run(`L', ', `)')minstack(`L'p))
249 # Now display the results.
251 # I hope this tutorial was helpful. Thanks for reading through this.
252 # On my machine, if I stick the input in a file named day05.input,
253 # then "m4 day05.m4" spits out answers for both parts of the question
254 # in about 50ms, using GNU m4 1.4.19. You can use "m4
255 # -Dfile=othername day05.m4" if your input lives elsewhere. The code
256 # should work with any POSIX implementation of m4:
257 # https://pubs.opengroup.org/onlinepubs/9699919799/utilities/m4.html
258 # although I haven't tested it on BSD m4, so it may fail if I
259 # triggered a subtle implementation difference there. With GNU m4,
260 # you can use "m4 -G day05.m4" to skip all GNU extensions and force
261 # execution of the O(N log N) parse; execution time was not noticeably
262 # different, so I'm spending more time in run() than in eat().