linux/subx_debugging.md

   1 ## A few hints for debugging SubX programs
   2
   3 Writing programs in SubX is surprisingly pleasant and addictive. Reading
   4 programs is a work in progress, and hopefully the extensive unit tests help.
   5 However, _debugging_ programs is where one really faces up to the low-level
   6 nature of SubX. Even the smallest modifications need testing to make sure they
   7 work. In my experience, there is no modification so small that I get it working
   8 on the first attempt. And when it doesn't work, there are no clear error
   9 messages. Machine code is too simple-minded for that. You can't use a debugger,
  10 since SubX's simplistic ELF binaries contain no debugging information. So
  11 debugging requires returning to basics and practicing with a new, more
  12 rudimentary but hopefully still workable toolkit:
  13
  14 - Start by nailing down a concrete set of steps for reproducibly obtaining the
  15   error or erroneous behavior.
  16
  17 - If possible, turn the steps into a failing test. It's not always possible,
  18   but SubX's primary goal is to keep improving the variety of tests one can
  19   write.
  20
  21 - Start running the single failing test alone. This involves modifying the top
  22   of the program (or the final `.subx` file passed in to `bootstrap translate`) by
  23   replacing the call to `run-tests` with a call to the appropriate `test-`
  24   function.
  25
  26 - Generate a trace for the failing test while running your program in emulated
  27   mode (`bootstrap run`):
  28
  29   ```
  30   $ cd linux
  31   $ ./translate_subx_debug file1.subx file2.subx ...  # generating a.elf
  32   $ bootstrap/bootstrap --trace run a.elf arg1 arg2
  33   saving trace to 'last_run'
  34   ```
  35
  36   The ability to generate a trace is the essential reason for the existence of
  37   `bootstrap run` mode. It gives far better visibility into program internals than
  38   running natively.
  39
  40   Here's a sample of the contents of `last_run`, with a few boxes highlighted:
  41
  42   <img alt='trace example' src='html/trace.png'>
  43
  44   Each of the green boxes shows the trace emitted for a single instruction.
  45   It starts with a line of the form `run: inst: ___` followed by the opcode
  46   for the instruction, the state of registers before the instruction executes,
  47   and various other facts deduced during execution. Some instructions first
  48   print a matching label. In the above screenshot, the red boxes show that
  49   address `0x0900005e` maps to label `$loop` and presumably marks the start of
  50   some loop. Function names get similar `run: == label` lines.
  51
  52 - One quick trick when scanning a trace for the first time:
  53
  54   ```
  55   $ grep label last_run
  56   ```
  57
  58   This is useful for quickly showing you the control flow for the run, and the
  59   function executing when the error occurred. I find it useful to start with
  60   this information, only looking at the complete trace after I've gotten
  61   oriented on the control flow. Did it get to the loop I just modified? How
  62   many times did it go through the loop?
  63
  64 - Once you have SubX displaying labels in traces, it's a short step to modify
  65   the program to insert more labels just to gain more insight. For example,
  66   consider the following function:
  67
  68   <img alt='control example -- before' src='html/control0.png'>
  69
  70   This function contains a series of jump instructions. If a trace shows
  71   `is-hex-lowercase-byte?` being encountered, and then `$is-hex-lowercase-byte?:end`
  72   being encountered, it's still ambiguous what happened. Did we hit an early
  73   exit, or did we execute all the way through? To clarify this, add temporary
  74   labels after each jump:
  75
  76   <img alt='control example -- after' src='html/control1.png'>
  77
  78   Now the trace should have a lot more detail on which of these labels was
  79   reached, and precisely when the exit was taken.
  80
  81 - If you find yourself wondering, "when did the contents of this memory
  82   address change?", `bootstrap run` has some rudimentary support for _watch
  83   points_. Just insert a label starting with `$watch-` before an instruction
  84   that writes to the address, and its value will start getting dumped to the
  85   trace after every instruction thereafter.
  86
  87 - Once we have a sense for precisely which instructions we want to look at,
  88   it's time to look at the trace as a whole. Key is the state of registers
  89   before each instruction. If a function is receiving bad arguments it becomes
  90   natural to inspect what values were pushed on the stack before calling it,
  91   tracing back further from there, and so on.
  92
  93   I occasionally want to see the precise state of the stack segment, in which
  94   case I uncomment a commented-out call to `dump_stack()` in the `vm.cc`
  95   layer. It makes the trace a lot more verbose and a lot less dense, necessitating
  96   a lot more scrolling around, so I keep it turned off most of the time.
  97
  98 - If the trace seems overwhelming, try [browsing it](https://github.com/akkartik/mu/blob/master/tools/browse_trace.readme.md)
  99   in the 'time-travel debugger'.
 100
 101 - Don't be afraid to slice and dice the trace using Unix tools. For example,
 102   say you have a SubX binary that dies while running tests. You can see what
 103   test it's segfaulting at by compiling it with debug information using
 104   `./translate_subx_debug`, and then running:
 105
 106   ```
 107   grep 'label test-' |tail
 108   ```
 109
 110   Just read out the last test printed out before the segfault.
 111
 112   Even outside of tests, I can often quickly debug an error just by scanning
 113   the end of a trace for labels:
 114
 115   ```
 116   $ grep label last_run |tail
 117   ```
 118
 119   Knowing _where_ the error occurred is often enough to put me on the right
 120   track to debugging an error.
 121
 122 Hopefully these hints are enough to get you started. The main thing to
 123 remember is to not be afraid of modifying the sources. A good debugging
 124 session gets into a nice rhythm of generating a trace, staring at it for a
 125 while, modifying the sources, regenerating the trace, and so on. Email
 126 [me](mailto:mu@akkartik.com) if you'd like another pair of eyes to stare at a
 127 trace, or if you have questions or complaints.