3 Bytecode (BC, also *P-code*, "portable code") is a type of [binary](binary.md) format for executable programs usually intended to be [interpreted](interpreter.md) or to serve as an [intermediate representation](intermediate_representation.md) in [compilers](compiler.md) (i.e. meant to be translated to some other language); it is quite similar to [machine code](machine_code.md), however machine code is meant to be directly run by some physical [hardware](hardware.md) while bytecode is more of a [virtual](virtual.md), machine independent code preferring things like [portability](portability.md), speed of interpretation, retaining meta information or being easy to translate.
9 Let's consider a simple algorithm that tests the [Collatz conjecture](collatz_conjecture.md) (which says that applying a simple operation from any starting number over and over will always lead to number 1). The program reads a number (one digit for simplicity) and then prints the sequence until reaching the final number 1. The algorithm in [C](c.md) would look as follows:
17 return n % 2 ? // is odd?
24 int n = getchar() - '0'; // read input ASCII digit
40 C will be normally compiled to [machine code](machine_code.md), however we can take a look at some immediate representation bytecode that compilers internally use to generate the machine code. The following is [LLVM](llvm.md), a widely used bytecode that can be produced from the above C code with [clang](clang.md) compiler (e.g. as `clang -cc1 tmp.c -S -emit-llvm -o -`):
43 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
44 target triple = "x86_64-pc-linux-gnu"
46 @.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1
48 ; Function Attrs: noinline nounwind optnone
49 define i32 @next(i32 %n) #0 {
51 %n.addr = alloca i32, align 4
52 store i32 %n, i32* %n.addr, align 4
53 %0 = load i32, i32* %n.addr, align 4
55 %tobool = icmp ne i32 %rem, 0
56 br i1 %tobool, label %cond.true, label %cond.false
58 cond.true: ; preds = %entry
59 %1 = load i32, i32* %n.addr, align 4
60 %mul = mul nsw i32 3, %1
61 %add = add nsw i32 %mul, 1
64 cond.false: ; preds = %entry
65 %2 = load i32, i32* %n.addr, align 4
69 cond.end: ; preds = %cond.false, %cond.true
70 %cond = phi i32 [ %add, %cond.true ], [ %div, %cond.false ]
74 ; Function Attrs: noinline nounwind optnone
75 define i32 @main() #0 {
77 %retval = alloca i32, align 4
78 %n = alloca i32, align 4
79 store i32 0, i32* %retval, align 4
80 %call = call i32 (...) @getchar()
81 %sub = sub nsw i32 %call, 48
82 store i32 %sub, i32* %n, align 4
85 while.body: ; preds = %entry, %if.end
86 %0 = load i32, i32* %n, align 4
87 %call1 = call i32 (i8*, ...) @printf(i8* ... )
88 %1 = load i32, i32* %n, align 4
89 %cmp = icmp eq i32 %1, 1
90 br i1 %cmp, label %if.then, label %if.end
92 if.then: ; preds = %while.body
95 if.end: ; preds = %while.body
96 %2 = load i32, i32* %n, align 4
97 %call2 = call i32 @next(i32 %2)
98 store i32 %call2, i32* %n, align 4
101 while.end: ; preds = %if.then
105 declare i32 @getchar(...) #1
107 declare i32 @printf(i8*, ...) #1
109 attributes #0 = { ... }
110 attributes #1 = { ... }
112 !llvm.module.flags = !{!0}
115 !0 = !{i32 1, !"wchar_size", i32 4}
116 !1 = !{!"clang version 7.0.1-8+deb10u2 (tags/RELEASE_701/final)"}
119 TODO: analyze the above
121 Now let's rewrite the same algorithm in [comun](comun.md), a different language which will allow us to produce another kind of bytecode (obtained with `comun -T program.cmn`):
134 <- # read input ASCII digit
135 "0" - # convert it to number
151 Here is annotated comun bytecode this compiles to:
154 000000: DES 00 0111 # func \ next:
155 000001: JMA 00 0100... # 20 (#14) |
156 000002: COC 00 0001 |
157 000003: MGE 00 0000 | $0
158 000004: CON' 00 0010 # 2 (#2) | 2
159 000005: MOX 00 0000 | %
160 000006: DES 00 0001 # if | \ ?
161 000007: JNA 00 0000... # 16 (#10) | |
162 000008: COC 00 0001 | |
163 000009: CON' 00 0011 # 3 (#3) | | 3
164 00000a: MUX 00 0000 | | *
165 00000b: CON' 00 0001 # 1 (#1) | | 1
166 00000c: ADX 00 0000 | | +
167 00000d: DES 00 0010 # else | < ;
168 00000e: JMA 00 0011... # 19 (#13) | |
169 00000f: COC 00 0001 | |
170 000010: CON' 00 0010 # 2 (#2) | | 2
171 000011: DIX 00 0000 | | /
172 000012: DES 00 0011 # end if | / .
173 000013: RET 00 0000 / .
175 000015: INP 00 0000 <-
176 000016: CON' 00 0000... # 48 (#30) "0"
178 000018: SUX 00 0000 -
179 000019: DES 00 0100 # loop \ @@
180 00001a: MGE 00 0000 | $0
181 00001b: CON' 00 1010 # 10 (#a) | 10
182 00001c: DIX 00 0000 | /
183 00001d: CON' 00 0000... # 48 (#30) | "0"
184 00001e: COC 00 0011 |
185 00001f: ADX 00 0000 | +
186 000020: OUT 00 0000 | ->
187 000021: MGE 00 0000 | $0
188 000022: CON' 00 1010 # 10 (#a) | 10
189 000023: MOX 00 0000 | %
190 000024: CON' 00 0000... # 48 (#30) | "0"
191 000025: COC 00 0011 |
192 000026: ADX 00 0000 | +
193 000027: OUT 00 0000 | ->
194 000028: CON' 00 1010 # 10 (#a) | 10
195 000029: OUT 00 0000 | ->
196 00002a: MGE 00 0000 | $0
197 00002b: CON' 00 0001 # 1 (#1) | 1
198 00002c: EQX 00 0000 | =
199 00002d: DES 00 0001 # if | \ ?
200 00002e: JNA 00 0100... # 52 (#34) | |
201 00002f: COC 00 0011 | |
202 000030: DES 00 0101 # break | | !@
203 000031: JMA 00 1000... # 56 (#38) | |
204 000032: COC 00 0011 | |
205 000033: DES 00 0011 # end if | / .
206 000034: CAL 00 0011 # 3 (#3) | next
207 000035: DES 00 0110 # end loop / .
208 000036: JMA 00 1010... # 26 (#1a)
213 TODO: analyze the above, show other bytecodes (python, java, ...)
215 Let's try the same in [Python](python.md). The code we'll examine will look like this:
221 return 3 * n + 1 if n % 2 != 0 else n / 2
223 n = ord(raw_input()[0]) - ord('0')
234 And the bytecode we get (e.g. with `python -m dis program.py`):
237 3 0 LOAD_CONST 0 (<code object next at ...)
239 6 STORE_NAME 0 (next)
241 6 9 LOAD_NAME 1 (ord)
242 12 LOAD_NAME 2 (raw_input)
248 28 LOAD_CONST 2 ('0')
253 8 38 SETUP_LOOP 43 (to 84)
254 >> 41 LOAD_NAME 4 (True)
255 44 POP_JUMP_IF_FALSE 83
261 11 52 LOAD_NAME 3 (n)
264 61 POP_JUMP_IF_FALSE 68
267 65 JUMP_FORWARD 0 (to 68)
269 14 >> 68 LOAD_NAME 0 (next)
275 >> 84 LOAD_CONST 4 (None)
279 TODO: make sense of it and analyze it