4 check that illegal insns on all targets don't cause the _toIR.c's to
5 assert. [DONE: amd64 x86 ppc32 ppc64 arm s390]
7 check also with --vex-guest-chase-cond=yes
9 check that all targets can run their insn set tests with
10 --vex-guest-max-insns=1.
12 all targets: run some tests using --profile-flags=... to exercise
13 function patchProfInc_<arch> [DONE: amd64 x86 ppc32 ppc64 arm s390]
15 figure out if there is a way to write a test program that checks
16 that event checks are actually getting triggered
21 host_arm_isel.c and host_arm_defs.c: get rid of global var arm_hwcaps.
23 host_x86_defs.c, host_amd64_defs.c: return proper VexInvalRange
24 records from the patchers, instead of {0,0}, so that transparent
25 self hosting works properly.
27 host_ppc_defs.h: is RdWrLR still needed? If not delete.
31 Comments that used to be in m_scheduler.c:
33 - extensive spinrounds
34 - with sched quantum = 1 -- check that handle_noredir_jump
35 doesn't return with INNER_COUNTERZERO
37 - out of date comment w.r.t. bit 0 set in libvex_trc_values.h
38 - can VG_TRC_BORING still happen? if not, rm
39 - memory leaks in m_transtab (InEdgeArr/OutEdgeArr leaking?)
40 - move do_cacheflush out of m_transtab
41 - more economical unchaining when nuking an entire sector
42 - ditto w.r.t. cache flushes
43 - verify case of 2 paths from A to B
44 - check -- is IP_AT_SYSCALL still right?
49 ppc: chain_XDirect: generate short form jumps when possible
51 ppc64: immediate generation is terrible .. should be able
54 arm codegen: Generate ORRS for CmpwNEZ32(Or32(x,y))
56 all targets: when nuking an entire sector, don't bother to undo the
57 patching for any translations within the sector (nor with their
60 (somewhat implausible) for jumps to disp_cp_indir, have multiple
61 copies of disp_cp_indir, one for each of the possible registers that
62 could have held the target guest address before jumping to the stub.
63 Then disp_cp_indir wouldn't have to reload it from memory each time.
64 Might also have the effect of spreading out the indirect mispredict
65 burden somewhat (across the multiple copies.)
70 T-chaining changes -- summary
72 * The code generators (host_blah_isel.c, host_blah_defs.[ch]) interact
73 more closely with Valgrind than before. In particular the
74 instruction selectors must use one of 3 different kinds of
75 control-transfer instructions: XDirect, XIndir and XAssisted.
76 All archs must use these the same; no more ad-hoc control transfer
81 * With T-chaining, translations can jump between each other without
82 going through the dispatcher loop every time. This means that the
83 event check (counter dec, and exit if negative) the dispatcher loop
84 previously did now needs to be compiled into each translation.
87 * The assembly dispatcher code (dispatch-arch-os.S) is still
88 present. It still provides table lookup services for
89 indirect branches, but it also provides a new feature:
90 dispatch points, to which the generated code jumps. There
93 VG_(disp_cp_chain_me_to_slowEP):
94 VG_(disp_cp_chain_me_to_fastEP):
95 These are chain-me requests, used for Boring conditional and
96 unconditional jumps to destinations known at JIT time. The
97 generated code calls these (doesn't jump to them) and the
98 stub recovers the return address. These calls never return;
99 instead the call is done so that the stub knows where the
100 calling point is. It needs to know this so it can patch
101 the calling point to the requested destination.
103 Old-style table lookup and go; used for indirect jumps
104 VG_(disp_cp_xassisted):
105 Most general and slowest kind. Can transfer to anywhere, but
106 first returns to scheduler to do some other event (eg a syscall)
108 VG_(disp_cp_evcheck_fail):
109 Code jumps here when the event check fails.
112 * new instructions in backends: XDirect, XIndir and XAssisted.
113 XDirect is used for chainable jumps. It is compiled into a
114 call to VG_(disp_cp_chain_me_to_slowEP) or
115 VG_(disp_cp_chain_me_to_fastEP).
117 XIndir is used for indirect jumps. It is compiled into a jump
118 to VG_(disp_cp_xindir)
120 XAssisted is used for "assisted" (do something first, then jump)
121 transfers. It is compiled into a jump to VG_(disp_cp_xassisted)
123 All 3 of these may be conditional.
125 More complexity: in some circumstances (no-redir translations)
126 all transfers must be done with XAssisted. In such cases the
127 instruction selector will be told this.
130 * Patching: XDirect is compiled basically into
131 %r11 = &VG_(disp_cp_chain_me_to_{slow,fast}EP)
133 Backends must provide a function (eg) chainXDirect_AMD64
134 which converts it into a jump to a specified destination
137 %r11 = 64-bit immediate
139 depending on branch distance.
141 Backends must provide a function (eg) unchainXDirect_AMD64
142 which restores the original call-to-the-stub version.
145 * Event checks. Each translation now has two entry points,
146 the slow one (slowEP) and fast one (fastEP). Like this:
150 if (counter < 0) goto VG_(disp_cp_evcheck_fail)
152 (rest of the translation)
154 slowEP is used for control flow transfers that are or might be
155 a back edge in the control flow graph. Insn selectors are
156 given the address of the highest guest byte in the block so
157 they can determine which edges are definitely not back edges.
159 The counter is placed in the first 8 bytes of the guest state,
160 and the address of VG_(disp_cp_evcheck_fail) is placed in
161 the next 8 bytes. This allows very compact checks on all
162 targets, since no immediates need to be synthesised, eg:
164 decq 0(%baseblock-pointer)
166 jmpq *8(baseblock-pointer)
169 On amd64 a non-failing check is therefore 2 insns; all 3 occupy
172 On amd64 the event check is created by a special single
173 pseudo-instruction AMD64_EvCheck.
176 * BB profiling (for --profile-flags=). The dispatch assembly
177 dispatch-arch-os.S no longer deals with this and so is much
178 simplified. Instead the profile inc is compiled into each
179 translation, as the insn immediately following the event
180 check. Again, on amd64 a pseudo-insn AMD64_ProfInc is used.
181 Counters are now 64 bit even on 32 bit hosts, to avoid overflow.
183 One complexity is that at JIT time it is not known where the
184 address of the counter is. To solve this, VexTranslateResult
185 now returns the offset of the profile inc in the generated
186 code. When the counter address is known, VEX can be called
187 again to patch it in. Backends must supply eg
188 patchProfInc_AMD64 to make this happen.
191 * Front end changes (guest_blah_toIR.c)
193 The way the guest program counter is handled has changed
194 significantly. Previously, the guest PC was updated (in IR)
195 at the start of each instruction, except for the first insn
196 in an IRSB. This is inconsistent and doesn't work with the
199 Now, each instruction must update the guest PC as its last
200 IR statement -- not its first. And no special exemption for
201 the first insn in the block. As before most of these are
202 optimised out by ir_opt, so no concerns about efficiency.
204 As a logical side effect of this, exits (IRStmt_Exit) and the
205 block-end transfer are both considered to write to the guest state
206 (the guest PC) and so need to be told the offset of it.
208 IR generators (eg disInstr_AMD64) are no longer allowed to set the
209 IRSB::next, to specify the block-end transfer address. Instead they
210 now indicate, to the generic steering logic that drives them (iow,
211 guest_generic_bb_to_IR.c), that the block has ended. This then
212 generates effectively "goto GET(PC)" (which, again, is optimised
213 away). What this does mean is that if the IR generator function
214 ends the IR of the last instruction in the block with an incorrect
215 assignment to the guest PC, execution will transfer to an incorrect
216 destination -- making the error obvious quickly.