1 //===---------------------------------------------------------------------===//
2 // Random ideas for the ARM backend (Thumb specific).
3 //===---------------------------------------------------------------------===//
5 * Add support for compiling functions in both ARM and Thumb mode, then taking
7 * Add support for compiling individual basic blocks in thumb mode, when in a
8 larger ARM function. This can be used for presumed cold code, like paths
9 to abort (failure path of asserts), EH handling code, etc.
11 * Thumb doesn't have normal pre/post increment addressing modes, but you can
12 load/store 32-bit integers with pre/postinc by using load/store multiple
13 instrs with a single register.
15 * Make better use of high registers r8, r10, r11, r12 (ip). Some variants of add
16 and cmp instructions can use high registers. Also, we can use them as
17 temporaries to spill values into.
19 * In thumb mode, short, byte, and bool preferred alignments are currently set
20 to 4 to accommodate ISA restriction (i.e. add sp, #imm, imm must be multiple
23 //===---------------------------------------------------------------------===//
25 Potential jumptable improvements:
27 * If we know function size is less than (1 << 16) * 2 bytes, we can use 16-bit
28 jumptable entries (e.g. (L1 - L2) >> 1). Or even smaller entries if the
29 function is even smaller. This also applies to ARM.
31 * Thumb jumptable codegen can improve given some help from the assembler. This
32 is what we generate right now:
34 .set PCRELV0, (LJTI1_0_0-(LPCRELL0+4))
45 Note there is another pc relative add that we can take advantage of.
46 add r1, pc, #imm_8 * 4
48 We should be able to generate:
58 if the assembler can translate the add to:
59 add r1, pc, #((LJTI1_0_0-(LPCRELL0+4))&0xfffffffc)
61 Note the assembler also does something similar to constpool load:
65 ldr r0, pc, #((LCPI1_0-(LPCRELL0+4))&0xfffffffc)
68 //===---------------------------------------------------------------------===//
70 We compiles the following:
72 define i16 @func_entry_2E_ce(i32 %i) {
73 switch i32 %i, label %bb12.exitStub [
74 i32 0, label %bb4.exitStub
75 i32 1, label %bb9.exitStub
76 i32 2, label %bb4.exitStub
77 i32 3, label %bb4.exitStub
78 i32 7, label %bb9.exitStub
79 i32 8, label %bb.exitStub
80 i32 9, label %bb9.exitStub
102 bhi LBB1_4 @bb12.exitStub
106 bne LBB1_5 @bb4.exitStub
110 bne LBB1_6 @bb9.exitStub
115 bne LBB1_7 @bb.exitStub
116 LBB1_4: @bb12.exitStub
119 LBB1_5: @bb4.exitStub
122 LBB1_6: @bb9.exitStub
137 @ lr needed for prologue
142 ands r0, r3, r2, asl r0
163 GCC is doing a couple of clever things here:
164 1. It is predicating one of the returns. This isn't a clear win though: in
165 cases where that return isn't taken, it is replacing one condbranch with
166 two 'ne' predicated instructions.
167 2. It is sinking the shift of "1 << i" into the tst, and using ands instead of
168 tst. This will probably require whole function isel.
177 //===---------------------------------------------------------------------===//
179 When spilling in thumb mode and the sp offset is too large to fit in the ldr /
180 str offset field, we load the offset from a constpool entry and add it to sp:
186 These instructions preserve the condition code which is important if the spill
187 is between a cmp and a bcc instruction. However, we can use the (potentially)
188 cheaper sequnce if we know it's ok to clobber the condition register.
194 This is especially bad when dynamic alloca is used. The all fixed size stack
195 objects are referenced off the frame pointer with negative offsets. See
196 oggenc for an example.
198 //===---------------------------------------------------------------------===//
200 We are reserving R3 as a scratch register under thumb mode. So if it is live in
201 to the function, we save / restore R3 to / from R12. Until register scavenging
202 is done, we should save R3 to a high callee saved reg at emitPrologue time
203 (when hasFP is true or stack size is large) and restore R3 from that register
204 instead. This allows us to at least get rid of the save to r12 everytime it is
207 //===---------------------------------------------------------------------===//
209 Poor codegen test/CodeGen/ARM/select.ll f7:
220 //===---------------------------------------------------------------------===//
222 Make register allocator / spiller smarter so we can re-materialize "mov r, imm",
223 etc. Almost all Thumb instructions clobber condition code.
225 //===---------------------------------------------------------------------===//
227 Add ldmia, stmia support.