README.aarch64

   1
   2 Status
   3 ~~~~~~
   4
   5 As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely,
   6 the 64-bit ARM architecture.  Currently it supports integer and FP
   7 instructions and can run anything generated by gcc-4.8.2 -O3.  The
   8 port is under active development.
   9
  10 Current limitations, as of mid-May 2014.
  11
  12 * limited support of vector (SIMD) instructions.  Initial target is
  13   support for instructions created by gcc-4.8.2 -O3
  14   (via autovectorisation).  This is complete.
  15
  16 * Integration with the built in GDB server:
  17    - works ok (breakpoint, attach to a process blocked in a syscall, ...)
  18    - still to do:
  19       arm64 xml register description files (allowing shadow registers
  20                                             to be looked at).
  21       cpsr transfer to/from gdb to be looked at (see also arm equivalent code)
  22
  23 * limited syscall support
  24
  25 There has been extensive testing of the baseline simulation of integer
  26 and FP instructions.  Memcheck is also believed to work, at least for
  27 small examples.  Other tools appear to at least not crash when running
  28 /bin/date.
  29
  30 Enough syscalls and instructions are supported for substantial
  31 programs to work.  Firefox 26 is able to start up and quit.  The noise
  32 level from Memcheck is low enough to make it practical to use for real
  33 debugging.
  34
  35
  36 Building
  37 ~~~~~~~~
  38
  39 You could probably build it directly on a target OS, using the normal
  40 non-cross scheme
  41
  42   ./autogen.sh ; ./configure --prefix=.. ; make ; make install
  43
  44 Development so far was however done by cross compiling, viz:
  45
  46   export CC=aarch64-linux-gnu-gcc
  47   export LD=aarch64-linux-gnu-ld
  48   export AR=aarch64-linux-gnu-ar
  49
  50   ./autogen.sh
  51   ./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \
  52               --enable-only64bit
  53   make -j4
  54   make -j4 install
  55
  56 Doing this assumes that the install path (`pwd`/Inst) is valid on
  57 both host and target, which isn't normally the case.  To avoid
  58 this limitation, do instead:
  59
  60   ./configure --prefix=/install/path/on/target \
  61               --host=aarch64-unknown-linux \
  62               --enable-only64bit
  63   make -j4
  64   make -j4 install DESTDIR=/a/temp/dir/on/host
  65   # and then copy the contents of DESTDIR to the target.
  66
  67 See README.android for more examples of cross-compile building.
  68
  69
  70 Implementation tidying-up/TODO notes
  71 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  72
  73 UnwindStartRegs -- what should that contain?
  74
  75
  76 vki-arm64-linux.h: vki_sigaction_base
  77 I really don't think that __vki_sigrestore_t sa_restorer
  78 should be present.  Adding it surely puts sa_mask at a wrong
  79 offset compared to (kernel) reality.  But not having it causes
  80 compilation of m_signals.c to fail in hard to understand ways,
  81 so adding it temporarily.
  82
  83
  84 m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF
  85 is there at the moment, but 0x00000000 is probably what it should be.
  86 Also, fix indentation/tab-vs-space stuff
  87
  88
  89 ./include/vki/vki-arm64-linux.h: uses __uint128_t.  Should change
  90 it to __vki_uint128_t, but what's the defn of that?
  91
  92
  93 m_debuginfo/priv_storage.h: need proper defn of DiCfSI
  94
  95
  96 readdwarf.c: is this correct?
  97 #elif defined(VGP_arm64_linux)
  98 #  define FP_REG         29    //???
  99 #  define SP_REG         31    //???
 100 #  define RA_REG_DEFAULT 30    //???
 101
 102
 103 vki-arm64-linux.h:
 104 re linux-3.10.5/include/uapi/asm-generic/sembuf.h
 105 I'd say the amd64 version has padding it shouldn't have.  Check?
 106
 107
 108 syswrap-linux.c run_a_thread_NORETURN assembly sections
 109 seems like tst->os_state.exitcode has word type
 110 in which case the ppc64_linux use of lwz to read it, is wrong
 111
 112
 113 syswrap-linux.c ML_(do_fork_clone)
 114 assuming that VGP_arm64_linux is the same as VGP_arm_linux here
 115
 116
 117 dispatch-arm64-linux.S: FIXME: set up FP control state before
 118 entering generated code.  Also fix screwy indentation.
 119
 120
 121 dispatcher-ery general: what's a good (predictor-friendly) way to
 122 branch to a register?
 123
 124
 125 in vki-arm64-scnums.h
 126 //#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
 127 Probably want to reenable that and clean up accordingly
 128
 129
 130 putIRegXXorZR: figure out a way that the computed value is actually
 131 used, so as to keep any memory reads that might generate it, alive.
 132 (else the simulation can lose exceptions).  At least, for writes to
 133 the zero register generated by loads .. or .. can anything other
 134 integer instructions, that write to a register, cause exceptions?
 135
 136
 137 loads/stores: generate stack alignment checks as necessary
 138
 139
 140 fix barrier insns: ISB, DMB
 141
 142
 143 fix atomic loads/stores
 144
 145
 146 FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused
 147 IROps so as to avoid double rounding
 148
 149
 150 ARM64Instr_Call getRegUsage: re-check relative to what
 151 getAllocableRegs_ARM64 makes available
 152
 153
 154 Make dispatch-arm64-linux.S save any callee-saved Q regs
 155 I think what is required is to save D8-D15 and nothing more than that.
 156
 157
 158 wrapper for __NR3264_fstat -- correct?
 159
 160
 161 PRE(sys_clone): get rid of references to vki_modify_ldt_t and the
 162 definition of it in vki-arm64-linux.h.  Ditto for 32 bit arm.
 163
 164
 165 sigframe-arm64-linux.c: build_sigframe: references to nonexistent
 166 siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been
 167 replaced by zero.  Also in synth_ucontext.
 168
 169
 170 m_debugger.c:
 171 uregs.pstate   = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */
 172 Is that remotely correct?
 173
 174
 175 host_arm64_defs.c: emit_ARM64INstr:
 176 ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing
 177 MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false
 178 dependencies on the top half of the register.  (Or at least check
 179 the semantics of INS Vd.D[0] to see if it zeroes out the top.)
 180
 181
 182 preferredVectorSubTypeFromSize: review perf effects and decide
 183 on a types-for-subparts policy
 184
 185
 186 fold_IRExpr_Unop: add a reduction rule for this
 187 1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) ))
 188 vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x)
 189
 190
 191 check insn selection for memcheck-only primops:
 192 Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32
 193 widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8
 194
 195
 196 isel: get rid of various cases where zero is put into a register
 197 and just use xzr instead.  Especially for CmpNEZ64/32.  And for
 198 writing zeroes into the CC thunk fields.
 199
 200
 201 /* Keep this list in sync with that in iselNext below */
 202 /* Keep this list in sync with that for Ist_Exit above */
 203 uh .. they are not in sync
 204
 205
 206 very stupid:
 207 imm64  x23, 0xFFFFFFFFFFFFFFA0
 208 17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2
 209
 210
 211 valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK,
 212 also add CFI annotations
 213
 214
 215 could possibly bring r29 into use, which be useful as it is
 216 callee saved
 217
 218
 219 ubfm/sbfm etc: special case cases that are simple shifts, as iropt
 220 can't always simplify the general-case IR to a shift in such cases.
 221
 222
 223 LDP,STP (immediate, simm7) (FP&VEC)
 224 should zero out hi parts of dst registers in the LDP case
 225
 226
 227 DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4
 228 rather than doing it "by hand"
 229
 230
 231 Any place where ZeroHI64ofV128 is used in conjunction with
 232 FP vector IROps: find a way to make sure that arithmetic on
 233 the upper half of the values is "harmless."
 234
 235
 236 math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than
 237 inline scalar code
 238
 239
 240 chainXDirect_ARM64: use direct jump forms when possible
 241
 242
 243 Raspberry Pi
 244 ~~~~~~~~~~~~
 245
 246 The Raspberry Pi since version 3 has had 64 bit hardware (aarch64). However,
 247 Raspberry Pi OS (formerly raspbian) has a 32-bit userland. You can check
 248 this using commands like file, ldd or readelf. For instance,
 249
 250 $ file -L `which gcc`
 251 /usr/bin/gcc: ELF 32-bit LSB executable, ARM, EABI5 version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=6cfb4b75e1e265eb5a05ef0a1915bca9bae34674, for GNU/Linux 3.2.0, stripped
 252
 253 As a consequence, if you try to run just "configure" it will detect aarch64 and
 254 select the "arm64" target, which is incorrect for the 32-bit userland.
 255
 256 Instead you should run
 257
 258 configure --host=armv8-unknown-linux
 259
 260 That will override the aarch64 detection and result in a 32bit build of
 261 Valgrind for the "arm" target.
 262