README.aarch64

   1
   2 Status
   3 ~~~~~~
   4
   5 As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely,
   6 the 64-bit ARM architecture.  Currently it supports integer and FP
   7 instructions and can run almost anything generated by gcc-4.8.2 -O2.
   8 The port is under active development.
   9
  10 Current limitations, as of mid-Feb 2014.
  11
  12 * limited support of vector (SIMD) instructions.  Initial target is
  13   support for instructions created by gcc-4.8.2 -O3 (via vectorisation).
  14   This is mostly complete.
  15
  16 * Integration with the built in GDB server:
  17    - basically works but breakpoints may be problematic (unclear)
  18      Use --vgdb=full to bypass the problem.
  19    - still to do:
  20       arm64 xml register description files (allowing shadow registers
  21                                             to be looked at).
  22       ptrace invoker : currently disabled for both arm and arm64
  23       cpsr transfer to/from gdb to be looked at (see also arm equivalent code)
  24
  25 * limited syscall support
  26
  27 There has been extensive testing of the baseline simulation of integer
  28 and FP instructions.  Memcheck is also believed to work, at least for
  29 small examples.  Other tools appear to at least not crash when running
  30 /bin/date.
  31
  32 Enough syscalls are supported for /bin/ssh and /bin/bash to work.  In
  33 particular that means that programs that create and use TCP sockets
  34 are likely to work.
  35
  36
  37
  38 Building
  39 ~~~~~~~~
  40
  41 You could probably build it directly on a target OS, using the normal
  42 non-cross scheme
  43
  44   ./autogen.sh ; ./configure --prefix=.. ; make ; make install
  45
  46 Development so far was however done by cross compiling, viz:
  47
  48   export CC=aarch64-linux-gnu-gcc
  49   export LD=aarch64-linux-gnu-ld
  50   export AR=aarch64-linux-gnu-ar
  51
  52   ./autogen.sh
  53   ./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \
  54               --enable-only64bit
  55   make -j4
  56   make -j4 install
  57
  58 Doing this assumes that the install path (`pwd`/Inst) is valid on
  59 both host and target, which isn't normally the case.  To avoid
  60 this limitation, do instead:
  61
  62   ./configure --prefix=/install/path/on/target \
  63               --host=aarch64-unknown-linux \
  64               --enable-only64bit
  65   make -j4
  66   make -j4 install DESTDIR=/a/temp/dir/on/host
  67   # and then copy the contents of DESTDIR to the target.
  68
  69 See README.android for more examples of cross-compile building.
  70
  71
  72 Implementation tidying-up/TODO notes
  73 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  74
  75 UnwindStartRegs -- what should that contain?
  76
  77
  78 vki-arm64-linux.h: vki_sigaction_base
  79 I really don't think that __vki_sigrestore_t sa_restorer
  80 should be present.  Adding it surely puts sa_mask at a wrong
  81 offset compared to (kernel) reality.  But not having it causes
  82 compilation of m_signals.c to fail in hard to understand ways,
  83 so adding it temporarily.
  84
  85
  86 m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF
  87 is there at the moment, but 0x00000000 is probably what it should be.
  88 Also, fix indentation/tab-vs-space stuff
  89
  90
  91 ./include/vki/vki-arm64-linux.h: uses __uint128_t.  Should change
  92 it to __vki_uint128_t, but what's the defn of that?
  93
  94
  95 m_debuginfo/priv_storage.h: need proper defn of DiCfSI
  96
  97
  98 readdwarf.c: is this correct?
  99 #elif defined(VGP_arm64_linux)
 100 #  define FP_REG         29    //???
 101 #  define SP_REG         31    //???
 102 #  define RA_REG_DEFAULT 30    //???
 103
 104
 105 vki-arm64-linux.h:
 106 re linux-3.10.5/include/uapi/asm-generic/sembuf.h
 107 I'd say the amd64 version has padding it shouldn't have.  Check?
 108
 109
 110 syswrap-linux.c run_a_thread_NORETURN assembly sections
 111 seems like tst->os_state.exitcode has word type
 112 in which case the ppc64_linux use of lwz to read it, is wrong
 113
 114
 115 syswrap-linux.c ML_(do_fork_clone)
 116 assuming that VGP_arm64_linux is the same as VGP_arm_linux here
 117
 118
 119 dispatch-arm64-linux.S: FIXME: set up FP control state before
 120 entering generated code.  Also fix screwy indentation.
 121
 122
 123 dispatcher-ery general: what's a good (predictor-friendly) way to
 124 branch to a register?
 125
 126
 127 in vki-arm64-scnums.h
 128 //#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
 129 Probably want to reenable that and clean up accordingly
 130
 131
 132 putIRegXXorZR: figure out a way that the computed value is actually
 133 used, so as to keep any memory reads that might generate it, alive.
 134 (else the simulation can lose exceptions).  At least, for writes to
 135 the zero register generated by loads .. or .. can anything other
 136 integer instructions, that write to a register, cause exceptions?
 137
 138
 139 loads/stores: generate stack alignment checks as necessary
 140
 141
 142 fix barrier insns: ISB, DMB
 143
 144
 145 fix atomic loads/stores
 146
 147
 148 FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused
 149 IROps so as to avoid double rounding
 150
 151
 152 ARM64Instr_Call getRegUsage: re-check relative to what
 153 getAllocableRegs_ARM64 makes available
 154
 155
 156 Make dispatch-arm64-linux.S save any callee-saved Q regs
 157 I think what is required is to save D8-D15 and nothing more than that.
 158
 159
 160 wrapper for __NR3264_fstat -- correct?
 161
 162
 163 PRE(sys_clone): get rid of references to vki_modify_ldt_t and the
 164 definition of it in vki-arm64-linux.h.  Ditto for 32 bit arm.
 165
 166
 167 sigframe-arm64-linux.c: build_sigframe: references to nonexistent
 168 siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been
 169 replaced by zero.  Also in synth_ucontext.
 170
 171
 172 m_debugger.c:
 173 uregs.pstate   = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */
 174 Is that remotely correct?
 175
 176
 177 host_arm64_defs.c: emit_ARM64INstr:
 178 ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing
 179 MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false
 180 dependencies on the top half of the register.  (Or at least check
 181 the semantics of INS Vd.D[0] to see if it zeroes out the top.)
 182
 183
 184 preferredVectorSubTypeFromSize: review perf effects and decide
 185 on a types-for-subparts policy
 186
 187
 188 fold_IRExpr_Unop: add a reduction rule for this
 189 1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) ))
 190 vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x)
 191
 192
 193 check insn selection for memcheck-only primops:
 194 Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32
 195 widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8
 196
 197
 198 isel: get rid of various cases where zero is put into a register
 199 and just use xzr instead.  Especially for CmpNEZ64/32.  And for
 200 writing zeroes into the CC thunk fields.
 201
 202
 203 /* Keep this list in sync with that in iselNext below */
 204 /* Keep this list in sync with that for Ist_Exit above */
 205 uh .. they are not in sync
 206
 207
 208 very stupid:
 209 imm64  x23, 0xFFFFFFFFFFFFFFA0
 210 17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2
 211
 212
 213 valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK,
 214 also add CFI annotations
 215
 216
 217 could possibly bring r29 into use, which be useful as it is
 218 callee saved
 219
 220
 221 ubfm/sbfm etc: special case cases that are simple shifts, as iropt
 222 can't always simplify the general-case IR to a shift in such cases.
 223
 224
 225 LDP,STP (immediate, simm7) (FP&VEC)
 226 should zero out hi parts of dst registers in the LDP case
 227
 228
 229 DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4
 230 rather than doing it "by hand"
 231
 232
 233 Any place where ZeroHI64ofV128 is used in conjunction with
 234 FP vector IROps: find a way to make sure that arithmetic on
 235 the upper half of the values is "harmless."
 236
 237
 238 math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than
 239 inline scalar code
 240
 241
 242 chainXDirect_ARM64: use direct jump forms when possible