3 ================================
4 NaCl SFI model on x86-64 systems
5 ================================
15 This document addresses the details of the Software Fault Isolation
16 (SFI) model for executable code that can be run in Native Client on an
17 x86-64 system. An overview of this model can be found in the paper:
18 `Adapting Software Fault Isolation to Contemporary CPU Architectures
19 <https://research.google.com/pubs/archive/35649.pdf>`_.
20 The primary focus of the SFI model is a Windows x86-64 system but the
21 same techniques can be applied to run identical x86-64 binaries on
22 other x86-64 systems such as Linux, Mac, FreeBSD, etc, so the
23 description of the SFI model tries to abstract away system
24 dependencies when possible.
26 Please note: throughout this document we use the AT&T notation for
27 assembler syntax, in which the target operand appears last, e.g. ``mov
33 The format of Native Client executable binaries is identical to the
34 x86-64 ELF binary format (`[0]
35 <http://en.wikipedia.org/wiki/Executable_and_Linkable_Format>`_, `[1]
36 <http://www.sco.com/developers/devspecs/gabi41.pdf>`_, `[2]
37 <http://www.sco.com/developers/gabi/latest/contents.html>`_, `[3]
38 <http://downloads.openwatcom.org/ftp/devel/docs/elf-64-gen.pdf>`_) for
39 Linux or BSD with a few extra requirements. The additional rules that
40 a Native Client ELF binary must follow are:
42 * The ELF magic OS ABI field must be 123.
43 * The ELF magic OS ABI VERSION field must be 5.
44 * The ELF e_flags field must be 0x200000 (32-byte alignment).
45 * There must be exactly one PT_LOAD text segment. It must begin at
46 0x20000 (128 kB) and be marked RX (no W). The contents of the text
47 segment must follow :ref:`Text Segment Rules <x86-64-text-segment-rules>`.
48 * There can be at most one PT_LOAD data segment marked R.
49 * There can be at most one PT_LOAD data segment marked RW.
50 * There can be at most one PT_GNU_STACK segment. It must be marked RW.
51 * All segments must end before limit address (4 GiB).
56 To ensure fault isolation at runtime, the system must maintain a
57 number of runtime *invariants* across the lifetime of the running
58 program. Both the *Validator* and the *Service Runtime* are
59 responsible for maintaining the invariants. See the paper for the
60 rationale for the invariants:
62 * ``RIP`` always points to valid instruction boundary (the validator must
63 ensure this with direct jumps and direct calls).
64 * ``R15`` (aka ``RBASE`` and ``RZP``) is never modified by code (the
65 validator must ensure this). Low 32 bits of ``RZP`` are all zero
66 (loader must ensure this).
67 * ``RIP``, ``RBP`` and ``RSP`` are always in the **safe zone**: between
68 ``R15`` and ``R15+4GiB``.
70 * Exception: ``RSP`` and ``RBP`` are allowed to be in the range of
71 ``0..4GiB`` inside *pseudo-instructions*: ``naclrestbp``,
72 ``naclrestsp``, ``naclspadj``, ``naclasp``, ``naclssp``.
74 * 84GiB are allocated for NaCl module (i.e. **untrusted region**):
76 * ``R15-40GiB..R15`` and ``R15+4GIB..R15+44GiB`` are buffer zones with
78 * The 4GB *safe zone* has pages with either PROT_WRITE or PROT_EXEC
79 but must not have PROT_WRITE+PROT_EXEC pages.
80 * All executable code in PROT_EXEC pages is validatable and
81 guaranteed to obey the invariant.
83 * Trampoline/springboard code is mapped to a non-writable region in
84 the *untrusted 84GB region*; each trampoline/springboard is 32-byte
85 aligned and fits within a single *bundle*.
86 * The OS must not put any internal structures/code into the untrusted
87 region at any time (not using OS dynamic linker, etc)
89 .. _x86-64-text-segment-rules:
94 * The validation process must ensure that the text segment complies
95 with the following rules. The validation process must complete
96 successfully strictly before executing any instruction of the
98 * The following instructions are illegal and must be rejected by the
99 validator (the list is not exhaustive as the validator uses a
100 whiteist, not a blacklist; this means there is a large but finite
101 list of instructions the validator allows, not a small list of
102 instructions the validator rejects):
104 * any privileged instructions
105 * ``mov`` to/from segment registers
107 * ``pusha``/``popa`` (not dangerous but not needed for GCC)
109 * There must be space for at least 32 bytes after the text segment and
110 before the next segment in ELF (towards higher addresses) that ends
111 strictly at a 64K boundary (a minimum page size for untrusted
112 code). This space will be padded with HLT instructions as part of
113 the validation process, along with the optional 64K page.
114 * Neither instructions nor *pseudo-instructions* are permitted to span
116 * The ELF entry address must be 32-byte aligned.
117 * Direct ``CALL``/``JUMP`` targets:
119 * must point to a valid instruction boundary
120 * must not point into a *pseudo-instruction*
121 * must not point between a *restricted register* (see below for
122 definition) producer instruction and its corresponding restricted
123 register consumer instruction.
125 * ``CALL`` instructions must be 5 bytes before a 32-byte boundary, so
126 that the return address will be 32-byte aligned.
127 * Indirect call targets must be 32-byte aligned. Instead of indirect
128 ``CALL``/``JMP`` x, use ``nacljmp`` and ``naclcall`` (see below for
129 definitions of these *pseudo-instructions*)
130 * All instructions that **read** or **write** from/to memory must use
131 one of the four registers ``RZP``, ``RIP``, ``RBP`` or ``RSP`` as a
132 base, restricted (see below) register index (multiplied by 0, 1, 2,
133 4 or 8) and constant displacement (optional).
135 * Exception to this rule: string instructions are allowed if used in
136 following sequences (the sequences should not cross *bundle*
137 boundaries; segment overrides are disallowed):
144 [rep] stos ; other string instructions can be used here
146 Note: this is identical to the *pseudo-instruction*: ``[rep] stos
147 %?ax, %nacl:(%rdi),%rZP``
149 * An operand of a command is said to be a **restricted register** iff
150 it is a register that is the target of a 32-bit move in the
151 immediately-preceding command in the same *bundle* (consider the
152 previous command as additional sandboxing prefix):
157 ; any 32-bit register can be used here; the first operand is
158 ; unrestricted but often is the same register
161 * Instructions capable of changing ``%RBP`` and ``%RSP`` are
162 forbidden, except the instruction sequences in the whitelist below,
163 which must not cross *bundle* boundaries:
171 ; restoration of %RBP from memory, register or stack - keeps the
175 ; restoration of %RSP from memory, register or stack - keeps the
179 add %rZP, %rsp ; restoration of %RSP from %RBP with adjust
181 add %rZP, %rsp ; stack space allocation
183 add %rZP, %rsp ; stack space deallocation
184 and $XX, %rsp ; alignment; XX must be between -128 and -1
186 popq ... ; except pop %RSP, pop %RBP
188 List of Pseudo-instructions
189 ===========================
191 Pseudo-instructions were introduced to let the compiler maintain the
192 invariants without needing to know the code alignment rules. The
193 assembler guarantees 32-bit alignment for all *pseudo-instructions* in
194 the table below. In addition, to the pseudo-instructions, one
195 pseudo-operand prefix is introduced: ``%nacl``. Presence of the
196 ``%nacl`` operand prefix ensures that:
198 * The instruction ``"%mov %eXX, %eXX"`` is added immediately before the
199 actual command using prefix ``%nacl`` (where ``%eXX`` is a 32-bit
200 part of the index register of the actual command, for example: in
201 operand ``%nacl:(,%r11)``, the notation ``%eXX`` is referring to
203 * The resulting sequence of two instructions does not cross the
206 For example, the instruction:
211 mov %eax,%nacl:(%r15,%rdi,2)
213 is translated by the assembler to:
219 mov %eax,(%r15,%rdi,2)
221 The complete list of introduced *pseudo-instructions* is as follows:
223 .. TODO(hamaji): Use rst's table instead of the raw HTML below.
230 <td>Pseudo-instruction</td>
231 <td>Is translated to<br/>
235 <td>[rep] cmps %nacl:(%rsi),%nacl:(%rdi),%rZP<br/>
236 <i>(sandboxed cmps)</i><br/>
238 <td>mov %esi,%esi<br/>
239 lea (%rZP,%rsi,1),%rsi<br/>
241 lea (%rZP,%rdi,1),%rdi<br/>
242 [rep] cmps (%rsi),(%rdi)<i><br/>
247 <td>[rep] movs %nacl:(%rsi),%nacl:(%rdi),%rZP<br/>
248 <i>(sandboxed movs)</i><br/>
250 <td>mov %esi,%esi<br/>
251 lea (%rZP,%rsi,1),%rsi<br/>
253 lea (%rZP,%rdi,1),%rdi<br/>
254 [rep] movs (%rsi),(%rdi)<i><br/>
259 <td>naclasp ...,%rZP<br/>
260 <i>(sandboxed stack increment)</i></td>
261 <td>add ...,%esp<br/>
265 <td>naclcall %eXX,%rZP<br/>
266 <i>(sandboxed indirect call)</i></td>
267 <td>and $-32, %eXX<br/>
270 <i>Note: the assembler ensures all calls (including
271 naclcall) will end at the bundle boundary.</i></td>
274 <td>nacljmp %eXX,%rZP<br/>
275 <i>(sandboxed indirect jump)</i></td>
276 <td>and $-32,%eXX<br/>
282 <td>naclrestbp ...,%rZP<br/>
283 <i>(sandboxed %ebp/rbp restore)</i></td>
284 <td>mov ...,%ebp<br/>
288 <td>naclrestsp ...,%rZP
289 <i>(sandboxed %esp/rsp restore)</i></td>
290 <td>mov ...,%esp<br/>
294 <td>naclrestsp_noflags ...,%rZP
295 <i>(sandboxed %esp/rsp restore)</i></td>
296 <td>mov ...,%esp<br/>
297 lea (%rsp,%rZP,1),%rsp</td>
300 <td>naclspadj $N,%rZP<br/>
301 <i>(sandboxed %esp/rsp restore from %rbp; incudes $N offset)</i></td>
302 <td>lea N(%rbp),%esp<br/>
306 <td>naclssp ...,%rZP<br/>
307 <i>(sandboxed stack decrement)</i></td>
308 <td>sub ...,%esp<br/>
312 <td>[rep] scas %nacl:(%rdi),%?ax,%rZP<br/>
313 <i>(sandboxed stos)</i></td>
314 <td>mov %edi,%edi<br/>
315 lea (%rZP,%rdi,1),%rdi<br/>
316 [rep] scas (%rdi),%?ax<br/>
320 <td>[rep] stos %?ax,%nacl:(%rdi),%rZP<br/>
321 <i>(sandboxed stos)</i></td>
322 <td>mov %edi,%edi<br/>
323 lea (%rZP,%rdi,1),%rdi<br/>
324 [rep] stos %?ax,(%rdi)<br/>