README for JITCS 0.01
---------------------
JITCS a.k.a. Just-In-Time-Compiler-System will become a just-in-time assembler
on all platforms where this service is plausible (e.g. Windows, MacOS X, Linux,
and Android, but NOT iOS or Windows Phone). Usage is targeted at applications
where compilation speed and code quality are both important, e.g. emulation.
JITCS is Copyright (C) 2013-2014 Dirk Steinke.
JITCS is free software, released under the MIT license.
See full Copyright Notice in the COPYRIGHT file or in include/jitcs.h.
WARNING: The library is currently not fit for ANY purpose, and is severely
lacking documention. You MAY study its general design principles, but
unfortunately, the most interesting parts are still to come.
The goals of JITCS are simple:
- require minimal overhead
- provide fast compilation time
- provide good code quality
- generate as much target dependent source code as possible from data
descriptions
- be available on all platforms supporting JIT compiling
Concerning multithreading: JITCS is reentrant, so different compiler objects
can be used by different threads, BUT JITCS is not using mutexes for anything,
so DO NOT share a compiler object across threads unless you do the
synchronizaton yourself.
At the moment, the scope of JITCS is limited to JIT assembly. There are three
subproblems concerning assembly: instruction selection, instruction scheduling
and register selection.
- Instruction selection: the type of operation is in most cases already
determined by the code developper (e.g. addition or multiplation), and most
of the time also the data on which the operation is working (e.g. int, float
or vector of bytes).
Using an IR to abstract from the actual hardware makes hardly sense in a
situation where the target hardware is mostly already determined (e.g. x86 on
Windows/Linux, ARM on Android/iOS). Due to its abstract nature, IR is better
suited for instruction combining, especially on 32-bit ARM, where many shift
instructions can be combined with arithmetic or logical operations.
The JITCS assembly stage assumes, that the best possibly instruction for the
current operation has already been selected.
- Instruction scheduling: while out-of-order architectures like x86 and modern
ARM do not benefit as greatly as older architectures from latency-based
scheduling, there are cases where they do. on the other hand, latency and
port-usage differs greatly between different versions of x86 for the same
instruction. so a JIT assembler should MEASURE the current architecture's
instruction latencies and port-usage, and NOT rely on built-in tables.
While there are ideas, on how to achieve that, JITCS just assumes that the
out-of-order target architecture will do the job good enough.
- Register selection: register allocation is basically a caching scheme.
Concerning the latencies for memory accesses (especially, if the memory is
NOT in L1 cache), it is good practice to avoid having to reload registers
from memory more often than necessary. If the exact size of the available
register file is not known to the code developper, or if he DOES NOT WANT to
worry about it, a good assembler should provide for a reasonably good
allocation scheme. Of course, the better the scheme, the longer the required
compilation times.
JITCS will provide for several levels of allocation quality. NONE of them
will beat the allocation quality of gcc or LLVM, but the best level should be
at least close, while still taking less time.
To do a good job on register allocation for native instructions, JITCS provides
data files describing all necessary information required for any native
instruction to be handled by JITCS. During the build process, these data files
are transformed into C++ source code meant to handle the native instructions.
The required information for each instruction comprises:
- type and names of parameters (for instruction constructors)
- usage of register parameters : used and/or defined (for dataflow analysis)
- usage of implicit registers (e.g. EFLAGS on x86, for dataflow analysis)
- ISA requirements (to test if a certain instruction can be run on the current
host cpu)
- description of instruction encoding (to turn the instruction object into
machine code)
- operand folding (e.g. folding a memory load operation into a
register-register operation on x86)
Alternatives:
-------------
Concerning just-in-time compilation, there are several alternatives available.
Examples are:
- xbyak:
It is a very fast, directly encoding, just-in-time assembler for x86 with
minimal overhead. The user is responsible for choosing instruction,
scheduling and registers.
The Library is header files only.
- LLVM/MCJIT:
A just-in-time compiler based on the popular LLVM library. It provides target
independence, a full-fledged optimization support, superior code generation
and a vast developper community.
Disadvantages are a rather low compilation speed and large library
dependencies (several megabyte).
- AsmJit:
X86. Instructions are temporarily stored with their operands, before the
machine code is generated in the second phase. It does not provide help with
instruction selection or scheduling, but offers help with register selection
(a.k.a. allocation). The allocator uses a rather naive scheme, and needs help
for good code generation.
The Library size is small (maybe 100-200k?).
JITCS resembles in design AsmJit. It is slower than xbyak, and produces worse
code than LLVM. JITCS will employ a much more sophisticated register allocator
than AsmJit.