assets/developer-notes/stephanie-gawroriski/2016/07/06.mkd

   1 # 2016/07/06
   2
   3 ## 07:39
   4
   5 Today is my birthday. Yay.
   6
   7 ## 07:46
   8
   9 The immutable configuration should match getters with the mutable one for
  10 simplicity.
  11
  12 ## 07:53
  13
  14 The simulations can act as a group. Instead of having individual simulations
  15 there would just be a system simulation. If a system instance does not exist
  16 then it will be created. When a program is requested to run on a given system
  17 then it will be passed to it. So this way, multiple programs can run on any
  18 given system but they would essentially be very standalone in their own
  19 execution.
  20
  21 ## 07:58
  22
  23 One thing to consider however is that the simulation could launch programs
  24 that exist in the simulated filesystem or from the real filesystem. I should
  25 likely just support it only from the simulated filesystem. With my current plan
  26 there would be a system filesystem and a home one. However, some operating
  27 systems may require a combined user based filesystem on top of a system based
  28 one. Also, some operating systems do not have a filesystem at all. For example
  29 Palm OS has a database filesystem while expansion cards act as traditional
  30 filesystems. The Nintendo 64 has only ROM and block based storage (although if
  31 a 64drive is used, the main cart can contain a filesystem). So having an
  32 external filesystem support could be slightly odd. However, for a Linux
  33 based system I should be able to run `fossil`. This way I can have a bootstrap
  34 build environment (assuming I can also get the Java compiler and interpreter
  35 also simulated). I will bump into a chicken and the egg problem however. Right
  36 now I have no class library and no graphical interfaces. For the simulator to
  37 work better, I need a graphical interface since for some systems such as
  38 Palm OS, everything uses graphics. However for testing, I can have a virtual
  39 serial port that the user can use which the operating system can output to
  40 perhaps for a given process. So on Linux, the test program would pipe its
  41 output to a virtual serial device, on Windows it would output via a COM1 or
  42 such. On the Nintendo 64's 64drive, it would output to its USB connection.
  43 This way I can have the test system output while being simulated be parseable
  44 for errors and such. If a test fails on a real system then that is incorrect,
  45 while if a test fails in just the simulator the simulator is incorrect.
  46
  47 ## 11:10
  48
  49 Well, Google never wished me a Happy Birthday, but Microsoft did.
  50
  51 ## 11:28
  52
  53 When it comes to the new `JITLogicAcceptor`, I should only create it when it
  54 actually is needed when a method needs to get its logic handled. Before I was
  55 definitely going to clutter `JITOutput` with logical operations, but that
  56 would be nasty. However with the `JITLogicAcceptor` I can essentially have a
  57 multiplexer which can output to multiple logical acceptors with the same
  58 information. Then this way, I can handle caching and runtime JIT at the same
  59 time. Thinking about other things, I wonder if in the game of life it would
  60 be possible to create standalone simulations (not using the super cells that
  61 run the same simulation) but a way where I can create a glider and a few other
  62 objects at specific positions within the world. If that could be done then a
  63 JIT would be possible in the game of life.
  64
  65 ## 11:44
  66
  67 Actually what I need is some kind of cache creator callback that could be
  68 specified in the output. So basically, there would be a (`Auto`)`Closeable`
  69 class writer of sorts. `JITOutput` would return this. It would essentially be
  70 a `beginClass` with the namespace and the name of the class. If the output is
  71 to be written to a cache then that can be handled by `JITOutput`. Otherwise
  72 if it is a binary that could also be handled too. So then `JITOutput` gets a
  73 producing class. Finishing the class can be handled by whatever implements the
  74 interface accordingly.
  75
  76 ## 12:05
  77
  78 I am going to have to have a creator interface for cached forms. The config
  79 will be given an interface which would create `OutputStream`s for writing to
  80 the disk. One thing that I could do on Linux, is create actual object files
  81 and then create a linker similar to `gcc` which can combine all of the object
  82 files together with a set entry point which performs the work. If I match the
  83 native format I could manually link all the classes together myself and set
  84 an entry point so to speak. Although this is not required at all. It would
  85 be interesting for hybrid programs however.
  86
  87 ## 12:10
  88
  89 However for `JITCacheCreator` that can sort of be hidden in a way by the
  90 `JITOutput`. However, `JITCacheCreator` would be sent to the configuration. If
  91 it is sent then that means `JITOutput` should output to a cache. I just need
  92 the creator because the `JITOutput` has no means of determining exactly where
  93 to place the `OutputStream` and if it even wants to be stored on the disk.
  94
  95 ## 13:21
  96
  97 So I need to copy over the class flag code and use that in the decoder since
  98 that will be very important.
  99
 100 ## 13:28
 101
 102 What I need though is a name. The CI code would basically be imported as-is
 103 for the most part, except with some slight changes. Well, perhaps not changed
 104 at all. The CI family will basically be going away with the new JIT since there
 105 is no need to really keep it around. A future Java compiler could just write
 106 the class data directly anyway.
 107
 108 ## 13:40
 109
 110 JIT would not really work out well. `Class` would not work.
 111
 112 ## 19:55
 113
 114 So basically what I need to determine now is how `beginClass` is to work. I
 115 have thought of it before. Currently I just have caching used. However with a
 116 multiplexing output for classes I would not need to worry about duplicating or
 117 having lots of branches. However what would essentially happen is that when
 118 class code is generated, the work would essentially be performed twice. One
 119 for being cached and the other if directly executed. Since that would be a bit
 120 of a waste I would suppose that there would be branched handlers for output.
 121 Generally when directly executed just some basic class details are needed.
 122 However the cached form can also be directly executed and placed in memory and
 123 initialized. So what I really could just do is have just a cached form writer
 124 for now. Also, with the generic operating system handling and such, it is
 125 very possible to compile for multiple systems at the same time although
 126 linking may be complicated by that fact. One consideration with the simulator
 127 is that I could directly feed it blobs. However it should be able to run actual
 128 OS code because that would need to be tested more than just the native machine
 129 code that is being executed. So one thing to consider at this point, for
 130 systems such as Linux, is if as noted before blobs should match the standard
 131 compilable object format. Then the blobs could be linked into a static binary
 132 and initialized with basic C code for now. However in my case, it would
 133 essentially just be a blob wrapped in an ELF file.
 134
 135 ## 20:05
 136
 137 One thing to determine is if the ELF symbol table is unique to a given OS or
 138 if it is standard for the format. I believe symbol tables are non-standard.
 139 Appears that the symbol table is standard. One thing though is, if I can get
 140 away without needing a symbol table at all. However, going without one would
 141 be a bit ugly. However, at least with ELFs and symbol tables though I can kind
 142 of easily generate a binary even without linking. When it comes to system
 143 specific virtual machine calls (`unsafe`) then those calls can just be bound
 144 to special symbol names rather than their normally intended method. So
 145 instead of calling a mangled name for the method, it is instead just a call
 146 to say `__squirreljme_foo`. However likely it would be best to have an
 147 alternative static OS implementation somewhere where methods are bound to
 148 instead. So basically the JIT would turn `unsafe` calls into
 149 `foo.bar.ImplementingClass` with compatible signatures. Then this implementing
 150 class would have the magical code such as direct assembly access and such.
 151 This way assembly can stay out of the main libraries and exist only in a small
 152 portion of the code. One thing I need though is a kind of ELF output libary
 153 and a common name mangling scheme which would work in C for the most part.
 154 Technically I could just store everything in an ELF and then link the end
 155 result anyway. Alternatively, I can always have a blob to ELF converter for
 156 that. If all blobs have their size information, I could essentially just
 157 concatenate every single blob, setup a base linker script, and create a small
 158 wrapper program which initializes data from the blob.
 159
 160 ## 20:14
 161
 162 So I would say then that blobs would be best which include their detailed
 163 information. With a backwards linking chain of sizes, pointers to classes
 164 could be created. Finding a class definition would be a bit slow, but there
 165 could be a generic large table at the end which points to every blob that
 166 exists similar to what I have thought up of before.
 167
 168 ## 20:32
 169
 170 So I will need a good way to have an output class format. However, since
 171 testing things will be a bit unknown while I have code generation, I should
 172 switch to working on an interpretive handling of classes. I would like the
 173 runtime classes to be purely interpreted (the JVM classes) in a kind of
 174 bootstrap state, however that would be very complex. However, an alternative
 175 would be to kind of target an abstract machine of sorts that can easily handle
 176 the logic. I could for example output to plain C code. This would in a way
 177 test how standalone the code is. While an intrepreter in Java will essentially
 178 be a `JVM` instance. However since `JVM` is essentially the kernel, it would
 179 need a way to setup sub-objects and processes. However the major issue is that
 180 the interpreter would be completely unable to call into the normal code and
 181 access kernel space objects as if they were user-space since the objects
 182 would be in completely different domains. However it could still work if the
 183 interpreter based machine is simple enough. The only initial problem is that
 184 the run-time kind of will expect all of the classes to exist and be built
 185 into the executable. Any dependencies would also be included too. There would
 186 just be enough classes to have a JIT and to run the launcher. I just hope that
 187 including all of them does not complicate and bloat it for some targets that
 188 are extremely limited in size. However, all my JARs are essentially just
 189 476KiB right now. For a wide range of systems that is enough. The actual size
 190 is 386KiB however (due to sector round up). Since the blobs would essentially
 191 have no debugging information, that would be a bit lighter. When running,
 192 each class in its compilation state will need to access a lookup table which
 193 refers to other classes and methods. One thing that would be very efficient
 194 would there to be a completely merged table. Otherwise there are going to
 195 be many instances to `java.lang.Object` for example. So I will say that when
 196 it comes to `JITOutput` that it always merge all possible classes and place
 197 it into a single index. Since `JITOutput` is initialized by something, the act
 198 of calling `close` on the output could finish off anything it needs to perform.
 199 Then this way to the JIT, all classes in a single namespace can be compiled at
 200 once and essentially a JITted JAR would be loaded 100% at-run time or at least
 201 memory mapped. Entire programs would be precompiled and resources included
 202 also. This would also probably be the best result also. Since this is the
 203 more efficient route and has much constant deduplication I must take it.
 204 Wasting lots of space on the same copies of a string would be a waste.
 205
 206 ## 20:47
 207
 208 So considering this, I should probably split the class and global tables to
 209 namespaces. This way, single blobs get their own namespace. Doing it this way
 210 would mean that `JITOutput` would not need to be closed. Any and all classes
 211 in the same namespace would be compiled together and use the same constant
 212 data. At least with it this way, when programs are running as user-space they
 213 do not need to have all the namespaces the kernel uses visible to the process.
 214 Only the `JVM` needs access to all of the default namespace areas. So
 215 basically then a `JITNamespaceWriter` is created which then has a `close` and
 216 a `beginClass` without a namespace. Since everything within a single JAR is
 217 visible to each other this namespace split makes sense. It would also make it
 218 easier for the launcher showing default namespaces JARs.
 219
 220 ## 20:55
 221
 222 Then this also means that `JIT` gets `JITNamespaceWriter` to target namespaces.
 223
 224 ## 21:02
 225
 226 Then for the builder, since the `JVM` and the builder will generally take an
 227 input JAR and recompile all of it, I can instead have a generic class in `JIT`
 228 which has a basic common interface to all of the JAR handling so that
 229 namespace recompilation is handled using the same code. Because as it stands
 230 right now, I would need to duplicate that work in the `JVM` and any changes to
 231 it may change and potentially break apart.
 232