assets/developer-notes/stephanie-gawroriski/2016/06/09.mkd

   1 # 2016/06/09
   2
   3 ## 09:33
   4
   5 Ok so, what I need to do is create actual threads and work on the structure
   6 manager.
   7
   8 ## 10:48
   9
  10 An idea that I have for the structure manager is that partitioning free space
  11 that is available would cost a bunch of memory. So instead of having an
  12 active partition table, I will have two regions. The region at the start of the
  13 lowest available memory would be the allocation table. This region specifies
  14 basic object information and the regions of memory which are actually used. So
  15 basically there would be a handle table at the start for every allocation. At
  16 the start, the memory would be initialized to contain an intial table of a
  17 given size and a large contigious free chunk. Each allocation chunk would then
  18 have information such as its type, where its pointer is allocated, some GC
  19 information, and basically that would be it. It would have to be small so that
  20 lower memory systems are not insanely pressured into it. There would also need
  21 to be size information also. So right now if a 32-bit field is used for type
  22 and GC flags, then that means two pointer sized fields are used for the
  23 data pointer and the size.
  24
  25  * 16-bit: 4 + 2 + 2 = 8
  26  * 32-bit: 4 + 4 + 4 = 12
  27  * 64-bit: 4 + 8 + 8 = 20
  28
  29 I could actually split the start 32-bit field into other fields which may have
  30 atomic operations performed on it. When it comes to the class type that an
  31 object is, or the length of the array that can be placed in the data pointer
  32 information.
  33
  34 ## 11:42
  35
  36 For arrays the type and length could be determined by the table index
  37 information. Then with this, I can have byte arrays which are mapped to
  38 specific regions of memory that are outside of the allocated memory zone.
  39
  40 ## 12:07
  41
  42 For simplicity, the memory table contents will only pertain to their own
  43 pool region. This way I can have multiple pools where their table data is not
  44 placed in another pool.
  45
  46 ## 12:23
  47
  48 I suppose the given allocation strategy shall be used:
  49
  50  1. Go through all memory pools
  51     1. Try to allocate a given amount of bytes.
  52     2. Check to see if compacting all of the data within a single pool would be
  53        enough to allocate the given number of bytes (without performing the
  54        compaction).
  55        * If there is enough room, then everything in the pool is compacted
  56          until a large enough area of free space is available.
  57  2. Try to allocate another memory pool, if it works run the steps for _1_ for
  58     that given pool.
  59  3. Garbage collect all memory pools and clear weak/unreferenced objects.
  60  4. Run step _1_ again.
  61  5. To the running program, an `OutOfMemoryError` is thrown.
  62
  63 The compaction would have to consider also objects it cannot move (because they
  64 may be locked by a thread).
  65
  66 ## 12:34
  67
  68 I suppose for simplicity and some debugging, the identity hash code of an
  69 object will be the table reference index. However a random value would likely
  70 be a better choice. Many of the references would have very close values with
  71 the same higher bits which would make it harder on hash maps. Random would
  72 be the best choice. However, the identity hash code could be a 16-bit value
  73 which is duplicated in the high and low places. This may be enough for Java ME
  74 since there is hopefully not going to be more than 65,000 objects at any one
  75 time.
  76
  77 ## 12:41
  78
  79 If multiple pools are available then pools which cannot be locked are ignored
  80 for allocation attempts. So if multiple threads are allocating at the same time
  81 in a multi-pool scenario they can freely manage their own pools so to speak.
  82 One thing to consider however is that there needs to be a latch for the garbage
  83 collector (which needs to lock all pools). There would be a lock collision if
  84 the GC attempts to lock all pools. So I suppose that the first memory pool
  85 should have a special flag be used that is compared and swap to indicate that
  86 a garbage collector is running. An allocation would first compare and swap the
  87 GC lock to prevent a GC from being run, then it would lock the current pool,
  88 if that fails it would unlock the GC lock and try another pool. If the pool
  89 lock succeeds then the pool remains locked and the GC lock is cleared. In the
  90 event that two threads are allocating in a pool and they both determine that
  91 the GC has to run, they will then both CAS the GC lock. Whichever one gets the
  92 true condition performs the garbage collection. The one which fails getting the
  93 garbage collector will yield-spin until the GC lock becomes available again.
  94 Since another thread performed garbage collection it would start the allocation
  95 cycle all over again.
  96
  97 ## 12:50
  98
  99 So the allocator would keep a reference to the first pool for the GC lock and
 100 then use the other pools accordingly. Actually if two threads were to hit the
 101 GC at the same time.
 102
 103 ## 12:56
 104
 105 Actually I can compare and swap the same value. Use for example
 106 `compareAndSwapInt(a, 0, 0)`. Although thinking about it, since the flags are
 107 quite simple it might be best if it were a pointer instead. However that would
 108 complicate the math because it would need to be stored the positions of the
 109 fields rather than being pure constants.
 110
 111 ## 12:59
 112
 113 Actually if two threads are running and both hit the garbage collector lock
 114 then flagging the other to determine if a GC ran would be simply just comparing
 115 the value against the GC is running value. If that value was detected then the
 116 allocation algorithm restarts at the beginning.
 117
 118 ## 13:10
 119
 120 One thing to consider however is cache coherency when it comes to data
 121 allocation.
 122
 123 ## 13:19
 124
 125 One major problem though is false sharing. Since all of the table data is at
 126 the start and very close to each other, when one thread allocates data the
 127 other threads's CPUs will have to throw out their object information caches.
 128
 129 ## 13:24
 130
 131 The current model I plan would work a bit better on older CPUs and such
 132 however. Also thinking about it, the memory pool information would be falsely
 133 shared also due to the locks so to speak.
 134
 135 ## 13:29
 136
 137 One way would be to add a get and set operation in the memory pool manager.
 138 There would be a configuration pool so to speak which would have a pointer to
 139 the last allocated object. When a new object is allocated it will atomically be
 140 compared and set to the new allocated object. Then the current object's back
 141 chain will point to the object that was allocated before it.
 142
 143         .       1 ( --- 2 ( --- 3
 144
 145 Then the garbage collector would perform a global lock and then go through
 146 all threads and would mark any objects which can be reached. Then once that
 147 stage is done the object chain is walked down which destroys any objects
 148 which were not marked as visited and their chains reset accordingly.
 149
 150 ## 13:39
 151
 152 However that would remove the capability of having compaction and would make
 153 it more difficult on lower memory systems. Also, when ordering fields, I need
 154 to have finals first with their alignment which is then followed by any
 155 volatiles or writable fields. This way there may potentially be a reduction in
 156 the amount of false sharing when reading field values that will never change
 157 in value.
 158
 159 ## 13:48
 160
 161 When it comes to the memory allocator with this scheme, it just would not scale
 162 well.
 163
 164 ## 14:00
 165
 166 Hypothetically, the table indexes would have to be at the minimum the size of
 167 a cache line. So depending on the architecture that is running, there would
 168 potentially be a bunch of wasted space in the starting table data.
 169
 170 ## 14:06
 171
 172 So for the JIT to possibly generate better code for a target system, will need
 173 to have CPU metrics and such that can vary at run-time to adjust for how a
 174 system runs. So ideally, precompiled code will be generated exactly the same
 175 for the most part, however the layout of actual objects would vary depending
 176 on the host system. So despite having the same code, the cache line size could
 177 be 32 on one system and 64 on another, so table entries would be rounded up to
 178 this given size. This would also need to be the barrier between final and
 179 fields which are mutable and then ones which are volatile.
 180
 181 ## 14:24
 182
 183 Currently reading an academic paper on Hoard. Technically I could use it since
 184 it is GPLv2+, however it is either C or C++ which when translated to Java does
 185 not really work well at all. So thinking about it, the current memory pool as
 186 is is a bad idea and will not work out at all. It should instead be a class
 187 which acts as a memory allocator. Then the structure manager could be managed
 188 on top of base memory manager implementations.
 189
 190 ## 14:45
 191
 192 Ok so, currently my memory manager is a bit too high level. The kernel and
 193 drivers may need to actually access the address space of the host on a fashion
 194 that the infamous `Unsafe` would do. So to the kernel, this would provide
 195 basically access to the entire address space using a common interface and
 196 such. I can also provide a similar interface for the MMUs that CPUs use also.
 197 So if a MMU is available, when code is generated I can instead catch the
 198 exception potentially rather than checking against a null pointer so to speak.
 199 Although that would be a bit ugly personally.
 200
 201 ## 14:51
 202
 203 Something that is interesting is that there is a reflection of a reflection
 204 of my eye and my iris/pupils in my glasses. That is light comes in, reflects
 205 off my eye, and then reflects off my glasses. The whites of the eye is a very
 206 dark purple while where the pupil and iris are completely black.
 207
 208 ## 14:53
 209
 210 Due to the varying memory models on CPUs, there are not completely linear
 211 memory models. Some portions of memory could also be banked switched, there
 212 could be paging also. Then for certain architectures, code and data is kept
 213 separate (such as in the JVM) where code might not even be readable.
 214
 215 ## 16:48
 216
 217 Since doubles and floats can be accessed via int/long I will not have them.
 218
 219 ## 17:29
 220
 221 Actually having the memory access having a block would be very ugly. It would
 222 be best if the memory space were kept as flat as possible.
 223
 224 ## 17:39
 225
 226 When it comes to memory management, some of it could be managed by the OS (if
 227 there is one). Hypothetically, `valgrind` could be used to determine if there
 228 are any memory leaks.
 229
 230 ## 18:22
 231
 232 I can keep the structure manager which can use an allocator for a native system
 233 and such.
 234
 235 ## 18:27
 236
 237 Actually thinking about it, instead of the interpreter creating the memory pool
 238 and the required information from it, it can be assigned a memory pool by the
 239 kernel instead. This would make it possible to run the interpreter to run some
 240 code on the real kernel where it could flow into natively compiled code for
 241 example. Although it would be slow, there could be a practical application
 242 for this.
 243
 244 ## 18:33
 245
 246 To handle the rerecording interpreter however I would need a way to determine
 247 the type and size of the allocated memory space. Memory allocation would also
 248 need to be deterministic, but in these cases it would be deterministic anyway.
 249
 250 ## 20:16
 251
 252 One thing I must rethink is how to integrate the interpreter memory with the
 253 kernel memory. At the start, the interpreter would have used the kernel
 254 memory but that was switched to the kernel uses the interpreter memory. The
 255 current method makes it easier for the re-recording interpreter to handle
 256 re-recording. Since both states can potentially be very complex, I suppose
 257 what I can do instead of a `Interpreter` is have an object which has
 258 interpreter configuration details. The configuration details would have the
 259 ability to create a structure manager and memory accessor.
 260
 261 ## 20:20
 262
 263 The `JVMKernel` can also use the configuration system. Actually, instead the
 264 stuff can be kept in the interpreter. However the kernel itself it initialized
 265 with settings that the interpreter was initialized with. So basically the
 266 interpreter would take the argument list and it could potentially modify them
 267 before they are passed to the kernel. Then this way the kernel does not have to
 268 use the argument rewriting (although it could still exist, just the `JVMKernel`
 269 can do nothing with it now).
 270
 271 ## 20:24
 272
 273 A difficulty with that is the constructor, since I can only use a default
 274 no-arg constructor. This means a factory must be used instead.
 275
 276 ## 20:27
 277
 278 The interpreter would handle the arguments and such, there would be an early
 279 argument handling method which could potentially adjust the remaining set of
 280 arguments (or pass them through untouched). Then the arguments that the
 281 interpreter gives (which may be modified if rerecording is enabled) will be
 282 passed to the kernel. Alone, the interpreter would do nothing and as such
 283 anything that wants to use it must then take its initialization arguments and
 284 do what it needs to with it.
 285
 286 ## 22:18
 287
 288 When it comes to the memory regions, perhaps instead everything should be
 289 hidden behind the structure manager. Instead of the kernel directly accessing
 290 the `MemoryAccessor` interface, it can take it from the `StructureManager`.
 291
 292 ## 22:26
 293
 294 Also the pointer type can be decoupled from the structure manager and placed in
 295 memory instead.
 296
 297 ## 22:29
 298
 299 Then for memory regions (code, data, etc.) it can be utilized via the access
 300 manager by having it indicating the type of information it provides. So for
 301 example if the memory accessor has combined data and code it would indicate
 302 as such. If it had non-executable data and non-readable code then it could
 303 require two memory accessors to be used for each.
 304