assets/developer-notes/stephanie-gawroriski/2016/07/19.mkd

   1 # 2016/07/19
   2
   3 ## 09:43
   4
   5 So I need an efficient means of reading streamed ZIP files. It is possible
   6 though that actual entries are spaced out and potentially not next to each
   7 other as in a normal ZIP file.
   8
   9 ## 10:03
  10
  11 I would suppose for stream based reading of ZIPs I will need a way to locate
  12 local file headers. So I need an adjustable buffered input stream where I can
  13 directly access the bytes in the buffer and know their actual positions. Once
  14 a local file header is reached I just then need to switch to a different
  15 size and detect the data descriptor if a uncompressed size was not specified.
  16 So generally something that would be a slight issue would potentially be ZIPs
  17 within ZIPs if they are placed correctly. However, I can calculate the CRC,
  18 size, and uncompressed size. Basically since the descriptor header is optional
  19 I essentially have to check every byte ahead of the current read position to
  20 determine if the compressed, uncompressed size, and CRC match the given file.
  21
  22 ## 10:14
  23
  24 Appears the standard ZIP utility included in my system does not support reading
  25 ZIPs from a pipe and working with their data.
  26
  27 ## 10:18
  28
  29 Basically for every byte that is read, checks will have to be made to determine
  30 if the end of the single entry has been reached. This would be not be very
  31 efficient since there would be a large number of tests. If the data descriptor
  32 was not option this task would be a bit easier. Probably something that would
  33 be a bit more efficient would be a double queue on the input bytes. So
  34 essentially the dynamic history stream would read from the last set of history
  35 and then there can be a peek method which can read ahead from the source
  36 stream. There can then be a get which returns the current history or at least
  37 a part of that history. This would be the most efficient means of writing the
  38 data. I can use the dynamic buffer code I previously wrote to manage the
  39 buffer and such.
  40
  41 ## 10:58
  42
  43 When `nextEntry` is called, it searches for the next entry based on header and
  44 other file information. Then an entry is setup. Calling close on the entry for
  45 it to work will basically read every byte until EOF is reached for that
  46 specific entry before being marked as closed/finished.
  47
  48 ## 12:46
  49
  50 I suppose after a `peek` there is instead just `readAhead` although that could
  51 be confused with `read`. I suppose instead that peek will just return the
  52 actual requested in the future, then there can be another `grab` which just
  53 loads the given number of bytes.
  54
  55 ## 13:52
  56
  57 `DynamicByteBuffer` could probably use a refactor to be much more efficient
  58 and much cleaner.
  59
  60 ## 16:41
  61
  62 Now that my code detects the local file header, I must now read it.
  63
  64 ## 20:07
  65
  66 When it comes to uncompressed data, there is potential that the decompressor
  67 could read a bit ahead. Also one thing I considered is that it is possible
  68 for the decompressor to use the historical stream to read in byte sequences
  69 so to speak for what it needs. However, for the reader, I need a lower level
  70 reader which is associated with the compressed size and that one performs the
  71 detection of the end of entry if the size is undefined.
  72
  73 ## 22:37
  74
  75 One issue is that I need to know the CRC of the uncompressed output and
  76 potentially the uncompressed size of the stream before that information is
  77 known to detect the end of the data. This would be a somewhat complex endeaver
  78 especially with the fact that the descriptor is optional. Of course this would
  79 be that much of an issue if the CRC were associated with the compressed data
  80 instead of the uncompressed one. I can always test this however.
  81
  82 ## 22:45
  83
  84 Saw a meteor in the sky outside my window, I wonder how common that is.
  85