3 ***DISCLAIMER***: _These notes are from the defunct k8 project which_
4 _precedes SquirrelJME. The notes for SquirrelJME start on 2016/02/26!_
5 _The k8 project was effectively a Java SE 8 operating system and as such_
6 _all of the notes are in the context of that scope. That project is no_
7 _longer my goal as SquirrelJME is the spiritual successor to it._
11 Work begins, first need to handle class files and such.
15 I believe my file system design will be tag based, allow for multiple root
16 filesystems, and some other things. Since ! is the really only safe URI
17 character that does not change much in UNIX and their shells (except for the
18 broken bash), that will work. The first path component will always be ! which
19 designates the tag file pool to use. So for example, there might be a tag
20 pool. Perhaps at for the pool might be better.
24 I know a better scheme, the selected tag pool is an absolute (mostly) path
25 which starts with "@/", while the normal UNIX-compatible root is just "/". In
26 the global tag directory of "@/", the potential "directories" will be all of
27 the visible tag pools the current user is permitted to access as determined by
28 ACLs. UNIX compatibility has to be important since so many things just depend
29 on it to work. Anyway, the "@/" contains all the major visible directories of
30 tags. Anything that starts with a + after such point is a query. So there may
31 be two ways to interact with the tag based filesystem. It should be noted that
32 the tag pool uses the filesystem as a whole rather than individual paths. I
33 personally dislike everything being exposed through the filesystem but it must
34 be done for simplicity (sort of) and URI work. I personally find the single
35 root a flawed design. However, POSIX does define double slash at start but I
36 need to determine if URL correcting (making it absolute) trashes the starting
37 double slashes. I will have to figure that one. One major thing is that there
38 are different case sensitivity domains, the tag domain will be insensitive
39 while the UNIX domain will be. It also looks like any extra slashes in the URI
40 path component of a file will be trashed.
44 Well clearing all of that. I will note some points.
46 * **Tag Domain** The tag domain shall start with "@/" and anything on the
47 system that goes into "@/" (say cd "@/") will enter the tag domain. The
48 POSIX special "//" is not used because it will be destroyed by URI/URL
50 * **POSIX Domain** Provides a usual flat view of the file system where the
51 root begins with a slash.
52 * **getRootDirectories()** Will list both the tag domain "@/" and the normal
53 UNIX domain, where the tag domain uses special stuff to access the
55 * **Directory case InSeNtItIvE bit** A directory could have an extended case
56 insensitive bit set to handle that case.
58 But for the tag interface, the root "@/" will list all of the tag pools. A tag
59 pool could be defined or separate.
63 Actually, I do not need POSIX compliance so I can just have all the root
64 directories be named pools like "@foo/" which interacts with all the tag stuff
65 as needed (so individual files can appear in many places). If a program
66 requires POSIX-compatibility say for a shell and whatnot then it can be
67 enabled per process and a simulated root for a set of tags could be managed in
68 a tree where file views are part of a tree. The tag based filesystem could
69 consist of both directories and subsirectories while being splayed apart. When
70 I get to the filesystem part, I will figure that out. Since I do need to get
71 back to byte code parsing.
75 Some more filesystem notes. Have volumes and sub volume pools which are all
81 @L=LabelOfPartitionOrStorage
82 @U=751be413-118f-a3aa-a0f8-0964095f076a
86 The volume description identifies a unique volume using either the disk label,
87 the UUID, or a user defined alias (which could link to a subpool which I will
88 describe later). Perhaps for ambiguity, I will need an alternative name system
96 @X=BlahBlah+there+are+too+many+subpools
98 Assume an alias named Foo which points to a subpool, so:
100 is the same as requesting
101 @L=MyFlashDrive+games+nethack
104 The subpool is optional, but is a place where a semi-combined state but in a
105 storage domain can be achieved. What this means is that if the sub volume
106 shares the same data as another volume it will be reduced (less duplicate
107 info) but the file will be invisible. Sub pools can have sub pools which
108 further divide things. There can also be a named alias to a sub pool. When
109 root directories are requested in the virtual machine it will list all the
110 known pools the user can actually list (there may be hidden root tag pools).
111 Some of these might lead to the same point despite being the same (imagine if
112 C: has the same exact contents of D: in DOS). However, correctly written code
113 could detect this as both would have the same exact pools. Also sub pools
114 would be able to inherit from other pools to create layers and views of other
115 data. So say there is a read-only pool on a Live CD, that provides the base
116 pool. A read/write pool could image that existing pool and provide changes
117 while existing on another storage medium that can be written to.
135 The tag field form consists of file system based on their meta tag type. There
136 are major categories which describe the intended usage of the content and then
137 there are fields in the categories which are rather specific. A hash for a
138 file might be called "hash:" while a specific hash algorithm would be
139 "hash:crc32". If you were to list the contents of the directory "hash:" you
140 would get all the field types that are defined for that filesystem for the
141 specified major group. As an extended example, there is the "owner:" group and
142 lets say that there are three users: "lex", "link", and "luigi". If you were
143 to cd into the "owner:" directory you would see three sub directories named
144 ":lex", ":link", and ":luigi". If you were to cd inside to one of those then
145 you would see all of the files associated to the specified user. In the main
146 pool view, such sublinks are not visible although they may be used. So you
147 would not see "owner:link" but could still cd into it. If you were to request
148 if "owner:link/" and "owner:/:link" were the same directory then it would
149 result in true. Note that the sameness is only for the pool, so if another
150 pool which has no borrowing from another pool performs the same check it would
151 fail. Note that multiple directories can contain the same files based on their
152 name. However with future thinking this will not work out, so it would be
153 better to visualize something like "acl:" and "acl:owner" which could have a
154 value of one of those names for example. Such ACLs would match the Java ACL
155 view and support multiple users and such. However reflecting on this, perhaps
156 "acl:name" would be much better because acls would be either attached to
157 single users or groups of users.
161 Examples for the hypothetical acl: major.
163 acl:owner -- Owner access of file (individual user)
164 acl:!group -- Group access of file (bunch of users together)
166 ACLs for the users from before:
172 And an example group:
176 Then there would be sub values for the ACL but that will be described
180 Group resolution would be performed and could encapsulate a ton of users as
185 Example ls of certain directories:
194 Now the one major important thing is the value of tags which associate
195 directly with metadata. If you were to ls the contents of "acl:lex" you would
196 end up getting every type of value that is associated with the "acl:lex" key.
197 Inside of those value directories would be files that match the specified
198 values. Meta data can be very different for varying types. An ACL would have a
199 flag set of permission maps such as read, write, list contents, and execute.
200 The tag handler would be built into the system which provides a file based
201 context on how to represent the system. So ACLs will end up being a bit field
202 where various flags are possible.
208 a = APPEND_DATA (files, ADD_SUBDIRECTORY for dirs)
214 r = READ_DATA (files, LIST_DIRECTORY for dirs)
219 w = WRITE_DATA (files, ADD_FILE for dirs)
220 N = WRITE_NAMED_ATTRS
224 When listing and this information is known, the bit flags will only print
225 individual flag types. Otherwise for the number of flags here there are about
226 16K combinations, if a flag were larger for say a 64-bit value then the
227 directory would be trashed with so many entries the system would run out. So
228 the represented data is context sensitive. There will be APIs to access the
229 internal tagging system to determine how data is being shown and how to browse
230 it. Value directories would start with the equals sign. In the sense of bit
231 flags, if the contents of "acl:lex/=w" were requested, it would treat it as a
232 binary AND which would mean that any file with the WRITE_DATA bit set would be
244 With such a system it would not be possible to represent files in directories,
245 but that can be simulated in a tag based system anyway (say a "unix:path"
246 component where the value is the directory it is contained in). Browsing
247 through this would be fun however and complex. Say in swing when there is an
248 open file dialog, you would be splashed with all these tags so there would
249 have to be a path specified view of a root. So say while the root will end up
250 being "@X=BlahBlah+poola" and will be listed in the list of roots as such, to
251 access the files in a path like nature where there are no tag lookups (since a
252 purely tag based system would be rather explosive) there will be a directory
253 called "$" in the root directory that just goes by the path form. So your
254 classic subdirectory view will be seen as something like
255 "@X=BlahBlah+poola/$/etc/somefiles". The dollar segment would just be doing
256 special tag based translation to have something familiar. There could also be
257 extended segments that just end in the dollar sign which have special meaning
258 so those will be reserved for later. So in short, ones ending in ':' are major
259 tag indexes while those ending in '$' have special path meanings after the
260 specified string. I believe the special names that should be reserved would be
261 anything that does not start with "x-". Then I can use a special thing like
262 "stat$" which contains information on the current pool, and the unique volume
263 IDs to determine if a pool is in the same volume or not. In POSIX mode to run
264 normal POSIX programs the default view will be the default pool root
265 directory. POSIX does not define mounts and such because it is very system
266 specific, in the POSIX sense the root of the filesystem it seems will be the
267 bland '$' handler it is bound to.
271 However, despite this file like view of everything I will not go crazy and
272 have a dev filesystem or a proc filesystem, you can use system APIs to access
273 that stuff. Special character devices and block devices are ugly because you
274 either make a protocol 1000 times over or use 1000 ioctls and fctls. But to
275 describe the bit flag value type, flags could be combined. This would mean
276 that doing "=rw" will only list files that are both readable and writeable.
277 Now the main problem is that this is currently at a single level, so you could
278 not add more tag types to refine searches (say you want all images that are
279 mostly red and were created on a Tuesday). You would then need to perform an
280 advanced query of tag data.
284 Thinking about all of that I know something better. The major:field stuff as
285 before, all the of the equals stuff will be literal tag values which are set
286 and those directories will contain every file that has the specified tag set
287 based on their directory locality. So the "unix:path" will be used in that
288 where it would be a simple system.
292 @X=BlahBlah+poola/acl:/:link/=rwx/my/program.sh
293 @X=BlahBlah+poola/acl:/:link/=rwx/some/other/script.sh
294 @X=BlahBlah+poola/acl:/:link/=rwx/yay.sh
295 @X=BlahBlah+poola/acl:/:link/=rw/important_text.txt
296 @X=BlahBlah+poola/acl:/:link/=/cannot_touch_this
297 @X=BlahBlah+poola/acl:link/=rwx/yay.sh
298 @X=BlahBlah+poola/acl:link/=/cannot_touch_this
301 This provides a simple one dimensional view and could be useful for bulk
302 operations to see everything. However the one major thing that would be used
303 now is a query system to do advanced searches. And that will be the special
304 "lex$" specifier. This still has to make valid URI path components so the
305 useful characters is limited, and the POSIX shell has special characters
306 already. Although : is used in the path component and = are in variables, they
307 can just be escaped. A single dollar sign is also never expanded if it is
308 followed by a / so that is a non-issue. In the query form, each directory will
309 consist of a single path component after the point to specificy a lexer
310 pattern for searching. One hintful thing is that "q$" should be an alias as it
311 is shorter and probably easier to remember. When a query is terminated it
312 provides the search results that match the "unix:path" so they are similar to
313 the global forms used. Queries in the special handler will always start with
314 +. If a file or directory in the result starts with a + then it will be
315 escaped so you will see an ugly "\\\\+" in the name of the file, that way you
316 do not end up starting a new search at all. So anything not starting with + is
317 not a tag search request. All search forms must then consist of normal URI
318 characters and other characters that would not cause handling to explode. They
319 will still take full forms but escaped forms would be completely acceptable.
320 The special handling character in this case will be +. It should be noted that
321 something to the lines of "+acl:link=rwx/+image:color=red" means to use only
322 files that are read/write/executable and are red images.
329 "+(mod)(major):(field)=(value)"
331 This handles major:field specific handler data in plain English so that is
332 is easier to determine what the typed information means.
334 Literal exact value match, no special translation:
335 "+(mod)(major):(field)==(value)"
337 This is a literal match, all forms must exactly match the data in the
338 specified way. This would be the same as if you read the attribute in
339 (the eventual) Java code and requested the raw information.
341 Result modifiers (always at the start), these may be stacked
342 "~" : NOT match, anything that would not match becomes matched.
343 "gleh!" : Force numerical value match, where value is a number or a type
349 h = Use "human-readable" amounts (B K M G etc.)
351 In the sense of a size
353 K = 1024 bytes (8192 bits)
354 M = 1048576 bytes (8388608 bits)
357 k = 1000 bits (125 bytes)
358 m = 1000000 bits (125000 bytes)
361 Since these are extrapolated readable values, there may be
371 So "lex$/+ge!file:size=16K" would match all files which are
372 greater than or equal to 16 * 1024 bytes.
374 And "lex$/+l!sound:length=3m" would match all sounds files which
375 are shorter than 3 minutes in length.
377 Otherwise, (major), (field), and (value) have the following potential.
379 Literal single byte hex value, "+x00" would be the NUL byte.
381 16-bit Unicode literal, same as \U in Java.
383 24-bit Unicode literal, for higher numerical pairs.
385 A literal plus sign (becomes +).
387 A literal forward slash (becomes /).
389 A literal backwards slash (becomes \).
391 A literal dollar sign (becomes $).
395 Start matching group set, ends at "+e".
397 Internal OR to group match set (akin to perl: "(foo|bar)").
399 End matching group set, starts at "+g".
401 Match character/group exactly ## times (count).
403 Match character/group exactly ## times or more (count).
405 Match character/group exactly ## times or less (count).
407 Match character/group exactly either ## or in between those values (count).
409 Match character/group outside of the specified ranges (count).
411 Wildcard, can be any character.
414 Major and field would be best forced to lower case always, but value could be