1 =============================
\r
2 JBLite Design Documentation
\r
3 =============================
\r
11 1. __init__(filename, init_from_file=None, init_method="etree")
\r
13 - Encapsulates an SQLite 3 database
\r
14 - Default: specify SQLite 3 DB file name
\r
15 - Alternative: Specify init_from_file to create a new SQLite
\r
16 database based upon a source file. (File must be in Jim Breen's
\r
17 JMdict XML format, in its default UTF-8 encoding. However,
\r
18 either the gzipped or uncompressed version may be used.)
\r
20 - Extra arg: init_method. Default is "etree", which uses
\r
21 CElementTree to quickly import and create a database.
\r
23 A low memory alternative implementation using SAX or similar
\r
24 may be provided, although this is known to be painfully slow...
\r
25 If this is done, it may be better to make a C extension for
\r
28 2. search(query, pref_lang=None)
\r
30 - single API to handle searches of both Japanese and foreign
\r
32 - pref_lang determines the "foreign" language to search. None
\r
33 means search all. Known values will be "en" and "fr". Maybe
\r
34 "es(?)" (Spanish) and "??" (German) as well...?
\r
40 What do we want to query as-needed?
\r
42 - keb/reb/glosses as main
\r
54 +- EntityTable (XML entity lookup, to save space)
\r
55 +- 1-M mapping tables
\r
56 +- Misc. tables.... generalized if possible, specialized if must
\r
58 Database design ideas:
\r
60 - Database creates all needed tables from an XML file.
\r
61 - Search function knows which tables to query to find entries.
\r
62 - On a search match, the code will find the root node which owns the
\r
63 gloss in question. (This means code specific to each match, since
\r
64 we got to walk back through the tables to find the original
\r
66 - Optimization: For any tables we want to be "searchable", add an
\r
67 extra column with the entry ID. It's data duplication, but it keeps
\r
68 us from having to read 5+ tables to find the entry key.
\r
70 Database object ideas:
\r
72 - Optimization: For any given attribute: the first access reads it
\r
73 from the DB, the following accesses use the cached value. Assumes
\r
74 the DB does not change in real time; a fair constraint on a single
\r
75 user study application.
\r
77 - More than one value may be read at a time in some cases... maybe?
\r
78 - Premature optimization? Standard use may be to grab all data
\r
88 1. __init__(filename, init_from_file=None, init_method="etree")
\r
90 - Encapsulates an SQLite 3 database
\r
91 - Default: specify SQLite 3 DB file name
\r
92 - Alternative: Specify init_from_file to create a new SQLite
\r
93 database based upon a source file. (File must be in Jim Breen's
\r
94 JMdict XML format, in its default UTF-8 encoding. However,
\r
95 either the gzipped or uncompressed version may be used.)
\r
97 - Extra arg: init_method. Default is "etree", which uses
\r
98 CElementTree to quickly import and create a database.
\r
100 A low memory alternative implementation using SAX or similar
\r
101 may be provided, although this is known to be painfully slow...
\r
102 If this is done, it may be better to make a C extension for
\r
107 - query is a Japanese string containing one or more kanji.
\r
109 3. query_code_search(query_type, query)
\r
111 - Allows use of SKIP, De Roo, Four Corners and S&H query code
\r
112 systems to look up kanji.
\r
114 4. stroke_count_search(count, allow_miscounts=False, error_margin=0,
\r
115 error_margin_type="plusminus")
\r
117 - Query by stroke count
\r
118 - On allow_miscounts: include common miscounts as candidates
\r
119 - error_margin allows minor miscounts on all candidates.
\r
120 - error_margin_type selects the type of margin: "plus", "minus", or
\r
123 5. stroke_count_filter(candidates, count, allow_miscounts=False,
\r
124 error_margin=0, error_margin_type="plusminus")
\r
126 - Takes a list of candidates, filters them by count. Database is
\r
127 only hit if necessary.
\r
128 - All other args are like stroke_count_search.
\r
132 6. dict_code_lookup(dict_name, dict_code)
\r
134 - Takes a dictionary ID code and a dictionary code, returns a
\r
136 - Really limited use case... probably won't implement this.
\r
142 What do we want to query as-needed?
\r
144 - readings (on/kun)
\r
146 - meanings (en/es/fr/etc)
\r
150 - lots of misc. info
\r