jblite_overall_plan.rst

   1 =============================\r
   2  JBLite Design Documentation\r
   3 =============================\r
   4 \r
   5 JMdict\r
   6 ======\r
   7 \r
   8 Database object API\r
   9 -------------------\r
  10 \r
  11 1. __init__(filename, init_from_file=None, init_method="etree")\r
  12 \r
  13    - Encapsulates an SQLite 3 database\r
  14    - Default: specify SQLite 3 DB file name\r
  15    - Alternative: Specify init_from_file to create a new SQLite\r
  16      database based upon a source file.  (File must be in Jim Breen's\r
  17      JMdict XML format, in its default UTF-8 encoding.  However,\r
  18      either the gzipped or uncompressed version may be used.)\r
  19 \r
  20      - Extra arg: init_method.  Default is "etree", which uses\r
  21        CElementTree to quickly import and create a database.\r
  22 \r
  23        A low memory alternative implementation using SAX or similar\r
  24        may be provided, although this is known to be painfully slow...\r
  25        If this is done, it may be better to make a C extension for\r
  26        this logic...\r
  27 \r
  28 2. search(query, pref_lang=None)\r
  29 \r
  30    - single API to handle searches of both Japanese and foreign\r
  31      language glosses.\r
  32    - pref_lang determines the "foreign" language to search.  None\r
  33      means search all.  Known values will be "en" and "fr".  Maybe\r
  34      "es(?)" (Spanish) and "??" (German) as well...?\r
  35 \r
  36 \r
  37 Entry API\r
  38 ---------\r
  39 \r
  40 What do we want to query as-needed?\r
  41 \r
  42 - keb/reb/glosses as main\r
  43 - other...\r
  44 \r
  45 \r
  46 Object design\r
  47 -------------\r
  48 \r
  49 ::\r
  50 \r
  51   Database\r
  52    |\r
  53    +- Tables\r
  54        +- EntityTable (XML entity lookup, to save space)\r
  55        +- 1-M mapping tables\r
  56        +- Misc. tables.... generalized if possible, specialized if must\r
  57 \r
  58 Database design ideas:\r
  59 \r
  60 - Database creates all needed tables from an XML file.\r
  61 - Search function knows which tables to query to find entries.\r
  62 - On a search match, the code will find the root node which owns the\r
  63   gloss in question.  (This means code specific to each match, since\r
  64   we got to walk back through the tables to find the original\r
  65   entry...)\r
  66 - Optimization: For any tables we want to be "searchable", add an\r
  67   extra column with the entry ID.  It's data duplication, but it keeps\r
  68   us from having to read 5+ tables to find the entry key.\r
  69 \r
  70 Database object ideas:\r
  71 \r
  72 - Optimization: For any given attribute: the first access reads it\r
  73   from the DB, the following accesses use the cached value.  Assumes\r
  74   the DB does not change in real time; a fair constraint on a single\r
  75   user study application.\r
  76 \r
  77   - More than one value may be read at a time in some cases... maybe?\r
  78   - Premature optimization?  Standard use may be to grab all data\r
  79     regardless...\r
  80 \r
  81 \r
  82 KANJIDIC2\r
  83 =========\r
  84 \r
  85 Databse object API\r
  86 ------------------\r
  87 \r
  88 1. __init__(filename, init_from_file=None, init_method="etree")\r
  89 \r
  90    - Encapsulates an SQLite 3 database\r
  91    - Default: specify SQLite 3 DB file name\r
  92    - Alternative: Specify init_from_file to create a new SQLite\r
  93      database based upon a source file.  (File must be in Jim Breen's\r
  94      JMdict XML format, in its default UTF-8 encoding.  However,\r
  95      either the gzipped or uncompressed version may be used.)\r
  96 \r
  97      - Extra arg: init_method.  Default is "etree", which uses\r
  98        CElementTree to quickly import and create a database.\r
  99 \r
 100        A low memory alternative implementation using SAX or similar\r
 101        may be provided, although this is known to be painfully slow...\r
 102        If this is done, it may be better to make a C extension for\r
 103        this logic...\r
 104 \r
 105 2. search(query)\r
 106 \r
 107    - query is a Japanese string containing one or more kanji.\r
 108 \r
 109 3. query_code_search(query_type, query)\r
 110 \r
 111    - Allows use of SKIP, De Roo, Four Corners and S&H query code\r
 112      systems to look up kanji.\r
 113 \r
 114 4. stroke_count_search(count, allow_miscounts=False, error_margin=0,\r
 115                        error_margin_type="plusminus")\r
 116 \r
 117    - Query by stroke count\r
 118    - On allow_miscounts: include common miscounts as candidates\r
 119    - error_margin allows minor miscounts on all candidates.\r
 120    - error_margin_type selects the type of margin: "plus", "minus", or\r
 121      "plusminus".\r
 122 \r
 123 5. stroke_count_filter(candidates, count, allow_miscounts=False,\r
 124                        error_margin=0, error_margin_type="plusminus")\r
 125 \r
 126    - Takes a list of candidates, filters them by count.  Database is\r
 127      only hit if necessary.\r
 128    - All other args are like stroke_count_search.\r
 129 \r
 130 Low priority:\r
 131 \r
 132 6. dict_code_lookup(dict_name, dict_code)\r
 133 \r
 134    - Takes a dictionary ID code and a dictionary code, returns a\r
 135      kanji.\r
 136    - Really limited use case... probably won't implement this.\r
 137 \r
 138 \r
 139 Entry API\r
 140 ---------\r
 141 \r
 142 What do we want to query as-needed?\r
 143 \r
 144 - readings (on/kun)\r
 145 - nanori\r
 146 - meanings (en/es/fr/etc)\r
 147 - stroke count\r
 148 - dict codes\r
 149 - query codes\r
 150 - lots of misc. info\r