positron/doc/database.html

   1 <html>
   2 <head>
   3   <link rel="stylesheet" type="text/css" href="style.css" />
   4   <title>Positron Developer's Guide: Working with the Database</title>
   5 </head>
   6
   7 <body>
   8 <h1>Positron Developer's Guide: Working with the Database</h1>
   9
  10 <p>
  11 The Neuros has several on-disk databases that are used by both the
  12 device during normal operation and the host computer during
  13 synchronization.  The primary databases are:
  14 </p>
  15
  16 <dl>
  17   <dt><a href="#audio">audio</a></dt><dd>Audio files stored on the
  18   Neuros</dd>
  19
  20   <dt><a href="#pcaudio">pcaudio</a></dt><dd>Audio files stored on the
  21   host computer</dd>
  22
  23   <dt><a href="#unidedhisi">unidedhisi</a></dt><dd>HiSi clips that
  24   have not been identified yet.  These clips should be fingerprinted
  25   and looked up on the HiSi server during synchronization.</dd>
  26
  27   <dt><a href="#idedhisi">idedhisi</a></dt><dd>HiSi clips that have
  28   been identified.  If the fingerprint is successfully located on the
  29   HiSi server, the database record corresponding to the clip should be
  30   removed from unidedhisi and put into this database along with the
  31   metadata returned from the server.</dd>
  32
  33   <dt><a href="#failedhisi">failedhisi</a></dt><dd>HiSi clips that
  34   could not be identified.  If the lookup fails, the record from the
  35   HiSi clip should be moved from the unidedhisi database to this
  36   database.</dd>
  37
  38 </dl>
  39
  40 <p>
  41 Each database can be thought of as a collection of records with a fixed
  42 number (greater than or equal to 1) of fields.  The first field is the
  43 primary field.  Next are zero or more fields called <emph>access
  44 keys</emph>, which are fields whose contents are indexed.  All the
  45 records that contain a particular value in an access key field can be
  46 quickly looked up.  (Example: finding all the songs with the genre
  47 "Rock") Finally, the access keys are followed by zero or more
  48 <emph>extra info</emph> fields.  These fields contain data that does
  49 not need to be indexed, like filenames or file sizes.  Some fields may
  50 also contain a collection of values, called a "bag."  This is used in
  51 the audio database to allow one file to be in multiple playlists at
  52 once (i.e. have multiple values in its playlist field).  Every
  53 database is required to have a special null record.
  54 </p>
  55
  56 <h2>Design</h2>
  57
  58 <p>
  59 The database structure described above is implemented as a tree of
  60 databases using a root database (audio, unidedhisi, etc.) with a child
  61 database (artist, genre, etc.)  for each access key.  When a record is
  62 added to the root, the actual contents of each access key field are
  63 replaced with a pointer to the record in the child db containing the
  64 value in its primary field.  Of course, if the value doesn't already
  65 exist in the child database, it must be added.  Null values are
  66 possible for access keys; just use a pointer to the null record in the
  67 child database.  The following diagram shows how this works:
  68 </p>
  69
  70 <div class="figure"><img src="sample-record.png" /></div class="figure">
  71
  72 <h3>File Layout</h3>
  73
  74 <p>
  75 Each root database is stored in a directory whose name is the same as
  76 the database name.  Inside that directory are two files holding the
  77 contents of the root database, a <emph>MDB</emph> file and a
  78 <emph>SAI</emph> file.  Child databases have a MDB and SAI file as
  79 well as a <emph>PAI</emph> file used for reverse lookups.  The name of
  80 each of these files will be the name of the database with either a
  81 ".mdb," ".sai," or ".pai" extension.  All the files for the root and
  82 child databases are stored in the same directory.  The file layout for
  83 the audio database is:
  84 </p>
  85
  86 <pre>
  87 audio/albums.mdb - Album child database
  88 audio/albums.pai
  89 audio/albums.sai
  90 audio/artist.mdb - Artist child database
  91 audio/artist.pai
  92 audio/artist.sai
  93 audio/audio.mdb - Root database
  94 audio/audio.sai
  95 audio/genre.mdb - Genre child database
  96 audio/genre.pai
  97 audio/genre.sai
  98 audio/playlist.mdb - Playlist child database
  99 audio/playlist.pai
 100 audio/playlist.sai
 101 audio/recordings.mdb - Recordings child database
 102 audio/recordings.pai
 103 audio/recordings.sai
 104 </pre>
 105
 106
 107 <h3 id="dataformat">Data Packing</h3>
 108
 109 <p>
 110 Because of the nature of the DSP used in the Neuros, all database
 111 files are treated as a sequence of 16-bit words.  This creates some
 112 packing issues that need to be considered when storing data in
 113 database files.  This is handled in the following way for the 5 major
 114 types of data:
 115
 116 <dl>
 117
 118   <dt>Bit-fields</dt><dd>Bit-fields are usually 16 bits and are stored
 119   in big-endian byte order.</dd>
 120
 121   <dt>Integers</dt><dd>Integers are 16 or 32 bits and also stored in
 122   big-endian byte order.</dd>
 123
 124   <dt>Pointers</dt><dd>Pointers are 32-bit integers that point to
 125   offsets in a file, rather than memory locations.  However, unlike
 126   the pointers most programmers are used to, these pointers point at
 127   16-bit words rather than bytes.  So a pointer with value 0 refers to
 128   bytes 0 and 1 in a file, and a pointer with value 22 refers to
 129   bytes 44 and 45.</dd>
 130
 131   <dt>Null-terminated strings</dt><dd>These a similar to C-style strings.
 132   First, the string is null-padded at the end to make it end on a word
 133   boundary.  Then it is terminated with a null word, 0x0000.
 134   Example: "foo" (0x66,0x6f,0x6f) would be coded as 0x66,0x6f,0x6f,0x00,0x00,0x00.
 135   Data of this type is refered to as <em>sz</em> in the tables.</dd>
 136
 137   <dt>Display data</dt><dd>This appears to be similar in use to a
 138   string, but with the ability to include some sort of binary data.
 139   For this reason, display data is made by again padding the string
 140   (or whatever data) out to a word boundary with nulls, but then
 141   prepending a word with the data length in words, excluding the
 142   length word itself.  Example: "foo" would be coded as 0x00,0x02,0x66,0x6f,0x6f,0x00.
 143   Data of this type is refered to as <em>dd</em> in the tables.</dd>
 144 </dl>
 145 </p>
 146
 147 <h3 id="dbfiles">File Formats</h3>
 148 <ul>
 149   <li><a href="mdb.html">MDB File Format</a></li>
 150   <li><a href="sai.html">SAI File Format</a></li>
 151   <li><a href="pai.html">PAI File Format</a></li>
 152 </ul>
 153
 154
 155 <h3>Standard Database Field Definitions</h3>
 156
 157 <p>
 158 In the following sections, the fields for each of the standard
 159 databases are defined.  The following types are used to describe the
 160 extra info fields:
 161 </p>
 162 <ul>
 163   <li>sz - Null-terminated string</li>
 164   <li>uint32 - Unsigned 32-bit integer</li>
 165 </ul>
 166 <p>
 167 The primary field is always a null-terminated string, and the access
 168 keys are 32-bit pointers to records in child databases which have only a
 169 primary field (also a null-terminated string).
 170 </p>
 171
 172 <h4 id="audio">audio</h4>
 173
 174 <table class="fielddef">
 175   <tr><th>#</th><th>Name</th><th>Type</th><th width="70%">Description</th></tr>
 176   <tr><td>0</td><td>Title</td><td>Primary</td><td>Title of track</td></tr>
 177   <tr><td>1</td><td>Playlist</td><td>Access Key</td><td>Name of Playlist(s) containing this track</td></tr>
 178   <tr><td>2</td><td>Artist</td><td>Access Key</td><td></td></tr>
 179   <tr><td>3</td><td>Album</td><td>Access Key</td><td></td></tr>
 180   <tr><td>4</td><td>Genre</td><td>Access Key</td><td></td></tr>
 181   <tr><td>5</td><td>Recordings</td><td>Access Key</td><td>Set to "FM Radio" if the track was
 182      recorded from the radio and "Microphone" if it was recorded from the microphone.</td></tr>
 183   <tr><td>6</td><td>Time</td><td>uint32</td><td>Length of track in seconds</td></tr>
 184   <tr><td>7</td><td>Size</td><td>uint32</td><td>Size of track in kilobytes</td></tr>
 185   <tr><td>8</td><td>Path</td><td>sz</td><td>Path to track on Neuros filesystem.
 186      Follows path conventions specified in the <a href="neuros.html#pathformat">Overview</a>. </td></tr>
 187 </table>
 188
 189 <h4 id="pcaudio">pcaudio</h4>
 190
 191 <table class="fielddef">
 192   <tr><th>#</th><th>Name</th><th>Type</th><th width="70%">Description</th></tr>
 193   <tr><td>0</td><td>Title</td><td>Primary</td><td>Title of track</td></tr>
 194   <tr><td>1</td><td>Playlist</td><td>Access Key</td><td>Name of Playlist(s) containing this track</td></tr>
 195   <tr><td>2</td><td>Artist</td><td>Access Key</td><td></td></tr>
 196   <tr><td>3</td><td>Album</td><td>Access Key</td><td></td></tr>
 197   <tr><td>4</td><td>Genre</td><td>Access Key</td><td></td></tr>
 198   <tr><td>5</td><td>Recordings</td><td>Access Key</td><td>Set to "FM Radio" if the track was
 199      recorded from the radio and "Microphone" if it was recorded from the microphone.</td></tr>
 200   <tr><td>6</td><td>Time</td><td>uint32</td><td>Length of track in seconds</td></tr>
 201   <tr><td>7</td><td>Size</td><td>uint32</td><td>Size of track in kilobytes</td></tr>
 202   <tr><td>8</td><td>Path</td><td>sz</td><td>Path to track on host PC filesystem.</td></tr>
 203 </table>
 204
 205 <h4 id="unidedhisi">unidedhisi</h4>
 206
 207 <table class="fielddef">
 208   <tr><th>#</th><th>Name</th><th>Type</th><th width="70%">Description</th></tr>
 209   <tr><td>0</td><td>Title</td><td>Primary</td><td>Usually the name of the file </td></tr>
 210   <tr><td>1</td><td>Source</td><td>sz</td><td>Source of track (Ex: "FM 100.7")</td></tr>
 211   <tr><td>2</td><td>Path</td><td>sz</td><td>Path to track on Neuros filesystem.  Follows path
 212       conventions specified in the <a href="neuros.html#pathformat">Overview</a>. </td></tr>
 213 </table>
 214
 215 <h4 id="idedhisi">idedhisi</h4>
 216
 217 <table class="fielddef">
 218   <tr><th>#</th><th>Name</th><th>Type</th><th width="70%">Description</th></tr>
 219   <tr><td>0</td><td>Title</td><td>Primary</td><td>Title of HiSi clip (not title of actual track)</td></tr>
 220   <tr><td>1</td><td>Source</td><td>sz</td><td>Source of clip (see
 221       <a href="#unidedhisi">unidedhisi</a>)</td></tr>
 222   <tr><td>2</td><td>Artist</td><td>sz</td><td></td></tr>
 223   <tr><td>3</td><td>Album</td><td>sz</td><td></td></tr>
 224   <tr><td>4</td><td>Genre</td><td>sz</td><td></td></tr>
 225   <tr><td>5</td><td>Track Name</td><td>sz</td><td>Title of actual track.
 226       (Found during song identification)</td></tr>
 227   <tr><td>6</td><td>Time</td><td>uint32</td><td>Length of track in seconds</td></tr>
 228   <tr><td>7</td><td>Size</td><td>uint32</td><td>Size of track in kilobytes</td></tr>
 229   <tr><td>8</td><td>Path</td><td>sz</td><td>Path to track on Neuros filesystem.  Follows path
 230       conventions specified in the <a href="neuros.html#pathformat">Overview</a>. </td></tr>
 231 </table>
 232
 233 <h4 id="failedhisi">failedhisi</h4>
 234
 235 <p>Field definitions are the same as <a href="#unidedhisi">unidedhisi</a>.</p>
 236
 237 <table class="fielddef">
 238   <tr><th>#</th><th>Name</th><th>Type</th><th width="70%">Description</th></tr>
 239   <tr><td>0</td><td>Title</td><td>Primary</td><td></td></tr>
 240   <tr><td>1</td><td>Source</td><td>sz</td><td></td></tr>
 241   <tr><td>2</td><td>Path</td><td>sz</td><td></td></tr>
 242 </table>
 243
 244 <h2>Maintaining Database Consistency</h2>
 245
 246 <p>
 247 Reading the database is fairly straightforward; the tricky part is
 248 maintaining consistency between all of the parts of the database when
 249 modifying it.  Failure to do this correctly often leads to
 250 unpredictable behavior in the Neuros, and sometimes even causes it to
 251 freeze.  The follow sections will explain these consistency conditions
 252 for various common operations on the database.  They assume you are
 253 familiar with the format of the various <a href="#dbfiles">database
 254 files</a>.
 255 </p>
 256
 257 <h3>Adding a Record</h3>
 258
 259 <p>To add a new record to a database:</p>
 260 <ol>
 261
 262   <li>Locate pointers to all of the access keys by searching the
 263   appropriate child database for each one.  Note that empty access keys
 264   should point at the null record in the associated child database.</li>
 265
 266   <li>Any access keys that cannot be found need to be added to the
 267   child databases (by recursively following these steps), and these
 268   new pointers gathered.</li>
 269
 270   <li>Using these pointers to the access keys, a new record needs to
 271   be added to the MDB file.</li>
 272
 273   <li>If this is a child database, a new module in the PAI file should
 274   be created.</li>
 275
 276   <li>A new SAI record should be created with pointers to the MDB
 277   record created in step 3, and the PAI module in step 4.
 278   (Remember the <a href="sai.html#caveat">caveat</a> about pointers to
 279   PAI modules.)</li>
 280
 281   <li>An entry to the MDB record created in step 3 needs to be added
 282   to the PAI module associated with each access key in the child
 283   databases.  If there is no more room for an additional entry in a
 284   PAI module, it may be necessary to <a href="#extendPAI">extend
 285   it</a>.</li>
 286
 287 </ol>
 288
 289 <a name="extendPAI">
 290 <h3>Extending a PAI Module</h3>
 291 <p>Extending a PAI module makes it larger so that it can hold more
 292 pointers to parent records:</p>
 293
 294 <ol>
 295   <li>Move all of the modules after the module being extended down by
 296   the size of the extension.  Remember that the size of PAI module
 297   must be a multiple of the <a href="pai.html#minlength">minimum
 298   module length</a>.</li>
 299
 300   <li>Fill in the extra space with zeros.</li>
 301
 302   <li>All of the PAI pointers in the associated SAI file will be wrong
 303   for modules that are located after this one in the PAI file.  Update
 304   them to reflect the new locations.</li>
 305
 306 </ol>
 307
 308
 309 <h3>Deleting a Record</h3>
 310 <p>There are two ways to delete a record.  It can be "marked" as
 311 deleted, and it will be ignored by the Neuros (this is what the
 312 firmware does when the user requests the a record to be delete):</p>
 313
 314 <ol>
 315   <li>Set the "isDeleted" flag on the MDB record.</li>
 316
 317   <li>Remove the entire SAI record associated with this record.  This
 318   must be done by moving the rest of the file up by the size of one
 319   record.</li>
 320
 321   <li>If this is a child database, clear the associated PAI module and
 322   mark it as empty.  The module does not actually have to be
 323   removed.</li>
 324
 325   <li>If this database has access keys, then the backlinking pointers
 326   in each of the associated PAI modules in the child databases needs
 327   to be removed.  Removing a pointer from a PAI module requires it to
 328   be erased by sliding all of the other entries in that module up.
 329   (Not the whole file!  The module itself should not change size.  Put
 330   zeros in the extra spot in the module this creates.)</li>
 331
 332   <li>If an access key in a child database no longer has any entries
 333   in its PAI module, that means it is no longer used by any records in
 334   the parent database and should be deleted.  Follow these same steps
 335   again for it.</li>
 336
 337 </ol>
 338
 339 <p>Complete deletion of a record is much more difficult.  It is
 340 identical to the above process, but also requires the MDB records and
 341 PAI modules to be removed from their respective files.  This will
 342 invalidate many pointers in the whole database, so it is perhaps
 343 easier to rebuild the database from scratch than to attempt to update
 344 the contents of the database in place.</p>
 345
 346 </body>
 347 </html>