src/docs/contributor/database.diviner

   1 @title Database Schema
   2 @group developer
   3
   4 This document describes key components of the database schema and should answer
   5 questions like how to store new types of data.
   6
   7 Database System
   8 ===============
   9
  10 Phabricator uses MySQL or another MySQL-compatible database (like MariaDB
  11 or Amazon RDS).
  12
  13 Phabricator uses the InnoDB table engine. The only exception is the
  14 `search_documentfield` table which uses MyISAM because MySQL doesn't support
  15 fulltext search in InnoDB (recent versions do, but we haven't added support
  16 yet).
  17
  18 We are unlikely to ever support other incompatible databases like PostgreSQL or
  19 SQLite.
  20
  21 PHP Drivers
  22 ===========
  23
  24 Phabricator supports [[ http://www.php.net/book.mysql | MySQL ]] and
  25 [[ http://www.php.net/book.mysqli | MySQLi ]] PHP extensions.
  26
  27 Databases
  28 =========
  29
  30 Each Phabricator application has its own database. The names are prefixed by
  31 `phabricator_` (this is configurable).
  32
  33 Phabricator uses a separate database for each application. To understand why,
  34 see @{article:Why does Phabricator need so many databases?}.
  35
  36 Connections
  37 ===========
  38
  39 Phabricator specifies if it will use any opened connection just for reading or
  40 also for writing. This allows opening write connections to a primary and read
  41 connections to a replica in primary/replica setups (which are not actually
  42 supported yet).
  43
  44 Tables
  45 ======
  46
  47 Most table names are prefixed by their application names. For example,
  48 Differential revisions are stored in database `phabricator_differential` and
  49 table `differential_revision`. This generally makes queries easier to recognize
  50 and understand.
  51
  52 The exception is a few tables which share the same schema over different
  53 databases such as `edge`.
  54
  55 We use lower-case table names with words separated by underscores.
  56
  57 Column Names
  58 ============
  59
  60 Phabricator uses `camelCase` names for columns. The main advantage is that they
  61 directly map to properties in PHP classes.
  62
  63 Don't use MySQL reserved words (such as `order`) for column names.
  64
  65 Data Types
  66 ==========
  67
  68 Phabricator defines a set of abstract data types (like `uint32`, `epoch`, and
  69 `phid`) which map to MySQL column types. The mapping depends on the MySQL
  70 version.
  71
  72 Phabricator uses `utf8mb4` character sets where available (MySQL 5.5 or newer),
  73 and `binary` character sets in most other cases. The primary motivation is to
  74 allow 4-byte unicode characters to be stored (the `utf8` character set, which
  75 is more widely available, does not support them). On newer MySQL, we use
  76 `utf8mb4` to take advantage of improved collation rules.
  77
  78 Phabricator stores dates with an `epoch` abstract data type, which maps to
  79 `int unsigned`. Although this makes dates less readable when browsing the
  80 database, it makes date and time manipulation more consistent and
  81 straightforward in the application.
  82
  83 We don't use the `enum` data type because each change to the list of possible
  84 values requires altering the table (which is slow with big tables). We use
  85 numbers (or short strings in some cases) mapped to PHP constants instead.
  86
  87 JSON and Other Serialized Data
  88 ==============================
  89
  90 Some data don't require structured access -- we don't need to filter or order by
  91 them. We store these data as text fields in JSON format. This approach has
  92 several advantages:
  93
  94   - If we decide to add another unstructured field then we don't need to alter
  95     the table (which is slow for big tables in MySQL).
  96   - Table structure is not cluttered by fields which could be unused most of the
  97     time.
  98
  99 An example of such usage can be found in column
 100 `differential_diffproperty.data`.
 101
 102 Primary Keys
 103 ============
 104
 105 Most tables have an auto-increment column named `id`. Adding an ID column is
 106 appropriate for most tables (even tables that have another natural unique key),
 107 as it improves consistency and makes it easier to perform generic operations
 108 on objects.
 109
 110 For example, @{class:LiskMigrationIterator} allows you to very easily apply a
 111 migration to a table using a constant amount of memory provided the table has
 112 an `id` column.
 113
 114 Indexes
 115 ======
 116
 117 Create all indexes necessary for fast query execution in most cases. Don't
 118 create indexes which are not used. You can analyze queries @{article:Using
 119 DarkConsole}.
 120
 121 Older MySQL versions are not able to use indexes for tuple search:
 122 `(a, b) IN ((%s, %d), (%s, %d))`. Use `AND` and `OR` instead:
 123 `((a = %s AND b = %d) OR (a = %s AND b = %d))`.
 124
 125 Foreign Keys
 126 ============
 127
 128 We don't use foreign keys because they're complicated and we haven't experienced
 129 significant issues with data inconsistency that foreign keys could help prevent.
 130 Empirically, we have witnessed first hand as `ON DELETE CASCADE` relationships
 131 accidentally destroy huge amounts of data. We may pursue foreign keys
 132 eventually, but there isn't a strong case for them at the present time.
 133
 134 PHIDs
 135 =====
 136
 137 Each globally referencable object in Phabricator has an associated PHID
 138 ("Phabricator ID") which serves as a global identifier, similar to a GUID.
 139 We use PHIDs for referencing data in different databases.
 140
 141 We use both auto-incrementing IDs and global PHIDs because each is useful in
 142 different contexts. Auto-incrementing IDs are meaningfully ordered and allow
 143 us to construct short, human-readable object names (like `D2258`) and URIs.
 144 Global PHIDs allow us to represent relationships between different types of
 145 objects in a homogeneous way.
 146
 147 For example, infrastructure like "subscribers" can be implemented easily with
 148 PHID relationships: different types of objects (users, projects, mailing lists)
 149 are permitted to subscribe to different types of objects (revisions, tasks,
 150 etc). Without PHIDs, we would need to add a "type" column to avoid ID collision;
 151 using PHIDs makes implementing features like this simpler.
 152
 153 Transactions
 154 ============
 155
 156 Transactional code should be written using transactions. Example of such code is
 157 inserting multiple records where one doesn't make sense without the other, or
 158 selecting data later used for update. See chapter in @{class:LiskDAO}.
 159
 160 Advanced Features
 161 =================
 162
 163 We don't use MySQL advanced features such as triggers, stored procedures or
 164 events because we like expressing the application logic in PHP more than in SQL.
 165 Some of these features (especially triggers) can also cause a great deal of
 166 confusion, and are generally more difficult to debug, profile, version control,
 167 update, and understand than application code.
 168
 169 Schema Denormalization
 170 ======================
 171
 172 Phabricator uses schema denormalization sparingly. Avoid denormalization unless
 173 there is a compelling reason (usually, performance) to denormalize.
 174
 175 Schema Changes and Migrations
 176 =============================
 177
 178 To create a new schema change or migration:
 179
 180 **Create a database patch**. Database patches go in
 181 `resources/sql/autopatches/`. To change a schema, use a `.sql` file and write
 182 in SQL. To perform a migration, use a `.php` file and write in PHP. Name your
 183 file `YYYYMMDD.patchname.ext`. For example, `20141225.christmas.sql`.
 184
 185 **Keep patches small**. Most schema change statements are not transactional. If
 186 a patch contains several SQL statements and fails partway through, it normally
 187 can not be rolled back. When a user tries to apply the patch again later, the
 188 first statement (which, for example, adds a column) may fail (because the column
 189 already exists). This can be avoided by keeping patches small (generally, one
 190 statement per patch).
 191
 192 **Use namespace and character set variables**. When defining a `.sql` patch,
 193 you should use these variables instead of hard-coding namespaces or character
 194 set names:
 195
 196 | Variable | Meaning | Notes |
 197 |---|---|---|
 198 | `{$NAMESPACE}` | Storage Namespace | Defaults to `phabricator` |
 199 | `{$CHARSET}` | Default Charset | Mostly used to specify table charset |
 200 | `{$COLLATE_TEXT}` | Text Collation | For most text (case-sensitive) |
 201 | `{$COLLATE_SORT}` | Sort Collation | For sortable text (case-insensitive) |
 202 | `{$CHARSET_FULLTEXT}` | Fulltext Charset | Specify explicitly for fulltext |
 203 | `{$COLLATE_FULLTEXT}` | Fulltext Collate | Specify explicitly for fulltext |
 204
 205
 206 **Test your patch**. Run `bin/storage upgrade` to test your patch.
 207
 208 See Also
 209 ========
 210
 211   - @{class:LiskDAO}