src/docs/contributor/database.diviner at upstream/main

@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
fork atom
phorge / src / docs / contributor / database.diviner
at upstream/main 222 lines 8.4 kB view raw
wrap content
Andre Klapper Diviner: Remove line about MySQL issues with indexes for tuple search 5mo ago
7ee80e43
  1@title Database Schema
  2@group developer
  3
  4This document describes key components of the database schema and should answer
  5questions like how to store new types of data.
  6
  7Database System
  8===============
  9
 10Phorge uses MySQL or another MySQL-compatible database (like MariaDB
 11or Amazon RDS).
 12
 13Phorge uses the InnoDB table engine.
 14
 15We are unlikely to ever support other incompatible databases like PostgreSQL or
 16SQLite.
 17
 18PHP Drivers
 19===========
 20
 21Phorge supports the [[ https://www.php.net/book.mysqli | MySQLi ]] PHP
 22extension.
 23
 24Databases
 25=========
 26
 27Each Phorge application has its own database. The names are prefixed by
 28`phabricator_` (this is configurable with `storage.default-namespace`).
 29
 30Phorge uses a separate database for each application. To understand why,
 31see @{article:Why does Phorge need so many databases?}.
 32
 33Connections
 34===========
 35
 36Phorge specifies if it will use any opened connection just for reading or
 37also for writing. This allows opening write connections to a primary and read
 38connections to a replica in primary/replica setups (which are not actually
 39supported yet).
 40
 41Tables
 42======
 43
 44Most table names are prefixed by their application names. For example,
 45Differential revisions are stored in database `phorge_differential` and
 46table `differential_revision`. This generally makes queries easier to recognize
 47and understand.
 48
 49The exception is a few tables which share the same schema over different
 50databases such as `edge`.
 51
 52We use lower-case table names with words separated by underscores.
 53
 54Column Names
 55============
 56
 57Phorge uses `camelCase` names for columns. The main advantage is that they
 58directly map to properties in PHP classes.
 59
 60Don't use MySQL reserved words (such as `order`) for column names.
 61
 62Data Types
 63==========
 64
 65Phorge defines a set of abstract data types (like `uint32`, `epoch`, and
 66`phid`) which map to MySQL column types. The mapping depends on the MySQL
 67version.
 68
 69Phorge uses `utf8mb4` character sets where available (MySQL 5.5 or newer),
 70and `binary` character sets in most other cases. The primary motivation is to
 71allow 4-byte unicode characters to be stored (the `utf8` character set, which
 72is more widely available, does not support them). On newer MySQL, we use
 73`utf8mb4` to take advantage of improved collation rules.
 74
 75Phorge stores dates with an `epoch` abstract data type, which maps to
 76`int unsigned`. Although this makes dates less readable when browsing the
 77database, it makes date and time manipulation more consistent and
 78straightforward in the application.
 79
 80We don't use the `enum` data type because each change to the list of possible
 81values requires altering the table (which is slow with big tables). We use
 82numbers (or short strings in some cases) mapped to PHP constants instead.
 83
 84JSON and Other Serialized Data
 85==============================
 86
 87Some data don't require structured access -- we don't need to filter or order by
 88them. We store these data as text fields in JSON format. This approach has
 89several advantages:
 90
 91  - If we decide to add another unstructured field then we don't need to alter
 92    the table (which is slow for big tables in MySQL).
 93  - Table structure is not cluttered by fields which could be unused most of the
 94    time.
 95
 96An example of such usage can be found in column
 97`differential_diffproperty.data`.
 98
 99Primary Keys
100============
101
102Most tables have an auto-increment column named `id`. Adding an ID column is
103appropriate for most tables (even tables that have another natural unique key),
104as it improves consistency and makes it easier to perform generic operations
105on objects.
106
107For example, @{class:LiskMigrationIterator} allows you to very easily apply a
108migration to a table using a constant amount of memory provided the table has
109an `id` column.
110
111Indexes
112======
113
114Create all indexes necessary for fast query execution in most cases. Don't
115create indexes which are not used. You can analyze queries @{article:Using
116DarkConsole}.
117
118Foreign Keys
119============
120
121We don't use foreign keys because they're complicated and we haven't experienced
122significant issues with data inconsistency that foreign keys could help prevent.
123Empirically, we have witnessed first hand as `ON DELETE CASCADE` relationships
124accidentally destroy huge amounts of data. We may pursue foreign keys
125eventually, but there isn't a strong case for them at the present time.
126
127PHIDs
128=====
129
130Each globally referenceable object in Phorge has an associated PHID
131("Phorge ID") which serves as a global identifier, similar to a GUID.
132We use PHIDs for referencing data in different databases.
133
134We use both auto-incrementing IDs and global PHIDs because each is useful in
135different contexts. Auto-incrementing IDs are meaningfully ordered and allow
136us to construct short, human-readable object names (like `D2258`) and URIs.
137Global PHIDs allow us to represent relationships between different types of
138objects in a homogeneous way.
139
140For example, infrastructure like "subscribers" can be implemented easily with
141PHID relationships: different types of objects (users, projects, mailing lists)
142are permitted to subscribe to different types of objects (revisions, tasks,
143etc). Without PHIDs, we would need to add a "type" column to avoid ID collision;
144using PHIDs makes implementing features like this simpler.
145
146For more information, see @{article:Handles Technical Documentation}
147
148Transactions
149============
150
151Transactional code should be written using transactions. Example of such code is
152inserting multiple records where one doesn't make sense without the other, or
153selecting data later used for update. See chapter in @{class:LiskDAO}.
154
155Advanced Features
156=================
157
158We don't use MySQL advanced features such as triggers, stored procedures or
159events because we like expressing the application logic in PHP more than in SQL.
160Some of these features (especially triggers) can also cause a great deal of
161confusion, and are generally more difficult to debug, profile, version control,
162update, and understand than application code.
163
164Schema Denormalization
165======================
166
167Phorge uses schema denormalization sparingly. Avoid denormalization unless
168there is a compelling reason (usually, performance) to denormalize.
169
170Schema Changes and Migrations
171=============================
172
173To create a new schema change or migration:
174
175**Create a database patch**. Database patches go in
176`resources/sql/autopatches/`. To change a schema, use a `.sql` file and write
177in SQL. To perform a migration, use a `.php` file and write in PHP. Name your
178file `YYYYMMDD.patchname.ext`. For example, `20141225.christmas.sql`.
179
180**Keep patches small**. Most schema change statements are not transactional. If
181a patch contains several SQL statements and fails partway through, it normally
182can not be rolled back. When a user tries to apply the patch again later, the
183first statement (which, for example, adds a column) may fail (because the column
184already exists). This can be avoided by keeping patches small (generally, one
185statement per patch).
186
187**Use namespace and character set variables**. When defining a `.sql` patch,
188you should use these variables instead of hard-coding namespaces or character
189set names:
190
191| Variable | Meaning | Notes |
192|---|---|---|
193| `{$NAMESPACE}` | Storage Namespace | Defaults to `phabricator` |
194| `{$CHARSET}` | Default Charset | Mostly used to specify table charset |
195| `{$COLLATE_TEXT}` | Text Collation | For most text (case-sensitive) |
196| `{$COLLATE_SORT}` | Sort Collation | For sortable text (case-insensitive) |
197| `{$CHARSET_FULLTEXT}` | Fulltext Charset | Specify explicitly for fulltext |
198| `{$COLLATE_FULLTEXT}` | Fulltext Collate | Specify explicitly for fulltext |
199
200
201**Test your patch**. Run `bin/storage upgrade` to test your patch.
202
203Advanced Testing
204================
205
206If you are developing a database patch and you want to apply your patch more
207than once, remember that the command `bin/storage upgrade` is smart enough not
208to execute patches twice, but it does not allow a rollback, and it does
209not allow to apply a specific patch by name anyway.
210You may want to know that successfully applied patches are stored in this
211database table:
212
213    {$NAMESPACE}_meta_data.patch_status
214
215If you need to test a patch more than once, you can remove the line which lists
216your patch from this database table, and test your patch again by running
217`bin/storage upgrade`.
218
219See Also
220========
221
222  - @{class:LiskDAO}