1 %ExternalStore {#externalstorearch}
2 ========================
4 %ExternalStore is an optional feature that enables persistent object storage
5 outside the main database, primarily for revision text (also known as a "blob").
7 The main public interface for interacting with %ExternalStore is ExternalStoreAccess.
8 Though note that higher-level concepts like {@link MediaWiki\Revision\RevisionRecord} and
9 text blobs have their own dedicated interface: {@link MediaWiki\Revision\RevisionStore}, and
10 {@link MediaWiki\Storage\BlobStore}.
16 Objects in external stores are internally identified by a special URL.
17 The URL is of the form `<store protocol>://<location>/<object name>`.
21 The protocol represents which ExternalStoreMedium class is used. The following protocols are
24 - `DB`: ExternalStoreDB
25 - `http`: ExternalStoreHttp
26 - `mwstore`: ExternalStoreMwstore
28 Multiple protocols may be enabled at the same time. For example, to support reading older data
29 while using a different protocol for new data.
31 Protocols are configured via {@link $wgExternalStores}. The ExternalStoreMedium class is decided
32 based on concatenating the value from $wgExternalStores to the string `ExternalStore`, with a
33 ucfirst transformation applied as-needed.
35 A custom protocol called "foobar" could be configured by implementing ExternalStoreMedium in a
36 subclass called `ExternalStoreFoobar`.
40 The location identifies a particular instance of given store protocol.
42 In the case of ExternalStoreDB, the location represents a database cluster (one or more database
43 servers that hold the same data).
45 When using the default of {@link Wikimedia::Rdbms::LBFactorySimple LBFactorySimple}, these
46 clusters can be configured via {@link $wgExternalServers}. Otherwise, external clusters must be
47 configured via {@link $wgLBFactoryConf}.
51 The destination of newly stored text blobs is configured via {@link $wgDefaultExternalStore}.
52 To enable use of %ExternalStore for new blobs, this must be set to a non-empty array. This can
53 be disabled to store new blobs in the main database instead, it does not affect how existing
56 Each destination uses a partial URL of the form `<store protocol>://<location>`.
57 When a blob is inserted, we randomly pick an available protocol/location pair from this list.
58 Insertions will fail-over to another default destination if the chosen one is unavailable.
60 ## Append-only {#externalstore-appendonly}
62 %ExternalStore is designed as an append-only system, to persist data in a way that is highly
63 reliable and immutable. As such, the interface is restricted to fetch and insert operations,
64 and specifically does not permit modification or deletion once data is stored.
66 This design benefits MediaWiki in a number of ways:
68 * The limited interface provides flexibility to each protocol implementation.
69 * Caching is trivial and safe.
70 * Stable references to external store can be kept outside of it, in the core database and anywhere
71 else in caching or other storage layers, without needing to track of propagate changes.
72 * Historical data can be stored with high reliability guarantees and operational safety:
73 * External database clusters may be operated in read-only mode, directly through MySQL.
74 * Each replica within the cluster may operate as independent static backup.
75 * Database replication between hosts may be turned off.
76 * Even command-line access from outside MediaWiki can't accidentally affect historical data.
78 In case of maintenance tasks such as recompression, we generally iterate through known blobs
79 and write new blobs as-needed and gracefully update pointers accordingly. If an entire cluster
80 has been copied or recompressed to a new location, it can be taken out of rotation, with any
81 storage space freed at that time. Note that multiple locations may be physically colocated
82 on the same hardware, e.g. by running multiple instances of MySQL. Although it may be simpler
83 to free space by doing recompression during other routine maintenance, such as when migrating
84 data from old to new hardware.