1 Elliptics network project
2 http://www.ioremap.net/projects/elliptics
3 Evgeniy Polyakov <zbr@ioremap.net>
5 This document describes high-level design of the elliptics distributed network storage,
6 its behaviour in different conditions and main states of the nodes and its relations
9 First, some exaplination on what it is and what problems it is supposed to fix.
11 Elliptics network is a parallel distributed network storage. It allows to group a
12 number of servers into single object storage with flat namespace indexed by the unified
13 IDs. Elliptics network provides an open protocol to store and retrieve the data and
14 perform administrative steps. It can also be infinitely extended to support private
15 extensions needed for the particular solutions. One of the implemented examples is
16 a remote command execution, which can be used as a load balancing job manager.
18 Elliptics network does not contain dedicated metadata servers, all nodes play exactly
19 the same single role - data storage and request forwarding to another nodes if needed.
21 Nodes can be organized into groups, which may represent physically dedicated datacenters
27 The network uses unified ring of the IDs where each new node will get a random set of IDs
28 according to free space (or one can generate ID file by hands and specify IDs he likes).
29 After selecting IDs and joining the network (saying its neighbours in the
30 ring that they have a new member) node will store data which has IDs more or equal
31 to the one associated with the node. Routing protocol manages request forwarding,
32 so each node should store data with greater or equal than own ID and less than
33 the next one in the ring.
35 ID in the elliptics network can be created from the data content, names, database
36 indexes or whatever you want. By default elliptics uses sha512 to generate ID for
38 ID can also be created as some internal to application object like database index
39 or row entry in the table.
44 Each IO operation can be wrapped into transaction (this is done automatically in
45 all exported functions). Transaction is a low-level entity which contains set of
46 commands processed atomically. When transaction receives a positive (zero status)
47 acknowledge (can be turned off for some requests if needed), it means that all
48 specified operations were completed successfully. Otherwise (usually negative error
49 status corresponding to the errno or other low-level error) means that one or another
50 operation failed and the whole transaction failed.
51 In this case client may try different recovery strategies described below.
53 Each object contains a history of updates, where each update is a separate transaction
54 stored somewhere in the network. By using transaction history it is possible to
55 recreate any object in its previous state providing snapshotting mechanism. One can
56 also fetch multiple transactions from the history log in parallel increasing IO
60 No need for metadata servers.
61 -----------------------------
62 Elliptics network does not need special metadata servers to manage ID to object
63 mapping, this is done automatically in applications by using name or content hash
64 to determine node which should manage given object.
67 Maintaining the data redundancy.
68 --------------------------------
69 Elliptics network uses server groups to maintain data redundancy.
71 Client can specify which groups to use for data processing, for example which groups
72 should host to be written data or which groups can be used to fetch it.
74 It is also possible to specify automatic group selection during write. For example
75 we can connect to 10 server groups and only want to write 3 copies. Automatic selection
76 will find 3 groups and return them to the client. Elliptics sorts groups according
79 Write functions either return number of transactions written or invoke clien't callback
80 with appropriate completion status. Reading will automatically switch between groups
81 when object is not present in one of them.
83 It is also possible in fcgi frontend to sort groups to read data according to LA,
84 i.e. client will fetch data from the node with smallest average load.
87 Frontend and backend modular approach.
88 --------------------------------------
89 Since protocol is opened it is possible to implement different frontends for the
90 storage, namely POHMELFS (http://www.ioremap.net/projects/pohmelfs) is a developed
91 POSIX parallel distributed filesystem. It is possible to use elliptics features with
92 the existing applications which use POSIX system calls like open(), read(), write()
95 Data is not stored by the elliptics network as is, instead it is provided to the
96 lowest layer where IO storage backends live. Backend is responsible for processing real
97 IO requests like data read and write. It can store data wherever it wants and perform
98 additional steps if needed. For example it can store transaction in the database
99 and update exported index by which it can be fetched by the elliptics-unaware
102 Multiple IO storage backends are supported by the example application server.
104 File backend, where each transaction is stored as separated file in the root
105 directory (actually in subdir, indexed by the first by of the command ID).
107 Eblob backend, which stores data in huge data blobs with index locked in memory.
108 This allows to perform O(1) lookup and use only single seek to get data from disk.
109 One can find more details on eblob homepage: http://www.ioremap.net/projects/eblob