1 Practical elliptics network in a nutshell.
3 This library provides a way to create a distributed hash table storage
4 with the versioned updates.
6 To create the storage one has to implement an application which will
7 create the node with given ID, which will join the network, add
8 transformation functions which will be used to create ID from the data
9 objects, and start doing IO using existing exported API functions.
11 Example application in the archive contains all those operations and
12 exports them via command line options.
14 Let's see how to create a simple distributed storage with the redundant
15 writes with an additional copy.
17 0. Compiling the sources.
23 1. Creating the first node with zero ID (default), listening on the
24 127.0.0.1:1025 IPv4 address and having /tmp/root0 root directory.
26 $ dnet_ioserv -a 127.0.0.1:1025:2 -d /tmp/root0 -j
28 Where -a switch provides local listening address in the addr:port:family format
29 (family 2 means IPv4, 10 - IPv6), -d is used to specify root directory to store
30 objects. Example application uses TCP, but one can also use any other network
31 protocol which is exported via sockets, it is specified in the configuration
32 request and is returned in lookup replies.
33 -j switch is used to specify that we want this node to join the network, so
34 that it can also store data from other clients.
36 2. Adding second node into the storage. It will have hex
37 ID 1234567890abcde0000000000000000000000000, listen on the 127.0.0.0:1234 IPv4
38 address and having /tmp/root1 root directory. It will join via the first node
39 with the 127.0.0.1:1025 IPv4 address.
41 $ dnet_ioserv -i 1234567890abcde0000 -a 127.0.0.1:1234:2 -r 127.0.0.1:1025:2 -d /tmp/root1 -j
43 -r switch is used to specify the node used to join the network through
44 or send commands to. -i switch specifies node's ID.
46 Order of the options does not matter.
48 Each node will store a data with IDs which are less or equal to its own ID, so
49 the second node in the example will store objects whose IDs match
50 (0x0, 0x1234567890abcde0000000000000000000000000] range. The first node will
51 store everything less than zero (having a ring addressing it means that the
52 first node will store anything that has ID higher than the last node's ID,
53 i.e. higher than 0x1234567890abcde0000000000000000000000000).
55 3. Writing the local file (/tmp/some_file) into the storage and creating
56 a redundant copy. We will use two transformation functions for that
57 (sha1 and md5 from the OpenSSL in the example, can be your own which implement
58 init-update-final sequence). We still have to create a node which will connect
59 to the network and thus has to have an ID, but it will not join the network,
60 i.e. it will not store other's data. Our ID will be
61 hex 2222222200000000000000000000000000000000 and node will bind to
62 127.0.0.1:1111 IPv4 address and request data via the node with 127.0.0.1:1025 IPv4
63 address. Address to send command to can be any known server address, requests will
64 be properly forwarded between the nodes, if you connected to the node which does
65 not contain or will not store your data.
67 $ dnet_ioserv -i 22222222 -a 127.0.0.1:1111:2 -r 127.0.0.1:1025:2 -T sha1 -T md5 -W /tmp/some_file
69 -T switch is used to specify multiple transformation functions, which will be
70 used to create ID of the file and its content. Having multiple transformation
71 functions means that each update will be repeated with each transforamation,
72 so we will hash /tmp/some_file (and its content) and put it into the network
73 with the different IDs, which potentially means that object will be placed on
74 the different nodes (it depends on amount of nodes and their IDs, as described
75 above example ID ranges).
76 -W option specifies a local file to write into the network.
78 4. Reading a local file from the network. We will use the same transformation
79 functions as in writing, so if one of them failed, we could switch to the second
80 and receive the object from a potentially different node.
82 $ dnet_ioserv -i 22222222 -a 127.0.0.1:1111:2 -r 127.0.0.1:1025:2 -T sha1 -T md5 -R /tmp/some_file
84 -R option specifies a local file we want to read from the network. Its name
85 will be transformed via provided function(s), object will be fetched from the
86 network and written locally into the provided file. Reading callbacks may be
87 invoked multiple times with a different data which should be placed at different
88 offsets, there is a special flag set when more data is expected, this flag will
89 be cleared when reading is completed. Actually it is a transaction completion flag,
90 which, if set, means transaction is not yet fully completed and there will be
91 another invocations of the completion callback.
93 As with writing this hashed name is used as a global object ID, thus if multiple
94 clients write into file with the same name (i.e. /tmp/some_file),
95 their content will be overwritten.
96 Library provides low level interfaces where ID is specified by the caller,
97 so if your application does not work with files, it can be private indexes or
98 anything you like. Those interfaces only work with the single object,
99 i.e. it will send/receive data to/from the single node in the network and
100 redundancy steps should be done by the caller.
102 Using those interfaces it is possible to spread the big file over multiple nodes,
103 to implement this, writer should split object into multiple chunks and assign them
104 different IDs (for example it can be extended filename like
105 /tmp/some_file[null byte]offset or content checksum), thus they will be sent
106 to the nodes corresponding for storing those IDs. Reader should implement
109 Thus the simplest solution for the multiple datacenter localtions or geographical
110 bundling and load balancing is to assign nodes in the same datacenter IDs with
111 the higher byte equal to the datacenter number and use different transformation
112 functions which will hash the content and set the highest ID byte to the datacenter number.
113 Thus reading from the specified datacenter can be achieved by using the appropriate
114 transformation function and if reading fails (when datacenter is disconnected) other
115 functions will start working and data will be fetched from the different locations.
117 Originally object is placed into the network according either to the ID user provided
118 (if low level interfaces are used) or its hashed name. Each write into that object
119 creates additional transactions which store content to be written. They have IDs
120 being equal to the data checksums (created by the provided transformation functions),
121 and thus are potentially stored on the different nodes. Each write updates local
122 history of the object, so it is possible to recreate file how it existed in the past.
123 One can also create object locally by fetching its updates from the different nodes
124 if main storing node is not accessible. It requires to know the history and having
125 the original transaction though, which are stored on the node which hosts the object
128 There is an idea of providing not only file based backends for the nodes, but also
129 stackable solutions like with transformation functions, when server provides a callback
130 to store data, and will place it either as a file in some dir, or database update