5 The Network Block Device is a Linux-originated lightweight block access
6 protocol that allows one to export a block device to a client. While the
7 name of the protocol specifically references the concept of block
8 devices, there is nothing inherent in the *protocol* which requires that
9 exports are, in fact, block devices; the protocol only concerns itself
10 with a range of bytes, and several operations of particular lengths at
11 particular offsets within that range of bytes.
13 For matters of clarity, in this document we will refer to an export from
14 a server as a block device, even though the actual backing on the server
15 need not be an actual block device; it may be a block device, a regular
16 file, or a more complex configuration involving several files. That is
17 an implementation detail of the server.
21 In the below protocol descriptions, the label 'C:' is used for
22 messages sent by the client, whereas 'S:' is used for messages sent by
23 the server). `monotype text` is for literal character data or (when
24 used in comments) constant names, `0xdeadbeef` is used for literal hex
25 numbers (which are always sent in big-endian network byte order), and
26 (brackets) are used for comments. Anything else is a description of
27 the data that is sent.
29 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
30 "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",
31 "MAY", and "OPTIONAL" in this document are to be interpreted as
32 described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
33 The same words in lower case carry their natural meaning.
35 Where this document refers to a string, then unless otherwise stated,
36 that string is a sequence of UTF-8 code points, which is not `NUL`
37 terminated, MUST NOT contain `NUL` characters, SHOULD be no longer than
38 256 bytes and MUST be no longer than 4096 bytes. This applies
39 to export names and error messages (amongst others). The length of a
40 string is always available through information sent earlier in the same
41 message, although it may require some computation based on the size of
42 other data also present in the same message.
46 The NBD protocol has two phases: the handshake and the transmission. During the
47 handshake, a connection is established and an exported NBD device along other
48 protocol parameters are negotiated between the client and the server. After a
49 successful handshake, the client and the server proceed to the transmission
50 phase in which the export is read from and written to.
52 On the client side under Linux, the handshake is implemented in
53 userspace, while the transmission phase is implemented in kernel space.
54 To get from the handshake to the transmission phase, the client performs
56 ioctl(nbd, NBD_SET_SOCK, sock)
59 with `nbd` in the above being a file descriptor for an open `/dev/nbdX`
60 device node, and `sock` being the socket to the server. The second of
61 the above two calls does not return until the client disconnects.
63 Note that there are other `ioctl` calls available, that are used by the
64 client to communicate the options to the kernel which were negotiated
65 with the server during the handshake. This document does not describe
68 When handling the client-side transmission phase with the Linux
69 kernel, the socket between the client and server can use either Unix
70 or TCP sockets. For other implementations, the client and server can
71 use any agreeable communication channel (a socket is typical, but it
72 is also possible to implement the NBD protocol over a pair of
73 uni-directional pipes). If TCP sockets are used, both the client and
74 server SHOULD disable Nagle's algorithm (that is, use `setsockopt` to
75 set the `TCP_NODELAY` option to non-zero), to eliminate artificial
76 delays caused by waiting for an ACK response when a large message
77 payload spans multiple network packets.
81 The handshake is the first phase of the protocol. Its main purpose is to
82 provide means for both the client and the server to negotiate which
83 export they are going to use and how.
85 There are three versions of the negotiation. They are referred to as
86 "oldstyle", "newstyle", and "fixed newstyle" negotiation. Oldstyle was
87 the only version of the negotiation until nbd 2.9.16; newstyle was
88 introduced for nbd 2.9.17. A short while later, it was discovered that
89 newstyle was insufficiently structured to allow protocol options to be
90 added while retaining backwards compatibility. The minor changes
91 introduced to fix this problem are, where necessary, referred to as
92 "fixed newstyle" to differentiate from the original version of the
95 #### Oldstyle negotiation
97 S: 64 bits, `0x4e42444d41474943` (ASCII '`NBDMAGIC`') (also known as
99 S: 64 bits, `0x00420281861253` (`cliserv_magic`, a magic number)
100 S: 64 bits, size of the export in bytes (unsigned)
102 S: 124 bytes, zeroes (reserved).
104 As can be seen, this isn't exactly a negotiation; it's just a matter of
105 the server sending a bunch of data to the client. If the client is
106 unhappy with what he receives, he should disconnect and not look back.
108 The fact that the size of the export was specified before the flags were
109 sent, made it impossible for the protocol to be changed in a
110 backwards-compatible manner to allow for named exports without ugliness.
111 As a result, the old style negotiation is now no longer developed;
112 starting with version 3.10 of the reference implementation, it is also
115 #### Newstyle negotiation
117 A client who wants to use the new style negotiation SHOULD connect on
118 the IANA-reserved port for NBD, 10809. The server MAY listen on other
119 ports as well, but it SHOULD use the old style handshake on those. The
120 server SHOULD refuse to allow oldstyle negotiations on the newstyle
121 port. For debugging purposes, the server MAY change the port on which to
122 listen for newstyle negotiation, but this SHOULD NOT happen for
125 The initial few exchanges in newstyle negotiation look as follows:
127 S: 64 bits, `0x4e42444d41474943` (ASCII '`NBDMAGIC`') (as in the old
129 S: 64 bits, `0x49484156454F5054` (ASCII '`IHAVEOPT`') (note different
131 S: 16 bits, handshake flags
132 C: 32 bits, client flags
134 This completes the initial phase of negotiation; the client and server
135 now both know they understand the first version of the newstyle
136 handshake, with no options. The client SHOULD ignore any handshake flags
137 it does not recognize, while the server MUST close the TCP connection if
138 it does not recognize the client's flags. What follows is a repeating
139 group of options. In non-fixed newstyle only one option can be set
140 (`NBD_OPT_EXPORT_NAME`), and it is not optional.
142 At this point, we move on to option haggling, during which point the
143 client can send one or (in fixed newstyle) more options to the server.
144 The generic format of setting an option is as follows:
146 C: 64 bits, `0x49484156454F5054` (ASCII '`IHAVEOPT`') (note same
147 newstyle handshake's magic number)
149 C: 32 bits, length of option data (unsigned)
150 C: any data needed for the chosen option, of length as specified above.
152 The presence of the option length in every option allows the server
153 to skip any options presented by the client that it does not
156 If the value of the option field is `NBD_OPT_EXPORT_NAME` and the server
157 is willing to allow the export, the server replies with information
158 about the used export:
160 S: 64 bits, size of the export in bytes (unsigned)
161 S: 16 bits, transmission flags
162 S: 124 bytes, zeroes (reserved) (unless `NBD_FLAG_C_NO_ZEROES` was
163 negotiated by the client)
165 If the server is unwilling to allow the export, it MUST terminate
168 The reason that the flags field is 16 bits large and not 32 as in the
169 oldstyle negotiation is that there are now 16 bits of transmission flags,
170 and 16 bits of handshake flags. Concatenated together, this results in
171 32 bits, which allows for using a common set of macros for both. If we
172 ever run out of flags, the server will set the most significant flag
173 bit, signalling that an extra flag field will follow, to which the
174 client will have to reply with a flag field of its own before the extra
175 flags are sent. This is not yet implemented.
177 #### Fixed newstyle negotiation
179 Unfortunately, due to a mistake, the server would immediately close the
180 connection when it saw an option it did not understand, rather than
181 signalling this fact to the client, which would've allowed it to retry;
182 and replies from the server were not structured either, which meant that
183 if the server were to send something the client did not understand, it
184 would have to abort negotiation as well.
186 To fix these two issues, the following changes were implemented:
188 - The server will set the handshake flag `NBD_FLAG_FIXED_NEWSTYLE`, to
189 signal that it supports fixed newstyle negotiation.
190 - The client SHOULD reply with `NBD_FLAG_C_FIXED_NEWSTYLE` set in its flags
191 field too, though its side of the protocol does not change incompatibly.
192 - The client MAY now send other options to the server as appropriate, in
193 the generic format for sending an option as described above.
194 - The server will reply to any option apart from `NBD_OPT_EXPORT_NAME`
195 with reply packets in the following format:
197 S: 64 bits, `0x3e889045565a9` (magic number for replies)
198 S: 32 bits, the option as sent by the client to which this is a reply
199 S: 32 bits, reply type (e.g., `NBD_REP_ACK` for successful completion,
200 or `NBD_REP_ERR_UNSUP` to mark use of an option not known by this
202 S: 32 bits, length of the reply. This MAY be zero for some replies, in
203 which case the next field is not sent
204 S: any data as required by the reply (e.g., an export name in the case
207 The client MUST NOT send any option until it has received a final
208 reply to any option it has sent (note that some options e.g.
209 `NBD_OPT_LIST` have multiple replies, and the final reply is
212 Some messages the client sends instruct the server to change some of
213 its internal state. The client SHOULD NOT send such messages more
214 than once; if it does, the server MAY fail the repeated message with
215 `NBD_REP_ERR_INVALID`.
217 #### Termination of the session during option haggling
219 There are three possible mechanisms to end option haggling:
221 * Transmission mode can be entered (by the client sending
222 `NBD_OPT_EXPORT_NAME` or by the server responding to an
223 `NBD_OPT_GO` with `NBD_REP_ACK`). This is documented
226 * The client can send (and the server can reply to) an
227 `NBD_OPT_ABORT`. This MUST be followed by the client
228 shutting down TLS (if it is running), and the client
229 dropping the connection. This is referred to as
230 'initiating a soft disconnect'; soft disconnects can
231 only be initiated by the client.
233 * The client or the server can disconnect the TCP session
234 without activity at the NBD protocol level. If TLS is
235 negotiated, the party initiating the transaction SHOULD
236 shutdown TLS first if it is running. This is referred
237 to as 'initiating a hard disconnect'.
239 This section concerns the second and third of these, together
240 called 'terminating the session', and under which circumstances
243 If either the client or the server detects a violation of a
244 mandatory condition ('MUST' etc.) by the other party, it MAY
245 initiate a hard disconnect.
247 A client MAY use a soft disconnect to terminate the session
250 A party that is mandated by this document to terminate the
251 session MUST initiate a hard disconnect if it is not possible
252 to use a soft disconnect. Such circumstances include: where
253 that party is the server and it cannot return an error
254 (e.g. after an `NBD_OPT_EXPORT_NAME` it cannot satisfy),
255 and where that party is the client following a failed TLS
258 A party MUST NOT initiate a hard disconnect save where set out
259 in this section. Therefore, unless a client's situation falls
260 within the provisions of the previous paragraph or the
261 client detects a breach of a mandatory condition, it MUST NOT
262 use a hard disconnect, and hence its only option to terminate
263 the session is via a soft disconnect.
265 There is no requirement for the client or server to complete a
266 negotiation if it does not wish to do so. Either end MAY simply
267 terminate the session. In the client's case, if it wishes to
268 do so it MUST use soft disconnect.
270 In the server's case it MUST (save where set out above) simply
271 error inbound options until the client gets the hint that it is
272 unwelcome, except that if a server believes a client's behaviour
273 constitutes a denial of service, it MAY initiate a hard disconnect.
274 If the server is in the process of being shut down it MAY
275 error any inflight option and SHOULD error further options received
276 (other than an `NBD_OPT_ABORT`) with `NBD_REP_ERR_SHUTDOWN`.
278 If the client receives `NBD_REP_ERR_SHUTDOWN` it MUST initiate
283 There are three message types in the transmission phase: the request,
284 the simple reply, and the structured reply chunk. The
285 transmission phase consists of a series of transactions, where the
286 client submits requests and the server sends corresponding replies
287 with either a single simple reply or a series of one or more
288 structured reply chunks per request. The phase continues until
289 either side terminates transmission; this can be performed cleanly
292 Note that without client negotiation, the server MUST use only simple
293 replies, and that it is impossible to tell by reading the server
294 traffic in isolation whether a data field will be present; the simple
295 reply is also problematic for error handling of the `NBD_CMD_READ`
296 request. Therefore, structured replies can be used to create a
297 context-free server stream; see below.
299 Replies need not be sent in the same order as requests (i.e., requests
300 may be handled by the server asynchronously), and structured reply
301 chunks from one request may be interleaved with reply messages from
302 other requests; however, there may be constraints that prevent
303 arbitrary reordering of structured reply chunks within a given reply.
304 Clients SHOULD use a cookie that is distinct from all other currently
305 pending transactions, but MAY reuse cookies that are no longer in
306 flight; cookies need not be consecutive. In each reply message
307 (whether simple or structured), the server MUST use the same value for
308 cookie as was sent by the client in the corresponding request,
309 treating the cookie as an opaque field. In this way, the client can
310 correlate which request is receiving a response. Note that earlier
311 versions of this specification referred to a client's cookie as a
314 #### Ordering of messages and writes
316 The server MAY process commands out of order, and MAY reply out of
319 * All write commands (that includes `NBD_CMD_WRITE`,
320 `NBD_CMD_WRITE_ZEROES` and `NBD_CMD_TRIM`) that the server
321 completes (i.e. replies to) prior to processing a
322 `NBD_CMD_FLUSH` MUST be written to non-volatile
323 storage prior to replying to that `NBD_CMD_FLUSH`. This
324 paragraph only applies if `NBD_FLAG_SEND_FLUSH` is set within
325 the transmission flags, as otherwise `NBD_CMD_FLUSH` will never
326 be sent by the client to the server.
328 * A client which uses multiple connections to a server to parallelize
329 commands MUST NOT issue an `NBD_CMD_FLUSH` request until it has
330 received the reply for all write commands which it expects to be
331 covered by the flush.
333 * A server MUST NOT reply to a command that has `NBD_CMD_FLAG_FUA` set
334 in its command flags until the data (if any) written by that command
335 is persisted to non-volatile storage. This only applies if
336 `NBD_FLAG_SEND_FUA` is set within the transmission flags, as otherwise
337 `NBD_CMD_FLAG_FUA` will not be set on any commands sent to the server
340 `NBD_CMD_FLUSH` is modelled on the Linux kernel empty bio with
341 `REQ_PREFLUSH` set. `NBD_CMD_FLAG_FUA` is modelled on the Linux
342 kernel bio with `REQ_FUA` set. In case of ambiguity in this
344 [kernel documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
349 The request message, sent by the client, looks as follows:
351 C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`)
352 C: 16 bits, command flags
355 C: 64 bits, offset (unsigned)
356 C: 32 bits, length (unsigned)
357 C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)
359 #### Simple reply message
361 The simple reply message MUST be sent by the server in response to all
362 requests if structured replies have not been negotiated using
363 `NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple
364 reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
365 but only if the reply has no data payload. The message looks as
368 S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
370 S: 32 bits, error (MAY be zero)
372 S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
375 #### Structured reply chunk message
377 Some of the major downsides of the default simple reply to
378 `NBD_CMD_READ` are as follows. First, it is not possible to support
379 partial reads or early errors (the command must succeed or fail as a
380 whole; no payload is sent if *error* was set, but if *error* is zero
381 and a later error is detected before *length* bytes are returned, the
382 server must initiate a hard disconnect). Second, there is no way to
383 efficiently skip over portions of a sparse export that is known to
384 contain all zeroes. Finally, it is not possible to reliably decode
385 the server traffic without also having context of what pending read
386 requests were sent by the client, to see which *cookie* values will
387 have accompanying payload on success. Therefore structured replies
388 are also permitted if negotiated.
390 A structured reply in the transmission phase consists of one or
391 more structured reply chunk messages. The server MUST NOT send
392 this reply type unless the client has successfully negotiated
393 structured replies via `NBD_OPT_STRUCTURED_REPLY`. Conversely, if
394 structured replies are negotiated, the server MUST use a
395 structured reply for any response with a payload, and MUST NOT use
396 a simple reply for `NBD_CMD_READ` (even for the case of an early
397 `NBD_EINVAL` due to bad flags), but MAY use either a simple reply or a
398 structured reply to all other requests. The server SHOULD prefer
399 sending errors via a structured reply, as the error can then be
400 accompanied by a string payload to present to a human user.
402 A structured reply MAY occupy multiple structured chunk messages
403 (all with the same value for "cookie"), and the
404 `NBD_REPLY_FLAG_DONE` reply flag is used to identify the final
405 chunk. Unless further documented by individual requests below,
406 the chunks MAY be sent in any order, except that the chunk with
407 the flag `NBD_REPLY_FLAG_DONE` MUST be sent last. Even when a
408 command documents further constraints between chunks of one reply,
409 it is always safe to interleave chunks of that reply with messages
410 related to other requests. A server SHOULD try to minimize the
411 number of chunks sent in a reply, but MUST NOT mark a chunk as
412 final if there is still a possibility of detecting an error before
413 transmission of that chunk completes. A structured reply is
414 considered successful only if it did not contain any error chunks,
415 although the client MAY be able to determine partial success based
416 on the chunks received.
418 A structured reply chunk message looks as follows:
420 S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
424 S: 32 bits, length of payload (unsigned)
425 S: *length* bytes of payload data (if *length* is nonzero)
427 The use of *length* in the reply allows context-free division of
428 the overall server traffic into individual reply messages; the
429 *type* field describes how to further interpret the payload.
431 #### Terminating the transmission phase
433 There are two methods of terminating the transmission phase:
435 * The client sends `NBD_CMD_DISC` whereupon the server MUST
436 close down the TLS session (if one is running) and then
437 close the TCP connection. This is referred to as 'initiating
438 a soft disconnect'. Soft disconnects can only be
439 initiated by the client.
441 * The client or the server drops the TCP session (in which
442 case it SHOULD shut down the TLS session first). This is
443 referred to as 'initiating a hard disconnect'.
445 Together these are referred to as 'terminating transmission'.
447 Either side MAY initiate a hard disconnect if it detects
448 a violation by the other party of a mandatory condition
449 within this document.
451 On a server shutdown, the server SHOULD wait for inflight
452 requests to be serviced prior to initiating a hard disconnect.
453 A server MAY speed this process up by issuing error replies.
454 The error value issued in respect of these requests and
455 any subsequently received requests SHOULD be `NBD_ESHUTDOWN`.
457 If the client receives an `NBD_ESHUTDOWN` error it MUST initiate
460 The client MAY issue a soft disconnect at any time, but
461 SHOULD wait until there are no inflight requests first.
463 The client and the server MUST NOT initiate any form
464 of disconnect other than in one of the above circumstances.
466 #### Reserved Magic values
468 The following magic values are reserved and must not be used
469 for future protocol extensions:
471 0x12560953 - Historic value for NBD_REQUEST_MAGIC, used
472 until Linux 2.1.116pre2.
474 0x96744668 - Historic value for NBD_REPLY_MAGIC, used
475 until Linux 2.1.116pre2.
477 0x25609514 - Used by nbd-server to store data log flags in the
478 transaction log. Never sent from/to a client.
480 The following magic values are reserved and must be used only as
481 described in the corresponding protocol extensions:
483 0x21e41c71 - `NBD_EXTENDED_REQUEST_MAGIC`
484 Defined by the experimental `EXTENDED_HEADERS`
485 [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md).
487 0x6e8a278c - `NBD_EXTENDED_REPLY_MAGIC`
488 Defined by the experimental `EXTENDED_HEADERS`
489 [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md).
493 The NBD protocol supports Transport Layer Security (TLS) (see
494 [RFC5246](https://tools.ietf.org/html/rfc5246)
496 [RFC6176](https://tools.ietf.org/html/rfc6176)
499 TLS is negotiated with the `NBD_OPT_STARTTLS`
500 option. This is performed as an in-session upgrade. Below the term
501 'negotiation' is used to refer to the sending and receiving of
502 NBD options and option replies, and the term 'initiation' of TLS
503 is used to refer to the actual upgrade to TLS.
505 ### Certificates, authentication and authorisation
507 This standard does not specify what encryption, certification
508 and signature algorithms are used. This standard does not
509 specify authentication and authorisation (for instance
510 whether client and/or server certificates are required and
511 what they should contain); this is implementation dependent.
513 TLS requires fixed newstyle negotiation to have completed.
515 ### Server-side requirements
517 There are three modes of operation for a server. The
518 server MUST support one of these modes.
520 * The server operates entirely without TLS ('NOTLS'); OR
522 * The server insists upon TLS, and forces the client to
523 upgrade by erroring any NBD options other than `NBD_OPT_STARTTLS`
524 or `NBD_OPT_ABORT` with `NBD_REP_ERR_TLS_REQD` ('FORCEDTLS'); this
525 in practice means that all option negotiation (apart from the
526 `NBD_OPT_STARTTLS` itself) is carried out with TLS; OR
528 * The server provides TLS, and it is mandatory on zero or more
529 exports, and is available at the client's option on all
530 other exports ('SELECTIVETLS'). The server does not force
531 the client to upgrade to TLS during option haggling (as
532 if the client ultimately were to choose a non-TLS-only export,
533 stopping TLS is not possible). Instead it permits the client
534 to upgrade as and when it chooses, but unless an upgrade to
535 TLS has already taken place, the server errors attempts
536 to enter transmission mode on TLS-only exports, MAY
537 refuse to provide information about TLS-only exports
538 via `NBD_OPT_INFO`, MAY refuse to provide information
539 about non-existent exports via `NBD_OPT_INFO`, and MAY omit
540 exports that are TLS-only from `NBD_OPT_LIST`.
542 The server MAY determine the mode in which it operates
543 dependent upon the session (for instance it might be
544 more liberal with TCP connections made over the loopback
545 interface) but it MUST be consistent in its mode
546 of operation across the lifespan of a single TCP connection
547 to the server. A client MUST NOT assume indications from
548 a prior TCP session to a given server will be relevant
549 to a subsequent session.
551 The server MUST operate in NOTLS mode unless the server
552 set flag `NBD_FLAG_FIXED_NEWSTYLE` and the client replied
553 with `NBD_FLAG_C_FIXED_NEWSTYLE` in the fixed newstyle
556 These modes of operations are described in detail below.
560 If the server receives `NBD_OPT_STARTTLS` it MUST respond with
561 `NBD_REP_ERR_POLICY` (if it does not support TLS for
562 policy reasons), `NBD_REP_ERR_UNSUP` (if it does not
563 support the `NBD_OPT_STARTTLS` option at all) or another
564 error explicitly permitted by this document. The server MUST NOT
565 respond to any option request with `NBD_REP_ERR_TLS_REQD`.
569 If the server receives `NBD_OPT_STARTTLS` prior to negotiating
570 TLS, it MUST reply with `NBD_REP_ACK`. If the server receives
571 `NBD_OPT_STARTTLS` when TLS has already been negotiated, it
572 it MUST reply with `NBD_REP_ERR_INVALID`.
574 After an `NBD_REP_ACK` reply has been sent, the server MUST be
575 prepared for a TLS handshake, and all further data MUST be sent
576 and received over TLS. There is no downgrade to a non-TLS session.
578 As per the TLS standard, the handshake MAY be initiated either
579 by the server (having sent the `NBD_REP_ACK`) or by the client.
580 If the handshake is unsuccessful (for instance the client's
581 certificate does not match) the server MUST terminate the
582 session as by this stage it is too late to continue without TLS
583 as the acknowledgement has been sent.
585 If the server receives any other option, including `NBD_OPT_INFO`
586 and unsupported options, it MUST reply with `NBD_REP_ERR_TLS_REQD`
587 if TLS has not been initiated; `NBD_OPT_INFO` is included as in this
588 mode, all exports are TLS-only. If the server receives a request to
589 enter transmission mode via `NBD_OPT_EXPORT_NAME` when TLS has not
590 been initiated, then as this request cannot error, it MUST
591 terminate the session. If the server receives a request to
592 enter transmission mode via `NBD_OPT_GO` when TLS has not been
593 initiated, it MUST error with `NBD_REP_ERR_TLS_REQD`.
595 The server MUST NOT send `NBD_REP_ERR_TLS_REQD` in reply to
596 any option if TLS has already been initiated.
598 The FORCEDTLS mode of operation has an implementation problem in
599 that the client MAY legally simply send a `NBD_OPT_EXPORT_NAME`
600 to enter transmission mode without previously sending any options.
601 This is avoided by use of `NBD_OPT_INFO` and `NBD_OPT_GO`.
603 #### SELECTIVETLS mode
605 If the server receives `NBD_OPT_STARTTLS` prior to negotiating
606 TLS, it MUST reply with `NBD_REP_ACK` and initiate TLS as set
607 out under 'FORCEDTLS' above. If the server receives
608 `NBD_OPT_STARTTLS` when TLS has already been negotiated, it
609 it MUST reply with `NBD_REP_ERR_INVALID`.
611 If the server receives `NBD_OPT_INFO` or `NBD_OPT_GO` and TLS
612 has not been initiated, it MAY reply with `NBD_REP_ERR_TLS_REQD`
613 if that export is non-existent, and MUST reply with
614 `NBD_REP_ERR_TLS_REQD` if that export is TLS-only.
616 If the server receives a request to enter transmission mode
617 via `NBD_OPT_EXPORT_NAME` on a TLS-only export when TLS has not
618 been initiated, then as this request cannot error, it MUST
619 terminate the session.
621 The server MUST NOT send `NBD_REP_ERR_TLS_REQD` in reply to
622 any option if TLS has already been negotiated. The server
623 MUST NOT send `NBD_REP_ERR_TLS_REQD` in response to any
624 option other than `NBD_OPT_INFO`, `NBD_OPT_GO` and
625 `NBD_OPT_EXPORT_NAME`, and only in those cases in respect of
626 a TLS-only or non-existent export.
628 There is a degenerate case of SELECTIVETLS where all
629 exports are TLS-only. This is permitted in part to make programming
630 of servers easier. Operation is a little different from FORCEDTLS,
631 as the client is not forced to upgrade to TLS prior to any options
632 being processed, and the server MAY choose to give information on
633 non-existent exports via `NBD_OPT_INFO` responses prior to an upgrade
636 ### Client-side requirements
638 If the client supports TLS at all, it MUST be prepared
639 to deal with servers operating in any of the above modes.
640 Notwithstanding, a client MAY always terminate the session or
641 refuse to connect to a particular export if TLS is
642 not available and the user requires TLS.
644 The client MUST NOT issue `NBD_OPT_STARTTLS` unless the server
645 set flag `NBD_FLAG_FIXED_NEWSTYLE` and the client replied
646 with `NBD_FLAG_C_FIXED_NEWSTYLE` in the fixed newstyle
649 The client MUST NOT issue `NBD_OPT_STARTTLS` if TLS has already
652 Subject to the above two limitations, the client MAY send
653 `NBD_OPT_STARTTLS` at any time to initiate a TLS session. If the
654 client receives `NBD_REP_ACK` in response, it MUST immediately
655 upgrade the session to TLS. If it receives `NBD_REP_ERR_UNSUP`,
656 `NBD_REP_ERR_POLICY` or any other error in response, it indicates
657 that the server cannot or will not upgrade the session to TLS,
658 and therefore the client MUST either continue the session
659 without TLS, or terminate the session.
661 A client that prefers to use TLS irrespective of whether
662 the server makes TLS mandatory SHOULD send `NBD_OPT_STARTTLS`
663 as the first option. This will ensure option haggling is subject
664 to TLS, and will thus prevent the possibility of options being
665 compromised by a Man-in-the-Middle attack. Note that the
666 `NBD_OPT_STARTTLS` itself may be compromised - see 'downgrade
667 attacks' for more details. For this reason, a client which only
668 wishes to use TLS SHOULD terminate the session if the
669 `NBD_OPT_STARTTLS` replies with an error.
671 If the TLS handshake is unsuccessful (for instance the server's
672 certificate does not validate) the client MUST terminate the
673 session as by this stage it is too late to continue without TLS.
675 If the client receives an `NBD_REP_ERR_TLS_REQD` in response
676 to any option, it implies that this option cannot be executed
677 unless a TLS upgrade is performed. If the option is any
678 option other than `NBD_OPT_INFO` or `NBD_OPT_GO`, this
679 indicates that no option will succeed unless a TLS upgrade
680 is performed; the client MAY therefore choose to issue
681 an `NBD_OPT_STARTTLS`, or MAY terminate the session (if
682 for instance it does not support TLS or does not have
683 appropriate credentials for this server). If the client
684 receives `NBD_REP_ERR_TLS_REQD` in response to
685 `NBD_OPT_INFO` or `NBD_OPT_GO` this indicates that the
686 export referred to within the option is either non-existent
687 or requires TLS; the client MAY therefore choose to issue
688 an `NBD_OPT_STARTTLS`, MAY terminate the session (if
689 for instance it does not support TLS or does not have
690 appropriate credentials for this server), or MAY continue
691 in another manner without TLS, for instance by querying
692 or using other exports.
694 If a client supports TLS, it SHOULD use `NBD_OPT_GO`
695 (if the server supports it) in place
696 of `NBD_OPT_EXPORT_NAME`. One reason for this is set out in
697 the final paragraphs of the sections under 'FORCEDTLS'
698 and 'SELECTIVETLS': this gives an opportunity for the
699 server to transmit that an error going into transmission
700 mode is due to the client's failure to initiate TLS,
701 and the fact that the client may obtain information about
702 which exports are TLS-only through `NBD_OPT_INFO`. Another reason is
703 that the handshake flag `NBD_FLAG_C_NO_ZEROES` can be altered by a
704 MitM downgrade attack, which can cause a protocol mismatch with
705 `NBD_OPT_EXPORT_NAME` but not with `NBD_OPT_GO`.
707 ### Security considerations
711 NBD implementations supporting TLS MUST support TLS version 1.2,
712 SHOULD support any later versions. NBD implementations
713 MAY support older versions but SHOULD NOT do so by default
714 (i.e. they SHOULD only be available by a configuration change).
715 Older versions SHOULD NOT be used where there is a risk of security
716 problems with those older versions or of a downgrade attack
717 against TLS versions.
719 #### Protocol downgrade attacks
721 A danger inherent in any scheme relying on the negotiation
722 of whether TLS should be employed is downgrade attacks within
725 There are two main dangers:
727 * A Man-in-the-Middle (MitM) hijacks a session and impersonates the
728 server (possibly by proxying it) claiming not to support TLS (for
729 example, by omitting `NBD_FLAG_FIXED_NEWSTYLE` or changing a
730 response to `NBD_OPT_STARTTLS`). In this manner, the client is
731 confused into operating in a plain-text manner with the MitM (with
732 the session possibly being proxied in plain-text to the server using
735 * The MitM hijacks a session and impersonates the client (possibly by
736 proxying it) claiming not to support TLS (for example, by omitting
737 `NBD_FLAG_C_FIXED_NEWSTYLE` or eliding a request for
738 `NBD_OPT_STARTTLS`). In this manner the server is confused into
739 operating in a plain-text manner with the MitM (with the session
740 being possibly proxied to the client with the method above).
742 With regard to the first, any client that does not wish
743 to be subject to potential downgrade attack SHOULD ensure
744 that if a TLS endpoint is specified by the client, it
745 ensures that TLS is negotiated prior to sending or
746 requesting sensitive data. To recap, the client MAY send
747 `NBD_OPT_STARTTLS` at any point during option haggling,
748 and MAY terminate the session if `NBD_REP_ACK` is not
751 With regard to the second, any server that does not wish
752 to be subject to a potential downgrade attack SHOULD either
753 used FORCEDTLS mode, or should force TLS on those exports
754 it is concerned about using SELECTIVE mode and TLS-only
755 exports. It is not possible to avoid downgrade attacks
756 on exports which may be served either via TLS or in plain
757 text unless the client insists on TLS.
761 During transmission phase, several operations are constrained by the
762 export size sent by the final `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`,
763 as well as by three size constraints defined here (minimum block,
764 preferred block, and maximum payload).
766 If a client can honour server size constraints (as set out below and
767 under `NBD_INFO_BLOCK_SIZE`), it SHOULD announce this during the
768 handshake phase by using `NBD_OPT_GO` (and `NBD_OPT_INFO` if used)
769 with an `NBD_INFO_BLOCK_SIZE` information request, and MUST use
770 `NBD_OPT_GO` rather than `NBD_OPT_EXPORT_NAME` (except in the case of
771 a fallback where the server did not support `NBD_OPT_INFO` or
774 A server with size constraints other than the default SHOULD advertise
775 the size constraints during handshake phase via `NBD_INFO_BLOCK_SIZE`
776 in response to `NBD_OPT_INFO` or `NBD_OPT_GO`, and MUST do so unless
777 it has agreed on size constraints via out of band means.
779 Some servers are able to make optimizations, such as opening files
780 with `O_DIRECT`, if they know that the client will obey a particular
781 minimum block size, where it must fall back to safer but slower code
782 if the client might send unaligned requests. For that reason, if a
783 client issues an `NBD_OPT_GO` including an `NBD_INFO_BLOCK_SIZE`
784 information request, it MUST abide by the size constraints it
785 receives. Clients MAY issue `NBD_OPT_INFO` with `NBD_INFO_BLOCK_SIZE`
786 to learn the server's constraints without committing to them.
788 If size constraints have not been advertised or agreed on externally,
789 then a server SHOULD support a default minimum block size of 1, a
790 preferred block size of 2^12 (4,096), and a maximum payload size that
791 is at least 2^25 (33,554,432) (even if the export size is smaller);
792 while a client desiring maximum interoperability SHOULD constrain its
793 requests to a minimum block size of 2^9 (512), and limit
794 `NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum payload size
795 of 2^25 (33,554,432). A server that wants to enforce size constraints
796 other than the defaults specified here MAY refuse to go into
797 transmission phase with a client that uses `NBD_OPT_EXPORT_NAME` (via
798 a hard disconnect) or which uses `NBD_OPT_GO` without requesting
799 `NBD_INFO_BLOCK_SIZE` (via an error reply of
800 `NBD_REP_ERR_BLOCK_SIZE_REQD`); but servers SHOULD NOT refuse clients
801 that do not request sizing information when the server supports
802 default sizing or where sizing constraints can be agreed on
803 externally. When allowing clients that did not negotiate sizing via
804 NBD, a server that enforces stricter size constraints than the
805 defaults MUST cleanly error commands that fall outside the constraints
806 without corrupting data; even so, enforcing constraints in this manner
807 may limit interoperability.
809 A client MAY choose to operate as if tighter size constraints had been
810 specified (for example, even when the server advertises the default
811 minimum block size of 1, a client may safely use a minimum block size
814 The minimum block size represents the smallest addressable length and
815 alignment within the export, although writing to an area that small
816 may require the server to use a less-efficient read-modify-write
817 action. If advertised, this value MUST be a power of 2, MUST NOT be
818 larger than 2^16 (65,536), and MAY be as small as 1 for an export
819 backed by a regular file, although the values of 2^9 (512) or 2^12
820 (4,096) are more typical for an export backed by a block device. If a
821 server advertises a minimum block size, the advertised export size
822 SHOULD be an integer multiple of that block size, since otherwise, the
823 client would be unable to access the final few bytes of the export.
825 The preferred block size represents the minimum size at which aligned
826 requests will have efficient I/O, avoiding behaviour such as
827 read-modify-write. If advertised, this MUST be a power of 2 at least
828 as large as the maximum of the minimum block size and 2^9 (512),
829 although larger values (such as 4,096, or even the minimum granularity
830 of a hole) are more typical. The preferred block size MAY be larger
831 than the export size, in which case the client is unable to utilize
832 the preferred block size for that export. The server MAY advertise an
833 export size that is not an integer multiple of the preferred block
836 The maximum payload size represents the maximum payload length that
837 the server is willing to handle in one request from the client. If
838 advertised, it MAY be something other than a power of 2, but MUST be
839 at least as large as the preferred block size, and SHOULD be at least
840 2^20 (1,048,576) if the export is that large. Advertising a maximum
841 payload size of 0xffffffff is permitted when the server does not have
842 a fixed limit on client request payloads. Typically, the advertised
843 maximum payload length is independent of the export size, even though
844 the actual payloads for read and write cannot successfully exceed the
845 constraints given by the export size and offset of a request.
846 Notwithstanding any maximum payload size advertised, either the server
847 or the client MAY initiate a hard disconnect if a payload length of
848 either a request or a reply would be large enough to be deemed a
849 denial of service attack; however, for maximum portability, any
850 payload not exceeding 2^25 (33,554,432) bytes SHOULD NOT be considered
851 a denial of service attack, even if that length is larger than the
852 advertised maximum payload size.
854 For commands that require a payload in either direction and where the
855 client controls the payload length (`NBD_CMD_WRITE`, or `NBD_CMD_READ`
856 with simple replies), the client MUST NOT request a length larger than
857 the maximum payload size. For replies where the payload length is
858 controlled by the server (`NBD_CMD_BLOCK_STATUS` without the flag
859 `NBD_CMD_FLAG_REQ_ONE`, or `NBD_CMD_READ` when structured replies are
860 negotiated), the server MAY exceed the maximum payload by the fixed
861 amount of overhead required in the structured reply (for example, a
862 server that advertises a maximum payload of 2^25 bytes may return
863 2^25+8 payload bytes in a single `NBD_REPLY_TYPE_OFFSET_DATA` chunk,
864 rather than splitting the reply across two chunks), although it MUST
865 honor any additional payload constraints documented for a particular
866 command. For commands that do not require a payload in either
867 direction (such as `NBD_CMD_TRIM` or `NBD_CMD_WRITE_ZEROES`), the
868 client MAY request an effect length larger than the maximum payload
869 size; the server SHOULD NOT disconnect, but MAY reply with an
870 `NBD_EOVERFLOW` or `NBD_EINVAL` error if the oversize request would
871 require too many server resources when compared to the same command
872 with an effect length limited to the maximum payload size (such as an
873 implementation of `NBD_CMD_WRITE_ZEROES` that utilizes a scratch
876 Where a transmission request can have a nonzero *offset* and/or
877 *length* (such as `NBD_CMD_READ`, `NBD_CMD_WRITE`, or `NBD_CMD_TRIM`),
878 the client MUST ensure that *offset* and *length* are integer
879 multiples of any advertised minimum block size, and SHOULD use integer
880 multiples of any advertised preferred block size where possible. For
881 those requests, the client MUST NOT use a *length* which, when added to
882 *offset*, would exceed the export size. The server SHOULD report an
883 `NBD_EINVAL` error if the client's request is not aligned to advertised
884 minimum block size boundaries or would exceed the export size.
888 It is often helpful for the client to be able to query the status of a
889 range of blocks. The nature of the status that can be queried is in
890 part implementation dependent. For instance, the status might
893 * in a sparse storage format, whether the relevant blocks are actually
894 present on the backing device for the export; or
896 * whether the relevant blocks are 'dirty'; some storage formats and
897 operations over such formats express a concept of data dirtiness.
898 Whether the operation is block device mirroring, incremental block
899 device backup or any other operation with a concept of data
900 dirtiness, they all share a need to provide a list of ranges that
901 this particular operation treats as dirty.
903 To provide such classes of information, the NBD protocol has a generic
904 framework for querying metadata; however, its use must first be
905 negotiated, and one or more metadata contexts must be selected.
907 The procedure works as follows:
909 - First, during negotiation, if the client wishes to query metadata
910 during transmission, the client MUST select one or more metadata
911 contexts with the `NBD_OPT_SET_META_CONTEXT` command. If needed, the
912 client can use `NBD_OPT_LIST_META_CONTEXT` to list contexts that the
914 - During transmission, a client can then indicate interest in metadata
915 for a given region by way of the `NBD_CMD_BLOCK_STATUS` command,
916 where *offset* and *length* indicate the area of interest. On
917 success, the server MUST respond with one structured reply chunk of
918 type `NBD_REPLY_TYPE_BLOCK_STATUS` per metadata context selected
919 during negotiation, where each reply chunk is a list of one or more
920 consecutive extents for that context. Each extent comes with a
921 *flags* field, the semantics of which are defined by the metadata
924 The client's requested *length* is only a hint to the server, so the
925 cumulative extent length contained in a chunk of the server's reply
926 may be shorter or longer the original request. When more than one
927 metadata context was negotiated, the reply chunks for the different
928 contexts of a single block status request need not have the same
929 number of extents or cumulative extent length.
931 In the request, the client may use the `NBD_CMD_FLAG_REQ_ONE` command
932 flag to further constrain the server's reply so that each chunk
933 contains exactly one extent whose length does not exceed the client's
936 A client MUST NOT use `NBD_CMD_BLOCK_STATUS` unless it selected a
937 nonzero number of metadata contexts during negotiation, and used the
938 same export name for the subsequent `NBD_OPT_GO` (or
939 `NBD_OPT_EXPORT_NAME`). Servers SHOULD reply with `NBD_EINVAL` to clients
940 sending `NBD_CMD_BLOCK_STATUS` without selecting at least one metadata
943 The reply to the `NBD_CMD_BLOCK_STATUS` request MUST be sent as a
944 structured reply; this implies that in order to use metadata querying,
945 structured replies MUST be negotiated first.
947 Metadata contexts are identified by their names. The name MUST consist
948 of a namespace, followed by a colon, followed by a leaf-name. The
949 namespace must consist entirely of printable non-whitespace UTF-8
950 characters other than colons, and be non-empty. The entire name
951 (namespace, colon, and leaf-name) MUST follow the restrictions for
952 strings as laid out earlier in this document.
954 Namespaces MUST be consist of one of the following:
955 - `base`, for metadata contexts defined by this document;
956 - `nbd-server`, for metadata contexts defined by the implementation
957 that accompanies this document (none currently);
958 - `x-*`, where `*` can be replaced by an arbitrary string not
959 containing colons, for local experiments. This SHOULD NOT be used
960 by metadata contexts that are expected to be widely used.
961 - A third-party namespace from the list below.
963 Third-party implementations can register additional namespaces by
964 simple request to the mailing-list. The following additional
965 third-party namespaces are currently registered:
966 * `qemu`, maintained by [qemu.org](https://www.qemu.org/docs/master/interop/nbd.html)
968 Save in respect of the `base:` namespace described below, this specification
969 requires no specific semantics of metadata contexts, except that all the
970 information they provide MUST be representable within the flags field as
971 defined for `NBD_REPLY_TYPE_BLOCK_STATUS`. Likewise, save in respect of
972 the `base:` namespace, the syntax of query strings is not specified by this
973 document, other than the recommendation that the empty leaf-name makes
974 sense as a wildcard for a client query during `NBD_OPT_LIST_META_CONTEXT`,
975 but SHOULD NOT select any contexts during `NBD_OPT_SET_META_CONTEXT`.
977 Server implementations SHOULD ensure the syntax for query strings they
978 support and semantics for resulting metadata context is documented
979 similarly to this document.
981 ### The `base:` metadata namespace
983 This standard defines exactly one metadata context; it is called
984 `base:allocation`, and it provides information on the basic allocation
985 status of extents (that is, whether they are allocated at all in a
986 sparse file context).
988 The query string within the `base:` metadata context can take one of
991 * `base:` - the server MUST ignore this form during
992 `NBD_OPT_SET_META_CONTEXT`, and MUST support this as a wildcard
993 during `NBD_OPT_LIST_META_CONTEXT`, in which case the server's reply
994 will contain a response for each supported metadata context within
995 the `base:` namespace (currently just `base:allocation`, although a
996 future revision of the standard might return multiple contexts); or
997 * `base:[leaf-name]` to select `[leaf-name]` as a context leaf-name
998 that might exist within the `base` namespace. If a `[leaf-name]`
999 requested by the client is not recognized, the server MUST ignore it
1000 rather than report an error.
1002 #### `base:allocation` metadata context
1004 The `base:allocation` metadata context is the basic "allocated at all"
1005 metadata context. If an extent is marked with `NBD_STATE_HOLE` at that
1006 context, this means that the given extent is not allocated in the
1007 backend storage, and that writing to the extent MAY result in the
1008 `NBD_ENOSPC` error. This supports sparse file semantics on the server
1009 side. If a server supports the `base:allocation` metadata context,
1010 then writing to an extent which has `NBD_STATE_HOLE` clear MUST NOT
1011 fail with `NBD_ENOSPC` unless for reasons specified in the definition of
1014 It defines the following flags for the flags field:
1016 - `NBD_STATE_HOLE` (bit 0): if set, the block represents a hole (and
1017 future writes to that area may cause fragmentation or encounter an
1018 `NBD_ENOSPC` error); if clear, the block is allocated or the server
1019 could not otherwise determine its status. Note that the use of
1020 `NBD_CMD_TRIM` is related to this status, but that the server MAY
1021 report a hole even where `NBD_CMD_TRIM` has not been requested, and
1022 also that a server MAY report that the block is allocated even where
1023 `NBD_CMD_TRIM` has been requested.
1024 - `NBD_STATE_ZERO` (bit 1): if set, the block contents read as all
1025 zeroes; if clear, the block contents are not known. Note that the
1026 use of `NBD_CMD_WRITE_ZEROES` is related to this status, but that
1027 the server MAY report zeroes even where `NBD_CMD_WRITE_ZEROES` has
1028 not been requested, and also that a server MAY report unknown
1029 content even where `NBD_CMD_WRITE_ZEROES` has been requested.
1031 It is not an error for a server to report that a region of the export
1032 has both `NBD_STATE_HOLE` set and `NBD_STATE_ZERO` clear. The contents
1033 of such an area are undefined, and a client reading such an area
1034 should make no assumption as to its contents or stability.
1036 For the `base:allocation` context, the remainder of the flags field is
1037 reserved. Servers SHOULD set it to all-zero; clients MUST ignore
1042 This section describes the value and meaning of constants (other than
1043 magic numbers) in the protocol.
1045 When flags fields are specified, they are numbered in network byte
1052 ##### Handshake flags
1054 This field of 16 bits is sent by the server after the `INIT_PASSWD` and
1055 the first magic number.
1057 - bit 0, `NBD_FLAG_FIXED_NEWSTYLE`; MUST be set by servers that
1058 support the fixed newstyle protocol
1059 - bit 1, `NBD_FLAG_NO_ZEROES`; if set, and if the client replies with
1060 `NBD_FLAG_C_NO_ZEROES` in the client flags field, the server MUST NOT
1061 send the 124 bytes of zero when the client ends negotiation with
1062 `NBD_OPT_EXPORT_NAME`.
1064 The server MUST NOT set any other flags, and SHOULD NOT change behaviour
1065 unless the client responds with a corresponding flag. The server MUST
1066 NOT set any of these flags during oldstyle negotiation.
1068 It is unlikely that additional capability flags will be defined in the
1069 NBD protocol since this phase is susceptible to MitM downgrade attacks
1070 when using TLS. Rather, additional features are best negotiated using
1075 This field of 32 bits is sent after initial connection and after
1076 receiving the handshake flags from the server.
1078 - bit 0, `NBD_FLAG_C_FIXED_NEWSTYLE`; SHOULD be set by clients that
1079 support the fixed newstyle protocol. Servers MAY choose to honour
1080 fixed newstyle from clients that didn't set this bit, but relying on
1081 this isn't recommended.
1082 - bit 1, `NBD_FLAG_C_NO_ZEROES`; MUST NOT be set if the server did not
1083 set `NBD_FLAG_NO_ZEROES`. If set, the server MUST NOT send the 124
1084 bytes of zeroes when the client ends negotiation with
1085 `NBD_OPT_EXPORT_NAME`.
1087 Clients MUST NOT set any other flags; the server MUST drop the TCP
1088 connection if the client sets an unknown flag, or a flag that does
1089 not match something advertised by the server.
1091 ##### Transmission flags
1093 This field of 16 bits is sent by the server after option haggling, or
1094 immediately after the handshake flags field in oldstyle negotiation.
1096 Many of these flags allow the server to expose to the client which
1097 features it understands (in which case they are documented below
1098 as "`NBD_FLAG_XXX` exposes feature `YYY`"). In each case, the server
1099 MAY set the flag for features it supports. The server MUST NOT set the
1100 flag for features it does not support. The client MUST NOT use a feature
1101 documented as 'exposed' by a flag unless that flag was set.
1103 The field has the following format:
1105 - bit 0, `NBD_FLAG_HAS_FLAGS`: MUST always be 1.
1106 - bit 1, `NBD_FLAG_READ_ONLY`: The server MAY set this flag to indicate
1107 to the client that the export is read-only (exports might be read-only
1108 in a manner undetectable to the server, for instance because of
1109 permissions). If this flag is set, the server MUST error subsequent
1110 write operations to the export.
1111 - bit 2, `NBD_FLAG_SEND_FLUSH`: exposes support for `NBD_CMD_FLUSH`.
1112 - bit 3, `NBD_FLAG_SEND_FUA`: exposes support for `NBD_CMD_FLAG_FUA`.
1113 - bit 4, `NBD_FLAG_ROTATIONAL`: the server MAY set this flag to 1 to
1114 inform the client that the export has the characteristics of a rotational
1115 medium, and the client MAY schedule I/O accesses in a manner corresponding
1116 to the setting of this flag.
1117 - bit 5, `NBD_FLAG_SEND_TRIM`: exposes support for `NBD_CMD_TRIM`.
1118 - bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`: exposes support for
1119 `NBD_CMD_WRITE_ZEROES` and `NBD_CMD_FLAG_NO_HOLE`.
1120 - bit 7, `NBD_FLAG_SEND_DF`: do not fragment a structured reply. The
1121 server MUST set this transmission flag to 1 if the
1122 `NBD_CMD_READ` request supports the `NBD_CMD_FLAG_DF` flag, and
1123 MUST leave this flag clear if structured replies have not been
1124 negotiated. Clients MUST NOT set the `NBD_CMD_FLAG_DF` request
1125 flag unless this transmission flag is set.
1126 - bit 8, `NBD_FLAG_CAN_MULTI_CONN`: Indicates that the server operates
1127 entirely without cache, or that the cache it uses is shared among all
1128 connections to the given device. In particular, if this flag is
1129 present, then the effects of `NBD_CMD_FLUSH` and `NBD_CMD_FLAG_FUA`
1130 MUST be visible across all connections when the server sends its reply
1131 to that command to the client. In the absence of this flag, clients
1132 SHOULD NOT multiplex their commands over more than one connection to
1134 - bit 9, `NBD_FLAG_SEND_RESIZE`: defined by the experimental `RESIZE`
1135 [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-resize/doc/proto.md).
1136 - bit 10, `NBD_FLAG_SEND_CACHE`: documents that the server understands
1137 `NBD_CMD_CACHE`; however, note that server implementations exist
1138 which support the command without advertising this bit, and
1139 conversely that this bit does not guarantee that the command will
1140 succeed or have an impact.
1141 - bit 11, `NBD_FLAG_SEND_FAST_ZERO`: allow clients to detect whether
1142 `NBD_CMD_WRITE_ZEROES` is faster than a corresponding write. The
1143 server MUST set this transmission flag to 1 if the
1144 `NBD_CMD_WRITE_ZEROES` request supports the `NBD_CMD_FLAG_FAST_ZERO`
1145 flag, and MUST set this transmission flag to 0 if
1146 `NBD_FLAG_SEND_WRITE_ZEROES` is not set. Servers MAY set this this
1147 transmission flag even if it will always use `NBD_ENOTSUP` failures for
1148 requests with `NBD_CMD_FLAG_FAST_ZERO` set (such as if the server
1149 cannot quickly determine whether a particular write zeroes request
1150 will be faster than a regular write). Clients MUST NOT set the
1151 `NBD_CMD_FLAG_FAST_ZERO` request flag unless this transmission flag
1153 - bit 12, `NBD_FLAG_BLOCK_STATUS_PAYLOAD`; defined by the experimental
1155 [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md).
1157 Clients SHOULD ignore unknown flags.
1161 These values are used in the "option" field during the option haggling
1162 of the newstyle negotiation.
1164 * `NBD_OPT_EXPORT_NAME` (1)
1166 Choose the export which the client would like to use, end option
1167 haggling, and proceed to the transmission phase.
1169 Data: String, name of the export, as free-form text.
1170 The length of the name is determined from the option header. If the
1171 chosen export does not exist or requirements for the chosen export
1172 are not met (e.g., the client did not initiate TLS for an export
1173 where the server requires it), the server MUST terminate the
1176 A special, "empty", name (i.e., the length field is zero and no name
1177 is specified), is reserved for a "default" export, to be used in cases
1178 where explicitly specifying an export name makes no sense.
1180 This is the only valid option in nonfixed newstyle negotiation. A
1181 server which wishes to use any other option MUST support fixed
1184 A major problem of this option is that it does not support the
1185 return of error messages to the client in case of problems. To
1186 remedy this, `NBD_OPT_GO` has been introduced (see below).
1187 A client thus SHOULD use `NBD_OPT_GO` in preference to
1188 `NBD_OPT_EXPORT_NAME` but SHOULD fall back to `NBD_OPT_EXPORT_NAME`
1189 if `NBD_OPT_GO` is not supported (not falling back will prevent
1190 it from connecting to old servers).
1192 * `NBD_OPT_ABORT` (2)
1194 The client desires to abort the negotiation and terminate the
1195 session. The server MUST reply with `NBD_REP_ACK`.
1197 The client SHOULD NOT send any additional data with the option;
1198 however, a server SHOULD ignore any data sent by the client rather
1199 than rejecting the request as invalid.
1201 Previous versions of this document were unclear on whether
1202 the server should send a reply to `NBD_OPT_ABORT`. Therefore
1203 the client SHOULD gracefully handle the server closing the
1204 connection after receiving an `NBD_OPT_ABORT` without it
1205 sending a reply. Similarly the server SHOULD gracefully handle
1206 the client sending an `NBD_OPT_ABORT` and closing the connection
1207 without waiting for a reply.
1209 * `NBD_OPT_LIST` (3)
1211 Return zero or more `NBD_REP_SERVER` replies, one for each export,
1212 followed by `NBD_REP_ACK` or an error (such as
1213 `NBD_REP_ERR_SHUTDOWN`). The server MAY omit entries from this
1214 list if TLS has not been negotiated, the server is operating in
1215 SELECTIVETLS mode, and the entry concerned is a TLS-only export.
1217 The client MUST NOT send any additional data with the option, and
1218 the server SHOULD reject a request that includes data with
1219 `NBD_REP_ERR_INVALID`.
1221 * `NBD_OPT_PEEK_EXPORT` (4)
1223 Was defined by the (withdrawn) experimental `PEEK_EXPORT` extension;
1226 * `NBD_OPT_STARTTLS` (5)
1228 The client wishes to initiate TLS.
1230 The client MUST NOT send any additional data with the option. The
1231 server MUST either reply with `NBD_REP_ACK` after which point the
1232 connection is upgraded to TLS, or an error reply explicitly
1233 permitted by this document (for example, `NBD_REP_ERR_INVALID` if
1234 the client included data).
1236 When this command succeeds, the server MUST NOT preserve any
1237 negotiation state (such as a request for
1238 `NBD_OPT_STRUCTURED_REPLY`, or metadata contexts from
1239 `NBD_OPT_SET_META_CONTEXT`) issued before this command. A client
1240 SHOULD defer all stateful option requests until after it
1241 determines whether encryption is available.
1243 See the section on TLS above for further details.
1245 * `NBD_OPT_INFO` (6) and `NBD_OPT_GO` (7)
1247 Both options have identical formats for requests and replies. The only
1248 difference is that after a successful reply to `NBD_OPT_GO` (i.e. one
1249 or more `NBD_REP_INFO` then an `NBD_REP_ACK`), transmission mode is
1250 entered immediately. Therefore these commands share common
1253 `NBD_OPT_INFO`: The client wishes to get details about an export
1254 with the given name for use in the transmission phase, but does
1255 not yet want to move to the transmission phase. When successful,
1256 this option provides more details than `NBD_OPT_LIST`, but only
1257 for a single export name.
1259 `NBD_OPT_GO`: The client wishes to terminate the handshake phase
1260 and progress to the transmission phase. This client MAY issue this
1261 command after an `NBD_OPT_INFO`, or MAY issue it without a
1262 previous `NBD_OPT_INFO`. `NBD_OPT_GO` can thus be used as an
1263 improved version of `NBD_OPT_EXPORT_NAME` that is capable of
1266 Data (both commands):
1268 - 32 bits, length of name (unsigned); MUST be no larger than the
1269 option data length - 6
1270 - String: name of the export
1271 - 16 bits, number of information requests
1272 - 16 bits x n - list of `NBD_INFO` information requests
1274 The client MAY list one or more items of specific information it
1275 is seeking in the list of information requests, or it MAY specify
1276 an empty list. The client MUST NOT include any information request
1277 in the list more than once. The server MUST ignore any information
1278 requests it does not understand. The server MAY reply to the
1279 information requests in any order. The server MAY ignore information
1280 requests that it does not wish to supply for policy reasons (other
1281 than `NBD_INFO_EXPORT`). Equally the client MAY refuse to negotiate
1282 if not supplied information it has requested. The server MAY send
1283 information requests back which are not explicitly requested, but
1284 the server MUST NOT assume that such information requests are
1285 understood and respected by the client unless the client explicitly
1286 asked for them. The client MUST ignore information replies it
1287 does not understand.
1289 If no name is specified (i.e. a zero length string is provided),
1290 this specifies the default export (if any), as with
1291 `NBD_OPT_EXPORT_NAME`.
1293 The server replies with a number of `NBD_REP_INFO` replies (as few
1294 as zero if an error is reported, at least one on success), then
1295 concludes the list of information with a final error reply or with
1296 a declaration of success, as follows:
1298 - `NBD_REP_ACK`: The server accepts the chosen export, and has
1299 completed providing information. In this case, the server MUST
1300 send at least one `NBD_REP_INFO`, with an `NBD_INFO_EXPORT`
1302 - `NBD_REP_ERR_UNKNOWN`: The chosen export does not exist on this
1303 server. In this case, the server SHOULD NOT send `NBD_REP_INFO`
1305 - `NBD_REP_ERR_TLS_REQD`: The server requires the client to
1306 initiate TLS before any revealing any further details about this
1307 export. In this case, a FORCEDTLS server MUST NOT send
1308 `NBD_REP_INFO` replies, but a SELECTIVETLS server MAY do so if
1309 this is a TLS-only export.
1310 - `NBD_REP_ERR_BLOCK_SIZE_REQD`: The server requires the client to
1311 request size constraints using `NBD_INFO_BLOCK_SIZE` prior to
1312 entering transmission phase, because the server will be using
1313 non-default size constraints. The server MUST NOT send this
1314 error if size constraints were requested with
1315 `NBD_INFO_BLOCK_SIZE` with the `NBD_OPT_INFO` or `NBD_OPT_GO`
1316 request. The server SHOULD NOT send this error if it is using
1317 default size constraints or size constraints negotiated out of
1318 band. A server sending an `NBD_REP_ERR_BLOCK_SIZE_REQD` error
1319 SHOULD ensure it first sends an `NBD_INFO_BLOCK_SIZE`
1320 information reply in order to help avoid a potentially
1321 unnecessary round trip.
1323 Additionally, if TLS has not been initiated, the server MAY reply
1324 with `NBD_REP_ERR_TLS_REQD` (instead of `NBD_REP_ERR_UNKNOWN`) to
1325 requests for exports that are unknown. This is so that clients
1326 that have not initiated TLS cannot enumerate exports. A
1327 SELECTIVETLS server that chooses to hide unknown exports in this
1328 manner SHOULD NOT send `NBD_REP_INFO` replies for a TLS-only
1331 For backwards compatibility, clients SHOULD be prepared to also
1332 handle `NBD_REP_ERR_UNSUP` by falling back to using `NBD_OPT_EXPORT_NAME`.
1334 Other errors (such as `NBD_REP_ERR_SHUTDOWN`) are also possible,
1335 as permitted elsewhere in this document, with no constraints on
1336 the number of preceding `NBD_REP_INFO`.
1338 If there are no intervening option requests between a successful
1339 `NBD_OPT_INFO` (that is, one where the reply ended with a final
1340 `NBD_REP_ACK`) and an `NBD_OPT_GO` with the same parameters
1341 (including the list of information items requested), then
1342 the server MUST reply with the same set of information, such as
1343 transmission flags in the `NBD_INFO_EXPORT` reply, although the
1344 ordering of the intermediate `NBD_REP_INFO` messages MAY differ.
1345 Otherwise, due to the intervening option requests or the use of
1346 different parameters, the server MAY send different data in the
1347 successful response, and/or MAY fail the second request.
1349 The reply to an `NBD_OPT_GO` is identical to the reply to
1350 `NBD_OPT_INFO` save that if the reply indicates success (i.e. ends
1351 with `NBD_REP_ACK`), the client and the server both immediately
1352 enter the transmission phase. The server MUST NOT send any zero
1353 padding bytes after the `NBD_REP_ACK` data, whether or not the
1354 client negotiated the `NBD_FLAG_C_NO_ZEROES` flag. The client MUST
1355 NOT send further option requests unless the final reply from the
1356 server indicates an error.
1360 See above under `NBD_OPT_INFO`.
1362 * `NBD_OPT_STRUCTURED_REPLY` (8)
1364 The client wishes to use structured replies during the
1365 transmission phase. The client MUST NOT send any additional data
1366 with the option, and the server SHOULD reject a request that
1367 includes data with `NBD_REP_ERR_INVALID`.
1369 The server replies with the following, or with an error permitted
1370 elsewhere in this document:
1372 - `NBD_REP_ACK`: Structured replies have been negotiated; the
1373 server MUST use structured replies to the `NBD_CMD_READ`
1374 transmission request. Other extensions that require structured
1375 replies may now be negotiated.
1376 - For backwards compatibility, clients SHOULD be prepared to also
1377 handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies
1380 It is envisioned that future extensions will add other new
1381 requests that may require a data payload in the reply. A server
1382 that supports such extensions SHOULD NOT advertise those
1383 extensions until the client negotiates structured replies; and a
1384 client MUST NOT make use of those extensions without first
1385 enabling the `NBD_OPT_STRUCTURED_REPLY` extension.
1387 If the client requests `NBD_OPT_STARTTLS` after this option, it
1388 MUST renegotiate structured replies and any other dependent
1389 extensions that it desires to use.
1391 * `NBD_OPT_LIST_META_CONTEXT` (9)
1393 Return a list of `NBD_REP_META_CONTEXT` replies, one per context,
1394 followed by an `NBD_REP_ACK` or an error.
1396 This option SHOULD NOT be requested unless structured replies have
1397 been negotiated first. If a client attempts to do so, a server
1398 MAY send `NBD_REP_ERR_INVALID`.
1401 - 32 bits, length of export name.
1402 - String, name of export for which we wish to list metadata
1404 - 32 bits, number of queries
1405 - Zero or more queries, each being:
1406 - 32 bits, length of query.
1407 - String, query to list a subset of the available metadata
1408 contexts. The syntax of this query is
1409 implementation-defined, except that it MUST start with a
1410 namespace and a colon.
1412 For details on the query string, see the "Metadata querying"
1413 section; note that a namespace may document that a different set
1414 of queries are valid for `NBD_OPT_LIST_META_CONTEXT` than for
1415 `NBD_OPT_SET_META_CONTEXT`, such as when using an empty leaf-name
1418 If the option request is syntactically invalid (such as a query
1419 length that would require reading beyond the original length given
1420 in the option header), the server MUST fail the request with
1421 `NBD_REP_ERR_INVALID`. For requests that are semantically invalid
1422 (such as lacking the required colon that delimits the namespace,
1423 or using a leaf name that is invalid for a known namespace), the
1424 server MAY fail the request with `NBD_REP_ERR_INVALID`. However,
1425 the server MUST ignore query strings belonging to an unknown
1426 namespace. If none of the query strings find any metadata
1427 contexts, the server MUST send a single reply of type
1430 The server MUST reply with a list of zero or more
1431 `NBD_REP_META_CONTEXT` replies, followed by either a final
1432 `NBD_REP_ACK` on success or by an error (for instance
1433 `NBD_REP_ERR_UNSUP` if the option is not supported). If an error
1434 is returned, the client MUST disregard any context replies that
1437 If zero queries are sent, then the server MUST return all the
1438 metadata contexts that are available to the client to select on
1439 the given export. However, this list may include wildcards that
1440 require a further `NBD_OPT_LIST_META_CONTEXT` with the wildcard as
1441 a query, rather than an actual context that is appropriate as a
1442 query to `NBD_OPT_SET_META_CONTEXT`, as set out below. In this
1443 case, the server SHOULD NOT fail with `NBD_REP_ERR_TOO_BIG`.
1445 If one or more queries are sent, then the server MUST return those
1446 metadata contexts that are available to the client to select on
1447 the given export with `NBD_OPT_SET_META_CONTEXT`, and which match
1448 one or more of the queries given. The support of wildcarding
1449 within the leaf-name portion of the query string is dependent upon
1450 the namespace. The server MAY send contexts in a different order
1451 than in the client's query. In this case, the server MAY fail
1452 with `NBD_REP_ERR_TOO_BIG` if too many queries are requested.
1454 In either case, however, for any given namespace the server MAY,
1455 instead of exhaustively listing every matching context available
1456 to select (or every context available to select where no query is
1457 given), send sufficient context records back to allow a client
1458 with knowledge of the namespace to select any context. This may
1459 be helpful where a client can construct algorithmic queries. For
1460 instance, a client might reply simply with the namespace with no
1461 leaf-name (e.g. 'x-FooBar:') or with a range of values (e.g.
1462 'x-ModifiedDate:20160310-20161214'). The semantics of such a reply
1463 are a matter for the definition of the namespace. However each
1464 namespace returned MUST begin with the relevant namespace,
1465 followed by a colon, and then other UTF-8 characters, with the
1466 entire string following the restrictions for strings set out
1467 earlier in this document.
1469 The metadata context ID in these replies is reserved and SHOULD be
1470 set to zero; clients MUST disregard it.
1472 * `NBD_OPT_SET_META_CONTEXT` (10)
1474 Change the set of active metadata contexts. Issuing this command
1475 replaces all previously-set metadata contexts (including when this
1476 command fails); clients must ensure that all metadata contexts
1477 they are interested in are selected with the final query that they
1480 This option MUST NOT be requested unless structured replies have
1481 been negotiated first. If a client attempts to do so, a server
1482 SHOULD send `NBD_REP_ERR_INVALID`.
1484 A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the
1485 negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least
1486 once, and where the final time it was sent, it referred to the
1487 same export name that was ultimately selected for transmission
1488 phase with no intervening `NBD_OPT_STARTTLS`, and where the server
1489 responded by returning least one metadata context without error.
1492 - 32 bits, length of export name.
1493 - String, name of export for which we wish to list metadata
1495 - 32 bits, number of queries
1496 - Zero or more queries, each being:
1497 - 32 bits, length of query
1498 - String, query to select metadata contexts. The syntax of this
1499 query is implementation-defined, except that it MUST start with a
1500 namespace and a colon.
1502 If zero queries are sent, the server MUST select no metadata
1505 The server MAY return `NBD_REP_ERR_TOO_BIG` if a request seeks to
1506 select too many contexts. Otherwise the server MUST reply with a
1507 number of `NBD_REP_META_CONTEXT` replies, one for each selected
1508 metadata context, each with a unique metadata context ID, followed
1509 by `NBD_REP_ACK`. The server MAY ignore queries that do not select
1510 a single metadata context, and MAY return selected contexts in a
1511 different order than in the client's request. The metadata
1512 context ID is transient and may vary across calls to
1513 `NBD_OPT_SET_META_CONTEXT`; clients MUST therefore treat the ID as
1514 an opaque value and not (for instance) cache it between
1515 connections. It is not an error if a `NBD_OPT_SET_META_CONTEXT`
1516 option does not select any metadata context, provided the client
1517 then does not attempt to issue `NBD_CMD_BLOCK_STATUS` commands.
1519 * `NBD_OPT_EXTENDED_HEADERS` (11)
1521 Defined by the experimental `EXTENDED_HEADERS`
1522 [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md).
1524 #### Option reply types
1526 These values are used in the "reply type" field, sent by the server
1527 during option haggling in the fixed newstyle negotiation.
1531 Will be sent by the server when it accepts the option and no further
1532 information is available, or when sending data related to the option
1533 (in the case of `NBD_OPT_LIST`) has finished. No data.
1535 * `NBD_REP_SERVER` (2)
1537 A description of an export. Data:
1539 - 32 bits, length of name (unsigned); MUST be no larger than the
1540 reply packet header length - 4
1541 - String, name of the export, as expected by `NBD_OPT_EXPORT_NAME`,
1542 `NBD_OPT_INFO`, or `NBD_OPT_GO`
1543 - If length of name < (reply packet header length - 4), then the
1544 rest of the data contains some implementation-specific details
1545 about the export. This is not currently implemented, but future
1546 versions of nbd-server may send along some details about the
1547 export. Therefore, unless explicitly documented otherwise by a
1548 particular client request, this field is defined to be a string
1549 suitable for direct display to a human being.
1551 * `NBD_REP_INFO` (3)
1553 A detailed description about an aspect of an export. The response
1554 to `NBD_OPT_INFO` and `NBD_OPT_GO` includes zero or more of these
1555 messages prior to a final error reply, or at least one before an
1556 `NBD_REP_ACK` reply indicating success. The server MUST send an
1557 `NBD_INFO_EXPORT` information type at some point before sending an
1558 `NBD_REP_ACK`, so that `NBD_OPT_GO` can provide a superset of the
1559 information given in response to `NBD_OPT_EXPORT_NAME`; all other
1560 information types are optional. A particular information type
1561 SHOULD only appear once for a given export unless documented
1564 A client MUST NOT rely on any particular ordering amongst the
1565 `NBD_OPT_INFO` replies, and MUST ignore information types that it
1568 The acceptable values for the header *length* field are determined
1569 by the information type, and includes the 2 bytes for the type
1570 designator, in the following general layout:
1572 - 16 bits, information type (e.g. `NBD_INFO_EXPORT`)
1573 - *length - 2* bytes, information payload
1575 The following information types are defined:
1577 * `NBD_INFO_EXPORT` (0)
1579 Mandatory information before a successful completion of
1580 `NBD_OPT_INFO` or `NBD_OPT_GO`. Describes the same information
1581 that is sent in response to the older `NBD_OPT_EXPORT_NAME`,
1582 except that there are no trailing zeroes whether or not
1583 `NBD_FLAG_C_NO_ZEROES` was negotiated. *length* MUST be 12, and
1584 the reply payload is interpreted as follows:
1586 - 16 bits, `NBD_INFO_EXPORT`
1587 - 64 bits, size of the export in bytes (unsigned)
1588 - 16 bits, transmission flags
1590 * `NBD_INFO_NAME` (1)
1592 Represents the server's canonical name of the export. The name
1593 MAY differ from the name presented in the client's option
1594 request, and the information item MAY be omitted if the client
1595 option request already used the canonical name. This
1596 information type represents the same name that would appear in
1597 the name portion of an `NBD_REP_SERVER` in response to
1598 `NBD_OPT_LIST`. The *length* MUST be at least 2, and the reply
1599 payload is interpreted as:
1601 - 16 bits, `NBD_INFO_NAME`
1602 - String: name of the export, *length - 2* bytes
1604 * `NBD_INFO_DESCRIPTION` (2)
1606 A description of the export, suitable for direct display to the
1607 human being. This information type represents the same optional
1608 description that may appear after the name portion of an
1609 `NBD_REP_SERVER` in response to `NBD_OPT_LIST`. The *length*
1610 MUST be at least 2, and the reply payload is interpreted as:
1612 - 16 bits, `NBD_INFO_DESCRIPTION`
1613 - String: description of the export, *length - 2* bytes
1615 * `NBD_INFO_BLOCK_SIZE` (3)
1617 Represents the server's advertised size constraints; see the
1618 "Size constraints" section for more details on what these values
1619 represent, and on constraints on their values. The server MUST
1620 send this info if it is requested and it intends to enforce size
1621 constraints other than the defaults. After sending this
1622 information in response to an `NBD_OPT_GO` in which the client
1623 specifically requested `NBD_INFO_BLOCK_SIZE`, the server can
1624 legitimately assume that any client that continues the session
1625 will support the size constraints supplied (note that this
1626 assumption cannot be made solely on the basis of an
1627 `NBD_OPT_INFO` with an `NBD_INFO_BLOCK_SIZE` request, or an
1628 `NBD_OPT_GO` without an explicit `NBD_INFO_BLOCK_SIZE`
1629 request). The *length* MUST be 14, and the reply payload is
1632 - 16 bits, `NBD_INFO_BLOCK_SIZE`
1633 - 32 bits, minimum block size
1634 - 32 bits, preferred block size
1635 - 32 bits, maximum payload size
1637 * `NBD_REP_META_CONTEXT` (4)
1639 A description of a metadata context. Data:
1641 - 32 bits, NBD metadata context ID.
1642 - String, name of the metadata context. This is not required to be
1643 a human-readable string, but it MUST be valid UTF-8 data.
1645 There are a number of error reply types, all of which are denoted by
1646 having bit 31 set. All error replies MAY have some data set, in which
1647 case that data is an error message string suitable for display to the user.
1649 * `NBD_REP_ERR_UNSUP` (2^31 + 1)
1651 The option sent by the client is unknown by this server
1652 implementation (e.g., because the server is too old, or from another
1655 * `NBD_REP_ERR_POLICY` (2^31 + 2)
1657 The option sent by the client is known by this server and
1658 syntactically valid, but server-side policy forbids the server to
1659 allow the option (e.g., the client sent `NBD_OPT_LIST` but server
1660 configuration has that disabled)
1662 * `NBD_REP_ERR_INVALID` (2^31 + 3)
1664 The option sent by the client is known by this server, but was
1665 determined by the server to be syntactically or semantically
1666 invalid. For instance, the client sent an `NBD_OPT_LIST` with
1667 nonzero data length, or the client sent a second
1668 `NBD_OPT_STARTTLS` after TLS was already negotiated.
1670 * `NBD_REP_ERR_PLATFORM` (2^31 + 4)
1672 The option sent by the client is not supported on the platform on
1673 which the server is running, or requires compile-time options that
1674 were disabled, e.g., upon trying to use TLS.
1676 * `NBD_REP_ERR_TLS_REQD` (2^31 + 5)
1678 The server is unwilling to continue negotiation unless TLS is
1679 initiated first. In the case of `NBD_OPT_INFO` and `NBD_OPT_GO`
1680 this unwillingness MAY (depending on the TLS mode) be limited
1681 to the export in question. See the section on TLS above for
1684 * `NBD_REP_ERR_UNKNOWN` (2^31 + 6)
1686 The requested export is not available.
1688 * `NBD_REP_ERR_SHUTDOWN` (2^31 + 7)
1690 The server is unwilling to continue negotiation as it is in the
1691 process of being shut down.
1693 * `NBD_REP_ERR_BLOCK_SIZE_REQD` (2^31 + 8)
1695 The server is unwilling to enter transmission phase for a given
1696 export unless the client first acknowledges (via
1697 `NBD_INFO_BLOCK_SIZE`) that it will obey non-default block sizing
1700 * `NBD_REP_ERR_TOO_BIG` (2^31 + 9)
1702 The request or the reply is too large to process.
1704 * `NBD_REP_ERR_EXT_HEADER_REQD` (2^31 + 10)
1706 Defined by the experimental `EXTENDED_HEADERS`
1707 [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md).
1709 ### Transmission phase
1715 This field of 16 bits is sent by the client with every request and provides
1716 additional information to the server to execute the command. Refer to
1717 the "Request types" section below for more details about how a given flag
1718 affects a particular command. Clients MUST NOT set a command flag bit
1719 that is not documented for the particular command; and whether a flag is
1720 valid may depend on negotiation during the handshake phase.
1722 - bit 0, `NBD_CMD_FLAG_FUA`; This flag is valid for all commands, provided
1723 `NBD_FLAG_SEND_FUA` has been negotiated, in which case the server MUST
1724 accept all commands with this bit set (even by ignoring the bit). The
1725 client SHOULD NOT set this bit unless the command has the potential of
1726 writing data (current commands are `NBD_CMD_WRITE`, `NBD_CMD_WRITE_ZEROES`
1727 and `NBD_CMD_TRIM`), however note that existing clients are known to set this
1728 bit on other commands. Subject to that, and provided `NBD_FLAG_SEND_FUA`
1729 is negotiated, the client MAY set this bit on all, no or some commands
1730 as it wishes (see the section on Ordering of messages and writes for
1731 details). If the server receives a command with `NBD_CMD_FLAG_FUA`
1732 set it MUST NOT send its reply to that command until all write
1733 operations (if any) associated with that command have been
1734 completed and persisted to non-volatile storage. If the command does
1735 not in fact write data (for instance on an `NBD_CMD_TRIM` in a situation
1736 where the command as a whole is ignored), the server MAY ignore this bit
1737 being set on such a command.
1738 - bit 1, `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE_ZEROES`.
1739 SHOULD be set to 1 if the client wants to ensure that the server does
1740 not create a hole. The client MAY send `NBD_CMD_FLAG_NO_HOLE` even
1741 if `NBD_FLAG_SEND_TRIM` was not set in the transmission flags field.
1742 The server MUST support the use of this flag if it advertises
1743 `NBD_FLAG_SEND_WRITE_ZEROES`.
1744 - bit 2, `NBD_CMD_FLAG_DF`; the "don't fragment" flag, valid during
1745 `NBD_CMD_READ`. SHOULD be set to 1 if the client requires the
1746 server to send at most one content chunk in reply. MUST NOT be set
1747 unless the transmission flags include `NBD_FLAG_SEND_DF`. Use of
1748 this flag MAY trigger an `NBD_EOVERFLOW` error chunk, if the request
1749 length is too large.
1750 - bit 3, `NBD_CMD_FLAG_REQ_ONE`; valid during
1751 `NBD_CMD_BLOCK_STATUS`. If set, the client is interested in only one
1752 extent per metadata context. If this flag is present, the server
1753 MUST NOT send metadata on more than one extent in the reply. Client
1754 implementers should note that using this flag on multiple contiguous
1755 requests is likely to be inefficient.
1756 - bit 4, `NBD_CMD_FLAG_FAST_ZERO`; valid during
1757 `NBD_CMD_WRITE_ZEROES`. If set, but the server cannot perform the
1758 write zeroes any faster than it would for an equivalent
1759 `NBD_CMD_WRITE`, then the server MUST fail quickly with an error of
1760 `NBD_ENOTSUP`. The client MUST NOT set this unless the server advertised
1761 `NBD_FLAG_SEND_FAST_ZERO`.
1762 - bit 5, `NBD_CMD_FLAG_PAYLOAD_LEN`; defined by the experimental
1764 [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md).
1766 ##### Structured reply flags
1768 This field of 16 bits is sent by the server as part of every
1771 - bit 0, `NBD_REPLY_FLAG_DONE`; the server MUST clear this bit if
1772 more structured reply chunks will be sent for the same client
1773 request, and MUST set this bit if this is the final reply. This
1774 bit MUST always be set for the `NBD_REPLY_TYPE_NONE` chunk,
1775 although any other chunk type can also be used as the final
1778 The server MUST NOT set any other flags without first negotiating
1779 the extension with the client, unless the client can usefully
1780 react to the response without interpreting the flag (for instance
1781 if the flag is some form of hint). Clients MUST ignore
1784 #### Structured reply types
1786 These values are used in the "type" field of a structured reply. Some
1787 chunk types can additionally be categorized by role, such as *error
1788 chunks* or *content chunks*. Each type determines how to interpret
1789 the "length" bytes of payload. If the client receives an unknown or
1790 unexpected type, other than an *error chunk*, it MUST initiate a hard
1791 disconnect. A server MUST NOT send a chunk where any variable-length
1792 portion of the chunk is larger than any advertised maximum payload
1793 size (however, the overall chunk may exceed the maximum payload by the
1794 small amount of fixed-length overhead inherent in the chunk type).
1796 * `NBD_REPLY_TYPE_NONE` (0)
1798 *length* MUST be 0 (and the payload field omitted). This chunk
1799 type MUST always be used with the `NBD_REPLY_FLAG_DONE` bit set
1800 (that is, it may appear at most once in a structured reply, and
1801 is only useful as the final reply chunk). If no earlier error
1802 chunks were sent, then this type implies that the overall client
1803 request is successful. Valid as a reply to any request.
1805 * `NBD_REPLY_TYPE_OFFSET_DATA` (1)
1807 This chunk type is in the content chunk category. *length* MUST be
1808 at least 9. It represents the contents of *length - 8* bytes of the
1809 export, starting at the absolute *offset* from the start of the
1810 export. The data MUST lie within the bounds of the original offset
1811 and length of the client's request, and MUST NOT overlap with the
1812 bounds of any earlier content chunk or error chunk in the same
1813 reply. This chunk MAY be used more than once in a reply, unless the
1814 `NBD_CMD_FLAG_DF` flag was set. Valid as a reply to `NBD_CMD_READ`.
1816 The payload is structured as:
1818 64 bits: offset (unsigned)
1819 *length - 8* bytes: data
1821 * `NBD_REPLY_TYPE_OFFSET_HOLE` (2)
1823 This chunk type is in the content chunk category. *length* MUST be
1824 exactly 12. It represents that the contents of *hole size* bytes,
1825 starting at the absolute *offset* from the start of the export, read
1826 as all zeroes. The hole MUST lie within the bounds of the original
1827 offset and length of the client's request, and MUST NOT overlap with
1828 the bounds of any earlier content chunk or error chunk in the same
1829 reply. This chunk MAY be used more than once in a reply, unless the
1830 `NBD_CMD_FLAG_DF` flag was set. Valid as a reply to `NBD_CMD_READ`.
1832 The payload is structured as:
1834 64 bits: offset (unsigned)
1835 32 bits: hole size (unsigned, MUST be nonzero)
1837 * `NBD_REPLY_TYPE_BLOCK_STATUS` (5)
1839 *length* MUST be 4 + (a positive integer multiple of 8). This reply
1840 represents a series of consecutive block descriptors where the sum
1841 of the length fields within the descriptors is subject to further
1842 constraints documented below. A successful block status request MUST
1843 have exactly one status chunk per negotiated metadata context ID.
1845 The payload starts with:
1847 32 bits, metadata context ID
1849 and is followed by a list of one or more descriptors, each with this
1852 32 bits, length of the extent to which the status below
1853 applies (unsigned, MUST be nonzero)
1854 32 bits, status flags
1856 If the client used the `NBD_CMD_FLAG_REQ_ONE` flag in the request,
1857 then every reply chunk MUST contain exactly one descriptor, and that
1858 descriptor MUST NOT exceed the *length* of the original request. If
1859 the client did not use the flag, and the server replies with N
1860 extents, then the sum of the *length* fields of the first N-1
1861 extents (if any) MUST be less than the requested length, while the
1862 *length* of the final extent MAY result in a sum larger than the
1863 original requested length, if the server has that information anyway
1864 as a side effect of reporting the status of the requested region.
1865 When multiple metadata contexts are negotiated, the reply chunks for
1866 the different contexts need not have the same number of extents or
1867 cumulative extent length.
1869 Servers SHOULD NOT send more than 2^20 extents in a single reply
1870 chunk; in other words, the size of
1871 `NBD_REPLY_TYPE_BLOCK_STATUS` should not be more than 4 + 8*2^20
1872 (8,388,612 bytes), even if this requires that the server truncate
1873 the response in relation to the *length* requested by the client.
1875 Even if the client did not use the `NBD_CMD_FLAG_REQ_ONE` flag in
1876 its request, the server MAY return fewer descriptors in the reply
1877 than would be required to fully specify the whole range of requested
1878 information to the client, if looking up the information would be
1879 too resource-intensive for the server, so long as at least one
1880 extent is returned. Servers should however be aware that most
1881 client implementations will likely follow up with a request for
1882 extent information at the first offset not covered by a
1883 reduced-length reply.
1885 * `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6)
1887 Defined by the experimental `EXTENDED_HEADERS`
1888 [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md).
1890 All error chunk types have bit 15 set, and begin with the same
1891 *error*, *message length*, and optional *message* fields as
1892 `NBD_REPLY_TYPE_ERROR`. If nonzero, *message length* indicates
1893 that an optional error string message appears next, suitable for
1894 display to a human user. The header *length* then covers any
1895 remaining structured fields at the end.
1897 * `NBD_REPLY_TYPE_ERROR` (2^15 + 1)
1899 This chunk type is in the error chunk category. *length* MUST
1900 be at least 6. This chunk represents that an error occurred,
1901 and the client MAY NOT make any assumptions about partial
1902 success. This type SHOULD NOT be used more than once in a
1903 structured reply. Valid as a reply to any request. Note that
1904 *message length* MUST NOT exceed the 4096 bytes string length
1907 The payload is structured as:
1909 32 bits: error (MUST be nonzero)
1910 16 bits: message length (no more than header *length* - 6)
1911 *message length* bytes: optional string suitable for
1912 direct display to a human being
1914 * `NBD_REPLY_TYPE_ERROR_OFFSET` (2^15 + 2)
1916 This chunk type is in the error chunk category. *length* MUST
1917 be at least 14. This reply represents that an error occurred at
1918 a given offset, which MUST lie within the original offset and
1919 length of the request; the client can use this offset to
1920 determine if request had any partial success. This chunk type
1921 MAY appear multiple times in a structured reply, although the
1922 same offset SHOULD NOT be repeated. Likewise, if content chunks
1923 were sent earlier in the structured reply, the server SHOULD NOT
1924 send multiple distinct offsets that lie within the bounds of a
1925 single content chunk. Valid as a reply to `NBD_CMD_READ`,
1926 `NBD_CMD_WRITE`, `NBD_CMD_TRIM`, `NBD_CMD_CACHE`,
1927 `NBD_CMD_WRITE_ZEROES`, and `NBD_CMD_BLOCK_STATUS`.
1929 The payload is structured as:
1931 32 bits: error (MUST be nonzero)
1932 16 bits: message length (no more than header *length* - 14)
1933 *message length* bytes: optional string suitable for
1934 direct display to a human being
1935 64 bits: offset (unsigned)
1937 If the client receives an unknown or unexpected type with bit 15
1938 set, it MUST consider the current reply as errored, but MAY
1939 continue transmission unless it detects that *message length* is
1940 too large to fit within the *length* specified by the header. For
1941 all other messages with unknown or unexpected type or inconsistent
1942 contents, the client MUST initiate a hard disconnect.
1946 The following request types exist:
1948 * `NBD_CMD_READ` (0)
1950 A read request. Length and offset define the data to be read. The
1951 server MUST reply with either a simple reply or a structured
1952 reply, according to whether the structured replies have been
1953 negotiated using `NBD_OPT_STRUCTURED_REPLY`. The client SHOULD NOT
1954 request a read length of 0; the behavior of a server on such a
1955 request is unspecified although the server SHOULD NOT disconnect.
1959 If structured replies were not negotiated, then a read request
1960 MUST always be answered by a simple reply, as documented above
1961 (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing
1962 length bytes of data according to the client's request), which in
1963 turn means any client request with a length larger than the
1964 maximum payload size will fail.
1966 If an error occurs, the server SHOULD set the appropriate error code
1967 in the error field. The server MAY then initiate a hard disconnect.
1968 If it chooses not to, it MUST NOT send any payload for this request.
1970 If an error occurs while reading after the server has already sent
1971 out the reply header with an error field set to zero (i.e.,
1972 signalling no error), the server MUST immediately initiate a
1973 hard disconnect; it MUST NOT send any further data to the client.
1975 *Structured replies*
1977 If structured replies are negotiated, then a read request MUST
1978 result in a structured reply with one or more chunks (each using
1979 magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), where the final
1980 chunk has the flag `NBD_REPLY_FLAG_DONE`, and with the following
1981 additional constraints.
1983 The server MAY split the reply into any number of content chunks;
1984 each chunk MUST describe at least one byte, although to minimize
1985 overhead, the server SHOULD use chunks with lengths and offsets as
1986 an integer multiple of 512 bytes, where possible (the first and
1987 last chunk of an unaligned read being the most obvious places for
1988 an exception). The server MUST NOT send content chunks that
1989 overlap with any earlier content or error chunk, and MUST NOT send
1990 chunks that describe data outside the offset and length of the
1991 request, but MAY send the content chunks in any order (the client
1992 MUST reassemble content chunks into the correct order), and MAY
1993 send additional content chunks even after reporting an error
1994 chunk. A server MAY support read requests larger than the maximum
1995 payload size by splitting the response across multiple chunks (in
1996 particular, a request for more than 2^32 - 8 bytes containing data
1997 rather than holes MUST be split to avoid overflowing the 32-bit
1998 `NBD_REPLY_TYPE_OFFSET_DATA` length field); however, the server is
1999 also permitted to reject large read requests up front with
2000 `NBD_EOVERFLOW`, so a client should be prepared to retry with
2001 smaller requests if a large request fails.
2003 When no error is detected, the server MUST send enough data chunks
2004 to cover the entire region described by the offset and length of
2005 the client's request.
2007 To minimize traffic, the server MAY use a content or error chunk
2008 as the final chunk by setting the `NBD_REPLY_FLAG_DONE` flag, but
2009 MUST NOT do so for a content chunk if it would still be possible
2010 to detect an error while transmitting the chunk. The
2011 `NBD_REPLY_TYPE_NONE` chunk is always acceptable as the final
2014 If an error is detected, the server MUST still complete the
2015 transmission of any current chunk (it MUST use padding bytes which
2016 SHOULD be zero, for any remaining data portion of a chunk with
2017 type `NBD_REPLY_TYPE_OFFSET_DATA`), but MAY omit further content
2018 chunks. The server MUST include an error chunk as one of the
2019 subsequent chunks, but MAY defer the error reporting behind other
2020 queued chunks. An error chunk of type `NBD_REPLY_TYPE_ERROR`
2021 implies that the client MAY NOT make any assumptions about
2022 validity of data chunks (whether sent before or after the error
2023 chunk), and if used, SHOULD be the only error chunk in the reply.
2024 On the other hand, an error chunk of type
2025 `NBD_REPLY_TYPE_ERROR_OFFSET` gives fine-grained information about
2026 which earlier data chunk(s) encountered a failure; as such, a
2027 server MAY still usefully follow it with further non-overlapping
2028 content chunks or with error offsets for other content chunks.
2029 The server MAY send an error chunk with no corresponding content
2030 chunk, but MUST ensure that the content chunk is sent first if a
2031 content and error chunk cover the same offset. Generally, a
2032 server SHOULD NOT mix errors with offsets with a generic error.
2033 As long as all errors are accompanied by offsets, the client MAY
2034 assume that any data chunks with no subsequent error offset are
2035 valid, that chunks with an overlapping error offset errors are
2036 valid up until the reported offset, and that portions of the read
2037 that do not have a corresponding content chunk are not valid.
2039 A client MAY initiate a hard disconnect if it detects that the server
2040 has sent invalid chunks (such as overlapping data, or not enough
2041 data before claiming success).
2043 In order to avoid the burden of reassembly, the client MAY set the
2044 `NBD_CMD_FLAG_DF` flag ("don't fragment"). If this flag is set,
2045 the server MUST send at most one content chunk, although it MAY
2046 still send multiple chunks (the remaining chunks would be error
2047 chunks or a final type of `NBD_REPLY_TYPE_NONE`). If the area
2048 being read contains both data and a hole, the server MUST use
2049 `NBD_REPLY_TYPE_OFFSET_DATA` with the zeroes explicitly present.
2050 A server MAY reject a client's request with the error `NBD_EOVERFLOW`
2051 if the length is too large to send without fragmentation, in which
2052 case it MUST NOT send a content chunk; however, the server MUST
2053 support unfragmented reads in which the client's request length
2054 does not exceed 65,536 bytes.
2056 * `NBD_CMD_WRITE` (1)
2058 A write request. Length and offset define the location and amount of
2059 data to be written. The client MUST follow the request header with
2060 *length* number of bytes to be written to the device. The client
2061 SHOULD NOT request a write length of 0; the behavior of a server on
2062 such a request is unspecified although the server SHOULD NOT
2065 The server MUST write the data to disk, and then send the reply
2066 message. The server MAY send the reply message before the data has
2067 reached permanent storage, unless `NBD_CMD_FLAG_FUA` is in use.
2069 If an error occurs, the server MUST set the appropriate error code
2072 * `NBD_CMD_DISC` (2)
2074 A disconnect request. The server MUST handle all outstanding
2075 requests, shut down the TLS session (if one is running), and
2076 close the TCP session. A client MUST NOT send
2077 anything to the server after sending an `NBD_CMD_DISC` command.
2079 The values of the length and offset fields in a disconnect request
2082 There is no reply to an `NBD_CMD_DISC`.
2084 * `NBD_CMD_FLUSH` (3)
2086 A flush request. The server MUST NOT send a
2087 successful reply header for this request before all write requests
2088 for which a reply has already been sent to the client have reached
2089 permanent storage (using fsync() or similar).
2091 A client MUST NOT send a flush request unless `NBD_FLAG_SEND_FLUSH`
2092 was set in the transmission flags field.
2094 For a flush request, *length* and *offset* are reserved, and MUST be
2097 * `NBD_CMD_TRIM` (4)
2099 A hint to the server that the data defined by length and offset is
2100 no longer needed. A server MAY discard *length* bytes starting at
2101 offset, but is not required to; and MAY round *offset* up and
2102 *length* down to meet internal alignment constraints so that only
2103 a portion of the client's request is actually discarded. The
2104 client SHOULD NOT request a trim length of 0; the behavior of a
2105 server on such a request is unspecified although the server SHOULD
2108 After issuing this command, a client MUST NOT make any assumptions
2109 about the contents of the export affected by this command, until
2110 overwriting it again with `NBD_CMD_WRITE` or `NBD_CMD_WRITE_ZEROES`.
2112 A client MUST NOT send a trim request unless `NBD_FLAG_SEND_TRIM`
2113 was set in the transmission flags field.
2115 * `NBD_CMD_CACHE` (5)
2117 A cache request. The client is informing the server that it plans
2118 to access the area specified by *offset* and *length*. The server
2119 MAY use this information to speed up further access to that area
2120 (for example, by performing the actions of `NBD_CMD_READ` but
2121 replying with just status instead of a payload, by using
2122 posix_fadvise(), or by retrieving remote data into a local cache
2123 so that future reads and unaligned writes to that region are
2124 faster). However, it is unspecified what the server's actual
2125 caching mechanism is (if any), whether there is a limit on how
2126 much can be cached at once, and whether writes to a cached region
2127 have write-through or write-back semantics. Thus, even when this
2128 command reports success, there is no guarantee of an actual
2129 performance gain. A future version of this standard may add
2130 command flags to request particular caching behaviors, where a
2131 server would reply with an error if that behavior cannot be
2134 If an error occurs, the server MUST set the appropriate error code
2135 in the error field. However failure on this operation does not
2136 imply that further read and write requests on this area will fail,
2137 and, other than any difference in performance, there MUST NOT be
2138 any difference in semantics compared to if the client had not used
2139 this command. When no command flags are in use, the server MAY
2140 send a reply prior to the requested area being fully cached.
2142 Note that client implementations exist which attempt to send a
2143 cache request even when `NBD_FLAG_SEND_CACHE` was not set in the
2144 transmission flags field, however, these implementations do not
2145 use any command flags. A server MAY advertise
2146 `NBD_FLAG_SEND_CACHE` even if the command has no effect or always
2147 fails with `NBD_EINVAL`; however, if it advertised the command, the
2148 server MUST reject any command flags it does not recognize.
2150 * `NBD_CMD_WRITE_ZEROES` (6)
2152 A write request with no payload. *Offset* and *length* define the
2153 location and amount of data to be zeroed. The client SHOULD NOT
2154 request a write length of 0; the behavior of a server on such a
2155 request is unspecified although the server SHOULD NOT disconnect.
2157 The server MUST zero out the data on disk, and then send the reply
2158 message. The server MAY send the reply message before the data has
2159 reached permanent storage, unless `NBD_CMD_FLAG_FUA` is in use.
2161 A client MUST NOT send a write zeroes request unless
2162 `NBD_FLAG_SEND_WRITE_ZEROES` was set in the transmission flags
2163 field. Additionally, a client MUST NOT send the
2164 `NBD_CMD_FLAG_FAST_ZERO` flag unless `NBD_FLAG_SEND_FAST_ZERO` was
2165 set in the transmission flags field.
2167 By default, the server MAY use trimming to zero out the area, even
2168 if it did not advertise `NBD_FLAG_SEND_TRIM`; but it MUST ensure
2169 that the data reads back as zero. However, the client MAY set the
2170 command flag `NBD_CMD_FLAG_NO_HOLE` to inform the server that the
2171 area MUST be fully provisioned, ensuring that future writes to the
2172 same area will not cause fragmentation or cause failure due to
2175 If the server advertised `NBD_FLAG_SEND_FAST_ZERO` but
2176 `NBD_CMD_FLAG_FAST_ZERO` is not set, then the server MUST NOT fail
2177 with `NBD_ENOTSUP`, even if the operation is no faster than a
2178 corresponding `NBD_CMD_WRITE`. Conversely, if `NBD_CMD_FLAG_FAST_ZERO`
2179 is set, the server SHOULD NOT fail with `NBD_EOVERFLOW` regardless of
2180 the client length, MUST fail quickly with `NBD_ENOTSUP` unless the
2181 request can be serviced in less time than a corresponding
2182 `NBD_CMD_WRITE`, and SHOULD NOT alter the contents of the export when
2183 returning an `NBD_ENOTSUP` failure. The server's
2184 determination on whether to fail a fast request MAY depend on a
2185 number of factors, such as whether the request was suitably
2186 aligned, on whether the `NBD_CMD_FLAG_NO_HOLE` flag was present,
2187 or even on whether a previous `NBD_CMD_TRIM` had been performed on
2188 the region. If the server did not advertise
2189 `NBD_FLAG_SEND_FAST_ZERO`, then it SHOULD NOT fail with
2190 `NBD_ENOTSUP`, regardless of the speed of servicing a request, and
2191 SHOULD fail with `NBD_EINVAL` if the `NBD_CMD_FLAG_FAST_ZERO` flag
2192 was set. A server MAY advertise `NBD_FLAG_SEND_FAST_ZERO` whether
2193 or not it will actually succeed on a fast zero request (a fast
2194 failure of `NBD_ENOTSUP` still counts as a fast response);
2195 similarly, a server SHOULD fail a fast zero request with
2196 `NBD_ENOTSUP` if the server cannot quickly determine in advance
2197 whether proceeding with the request would be fast, even if it
2198 turns out that the same request without the flag would be fast
2201 One intended use of a fast zero request is optimizing the copying
2202 of a sparse image source into the export: a client can request
2203 fast zeroing of the entire export, and if it succeeds, follow that
2204 with write requests to just the data portions before a single
2205 flush of the entire image, for fewer transactions overall. On the
2206 other hand, if the fast zero request fails, the fast failure lets
2207 the client know that it must manually write zeroes corresponding
2208 to the holes of the source image before a final flush, for more
2209 transactions but with no time lost to duplicated I/O to the data
2210 portions. Knowing this usage pattern can help decide whether a
2211 server's implementation for writing zeroes counts as fast (for
2212 example, a successful fast zero request may start a background
2213 operation that would cause the next flush request to take longer,
2214 but that is okay as long as intermediate writes before that flush
2215 do not further lengthen the time spent on the overall sequence of
2218 If an error occurs, the server MUST set the appropriate error code
2221 The server SHOULD return `NBD_ENOSPC` if it receives a write zeroes request
2222 including one or more sectors beyond the size of the device. It SHOULD
2223 return `NBD_EPERM` if it receives a write zeroes request on a read-only export.
2225 * `NBD_CMD_BLOCK_STATUS` (7)
2227 A block status query request. Length and offset define the range
2228 of interest. The client SHOULD NOT request a status length of 0;
2229 the behavior of a server on such a request is unspecified although
2230 the server SHOULD NOT disconnect.
2232 A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the
2233 negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least
2234 once, and where the final time that was sent, it referred to the
2235 same export name used to enter transmission phase, and where the
2236 server returned at least one metadata context without an error.
2237 This in turn requires the client to first negotiate structured
2238 replies. For a successful return, the server MUST use a structured
2239 reply, containing exactly one chunk of type
2240 `NBD_REPLY_TYPE_BLOCK_STATUS` per selected context id, where the
2241 status field of each descriptor is determined by the flags field
2242 as defined by the metadata context. The server MAY send chunks in
2243 a different order than the context ids were assigned in reply to
2244 `NBD_OPT_SET_META_CONTEXT`.
2246 The list of block status descriptors within the
2247 `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
2248 of the export starting from specified *offset*. If the client used
2249 the `NBD_CMD_FLAG_REQ_ONE` flag, each chunk contains exactly one
2250 descriptor where the *length* of the descriptor MUST NOT be
2251 greater than the *length* of the request; otherwise, a chunk MAY
2252 contain multiple descriptors, and the final descriptor MAY extend
2253 beyond the original requested size if the server can determine a
2254 larger length without additional effort. On the other hand, the
2255 server MAY return less data than requested. In particular, a
2256 server SHOULD NOT send more than 2^20 status descriptors in a
2257 single chunk. However the server MUST return at least one status
2258 descriptor, and since each status descriptor has a non-zero
2259 length, a client can always make progress on a successful return.
2261 The server SHOULD use different *status* values between
2262 consecutive descriptors where feasible, although the client SHOULD
2263 be prepared to handle consecutive descriptors with the same
2264 *status* value. The server SHOULD use descriptor lengths that are
2265 an integer multiple of 512 bytes where possible (the first and
2266 last descriptor of an unaligned query being the most obvious
2267 places for an exception), in part to avoid an amplification effect
2268 where a series of smaller descriptors can cause the server's reply
2269 to occupy more bytes than the *length* of the client's request.
2270 The server MUST use descriptor lengths that are an integer
2271 multiple of any advertised minimum block size. The status flags
2272 are intentionally defined so that a server MAY always safely
2273 report a status of 0 for any block, although the server SHOULD
2274 return additional status values when they can be easily detected.
2276 If an error occurs, the server SHOULD set the appropriate error
2277 code in the error field of an error chunk. However, if the error
2278 does not involve invalid usage (such as a request beyond the
2279 bounds of the export), a server MAY reply with a single block
2280 status descriptor with *length* matching the requested length,
2281 rather than reporting the error; in this case the context MAY
2282 mandate the status returned.
2284 A client MAY initiate a hard disconnect if it detects that the
2285 server has sent an invalid chunk. The server SHOULD return
2286 `NBD_EINVAL` if it receives a `NBD_CMD_BLOCK_STATUS` request including
2287 one or more sectors beyond the size of the device.
2289 * `NBD_CMD_RESIZE` (8)
2291 Defined by the experimental `RESIZE`
2292 [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-resize/doc/proto.md).
2296 Some third-party implementations may require additional protocol
2297 messages which are not described in this document. In the interest of
2298 interoperability, authors of such implementations SHOULD contact the
2299 maintainer of this document, so that these messages can be listed here
2300 to avoid conflicting implementations.
2304 The error values are used for the error field in the reply message.
2305 Originally, error messages were defined as the value of `errno` on the
2306 system running the server; however, although they happen to have similar
2307 values on most systems, these values are in fact not well-defined, and
2308 therefore not entirely portable.
2310 Therefore, the allowed values for the error field have been restricted
2311 to set of possibilities. To remain intelligible with older clients, the
2312 most common values of `errno` for that particular error has been chosen
2313 as the value for an error.
2315 The following error values are defined:
2317 * `NBD_EPERM` (1), Operation not permitted.
2318 * `NBD_EIO` (5), Input/output error.
2319 * `NBD_ENOMEM` (12), Cannot allocate memory.
2320 * `NBD_EINVAL` (22), Invalid argument.
2321 * `NBD_ENOSPC` (28), No space left on device.
2322 * `NBD_EOVERFLOW` (75), Value too large.
2323 * `NBD_ENOTSUP` (95), Operation not supported.
2324 * `NBD_ESHUTDOWN` (108), Server is in the process of being shut down.
2326 The server SHOULD return `NBD_ENOSPC` if it receives a write request
2327 including one or more sectors beyond the size of the device. It also
2328 SHOULD map the `EDQUOT` and `EFBIG` errors to `NBD_ENOSPC`. It SHOULD
2329 return `NBD_EINVAL` if it receives a read or trim request including one or
2330 more sectors beyond the size of the device, or if a read or write
2331 request is not aligned to advertised minimum block sizes. Finally, it
2332 SHOULD return `NBD_EPERM` if it receives a write or trim request on a
2335 The server SHOULD NOT return `NBD_EOVERFLOW` except as documented in
2336 response to `NBD_CMD_READ` when `NBD_CMD_FLAG_DF` is supported, or when
2337 a command without payload requests a length larger than an advertised
2338 maximum payload length.
2340 The server SHOULD NOT return `NBD_ENOTSUP` except as documented in
2341 response to `NBD_CMD_WRITE_ZEROES` when `NBD_CMD_FLAG_FAST_ZERO` is
2344 The server SHOULD return `NBD_EINVAL` if it receives an unknown command.
2346 The server SHOULD return `NBD_EINVAL` if it receives an unknown
2347 command flag. It also SHOULD return `NBD_EINVAL` if it receives a
2348 request with a flag not explicitly documented as applicable to the
2351 Which error to return in any other case is not specified by the NBD
2354 The server SHOULD NOT return `NBD_ENOMEM` if at all possible.
2356 The client SHOULD treat an unexpected error value as if it had been
2357 `NBD_EINVAL`, rather than disconnecting from the server.
2359 ## Experimental extensions
2361 In addition to the normative elements of the specification set out
2362 herein, various experimental non-normative extensions have been
2363 proposed. These may not be implemented in any known server or client,
2364 and are subject to change at any point. A full implementation may
2365 require changes to the specifications, or cause the specifications to
2366 be withdrawn altogether.
2368 These experimental extensions are set out in git branches starting
2369 with names starting with the word 'extension'.
2371 Currently known are:
2373 * The `EXTENDED_HEADER` [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md).
2375 * The `RESIZE` [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-resize/doc/proto.md).
2377 Implementers of these extensions are strongly suggested to contact the
2378 [mailinglist](mailto:nbd@other.debian.org) in order to help
2379 fine-tune the specifications before committing to a particular
2382 Those proposing further extensions should also contact the
2383 [mailinglist](mailto:nbd@other.debian.org). It is
2384 possible to reserve command codes etc. within this document
2385 for such proposed extensions. Aside from that, extensions are
2386 written as branches which can be merged into master if and
2387 when those extensions are promoted to the normative version
2388 of the document in the master branch.
2390 ## Compatibility and interoperability
2392 Originally, the NBD protocol was a fairly simple protocol with few
2393 options. While the basic protocol is still reasonably simple, a growing
2394 number of extensions has been implemented that may make the protocol
2395 description seem overwhelming at first.
2397 In an effort to not overwhelm first-time implementers with various
2398 options and features that may or may not be important for their use
2399 case, while at the same time desiring maximum interoperability, this
2400 section tries to clarify what is optional and what is expected to be
2401 available in all implementations.
2403 All protocol options and messages not explicitly mentioned below should
2404 be considered optional features that MAY be negotiated between client
2405 and server, but are not required to be available.
2409 The following MUST be implemented by all implementations, and should be
2410 considered a baseline:
2413 - The fixed newstyle handshake
2414 - During the handshake:
2416 - the `NBD_OPT_INFO` and `NBD_OPT_GO` messages, with the
2417 `NBD_INFO_EXPORT` response.
2418 - Servers that receive messages which they do not implement MUST
2419 reply to them with `NBD_REP_ERR_UNSUP`, and MUST NOT fail to parse
2420 the next message received.
2421 - the `NBD_OPT_ABORT` message, and its response.
2422 - the `NBD_OPT_LIST` message and its response.
2424 - During the transmission phase:
2427 - the `NBD_CMD_READ` message (and its response)
2428 - the `NBD_CMD_WRITE` message (and its response), unless the
2429 implementation is a client that does not wish to write
2430 - the `NBD_CMD_DISC` message (and its resulting effects, although
2431 no response is involved)
2433 Clients that wish to use more messages MUST negotiate them during the
2434 handshake phase, first.
2436 ### Maximum interoperability
2438 Clients and servers that desire maximum interoperability SHOULD
2439 implement the following features:
2441 - TLS-encrypted communication, which may be required by some
2442 implementations or configurations;
2443 - Servers that implement size constraints through
2444 `NBD_INFO_BLOCK_SIZE` and desire maximum interoperability SHOULD NOT
2445 require them. Similarly, clients that desire maximum
2446 interoperability SHOULD implement querying for size
2447 constraints. Since some clients default to a block size of 512
2448 bytes, implementations desiring maximum interoperability MAY default
2449 to that size. Clients that do not implement querying for size
2450 constraints SHOULD abide by the rules laid out in the section "Size
2451 constraints", above.
2452 - Clients or servers that desire interoperability with older
2453 implementations SHOULD implement the `NBD_OPT_EXPORT_NAME` message in
2454 addition to `NBD_OPT_INFO` and `NBD_OPT_GO`.
2455 - For data safety, implementing `NBD_CMD_FLUSH` and the
2456 `NBD_CMD_FLAG_FUA` flag to `NBD_CMD_WRITE` is strongly recommended.
2458 ### Future considerations
2460 The following may be moved to the "Maximum interoperability" or
2461 "Baseline" sections at some point in the future, but some significant
2462 implementations are not yet ready to support them:
2464 - Structured replies; the Linux kernel currently does not yet implement
2469 This file tries to document the NBD protocol as it is currently
2470 implemented in the Linux kernel and in the reference implementation. The
2471 purpose of this file is to allow people to understand the protocol
2472 without having to read the code. However, the description above does not
2473 come with any form of warranty; while every effort has been taken to
2474 avoid them, mistakes are possible.
2476 In contrast to the other files in this repository, this file is not
2477 licensed under the GPLv2. To the extent possible by applicable law, I
2478 hereby waive all copyright and related or neighboring rights to this
2479 file and release it into the public domain.
2481 The purpose of releasing this into the public domain is to allow
2482 competing implementations of the NBD protocol without those
2483 implementations being considered derivative implementations; but please
2484 note that changing this document, while allowed by its public domain
2485 status, does not make an incompatible implementation suddenly speak the