7 DNSOP Working Group Paul Vixie, ISC
8 INTERNET-DRAFT Akira Kato, WIDE
9 <draft-ietf-dnsop-respsize-06.txt> August 2006
11 DNS Referral Response Size Issues
14 By submitting this Internet-Draft, each author represents that any
15 applicable patent or other IPR claims of which he or she is aware
16 have been or will be disclosed, and any of which he or she becomes
17 aware will be disclosed, in accordance with Section 6 of BCP 79.
19 Internet-Drafts are working documents of the Internet Engineering
20 Task Force (IETF), its areas, and its working groups. Note that
21 other groups may also distribute working documents as Internet-
24 Internet-Drafts are draft documents valid for a maximum of six months
25 and may be updated, replaced, or obsoleted by other documents at any
26 time. It is inappropriate to use Internet-Drafts as reference
27 material or to cite them other than as "work in progress."
29 The list of current Internet-Drafts can be accessed at
30 http://www.ietf.org/ietf/1id-abstracts.txt
32 The list of Internet-Draft Shadow Directories can be accessed at
33 http://www.ietf.org/shadow.html.
37 Copyright (C) The Internet Society (2006). All Rights Reserved.
44 With a mandated default minimum maximum message size of 512 octets,
45 the DNS protocol presents some special problems for zones wishing to
46 expose a moderate or high number of authority servers (NS RRs). This
47 document explains the operational issues caused by, or related to
48 this response size limit, and suggests ways to optimize the use of
49 this limited space. Guidance is offered to DNS server implementors
50 and to DNS zone operators.
55 Expires January 2007 [Page 1]
57 INTERNET-DRAFT August 2006 RESPSIZE
60 1 - Introduction and Overview
62 1.1. The DNS standard (see [RFC1035 4.2.1]) limits message size to 512
63 octets. Even though this limitation was due to the required minimum IP
64 reassembly limit for IPv4, it became a hard DNS protocol limit and is
65 not implicitly relaxed by changes in transport, for example to IPv6.
67 1.2. The EDNS0 protocol extension (see [RFC2671 2.3, 4.5]) permits
68 larger responses by mutual agreement of the requester and responder.
69 The 512 octet message size limit will remain in practical effect until
70 there is widespread deployment of EDNS0 in DNS resolvers on the
73 1.3. Since DNS responses include a copy of the request, the space
74 available for response data is somewhat less than the full 512 octets.
75 Negative responses are quite small, but for positive and delegation
76 responses, every octet must be carefully and sparingly allocated. This
77 document specifically addresses delegation response sizes.
79 2 - Delegation Details
81 2.1. RELEVANT PROTOCOL ELEMENTS
83 2.1.1. A delegation response will include the following elements:
85 Header Section: fixed length (12 octets)
86 Question Section: original query (name, class, type)
87 Answer Section: empty, or a CNAME/DNAME chain
88 Authority Section: NS RRset (nameserver names)
89 Additional Section: A and AAAA RRsets (nameserver addresses)
91 2.1.2. If the total response size exceeds 512 octets, and if the data
92 that does not fit was "required", then the TC bit will be set
93 (indicating truncation). This will usually cause the requester to retry
94 using TCP, depending on what information was desired and what
95 information was omitted. For example, truncation in the authority
96 section is of no interest to a stub resolver who only plans to consume
97 the answer section. If a retry using TCP is needed, the total cost of
98 the transaction is much higher. See [RFC1123 6.1.3.2] for details on
99 the requirement that UDP be attempted before falling back to TCP.
101 2.1.3. RRsets are never sent partially unless TC bit set to indicate
102 truncation. When TC bit is set, the final apparent RRset in the final
103 non-empty section must be considered "possibly damaged" (see [RFC1035
108 Expires January 2007 [Page 2]
110 INTERNET-DRAFT August 2006 RESPSIZE
113 2.1.4. With or without truncation, the glue present in the additional
114 data section should be considered "possibly incomplete", and requesters
115 should be prepared to re-query for any damaged or missing RRsets. Note
116 that truncation of the additional data section might not be signalled
117 via the TC bit since additional data is often optional (see discussion
120 2.1.5. DNS label compression allows a domain name to be instantiated
121 only once per DNS message, and then referenced with a two-octet
122 "pointer" from other locations in that same DNS message (see [RFC1035
123 4.1.4]). If all nameserver names in a message share a common parent
124 (for example, all ending in ".ROOT-SERVERS.NET"), then more space will
125 be available for incompressable data (such as nameserver addresses).
127 2.1.6. The query name can be as long as 255 octets of network data. In
128 this worst case scenario, the question section will be 259 octets in
129 size, which would leave only 240 octets for the authority and additional
130 sections (after deducting 12 octets for the fixed length header.)
132 2.2. ADVICE TO ZONE OWNERS
134 2.2.1. Average and maximum question section sizes can be predicted by
135 the zone owner, since they will know what names actually exist, and can
136 measure which ones are queried for most often. Note that if the zone
137 contains any wildcards, it is possible for maximum length queries to
138 require positive responses, but that it is reasonable to expect
139 truncation and TCP retry in that case. For cost and performance
140 reasons, the majority of requests should be satisfied without truncation
143 2.2.2. Some queries to non-existing names can be large, but this is not
144 a problem because negative responses need not contain any answer,
145 authority or additional records. See [RFC2308 2.1] for more information
146 about the format of negative responses.
148 2.2.3. The minimum useful number of name servers is two, for redundancy
149 (see [RFC1034 4.1]). A zone's name servers should be reachable by all
150 IP transport protocols (e.g., IPv4 and IPv6) in common use.
152 2.2.4. The best case is no truncation at all. This is because many
153 requesters will retry using TCP immediately, or will automatically re-
154 query for RRsets that are possibly truncated, without considering
155 whether the omitted data was actually necessary.
161 Expires January 2007 [Page 3]
163 INTERNET-DRAFT August 2006 RESPSIZE
166 2.3. ADVICE TO SERVER IMPLEMENTORS
168 2.3.1. In case of multi-homed name servers, it is advantageous to
169 include an address record from each of several name servers before
170 including several address records for any one name server. If address
171 records for more than one transport (for example, A and AAAA) are
172 available, then it is advantageous to include records of both types
173 early on, before the message is full.
175 2.3.2. Each added NS RR for a zone will add 12 fixed octets (name, type,
176 class, ttl, and rdlen) plus 2 to 255 variable octets (for the NSDNAME).
177 Each A RR will require 16 octets, and each AAAA RR will require 28
180 2.3.3. While DNS distinguishes between necessary and optional resource
181 records, this distinction is according to protocol elements necessary to
182 signify facts, and takes no official notice of protocol content
183 necessary to ensure correct operation. For example, a nameserver name
184 that is in or below the zone cut being described by a delegation is
185 "necessary content," since there is no way to reach that zone unless the
186 parent zone's delegation includes "glue records" describing that name
189 2.3.4. It is also necessary to distinguish between "explicit truncation"
190 where a message could not contain enough records to convey its intended
191 meaning, and so the TC bit has been set, and "silent truncation", where
192 the message was not large enough to contain some records which were "not
193 required", and so the TC bit was not set.
195 2.3.5. A delegation response should prioritize glue records as follows.
198 All glue RRsets for one name server whose name is in or below the
199 zone being delegated, or which has multiple address RRsets (currently
200 A and AAAA), or preferably both;
203 Alternate between adding all glue RRsets for any name servers whose
204 names are in or below the zone being delegated, and all glue RRsets
205 for any name servers who have multiple address RRsets (currently A
209 All other glue RRsets, in any order.
214 Expires January 2007 [Page 4]
216 INTERNET-DRAFT August 2006 RESPSIZE
219 Whenever there are multiple candidates for a position in this priority
220 scheme, one should be chosen on a round-robin or fully random basis.
222 The goal of this priority scheme is to offer "necessary" glue first,
223 avoiding silent truncation for this glue if possible.
225 2.3.6. If any "necessary content" is silently truncated, then it is
226 advisable that the TC bit be set in order to force a TCP retry, rather
227 than have the zone be unreachable. Note that a parent server's proper
228 response to a query for in-child glue or below-child glue is a referral
229 rather than an answer, and that this referral MUST be able to contain
230 the in-child or below-child glue, and that in outlying cases, only EDNS
231 or TCP will be large enough to contain that data.
235 3.1. An instrumented protocol trace of a best case delegation response
236 follows. Note that 13 servers are named, and 13 addresses are given.
237 This query was artificially designed to exactly reach the 512 octet
240 ;; flags: qr rd; QUERY: 1, ANS: 0, AUTH: 13, ADDIT: 13
242 ;; [23456789.123456789.123456789.\
243 123456789.123456789.123456789.com A IN] ;; @80
245 ;; AUTHORITY SECTION:
246 com. 86400 NS E.GTLD-SERVERS.NET. ;; @112
247 com. 86400 NS F.GTLD-SERVERS.NET. ;; @128
248 com. 86400 NS G.GTLD-SERVERS.NET. ;; @144
249 com. 86400 NS H.GTLD-SERVERS.NET. ;; @160
250 com. 86400 NS I.GTLD-SERVERS.NET. ;; @176
251 com. 86400 NS J.GTLD-SERVERS.NET. ;; @192
252 com. 86400 NS K.GTLD-SERVERS.NET. ;; @208
253 com. 86400 NS L.GTLD-SERVERS.NET. ;; @224
254 com. 86400 NS M.GTLD-SERVERS.NET. ;; @240
255 com. 86400 NS A.GTLD-SERVERS.NET. ;; @256
256 com. 86400 NS B.GTLD-SERVERS.NET. ;; @272
257 com. 86400 NS C.GTLD-SERVERS.NET. ;; @288
258 com. 86400 NS D.GTLD-SERVERS.NET. ;; @304
267 Expires January 2007 [Page 5]
269 INTERNET-DRAFT August 2006 RESPSIZE
272 ;; ADDITIONAL SECTION:
273 A.GTLD-SERVERS.NET. 86400 A 192.5.6.30 ;; @320
274 B.GTLD-SERVERS.NET. 86400 A 192.33.14.30 ;; @336
275 C.GTLD-SERVERS.NET. 86400 A 192.26.92.30 ;; @352
276 D.GTLD-SERVERS.NET. 86400 A 192.31.80.30 ;; @368
277 E.GTLD-SERVERS.NET. 86400 A 192.12.94.30 ;; @384
278 F.GTLD-SERVERS.NET. 86400 A 192.35.51.30 ;; @400
279 G.GTLD-SERVERS.NET. 86400 A 192.42.93.30 ;; @416
280 H.GTLD-SERVERS.NET. 86400 A 192.54.112.30 ;; @432
281 I.GTLD-SERVERS.NET. 86400 A 192.43.172.30 ;; @448
282 J.GTLD-SERVERS.NET. 86400 A 192.48.79.30 ;; @464
283 K.GTLD-SERVERS.NET. 86400 A 192.52.178.30 ;; @480
284 L.GTLD-SERVERS.NET. 86400 A 192.41.162.30 ;; @496
285 M.GTLD-SERVERS.NET. 86400 A 192.55.83.30 ;; @512
287 ;; MSG SIZE sent: 80 rcvd: 512
289 3.2. For longer query names, the number of address records supplied will
290 be lower. Furthermore, it is only by using a common parent name (which
291 is GTLD-SERVERS.NET in this example) that all 13 addresses are able to
292 fit, due to the use of DNS compression pointers in the last 12
293 occurances of the parent domain name. The following output from a
294 response simulator demonstrates these properties.
296 % perl respsize.pl a.dns.br b.dns.br c.dns.br d.dns.br
297 a.dns.br requires 10 bytes
298 b.dns.br requires 4 bytes
299 c.dns.br requires 4 bytes
300 d.dns.br requires 4 bytes
302 For maximum size query (255 byte):
303 only A is considered: # of A is 4 (green)
304 A and AAAA are considered: # of A+AAAA is 3 (yellow)
305 preferred-glue A is assumed: # of A is 4, # of AAAA is 3 (yellow)
306 For average size query (64 byte):
307 only A is considered: # of A is 4 (green)
308 A and AAAA are considered: # of A+AAAA is 4 (green)
309 preferred-glue A is assumed: # of A is 4, # of AAAA is 4 (green)
320 Expires January 2007 [Page 6]
322 INTERNET-DRAFT August 2006 RESPSIZE
325 % perl respsize.pl ns-ext.isc.org ns.psg.com ns.ripe.net ns.eu.int
326 ns-ext.isc.org requires 16 bytes
327 ns.psg.com requires 12 bytes
328 ns.ripe.net requires 13 bytes
329 ns.eu.int requires 11 bytes
331 For maximum size query (255 byte):
332 only A is considered: # of A is 4 (green)
333 A and AAAA are considered: # of A+AAAA is 3 (yellow)
334 preferred-glue A is assumed: # of A is 4, # of AAAA is 2 (yellow)
335 For average size query (64 byte):
336 only A is considered: # of A is 4 (green)
337 A and AAAA are considered: # of A+AAAA is 4 (green)
338 preferred-glue A is assumed: # of A is 4, # of AAAA is 4 (green)
340 (Note: The response simulator program is shown in Section 5.)
342 Here we use the term "green" if all address records could fit, or
343 "yellow" if two or more could fit, or "orange" if only one could fit, or
344 "red" if no address record could fit. It's clear that without a common
345 parent for nameserver names, much space would be lost. For these
346 examples we use an average/common name size of 15 octets, befitting our
347 assumption of GTLD-SERVERS.NET as our common parent name.
349 We're assuming a medium query name size of 64 since that is the typical
350 size seen in trace data at the time of this writing. If
351 Internationalized Domain Name (IDN) or any other technology which
352 results in larger query names be deployed significantly in advance of
353 EDNS, then new measurements and new estimates will have to be made.
357 4.1. The current practice of giving all nameserver names a common parent
358 (such as GTLD-SERVERS.NET or ROOT-SERVERS.NET) saves space in DNS
359 responses and allows for more nameservers to be enumerated than would
360 otherwise be possible, since the common parent domain name only appears
361 once in a DNS message and is referred to via "compression pointers"
364 4.2. If all nameserver names for a zone share a common parent, then it
365 is operationally advisable to make all servers for the zone thus served
366 also be authoritative for the zone of that common parent. For example,
367 the root name servers (?.ROOT-SERVERS.NET) can answer authoritatively
368 for the ROOT-SERVERS.NET. This is to ensure that the zone's servers
369 always have the zone's nameservers' glue available when delegating, and
373 Expires January 2007 [Page 7]
375 INTERNET-DRAFT August 2006 RESPSIZE
378 will be able to respond with answers rather than referrals if a
379 requester who wants that glue comes back asking for it. In this case
380 the name server will likely be a "stealth server" -- authoritative but
381 unadvertised in the glue zone's NS RRset. See [RFC1996 2] for more
382 information about stealth servers.
384 4.3. Thirteen (13) is the effective maximum number of nameserver names
385 usable traditional (non-extended) DNS, assuming a common parent domain
386 name, and given that implicit referral response truncation is
387 undesirable in the average case.
389 4.4. Multi-homing of name servers within a protocol family is
390 inadvisable since the necessary glue RRsets (A or AAAA) are atomically
391 indivisible, and will be larger than a single resource record. Larger
392 RRsets are more likely to lead to or encounter truncation.
394 4.5. Multi-homing of name servers across protocol families is less
395 likely to lead to or encounter truncation, partly because multiprotocol
396 clients are more likely to speak EDNS which can use a larger response
397 size limit, and partly because the resource records (A and AAAA) are in
398 different RRsets and are therefore divisible from each other.
400 4.6. Name server names which are at or below the zone they serve are
401 more sensitive to referral response truncation, and glue records for
402 them should be considered "less optional" than other glue records, in
403 the assembly of referral responses.
405 4.7. If a zone is served by thirteen (13) name servers having a common
406 parent name (such as ?.ROOT-SERVERS.NET) and each such name server has a
407 single address record in some protocol family (e.g., an A RR), then all
408 thirteen name servers or any subset thereof could multi-home in a second
409 protocol family by adding a second address record (e.g., an AAAA RR)
410 without reducing the reachability of the zone thus served.
417 # repsize.pl [ -z zone ] fqdn_ns1 fqdn_ns2 ...
418 # if all queries are assumed to have a same zone suffix,
419 # such as "jp" in JP TLD servers, specify it in -z option
426 Expires January 2007 [Page 8]
428 INTERNET-DRAFT August 2006 RESPSIZE
431 my ($sz_msg) = (512);
432 my ($sz_header, $sz_ptr, $sz_rr_a, $sz_rr_aaaa) = (12, 2, 16, 28);
433 my ($sz_type, $sz_class, $sz_ttl, $sz_rdlen) = (2, 2, 4, 2);
434 my (%namedb, $name, $nssect, %opts, $optz);
438 if (defined($opts{'z'})) {
439 server_name_len($opts{'z'}); # just register it
442 foreach $name (@ARGV) {
445 $len = server_name_len($name);
446 print "$name requires $len bytes\n";
447 $nssect += $sz_ptr + $sz_type + $sz_class + $sz_ttl
450 print "# of NS: $n_ns\n";
451 arsect(255, $nssect, $n_ns, "maximum");
452 arsect(64, $nssect, $n_ns, "average");
454 sub server_name_len {
456 my (@labels, $len, $n, $suffix);
458 $name =~ tr/A-Z/a-z/;
459 @labels = split(/\./, $name);
460 $len = length(join('.', @labels)) + 2;
461 for ($n = 0; $#labels >= 0; $n++, shift @labels) {
462 $suffix = join('.', @labels);
463 return length($name) - length($suffix) + $sz_ptr
464 if (defined($namedb{$suffix}));
465 $namedb{$suffix} = 1;
471 my ($sz_query, $nssect, $n_ns, $cond) = @_;
472 my ($space, $n_a, $n_a_aaaa, $n_p_aaaa, $ansect);
473 $ansect = $sz_query + 1 + $sz_type + $sz_class;
474 $space = $sz_msg - $sz_header - $ansect - $nssect;
475 $n_a = atmost(int($space / $sz_rr_a), $n_ns);
479 Expires January 2007 [Page 9]
481 INTERNET-DRAFT August 2006 RESPSIZE
484 $n_a_aaaa = atmost(int($space
485 / ($sz_rr_a + $sz_rr_aaaa)), $n_ns);
486 $n_p_aaaa = atmost(int(($space - $sz_rr_a * $n_ns)
487 / $sz_rr_aaaa), $n_ns);
488 printf "For %s size query (%d byte):\n", $cond, $sz_query;
489 printf " only A is considered: ";
490 printf "# of A is %d (%s)\n", $n_a, &judge($n_a, $n_ns);
491 printf " A and AAAA are considered: ";
492 printf "# of A+AAAA is %d (%s)\n",
493 $n_a_aaaa, &judge($n_a_aaaa, $n_ns);
494 printf " preferred-glue A is assumed: ";
495 printf "# of A is %d, # of AAAA is %d (%s)\n",
496 $n_a, $n_p_aaaa, &judge($n_p_aaaa, $n_ns);
501 return "green" if ($n >= $n_ns);
502 return "yellow" if ($n >= 2);
503 return "orange" if ($n == 1);
509 return 0 if ($a < 0);
510 return $b if ($a > $b);
514 6 - Security Considerations
516 The recommendations contained in this document have no known security
519 7 - IANA Considerations
521 This document does not call for changes or additions to any IANA
526 The authors thank Peter Koch, Rob Austein, Joe Abley, and Mark Andrews
527 for their valuable comments and suggestions.
532 Expires January 2007 [Page 10]
534 INTERNET-DRAFT August 2006 RESPSIZE
537 This work was supported by the US National Science Foundation (research
538 grant SCI-0427144) and DNS-OARC.
542 [RFC1034] Mockapetris, P.V., "Domain names - Concepts and Facilities",
543 RFC1034, November 1987.
545 [RFC1035] Mockapetris, P.V., "Domain names - Implementation and
546 Specification", RFC1035, November 1987.
548 [RFC1123] Braden, R., Ed., "Requirements for Internet Hosts -
549 Application and Support", RFC1123, October 1989.
551 [RFC1996] Vixie, P., "A Mechanism for Prompt Notification of Zone
552 Changes (DNS NOTIFY)", RFC1996, August 1996.
554 [RFC2181] Elz, R., Bush, R., "Clarifications to the DNS Specification",
557 [RFC2308] Andrews, M., "Negative Caching of DNS Queries (DNS NCACHE)",
560 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", RFC2671,
563 [RFC4472] Durand, A., Ihren, J., Savola, P., "Operational Consideration
564 and Issues with IPV6 DNS", April 2006.
566 10 - Authors' Addresses
569 Internet Systems Consortium, Inc.
571 Redwood City, CA 94063
576 University of Tokyo, Information Technology Center
578 Tokyo 113-8658, JAPAN
585 Expires January 2007 [Page 11]
587 INTERNET-DRAFT August 2006 RESPSIZE
590 Full Copyright Statement
592 Copyright (C) The Internet Society (2006).
594 This document is subject to the rights, licenses and restrictions
595 contained in BCP 78, and except as set forth therein, the authors retain
598 This document and the information contained herein are provided on an
599 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR
600 IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
601 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
602 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
603 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
604 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
606 Intellectual Property
608 The IETF takes no position regarding the validity or scope of any
609 Intellectual Property Rights or other rights that might be claimed to
610 pertain to the implementation or use of the technology described in this
611 document or the extent to which any license under such rights might or
612 might not be available; nor does it represent that it has made any
613 independent effort to identify any such rights. Information on the
614 procedures with respect to rights in RFC documents can be found in BCP
617 Copies of IPR disclosures made to the IETF Secretariat and any
618 assurances of licenses to be made available, or the result of an attempt
619 made to obtain a general license or permission for the use of such
620 proprietary rights by implementers or users of this specification can be
621 obtained from the IETF on-line IPR repository at
622 http://www.ietf.org/ipr.
624 The IETF invites any interested party to bring to its attention any
625 copyrights, patents or patent applications, or other proprietary rights
626 that may cover technology that may be required to implement this
627 standard. Please address the information to the IETF at
632 Funding for the RFC Editor function is provided by the IETF
633 Administrative Support Activity (IASA).
638 Expires January 2007 [Page 12]