7 Network Working Group J. Klensin
8 Request for Comments: 3467 February 2003
9 Category: Informational
12 Role of the Domain Name System (DNS)
16 This memo provides information for the Internet community. It does
17 not specify an Internet standard of any kind. Distribution of this
22 Copyright (C) The Internet Society (2003). All Rights Reserved.
26 This document reviews the original function and purpose of the domain
27 name system (DNS). It contrasts that history with some of the
28 purposes for which the DNS has recently been applied and some of the
29 newer demands being placed upon it or suggested for it. A framework
30 for an alternative to placing these additional stresses on the DNS is
31 then outlined. This document and that framework are not a proposed
32 solution, only a strong suggestion that the time has come to begin
33 thinking more broadly about the problems we are encountering and
34 possible approaches to solving them.
38 1. Introduction and History ..................................... 2
39 1.1 Context for DNS Development ............................... 3
40 1.2 Review of the DNS and Its Role as Designed ................ 4
41 1.3 The Web and User-visible Domain Names ..................... 6
42 1.4 Internet Applications Protocols and Their Evolution ....... 7
43 2. Signs of DNS Overloading ..................................... 8
44 3. Searching, Directories, and the DNS .......................... 12
45 3.1 Overview ................................................. 12
46 3.2 Some Details and Comments ................................. 14
47 4. Internationalization ......................................... 15
48 4.1 ASCII Isn't Just Because of English ....................... 16
49 4.2 The "ASCII Encoding" Approaches ........................... 17
50 4.3 "Stringprep" and Its Complexities ......................... 17
51 4.4 The Unicode Stability Problem ............................. 19
52 4.5 Audiences, End Users, and the User Interface Problem ...... 20
53 4.6 Business Cards and Other Natural Uses of Natural Languages. 22
54 4.7 ASCII Encodings and the Roman Keyboard Assumption ......... 22
58 Klensin Informational [Page 1]
60 RFC 3467 Role of the Domain Name System (DNS) February 2003
63 4.8 Intra-DNS Approaches for "Multilingual Names" ............. 23
64 5. Search-based Systems: The Key Controversies .................. 23
65 6. Security Considerations ...................................... 24
66 7. References ................................................... 25
67 7.1 Normative References ...................................... 25
68 7.2 Explanatory and Informative References .................... 25
69 8. Acknowledgements ............................................. 30
70 9. Author's Address ............................................. 30
71 10. Full Copyright Statement ..................................... 31
73 1. Introduction and History
75 The DNS was designed as a replacement for the older "host table"
76 system. Both were intended to provide names for network resources at
77 a more abstract level than network (IP) addresses (see, e.g.,
78 [RFC625], [RFC811], [RFC819], [RFC830], [RFC882]). In recent years,
79 the DNS has become a database of convenience for the Internet, with
80 many proposals to add new features. Only some of these proposals
81 have been successful. Often the main (or only) motivation for using
82 the DNS is because it exists and is widely deployed, not because its
83 existing structure, facilities, and content are appropriate for the
84 particular application of data involved. This document reviews the
85 history of the DNS, including examination of some of those newer
86 applications. It then argues that the overloading process is often
87 inappropriate. Instead, it suggests that the DNS should be
88 supplemented by systems better matched to the intended applications
89 and outlines a framework and rationale for one such system.
91 Several of the comments that follow are somewhat revisionist. Good
92 design and engineering often requires a level of intuition by the
93 designers about things that will be necessary in the future; the
94 reasons for some of these design decisions are not made explicit at
95 the time because no one is able to articulate them. The discussion
96 below reconstructs some of the decisions about the Internet's primary
97 namespace (the "Class=IN" DNS) in the light of subsequent development
98 and experience. In addition, the historical reasons for particular
99 decisions about the Internet were often severely underdocumented
100 contemporaneously and, not surprisingly, different participants have
101 different recollections about what happened and what was considered
102 important. Consequently, the quasi-historical story below is just
103 one story. There may be (indeed, almost certainly are) other stories
104 about how the DNS evolved to its present state, but those variants do
105 not invalidate the inferences and conclusions.
107 This document presumes a general understanding of the terminology of
108 RFC 1034 [RFC1034] or of any good DNS tutorial (see, e.g., [Albitz]).
114 Klensin Informational [Page 2]
116 RFC 3467 Role of the Domain Name System (DNS) February 2003
119 1.1 Context for DNS Development
121 During the entire post-startup-period life of the ARPANET and nearly
122 the first decade or so of operation of the Internet, the list of host
123 names and their mapping to and from addresses was maintained in a
124 frequently-updated "host table" [RFC625], [RFC811], [RFC952]. The
125 names themselves were restricted to a subset of ASCII [ASCII] chosen
126 to avoid ambiguities in printed form, to permit interoperation with
127 systems using other character codings (notably EBCDIC), and to avoid
128 the "national use" code positions of ISO 646 [IS646]. These
129 restrictions later became collectively known as the "LDH" rules for
130 "letter-digit-hyphen", the permitted characters. The table was just
131 a list with a common format that was eventually agreed upon; sites
132 were expected to frequently obtain copies of, and install, new
133 versions. The host tables themselves were introduced to:
135 o Eliminate the requirement for people to remember host numbers
136 (addresses). Despite apparent experience to the contrary in the
137 conventional telephone system, numeric numbering systems,
138 including the numeric host number strategy, did not (and do not)
139 work well for more than a (large) handful of hosts.
141 o Provide stability when addresses changed. Since addresses -- to
142 some degree in the ARPANET and more importantly in the
143 contemporary Internet -- are a function of network topology and
144 routing, they often had to be changed when connectivity or
145 topology changed. The names could be kept stable even as
148 o Provide the capability to have multiple addresses associated with
149 a given host to reflect different types of connectivity and
150 topology. Use of names, rather than explicit addresses, avoided
151 the requirement that would otherwise exist for users and other
152 hosts to track these multiple host numbers and addresses and the
153 topological considerations for selecting one over others.
155 After several years of using the host table approach, the community
156 concluded that model did not scale adequately and that it would not
157 adequately support new service variations. A number of discussions
158 and meetings were held which drew several ideas and incomplete
159 proposals together. The DNS was the result of that effort. It
160 continued to evolve during the design and initial implementation
161 period, with a number of documents recording the changes (see
162 [RFC819], [RFC830], and [RFC1034]).
170 Klensin Informational [Page 3]
172 RFC 3467 Role of the Domain Name System (DNS) February 2003
175 The goals for the DNS included:
177 o Preservation of the capabilities of the host table arrangements
178 (especially unique, unambiguous, host names),
180 o Provision for addition of additional services (e.g., the special
181 record types for electronic mail routing which quickly followed
182 introduction of the DNS), and
184 o Creation of a robust, hierarchical, distributed, name lookup
185 system to accomplish the other goals.
187 The DNS design also permitted distribution of name administration,
188 rather than requiring that each host be entered into a single,
189 central, table by a central administration.
191 1.2 Review of the DNS and Its Role as Designed
193 The DNS was designed to identify network resources. Although there
194 was speculation about including, e.g., personal names and email
195 addresses, it was not designed primarily to identify people, brands,
196 etc. At the same time, the system was designed with the flexibility
197 to accommodate new data types and structures, both through the
198 addition of new record types to the initial "INternet" class, and,
199 potentially, through the introduction of new classes. Since the
200 appropriate identifiers and content of those future extensions could
201 not be anticipated, the design provided that these fields could
202 contain any (binary) information, not just the restricted text forms
205 However, the DNS, as it is actually used, is intimately tied to the
206 applications and application protocols that utilize it, often at a
209 In particular, despite the ability of the protocols and data
210 structures themselves to accommodate any binary representation, DNS
211 names as used were historically not even unrestricted ASCII, but a
212 very restricted subset of it, a subset that derives from the original
213 host table naming rules. Selection of that subset was driven in part
214 by human factors considerations, including a desire to eliminate
215 possible ambiguities in an international context. Hence character
216 codes that had international variations in interpretation were
217 excluded, the underscore character and case distinctions were
218 eliminated as being confusing (in the underscore's case, with the
219 hyphen character) when written or read by people, and so on. These
220 considerations appear to be very similar to those that resulted in
221 similarly restricted character sets being used as protocol elements
222 in many ITU and ISO protocols (cf. [X29]).
226 Klensin Informational [Page 4]
228 RFC 3467 Role of the Domain Name System (DNS) February 2003
231 Another assumption was that there would be a high ratio of physical
232 hosts to second level domains and, more generally, that the system
233 would be deeply hierarchical, with most systems (and names) at the
234 third level or below and a very large percentage of the total names
235 representing physical hosts. There are domains that follow this
236 model: many university and corporate domains use fairly deep
237 hierarchies, as do a few country-oriented top level domains
238 ("ccTLDs"). Historically, the "US." domain has been an excellent
239 example of the deeply hierarchical approach. However, by 1998,
240 comparison of several efforts to survey the DNS showed a count of SOA
241 records that approached (and may have passed) the number of distinct
242 hosts. Looked at differently, we appear to be moving toward a
243 situation in which the number of delegated domains on the Internet is
244 approaching or exceeding the number of hosts, or at least the number
245 of hosts able to provide services to others on the network. This
246 presumably results from synonyms or aliases that map a great many
247 names onto a smaller number of hosts. While experience up to this
248 time has shown that the DNS is robust enough -- given contemporary
249 machines as servers and current bandwidth norms -- to be able to
250 continue to operate reasonably well when those historical assumptions
251 are not met (e.g., with a flat, structure under ".COM" containing
252 well over ten million delegated subdomains [COMSIZE]), it is still
253 useful to remember that the system could have been designed to work
254 optimally with a flat structure (and very large zones) rather than a
255 deeply hierarchical one, and was not.
257 Similarly, despite some early speculation about entering people's
258 names and email addresses into the DNS directly (e.g., see
259 [RFC1034]), electronic mail addresses in the Internet have preserved
260 the original, pre-DNS, "user (or mailbox) at location" conceptual
261 format rather than a flatter or strictly dot-separated one.
262 Location, in that instance, is a reference to a host. The sole
263 exception, at least in the "IN" class, has been one field of the SOA
266 Both the DNS architecture itself and the two-level (host name and
267 mailbox name) provisions for email and similar functions (e.g., see
268 the finger protocol [FINGER]), also anticipated a relatively high
269 ratio of users to actual hosts. Despite the observation in RFC 1034
270 that the DNS was expected to grow to be proportional to the number of
271 users (section 2.3), it has never been clear that the DNS was
272 seriously designed for, or could, scale to the order of magnitude of
273 number of users (or, more recently, products or document objects),
274 rather than that of physical hosts.
276 Just as was the case for the host table before it, the DNS provided
277 critical uniqueness for names, and universal accessibility to them,
278 as part of overall "single internet" and "end to end" models (cf.
282 Klensin Informational [Page 5]
284 RFC 3467 Role of the Domain Name System (DNS) February 2003
287 [RFC2826]). However, there are many signs that, as new uses evolved
288 and original assumptions were abused (if not violated outright), the
289 system was being stretched to, or beyond, its practical limits.
291 The original design effort that led to the DNS included examination
292 of the directory technologies available at the time. The design
293 group concluded that the DNS design, with its simplifying assumptions
294 and restricted capabilities, would be feasible to deploy and make
295 adequately robust, which the more comprehensive directory approaches
296 were not. At the same time, some of the participants feared that the
297 limitations might cause future problems; this document essentially
298 takes the position that they were probably correct. On the other
299 hand, directory technology and implementations have evolved
300 significantly in the ensuing years: it may be time to revisit the
301 assumptions, either in the context of the two- (or more) level
302 mechanism contemplated by the rest of this document or, even more
303 radically, as a path toward a DNS replacement.
305 1.3 The Web and User-visible Domain Names
307 From the standpoint of the integrity of the domain name system -- and
308 scaling of the Internet, including optimal accessibility to content
309 -- the web design decision to use "A record" domain names directly in
310 URLs, rather than some system of indirection, has proven to be a
311 serious mistake in several respects. Convenience of typing, and the
312 desire to make domain names out of easily-remembered product names,
313 has led to a flattening of the DNS, with many people now perceiving
314 that second-level names under COM (or in some countries, second- or
315 third-level names under the relevant ccTLD) are all that is
316 meaningful. This perception has been reinforced by some domain name
317 registrars [REGISTRAR] who have been anxious to "sell" additional
318 names. And, of course, the perception that one needed a second-level
319 (or even top-level) domain per product, rather than having names
320 associated with a (usually organizational) collection of network
321 resources, has led to a rapid acceleration in the number of names
322 being registered. That acceleration has, in turn, clearly benefited
323 registrars charging on a per-name basis, "cybersquatters", and others
324 in the business of "selling" names, but it has not obviously
325 benefited the Internet as a whole.
327 This emphasis on second-level domain names has also created a problem
328 for the trademark community. Since the Internet is international,
329 and names are being populated in a flat and unqualified space,
330 similarly-named entities are in conflict even if there would
331 ordinarily be no chance of confusing them in the marketplace. The
332 problem appears to be unsolvable except by a choice between draconian
333 measures. These might include significant changes to the legislation
334 and conventions that govern disputes over "names" and "marks". Or
338 Klensin Informational [Page 6]
340 RFC 3467 Role of the Domain Name System (DNS) February 2003
343 they might result in a situation in which the "rights" to a name are
344 typically not settled using the subtle and traditional product (or
345 industry) type and geopolitical scope rules of the trademark system.
346 Instead they have depended largely on political or economic power,
347 e.g., the organization with the greatest resources to invest in
348 defending (or attacking) names will ultimately win out. The latter
349 raises not only important issues of equity, but also the risk of
350 backlash as the numerous small players are forced to relinquish names
351 they find attractive and to adopt less-desirable naming conventions.
353 Independent of these sociopolitical problems, content distribution
354 issues have made it clear that it should be possible for an
355 organization to have copies of data it wishes to make available
356 distributed around the network, with a user who asks for the
357 information by name getting the topologically-closest copy. This is
358 not possible with simple, as-designed, use of the DNS: DNS names
359 identify target resources or, in the case of email "MX" records, a
360 preferentially-ordered list of resources "closest" to a target (not
361 to the source/user). Several technologies (and, in some cases,
362 corresponding business models) have arisen to work around these
363 problems, including intercepting and altering DNS requests so as to
364 point to other locations.
366 Additional implications are still being discovered and evaluated.
368 Approaches that involve interception of DNS queries and rewriting of
369 DNS names (or otherwise altering the resolution process based on the
370 topological location of the user) seem, however, to risk disrupting
371 end-to-end applications in the general case and raise many of the
372 issues discussed by the IAB in [IAB-OPES]. These problems occur even
373 if the rewriting machinery is accompanied by additional workarounds
374 for particular applications. For example, security associations and
375 applications that need to identify "the same host" often run into
376 problems if DNS names or other references are changed in the network
377 without participation of the applications that are trying to invoke
378 the associated services.
380 1.4 Internet Applications Protocols and Their Evolution
382 At the applications level, few of the protocols in active,
383 widespread, use on the Internet reflect either contemporary knowledge
384 in computer science or human factors or experience accumulated
385 through deployment and use. Instead, protocols tend to be deployed
386 at a just-past-prototype level, typically including the types of
387 expedient compromises typical with prototypes. If they prove useful,
388 the nature of the network permits very rapid dissemination (i.e.,
389 they fill a vacuum, even if a vacuum that no one previously knew
390 existed). But, once the vacuum is filled, the installed base
394 Klensin Informational [Page 7]
396 RFC 3467 Role of the Domain Name System (DNS) February 2003
399 provides its own inertia: unless the design is so seriously faulty as
400 to prevent effective use (or there is a widely-perceived sense of
401 impending disaster unless the protocol is replaced), future
402 developments must maintain backward compatibility and workarounds for
403 problematic characteristics rather than benefiting from redesign in
404 the light of experience. Applications that are "almost good enough"
405 prevent development and deployment of high-quality replacements.
407 The DNS is both an illustration of, and an exception to, parts of
408 this pessimistic interpretation. It was a second-generation
409 development, with the host table system being seen as at the end of
410 its useful life. There was a serious attempt made to reflect the
411 computing state of the art at the time. However, deployment was much
412 slower than expected (and very painful for many sites) and some fixed
413 (although relaxed several times) deadlines from a central network
414 administration were necessary for deployment to occur at all.
415 Replacing it now, in order to add functionality, while it continues
416 to perform its core functions at least reasonably well, would
417 presumably be extremely difficult.
419 There are many, perhaps obvious, examples of this. Despite many
420 known deficiencies and weaknesses of definition, the "finger" and
421 "whois" [WHOIS] protocols have not been replaced (despite many
422 efforts to update or replace the latter [WHOIS-UPDATE]). The Telnet
423 protocol and its many options drove out the SUPDUP [RFC734] one,
424 which was arguably much better designed for a diverse collection of
425 network hosts. A number of efforts to replace the email or file
426 transfer protocols with models which their advocates considered much
427 better have failed. And, more recently and below the applications
428 level, there is some reason to believe that this resistance to change
429 has been one of the factors impeding IPv6 deployment.
431 2. Signs of DNS Overloading
433 Parts of the historical discussion above identify areas in which the
434 DNS has become overloaded (semantically if not in the mechanical
435 ability to resolve names). Despite this overloading, it appears that
436 DNS performance and reliability are still within an acceptable range:
437 there is little evidence of serious performance degradation. Recent
438 proposals and mechanisms to better respond to overloading and scaling
439 issues have all focused on patching or working around limitations
440 that develop when the DNS is utilized for out-of-design functions,
441 rather than on dramatic rethinking of either DNS design or those
442 uses. The number of these issues that have arisen at much the same
443 time may argue for just that type of rethinking, and not just for
444 adding complexity and attempting to incrementally alter the design
445 (see, for example, the discussion of simplicity in section 2 of
450 Klensin Informational [Page 8]
452 RFC 3467 Role of the Domain Name System (DNS) February 2003
457 o While technical approaches such as larger and higher-powered
458 servers and more bandwidth, and legal/political mechanisms such as
459 dispute resolution policies, have arguably kept the problems from
460 becoming critical, the DNS has not proven adequately responsive to
461 business and individual needs to describe or identify things (such
462 as product names and names of individuals) other than strict
465 o While stacks have been modified to better handle multiple
466 addresses on a physical interface and some protocols have been
467 extended to include DNS names for determining context, the DNS
468 does not deal especially well with many names associated with a
469 given host (e.g., web hosting facilities with multiple domains on
472 o Efforts to add names deriving from languages or character sets
473 based on other than simple ASCII and English-like names (see
474 below), or even to utilize complex company or product names
475 without the use of hierarchy, have created apparent requirements
476 for names (labels) that are over 63 octets long. This requirement
477 will undoubtedly increase over time; while there are workarounds
478 to accommodate longer names, they impose their own restrictions
479 and cause their own problems.
481 o Increasing commercialization of the Internet, and visibility of
482 domain names that are assumed to match names of companies or
483 products, has turned the DNS and DNS names into a trademark
484 battleground. The traditional trademark system in (at least) most
485 countries makes careful distinctions about fields of
486 applicability. When the space is flattened, without
487 differentiation by either geography or industry sector, not only
488 are there likely conflicts between "Joe's Pizza" (of Boston) and
489 "Joe's Pizza" (of San Francisco) but between both and "Joe's Auto
490 Repair" (of Los Angeles). All three would like to control
491 "Joes.com" (and would prefer, if it were permitted by DNS naming
492 rules, to also spell it as "Joe's.com" and have both resolve the
493 same way) and may claim trademark rights to do so, even though
494 conflict or confusion would not occur with traditional trademark
497 o Many organizations wish to have different web sites under the same
498 URL and domain name. Sometimes this is to create local variations
499 -- the Widget Company might want to present different material to
500 a UK user relative to a US one -- and sometimes it is to provide
501 higher performance by supplying information from the server
502 topologically closest to the user. If the name resolution
506 Klensin Informational [Page 9]
508 RFC 3467 Role of the Domain Name System (DNS) February 2003
511 mechanism is expected to provide this functionality, there are
512 three possible models (which might be combined):
514 - supply information about multiple sites (or locations or
515 references). Those sites would, in turn, provide information
516 associated with the name and sufficient site-specific
517 attributes to permit the application to make a sensible choice
520 - accept client-site attributes and utilize them in the search
523 - return different answers based on the location or identity of
526 While there are some tricks that can provide partial simulations of
527 these types of function, DNS responses cannot be reliably conditioned
530 These, and similar, issues of performance or content choices can, of
531 course, be thought of as not involving the DNS at all. For example,
532 the commonly-cited alternate approach of coupling these issues to
533 HTTP content negotiation (cf. [RFC2295]), requires that an HTTP
534 connection first be opened to some "common" or "primary" host so that
535 preferences can be negotiated and then the client redirected or sent
536 alternate data. At least from the standpoint of improving
537 performance by accessing a "closer" location, both initially and
538 thereafter, this approach sacrifices the desired result before the
539 client initiates any action. It could even be argued that some of
540 the characteristics of common content negotiation approaches are
541 workarounds for the non-optimal use of the DNS in web URLs.
543 o Many existing and proposed systems for "finding things on the
544 Internet" require a true search capability in which near matches
545 can be reported to the user (or to some user agent with an
546 appropriate rule-set) and to which queries may be ambiguous or
547 fuzzy. The DNS, by contrast, can accommodate only one set of
548 (quite rigid) matching rules. Proposals to permit different rules
549 in different localities (e.g., matching rules that are TLD- or
550 zone-specific) help to identify the problem. But they cannot be
551 applied directly to the DNS without either abandoning the desired
552 level of flexibility or isolating different parts of the Internet
553 from each other (or both). Fuzzy or ambiguous searches are
554 desirable for resolution of names that might have spelling
555 variations and for names that can be resolved into different sets
556 of glyphs depending on context. Especially when
557 internationalization is considered, variant name problems go
558 beyond simple differences in representation of a character or
562 Klensin Informational [Page 10]
564 RFC 3467 Role of the Domain Name System (DNS) February 2003
567 ordering of a string. Instead, avoiding user astonishment and
568 confusion requires consideration of relationships such as
569 languages that can be written with different alphabets, Kanji-
570 Hiragana relationships, Simplified and Traditional Chinese, etc.
571 See [Seng] for a discussion and suggestions for addressing a
572 subset of these issues in the context of characters based on
573 Chinese ones. But that document essentially illustrates the
574 difficulty of providing the type of flexible matching that would
575 be anticipated by users; instead, it tries to protect against the
576 worst types of confusion (and opportunities for fraud).
578 o The historical DNS, and applications that make assumptions about
579 how it works, impose significant risk (or forces technical kludges
580 and consequent odd restrictions), when one considers adding
581 mechanisms for use with various multi-character-set and
582 multilingual "internationalization" systems. See the IAB's
583 discussion of some of these issues [RFC2825] for more information.
585 o In order to provide proper functionality to the Internet, the DNS
586 must have a single unique root (the IAB provides more discussion
587 of this issue [RFC2826]). There are many desires for local
588 treatment of names or character sets that cannot be accommodated
589 without either multiple roots (e.g., a separate root for
590 multilingual names, proposed at various times by MINC [MINC] and
591 others), or mechanisms that would have similar effects in terms of
592 Internet fragmentation and isolation.
594 o For some purposes, it is desirable to be able to search not only
595 an index entry (labels or fully-qualified names in the DNS case),
596 but their values or targets (DNS data). One might, for example,
597 want to locate all of the host (and virtual host) names which
598 cause mail to be directed to a given server via MX records. The
599 DNS does not support this capability (see the discussion in
600 [IQUERY]) and it can be simulated only by extracting all of the
601 relevant records (perhaps by zone transfer if the source permits
602 doing so, but that permission is becoming less frequently
603 available) and then searching a file built from those records.
605 o Finally, as additional types of personal or identifying
606 information are added to the DNS, issues arise with protection of
607 that information. There are increasing calls to make different
608 information available based on the credentials and authorization
609 of the source of the inquiry. As with information keyed to site
610 locations or proximity (as discussed above), the DNS protocols
611 make providing these differentiated services quite difficult if
618 Klensin Informational [Page 11]
620 RFC 3467 Role of the Domain Name System (DNS) February 2003
623 In each of these cases, it is, or might be, possible to devise ways
624 to trick the DNS system into supporting mechanisms that were not
625 designed into it. Several ingenious solutions have been proposed in
626 many of these areas already, and some have been deployed into the
627 marketplace with some success. But the price of each of these
628 changes is added complexity and, with it, added risk of unexpected
629 and destabilizing problems.
631 Several of the above problems are addressed well by a good directory
632 system (supported by the LDAP protocol or some protocol more
633 precisely suited to these specific applications) or searching
634 environment (such as common web search engines) although not by the
635 DNS. Given the difficulty of deploying new applications discussed
636 above, an important question is whether the tricks and kludges are
637 bad enough, or will become bad enough as usage grows, that new
638 solutions are needed and can be deployed.
640 3. Searching, Directories, and the DNS
644 The constraints of the DNS and the discussion above suggest the
645 introduction of an intermediate protocol mechanism, referred to below
646 as a "search layer" or "searchable system". The terms "directory"
647 and "directory system" are used interchangeably with "searchable
648 system" in this document, although the latter is far more precise.
649 Search layer proposals would use a two (or more) stage lookup, not
650 unlike several of the proposals for internationalized names in the
651 DNS (see section 4), but all operations but the final one would
652 involve searching other systems, rather than looking up identifiers
653 in the DNS itself. As explained below, this would permit relaxation
654 of several constraints, leading to a more capable and comprehensive
657 Ultimately, many of the issues with domain names arise as the result
658 of efforts to use the DNS as a directory. While, at the time this
659 document was written, sufficient pressure or demand had not occurred
660 to justify a change, it was already quite clear that, as a directory
661 system, the DNS is a good deal less than ideal. This document
662 suggests that there actually is a requirement for a directory system,
663 and that the right solution to a searchable system requirement is a
664 searchable system, not a series of DNS patches, kludges, or
674 Klensin Informational [Page 12]
676 RFC 3467 Role of the Domain Name System (DNS) February 2003
679 The following points illustrate particular aspects of this
682 o A directory system would not require imposition of particular
683 length limits on names.
685 o A directory system could permit explicit association of
686 attributes, e.g., language and country, with a name, without
687 having to utilize trick encodings to incorporate that information
688 in DNS labels (or creating artificial hierarchy for doing so).
690 o There is considerable experience (albeit not much of it very
691 successful) in doing fuzzy and "sonex" (similar-sounding) matching
692 in directory systems. Moreover, it is plausible to think about
693 different matching rules for different areas and sets of names so
694 that these can be adapted to local cultural requirements.
695 Specifically, it might be possible to have a single form of a name
696 in a directory, but to have great flexibility about what queries
697 matched that name (and even have different variations in different
698 areas). Of course, the more flexibility that a system provides,
699 the greater the possibility of real or imagined trademark
700 conflicts. But the opportunity would exist to design a directory
701 structure that dealt with those issues in an intelligent way,
702 while DNS constraints almost certainly make a general and
703 equitable DNS-only solution impossible.
705 o If a directory system is used to translate to DNS names, and then
706 DNS names are looked up in the normal fashion, it may be possible
707 to relax several of the constraints that have been traditional
708 (and perhaps necessary) with the DNS. For example, reverse-
709 mapping of addresses to directory names may not be a requirement
710 even if mapping of addresses to DNS names continues to be, since
711 the DNS name(s) would (continue to) uniquely identify the host.
713 o Solutions to multilingual transcription problems that are common
714 in "normal life" (e.g., two-sided business cards to be sure that
715 recipients trying to contact a person can access romanized
716 spellings and numbers if the original language is not
717 comprehensible to them) can be easily handled in a directory
718 system by inserting both sets of entries.
720 o A directory system could be designed that would return, not a
721 single name, but a set of names paired with network-locational
722 information or other context-establishing attributes. This type
723 of information might be of considerable use in resolving the
724 "nearest (or best) server for a particular named resource"
730 Klensin Informational [Page 13]
732 RFC 3467 Role of the Domain Name System (DNS) February 2003
735 problems that are a significant concern for organizations hosting
736 web and other sites that are accessed from a wide range of
737 locations and subnets.
739 o Names bound to countries and languages might help to manage
740 trademark realities, while, as discussed in section 1.3 above, use
741 of the DNS in trademark-significant contexts tends to require
742 worldwide "flattening" of the trademark system.
744 Many of these issues are a consequence of another property of the
745 DNS: names must be unique across the Internet. The need to have a
746 system of unique identifiers is fairly obvious (see [RFC2826]).
747 However, if that requirement were to be eliminated in a search or
748 directory system that was visible to users instead of the DNS, many
749 difficult problems -- of both an engineering and a policy nature --
750 would be likely to vanish.
752 3.2 Some Details and Comments
754 Almost any internationalization proposal for names that are in, or
755 map into, the DNS will require changing DNS resolver API calls
756 ("gethostbyname" or equivalent), or adding some pre-resolution
757 preparation mechanism, in almost all Internet applications -- whether
758 to cause the API to take a different character set (no matter how it
759 is then mapped into the bits used in the DNS or another system), to
760 accept or return more arguments with qualifying or identifying
761 information, or otherwise. Once applications must be opened to make
762 such changes, it is a relatively small matter to switch from calling
763 into the DNS to calling a directory service and then the DNS (in many
764 situations, both actions could be accomplished in a single API call).
766 A directory approach can be consistent both with "flat" models and
767 multi-attribute ones. The DNS requires strict hierarchies, limiting
768 its ability to differentiate among names by their properties. By
769 contrast, modern directories can utilize independently-searched
770 attributes and other structured schema to provide flexibilities not
771 present in a strictly hierarchical system.
773 There is a strong historical argument for a single directory
774 structure (implying a need for mechanisms for registration,
775 delegation, etc.). But a single structure is not a strict
776 requirement, especially if in-depth case analysis and design work
777 leads to the conclusion that reverse-mapping to directory names is
778 not a requirement (see section 5). If a single structure is not
779 needed, then, unlike the DNS, there would be no requirement for a
780 global organization to authorize or delegate operation of portions of
786 Klensin Informational [Page 14]
788 RFC 3467 Role of the Domain Name System (DNS) February 2003
791 The "no single structure" concept could be taken further by moving
792 away from simple "names" in favor of, e.g., multiattribute,
793 multihierarchical, faceted systems in which most of the facets use
794 restricted vocabularies. (These terms are fairly standard in the
795 information retrieval and classification system literature, see,
796 e.g., [IS5127].) Such systems could be designed to avoid the need
797 for procedures to ensure uniqueness across, or even within, providers
798 and databases of the faceted entities for which the search is to be
799 performed. (See [DNS-Search] for further discussion.)
801 While the discussion above includes very general comments about
802 attributes, it appears that only a very small number of attributes
803 would be needed. The list would almost certainly include country and
804 language for internationalization purposes. It might require
805 "charset" if we cannot agree on a character set and encoding,
806 although there are strong arguments for simply using ISO 10646 (also
807 known as Unicode or "UCS" (for Universal Character Set) [UNICODE],
808 [IS10646] coding in interchange. Trademark issues might motivate
809 "commercial" and "non-commercial" (or other) attributes if they would
810 be helpful in bypassing trademark problems. And applications to
811 resource location, such as those contemplated for Uniform Resource
812 Identifiers (URIs) [RFC2396, RFC3305] or the Service Location
813 Protocol [RFC2608], might argue for a few other attributes (as
816 4. Internationalization
818 Much of the thinking underlying this document was driven by
819 considerations of internationalizing the DNS or, more specifically,
820 providing access to the functions of the DNS from languages and
821 naming systems that cannot be accurately expressed in the traditional
822 DNS subset of ASCII. Much of the relevant work was done in the
823 IETF's "Internationalized Domain Names" Working Group (IDN-WG),
824 although this document also draws on extensive parallel discussions
825 in other forums. This section contains an evaluation of what was
826 learned as an "internationalized DNS" or "multilingual DNS" was
827 explored and suggests future steps based on that evaluation.
829 When the IDN-WG was initiated, it was obvious to several of the
830 participants that its first important task was an undocumented one:
831 to increase the understanding of the complexities of the problem
832 sufficiently that naive solutions could be rejected and people could
833 go to work on the harder problems. The IDN-WG clearly accomplished
834 that task. The beliefs that the problems were simple, and in the
835 corresponding simplistic approaches and their promises of quick and
836 painless deployment, effectively disappeared as the WG's efforts
842 Klensin Informational [Page 15]
844 RFC 3467 Role of the Domain Name System (DNS) February 2003
847 Some of the lessons learned from increased understanding and the
848 dissipation of naive beliefs should be taken as cautions by the wider
849 community: the problems are not simple. Specifically, extracting
850 small elements for solution rather than looking at whole systems, may
851 result in obscuring the problems but not solving any problem that is
854 4.1 ASCII Isn't Just Because of English
856 The hostname rules chosen in the mid-70s weren't just "ASCII because
857 English uses ASCII", although that was a starting point. We have
858 discovered that almost every other script (and even ASCII if we
859 permit the rest of the characters specified in the ISO 646
860 International Reference Version) is more complex than hostname-
861 restricted-ASCII (the "LDH" form, see section 1.1). And ASCII isn't
862 sufficient to completely represent English -- there are several words
863 in the language that are correctly spelled only with characters or
864 diacritical marks that do not appear in ASCII. With a broader
865 selection of scripts, in some examples, case mapping works from one
866 case to the other but is not reversible. In others, there are
867 conventions about alternate ways to represent characters (in the
868 language, not [only] in character coding) that work most of the time,
869 but not always. And there are issues in coding, with Unicode/10646
870 providing different ways to represent the same character
871 ("character", rather than "glyph", is used deliberately here). And,
872 in still others, there are questions as to whether two glyphs
873 "match", which may be a distance-function question, not one with a
874 binary answer. The IETF approach to these problems is to require
875 pre-matching canonicalization (see the "stringprep" discussion
878 The IETF has resisted the temptations to either try to specify an
879 entirely new coded character set, or to pick and choose Unicode/10646
880 characters on a per-character basis rather than by using well-defined
881 blocks. While it may appear that a character set designed to meet
882 Internet-specific needs would be very attractive, the IETF has never
883 had the expertise, resources, and representation from critically-
884 important communities to actually take on that job. Perhaps more
885 important, a new effort might have chosen to make some of the many
886 complex tradeoffs differently than the Unicode committee did,
887 producing a code with somewhat different characteristics. But there
888 is no evidence that doing so would produce a code with fewer problems
889 and side-effects. It is much more likely that making tradeoffs
890 differently would simply result in a different set of problems, which
891 would be equally or more difficult.
898 Klensin Informational [Page 16]
900 RFC 3467 Role of the Domain Name System (DNS) February 2003
903 4.2 The "ASCII Encoding" Approaches
905 While the DNS can handle arbitrary binary strings without known
906 internal problems (see [RFC2181]), some restrictions are imposed by
907 the requirement that text be interpreted in a case-independent way
908 ([RFC1034], [RFC1035]). More important, most internet applications
909 assume the hostname-restricted "LDH" syntax that is specified in the
910 host table RFCs and as "prudent" in RFC 1035. If those assumptions
911 are not met, many conforming implementations of those applications
912 may exhibit behavior that would surprise implementors and users. To
913 avoid these potential problems, IETF internationalization work has
914 focused on "ASCII-Compatible Encodings" (ACE). These encodings
915 preserve the LDH conventions in the DNS itself. Implementations of
916 applications that have not been upgraded utilize the encoded forms,
917 while newer ones can be written to recognize the special codings and
918 map them into non-ASCII characters. These approaches are, however,
919 not problem-free even if human interface issues are ignored. Among
920 other issues, they rely on what is ultimately a heuristic to
921 determine whether a DNS label is to be considered as an
922 internationalized name (i.e., encoded Unicode) or interpreted as an
923 actual LDH name in its own right. And, while all determinations of
924 whether a particular query matches a stored object are traditionally
925 made by DNS servers, the ACE systems, when combined with the
926 complexities of international scripts and names, require that much of
927 the matching work be separated into a separate, client-side,
928 canonicalization or "preparation" process before the DNS matching
929 mechanisms are invoked [STRINGPREP].
931 4.3 "Stringprep" and Its Complexities
933 As outlined above, the model for avoiding problems associated with
934 putting non-ASCII names in the DNS and elsewhere evolved into the
935 principle that strings are to be placed into the DNS only after being
936 passed through a string preparation function that eliminates or
937 rejects spurious character codes, maps some characters onto others,
938 performs some sequence canonicalization, and generally creates forms
939 that can be accurately compared. The impact of this process on
940 hostname-restricted ASCII (i.e., "LDH") strings is trivial and
941 essentially adds only overhead. For other scripts, the impact is, of
942 necessity, quite significant.
944 Although the general notion underlying stringprep is simple, the many
945 details are quite subtle and the associated tradeoffs are complex. A
946 design team worked on it for months, with considerable effort placed
947 into clarifying and fine-tuning the protocol and tables. Despite
948 general agreement that the IETF would avoid getting into the business
949 of defining character sets, character codings, and the associated
950 conventions, the group several times considered and rejected special
954 Klensin Informational [Page 17]
956 RFC 3467 Role of the Domain Name System (DNS) February 2003
959 treatment of code positions to more nearly match the distinctions
960 made by Unicode with user perceptions about similarities and
961 differences between characters. But there were intense temptations
962 (and pressures) to incorporate language-specific or country-specific
963 rules. Those temptations, even when resisted, were indicative of
964 parts of the ongoing controversy or of the basic unsuitability of the
965 DNS for fully internationalized names that are visible,
966 comprehensible, and predictable for end users.
968 There have also been controversies about how far one should go in
969 these processes of preparation and transformation and, ultimately,
970 about the validity of various analogies. For example, each of the
971 following operations has been claimed to be similar to case-mapping
974 o stripping of vowels in Arabic or Hebrew
976 o matching of "look-alike" characters such as upper-case Alpha in
977 Greek and upper-case A in Roman-based alphabets
979 o matching of Traditional and Simplified Chinese characters that
980 represent the same words,
982 o matching of Serbo-Croatian words whether written in Roman-derived
983 or Cyrillic characters
985 A decision to support any of these operations would have implications
986 for other scripts or languages and would increase the overall
987 complexity of the process. For example, unless language-specific
988 information is somehow available, performing matching between
989 Traditional and Simplified Chinese has impacts on Japanese and Korean
990 uses of the same "traditional" characters (e.g., it would not be
991 appropriate to map Kanji into Simplified Chinese).
993 Even were the IDN-WG's other work to have been abandoned completely
994 or if it were to fail in the marketplace, the stringprep and nameprep
995 work will continue to be extremely useful, both in identifying issues
996 and problem code points and in providing a reasonable set of basic
997 rules. Where problems remain, they are arguably not with nameprep,
998 but with the DNS-imposed requirement that its results, as with all
999 other parts of the matching and comparison process, yield a binary
1000 "match or no match" answer, rather than, e.g., a value on a
1001 similarity scale that can be evaluated by the user or by user-driven
1002 heuristic functions.
1010 Klensin Informational [Page 18]
1012 RFC 3467 Role of the Domain Name System (DNS) February 2003
1015 4.4 The Unicode Stability Problem
1017 ISO 10646 basically defines only code points, and not rules for using
1018 or comparing the characters. This is part of a long-standing
1019 tradition with the work of what is now ISO/IEC JTC1/SC2: they have
1020 performed code point assignments and have typically treated the ways
1021 in which characters are used as beyond their scope. Consequently,
1022 they have not dealt effectively with the broader range of
1023 internationalization issues. By contrast, the Unicode Technical
1024 Committee (UTC) has defined, in annexes and technical reports (see,
1025 e.g., [UTR15]), some additional rules for canonicalization and
1026 comparison. Many of those rules and conventions have been factored
1027 into the "stringprep" and "nameprep" work, but it is not
1028 straightforward to make or define them in a fashion that is
1029 sufficiently precise and permanent to be relied on by the DNS.
1031 Perhaps more important, the discussions leading to nameprep also
1032 identified several areas in which the UTC definitions are inadequate,
1033 at least without additional information, to make matching precise and
1034 unambiguous. In some of these cases, the Unicode Standard permits
1035 several alternate approaches, none of which are an exact and obvious
1036 match to DNS needs. That has left these sensitive choices up to
1037 IETF, which lacks sufficient in-depth expertise, much less any
1038 mechanism for deciding to optimize one language at the expense of
1041 For example, it is tempting to define some rules on the basis of
1042 membership in particular scripts, or for punctuation characters, but
1043 there is no precise definition of what characters belong to which
1044 script or which ones are, or are not, punctuation. The existence of
1045 these areas of vagueness raises two issues: whether trying to do
1046 precise matching at the character set level is actually possible
1047 (addressed below) and whether driving toward more precision could
1048 create issues that cause instability in the implementation and
1049 resolution models for the DNS.
1051 The Unicode definition also evolves. Version 3.2 appeared shortly
1052 after work on this document was initiated. It added some characters
1053 and functionality and included a few minor incompatible code point
1054 changes. IETF has secured an agreement about constraints on future
1055 changes, but it remains to be seen how that agreement will work out
1056 in practice. The prognosis actually appears poor at this stage,
1057 since UTC chose to ballot a recent possible change which should have
1058 been prohibited by the agreement (the outcome of the ballot is not
1059 relevant, only that the ballot was issued rather than having the
1060 result be a foregone conclusion). However, some members of the
1061 community consider some of the changes between Unicode 3.0 and 3.1
1062 and between 3.1 and 3.2, as well as this recent ballot, to be
1066 Klensin Informational [Page 19]
1068 RFC 3467 Role of the Domain Name System (DNS) February 2003
1071 evidence of instability and that these instabilities are better
1072 handled in a system that can be more flexible about handling of
1073 characters, scripts, and ancillary information than the DNS.
1075 In addition, because the systems implications of internationalization
1076 are considered out of scope in SC2, ISO/IEC JTC1 has assigned some of
1077 those issues to its SC22/WG20 (the Internationalization working group
1078 within the subcommittee that deals with programming languages,
1079 systems, and environments). WG20 has historically dealt with
1080 internationalization issues thoughtfully and in depth, but its status
1081 has several times been in doubt in recent years. However, assignment
1082 of these matters to WG20 increases the risk of eventual ISO
1083 internationalization standards that specify different behavior than
1084 the UTC specifications.
1086 4.5 Audiences, End Users, and the User Interface Problem
1088 Part of what has "caused" the DNS internationalization problem, as
1089 well as the DNS trademark problem and several others, is that we have
1090 stopped thinking about "identifiers for objects" -- which normal
1091 people are not expected to see -- and started thinking about "names"
1092 -- strings that are expected not only to be readable, but to have
1093 linguistically-sensible and culturally-dependent meaning to non-
1096 Within the IETF, the IDN-WG, and sometimes other groups, avoided
1097 addressing the implications of that transition by taking "outside our
1098 scope -- someone else's problem" approaches or by suggesting that
1099 people will just become accustomed to whatever conventions are
1100 adopted. The realities of user and vendor behavior suggest that
1101 these approaches will not serve the Internet community well in the
1104 o If we want to make it a problem in a different part of the user
1105 interface structure, we need to figure out where it goes in order
1106 to have proof of concept of our solution. Unlike vendors whose
1107 sole [business] model is the selling or registering of names, the
1108 IETF must produce solutions that actually work, in the
1109 applications context as seen by the end user.
1111 o The principle that "they will get used to our conventions and
1112 adapt" is fine if we are writing rules for programming languages
1113 or an API. But the conventions under discussion are not part of a
1114 semi-mathematical system, they are deeply ingrained in culture.
1115 No matter how often an English-speaking American is told that the
1116 Internet requires that the correct spelling of "colour" be used,
1117 he or she isn't going to be convinced. Getting a French-speaker in
1118 Lyon to use exactly the same lexical conventions as a French-
1122 Klensin Informational [Page 20]
1124 RFC 3467 Role of the Domain Name System (DNS) February 2003
1127 speaker in Quebec in order to accommodate the decisions of the
1128 IETF or of a registrar or registry is just not likely. "Montreal"
1129 is either a misspelling or an anglicization of a similar word with
1130 an acute accent mark over the "e" (i.e., using the Unicode
1131 character U+00E9 or one of its equivalents). But global agreement
1132 on a rule that will determine whether the two forms should match
1133 -- and that won't astonish end users and speakers of one language
1134 or the other -- is as unlikely as agreement on whether
1135 "misspelling" or "anglicization" is the greater travesty.
1137 More generally, it is not clear that the outcome of any conceivable
1138 nameprep-like process is going to be good enough for practical,
1139 user-level, use. In the use of human languages by humans, there are
1140 many cases in which things that do not match are nonetheless
1141 interpreted as matching. The Norwegian/Danish character that appears
1142 in U+00F8 (visually, a lower case 'o' overstruck with a forward
1143 slash) and the "o-umlaut" German character that appears in U+00F6
1144 (visually, a lower case 'o' with diaeresis (or umlaut)) are clearly
1145 different and no matching program should yield an "equal" comparison.
1146 But they are more similar to each other than either of them is to,
1147 e.g., "e". Humans are able to mentally make the correction in
1148 context, and do so easily, and they can be surprised if computers
1149 cannot do so. Worse, there is a Swedish character whose appearance
1150 is identical to the German o-umlaut, and which shares code point
1151 U+00F6, but that, if the languages are known and the sounds of the
1152 letters or meanings of words including the character are considered,
1153 actually should match the Norwegian/Danish use of U+00F8.
1155 This text uses examples in Roman scripts because it is being written
1156 in English and those examples are relatively easy to render. But one
1157 of the important lessons of the discussions about domain name
1158 internationalization in recent years is that problems similar to
1159 those described above exist in almost every language and script.
1160 Each one has its idiosyncrasies, and each set of idiosyncracies is
1161 tied to common usage and cultural issues that are very familiar in
1162 the relevant group, and often deeply held as cultural values. As
1163 long as a schoolchild in the US can get a bad grade on a spelling
1164 test for using a perfectly valid British spelling, or one in France
1165 or Germany can get a poor grade for leaving off a diacritical mark,
1166 there are issues with the relevant language. Similarly, if children
1167 in Egypt or Israel are taught that it is acceptable to write a word
1168 with or without vowels or stress marks, but that, if those marks are
1169 included, they must be the correct ones, or a user in Korea is
1170 potentially offended or astonished by out-of-order sequences of Jamo,
1171 systems based on character-at-a-time processing and simplistic
1172 matching, with no contextual information, are not going to satisfy
1178 Klensin Informational [Page 21]
1180 RFC 3467 Role of the Domain Name System (DNS) February 2003
1183 Users are demanding solutions that deal with language and culture.
1184 Systems of identifier symbol-strings that serve specialists or
1185 computers are, at best, a solution to a rather different (and, at the
1186 time this document was written, somewhat ill-defined), problem. The
1187 recent efforts have made it ever more clear that, if we ignore the
1188 distinction between the user requirements and narrowly-defined
1189 identifiers, we are solving an insufficient problem. And,
1190 conversely, the approaches that have been proposed to approximate
1191 solutions to the user requirement may be far more complex than simple
1192 identifiers require.
1194 4.6 Business Cards and Other Natural Uses of Natural Languages
1196 Over the last few centuries, local conventions have been established
1197 in various parts of the world for dealing with multilingual
1198 situations. It may be helpful to examine some of these. For
1199 example, if one visits a country where the language is different from
1200 ones own, business cards are often printed on two sides, one side in
1201 each language. The conventions are not completely consistent and the
1202 technique assumes that recipients will be tolerant. Translations of
1203 names or places are attempted in some situations and transliterations
1204 in others. Since it is widely understood that exact translations or
1205 transliterations are often not possible, people typically smile at
1206 errors, appreciate the effort, and move on.
1208 The DNS situation differs from these practices in at least two ways.
1209 Since a global solution is required, the business card would need a
1210 number of sides approximating the number of languages in the world,
1211 which is probably impossible without violating laws of physics. More
1212 important, the opportunities for tolerance don't exist: the DNS
1213 requires a exact match or the lookup fails.
1215 4.7 ASCII Encodings and the Roman Keyboard Assumption
1217 Part of the argument for ACE-based solutions is that they provide an
1218 escape for multilingual environments when applications have not been
1219 upgraded. When an older application encounters an ACE-based name,
1220 the assumption is that the (admittedly ugly) ASCII-coded string will
1221 be displayed and can be typed in. This argument is reasonable from
1222 the standpoint of mixtures of Roman-based alphabets, but may not be
1223 relevant if user-level systems and devices are involved that do not
1224 support the entry of Roman-based characters or which cannot
1225 conveniently render such characters. Such systems are few in the
1226 world today, but the number can reasonably be expected to rise as the
1227 Internet is increasingly used by populations whose primary concern is
1228 with local issues, local information, and local languages. It is,
1234 Klensin Informational [Page 22]
1236 RFC 3467 Role of the Domain Name System (DNS) February 2003
1239 for example, fairly easy to imagine populations who use Arabic or
1240 Thai scripts and who do not have routine access to scripts or input
1241 devices based on Roman-derived alphabets.
1243 4.8 Intra-DNS Approaches for "Multilingual Names"
1245 It appears, from the cases above and others, that none of the intra-
1246 DNS-based solutions for "multilingual names" are workable. They rest
1247 on too many assumptions that do not appear to be feasible -- that
1248 people will adapt deeply-entrenched language habits to conventions
1249 laid down to make the lives of computers easy; that we can make
1250 "freeze it now, no need for changes in these areas" decisions about
1251 Unicode and nameprep; that ACE will smooth over applications
1252 problems, even in environments without the ability to key or render
1253 Roman-based glyphs (or where user experience is such that such glyphs
1254 cannot easily be distinguished from each other); that the Unicode
1255 Consortium will never decide to repair an error in a way that creates
1256 a risk of DNS incompatibility; that we can either deploy EDNS
1257 [RFC2671] or that long names are not really important; that Japanese
1258 and Chinese computer users (and others) will either give up their
1259 local or IS 2022-based character coding solutions (for which addition
1260 of a large fraction of a million new code points to Unicode is almost
1261 certainly a necessary, but probably not sufficient, condition) or
1262 build leakproof and completely accurate boundary conversion
1263 mechanisms; that out of band or contextual information will always be
1264 sufficient for the "map glyph onto script" problem; and so on. In
1265 each case, it is likely that about 80% or 90% of cases will work
1266 satisfactorily, but it is unlikely that such partial solutions will
1267 be good enough. For example, suppose someone can spell her name 90%
1268 correctly, or a company name is matched correctly 80% of the time but
1269 the other 20% of attempts identify a competitor: are either likely to
1270 be considered adequate?
1272 5. Search-based Systems: The Key Controversies
1274 For many years, a common response to requirements to locate people or
1275 resources on the Internet has been to invoke the term "directory".
1276 While an in-depth analysis of the reasons would require a separate
1277 document, the history of failure of these invocations has given
1278 "directory" efforts a bad reputation. The effort proposed here is
1279 different from those predecessors for several reasons, perhaps the
1280 most important of which is that it focuses on a fairly-well-
1281 understood set of problems and needs, rather than on finding uses for
1282 a particular technology.
1284 As suggested in some of the text above, it is an open question as to
1285 whether the needs of the community would be best served by a single
1286 (even if functionally, and perhaps administratively, distributed)
1290 Klensin Informational [Page 23]
1292 RFC 3467 Role of the Domain Name System (DNS) February 2003
1295 directory with universal applicability, a single directory that
1296 supports locally-tailored search (and, most important, matching)
1297 functions, or multiple, locally-determined, directories. Each has
1298 its attractions. Any but the first would essentially prevent
1299 reverse-mapping (determination of the user-visible name of the host
1300 or resource from target information such as an address or DNS name).
1301 But reverse mapping has become less useful over the years --at least
1302 to users -- as more and more names have been associated with many
1303 host addresses and as CIDR [CIDR] has proven problematic for mapping
1304 smaller address blocks to meaningful names.
1306 Locally-tailored searches and mappings would permit national
1307 variations on interpretation of which strings matched which other
1308 ones, an arrangement that is especially important when different
1309 localities apply different rules to, e.g., matching of characters
1310 with and without diacriticals. But, of course, this implies that a
1311 URL may evaluate properly or not depending on either settings on a
1312 client machine or the network connectivity of the user. That is not,
1313 in general, a desirable situation, since it implies that users could
1314 not, in the general case, share URLs (or other host references) and
1315 that a particular user might not be able to carry references from one
1316 host or location to another.
1318 And, of course, completely separate directories would permit
1319 translation and transliteration functions to be embedded in the
1320 directory, giving much of the Internet a different appearance
1321 depending on which directory was chosen. The attractions of this are
1322 obvious, but, unless things were very carefully designed to preserve
1323 uniqueness and precise identities at the right points (which may or
1324 may not be possible), such a system would have many of the
1325 difficulties associated with multiple DNS roots.
1327 Finally, a system of separate directories and databases, if coupled
1328 with removal of the DNS-imposed requirement for unique names, would
1329 largely eliminate the need for a single worldwide authority to manage
1330 the top of the naming hierarchy.
1332 6. Security Considerations
1334 The set of proposals implied by this document suggests an interesting
1335 set of security issues (i.e., nothing important is ever easy). A
1336 directory system used for locating network resources would presumably
1337 need to be as carefully protected against unauthorized changes as the
1338 DNS itself. There also might be new opportunities for problems in an
1339 arrangement involving two or more (sub)layers, especially if such a
1340 system were designed without central authority or uniqueness of
1341 names. It is uncertain how much greater those risks would be as
1342 compared to a DNS lookup sequence that involved looking up one name,
1346 Klensin Informational [Page 24]
1348 RFC 3467 Role of the Domain Name System (DNS) February 2003
1351 getting back information, and then doing additional lookups
1352 potentially in different subtrees. That multistage lookup will often
1353 be the case with, e.g., NAPTR records [RFC 2915] unless additional
1354 restrictions are imposed. But additional steps, systems, and
1355 databases almost certainly involve some additional risks of
1360 7.1 Normative References
1364 7.2 Explanatory and Informative References
1366 [Albitz] Any of the editions of Albitz, P. and C. Liu, DNS and
1367 BIND, O'Reilly and Associates, 1992, 1997, 1998, 2001.
1369 [ASCII] American National Standards Institute (formerly United
1370 States of America Standards Institute), X3.4, 1968,
1371 "USA Code for Information Interchange". ANSI X3.4-1968
1372 has been replaced by newer versions with slight
1373 modifications, but the 1968 version remains definitive
1374 for the Internet. Some time after ASCII was first
1375 formulated as a standard, ISO adopted international
1376 standard 646, which uses ASCII as a base. IS 646
1377 actually contained two code tables: an "International
1378 Reference Version" (often referenced as ISO 646-IRV)
1379 which was essentially identical to the ASCII of the
1380 time, and a "Basic Version" (ISO 646-BV), which
1381 designates a number of character positions for
1384 [CIDR] Fuller, V., Li, T., Yu, J. and K. Varadhan, "Classless
1385 Inter-Domain Routing (CIDR): an Address Assignment and
1386 Aggregation Strategy", RFC 1519, September 1993.
1388 Eidnes, H., de Groot, G. and P. Vixie, "Classless IN-
1389 ADDR.ARPA delegation", RFC 2317, March 1998.
1391 [COM-SIZE] Size information supplied by Verisign Global Registry
1392 Services (the zone administrator, or "registry
1393 operator", for COM, see [REGISTRAR], below) to ICANN,
1396 [DNS-Search] Klensin, J., "A Search-based access model for the
1397 DNS", Work in Progress.
1402 Klensin Informational [Page 25]
1404 RFC 3467 Role of the Domain Name System (DNS) February 2003
1407 [FINGER] Zimmerman, D., "The Finger User Information Protocol",
1408 RFC 1288, December 1991.
1410 Harrenstien, K., "NAME/FINGER Protocol", RFC 742,
1413 [IAB-OPES] Floyd, S. and L. Daigle, "IAB Architectural and Policy
1414 Considerations for Open Pluggable Edge Services", RFC
1417 [IQUERY] Lawrence, D., "Obsoleting IQUERY", RFC 3425, November
1420 [IS646] ISO/IEC 646:1991 Information technology -- ISO 7-bit
1421 coded character set for information interchange
1423 [IS10646] ISO/IEC 10646-1:2000 Information technology --
1424 Universal Multiple-Octet Coded Character Set (UCS) --
1425 Part 1: Architecture and Basic Multilingual Plane and
1426 ISO/IEC 10646-2:2001 Information technology --
1427 Universal Multiple-Octet Coded Character Set (UCS) --
1428 Part 2: Supplementary Planes
1430 [MINC] The Multilingual Internet Names Consortium,
1431 http://www.minc.org/ has been an early advocate for
1432 the importance of expansion of DNS names to
1433 accommodate non-ASCII characters. Some of their
1434 specific proposals, while helping people to understand
1435 the problems better, were not compatible with the
1438 [NAPTR] Mealling, M. and R. Daniel, "The Naming Authority
1439 Pointer (NAPTR) DNS Resource Record", RFC 2915,
1442 Mealling, M., "Dynamic Delegation Discovery System
1443 (DDDS) Part One: The Comprehensive DDDS", RFC 3401,
1446 Mealling, M., "Dynamic Delegation Discovery System
1447 (DDDS) Part Two: The Algorithm", RFC 3402, October
1450 Mealling, M., "Dynamic Delegation Discovery System
1451 (DDDS) Part Three: The Domain Name System (DNS)
1452 Database", RFC 3403, October 2002.
1458 Klensin Informational [Page 26]
1460 RFC 3467 Role of the Domain Name System (DNS) February 2003
1463 [REGISTRAR] In an early stage of the process that created the
1464 Internet Corporation for Assigned Names and Numbers
1465 (ICANN), a "Green Paper" was released by the US
1466 Government. That paper introduced new terminology
1467 and some concepts not needed by traditional DNS
1468 operations. The term "registry" was applied to the
1469 actual operator and database holder of a domain
1470 (typically at the top level, since the Green Paper was
1471 little concerned with anything else), while
1472 organizations that marketed names and made them
1473 available to "registrants" were known as "registrars".
1474 In the classic DNS model, the function of "zone
1475 administrator" encompassed both registry and registrar
1476 roles, although that model did not anticipate a
1477 commercial market in names.
1479 [RFC625] Kudlick, M. and E. Feinler, "On-line hostnames
1480 service", RFC 625, March 1974.
1482 [RFC734] Crispin, M., "SUPDUP Protocol", RFC 734, October 1977.
1484 [RFC811] Harrenstien, K., White, V. and E. Feinler, "Hostnames
1485 Server", RFC 811, March 1982.
1487 [RFC819] Su, Z. and J. Postel, "Domain naming convention for
1488 Internet user applications", RFC 819, August 1982.
1490 [RFC830] Su, Z., "Distributed system for Internet name
1491 service", RFC 830, October 1982.
1493 [RFC882] Mockapetris, P., "Domain names: Concepts and
1494 facilities", RFC 882, November 1983.
1496 [RFC883] Mockapetris, P., "Domain names: Implementation
1497 specification", RFC 883, November 1983.
1499 [RFC952] Harrenstien, K, Stahl, M. and E. Feinler, "DoD
1500 Internet host table specification", RFC 952, October
1503 [RFC953] Harrenstien, K., Stahl, M. and E. Feinler, "HOSTNAME
1504 SERVER", RFC 953, October 1985.
1506 [RFC1034] Mockapetris, P., "Domain names, Concepts and
1507 facilities", STD 13, RFC 1034, November 1987.
1514 Klensin Informational [Page 27]
1516 RFC 3467 Role of the Domain Name System (DNS) February 2003
1519 [RFC1035] Mockapetris, P., "Domain names - implementation and
1520 specification", STD 13, RFC 1035, November 1987.
1522 [RFC1591] Postel, J., "Domain Name System Structure and
1523 Delegation", RFC 1591, March 1994.
1525 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
1526 Specification", RFC 2181, July 1997.
1528 [RFC2295] Holtman, K. and A. Mutz, "Transparent Content
1529 Negotiation in HTTP", RFC 2295, March 1998
1531 [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter,
1532 "Uniform Resource Identifiers (URI): Generic Syntax",
1533 RFC 2396, August 1998.
1535 [RFC2608] Guttman, E., Perkins, C., Veizades, J. and M. Day,
1536 "Service Location Protocol, Version 2", RFC 2608, June
1539 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", RFC
1542 [RFC2825] IAB, Daigle, L., Ed., "A Tangled Web: Issues of I18N,
1543 Domain Names, and the Other Internet protocols", RFC
1546 [RFC2826] IAB, "IAB Technical Comment on the Unique DNS Root",
1549 [RFC2972] Popp, N., Mealling, M., Masinter, L. and K. Sollins,
1550 "Context and Goals for Common Name Resolution", RFC
1553 [RFC3305] Mealling, M. and R. Denenberg, Eds., "Report from the
1554 Joint W3C/IETF URI Planning Interest Group: Uniform
1555 Resource Identifiers (URIs), URLs, and Uniform
1556 Resource Names (URNs): Clarifications and
1557 Recommendations", RFC 3305, August 2002.
1559 [RFC3439] Bush, R. and D. Meyer, "Some Internet Architectural
1560 Guidelines and Philosophy", RFC 3439, December 2002.
1562 [Seng] Seng, J., et al., Eds., "Internationalized Domain
1563 Names: Registration and Administration Guideline for
1564 Chinese, Japanese, and Korean", Work in Progress.
1570 Klensin Informational [Page 28]
1572 RFC 3467 Role of the Domain Name System (DNS) February 2003
1575 [STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of
1576 Internationalized Strings (stringprep)", RFC 3454,
1579 The particular profile used for placing
1580 internationalized strings in the DNS is called
1581 "nameprep", described in Hoffman, P. and M. Blanchet,
1582 "Nameprep: A Stringprep Profile for Internationalized
1583 Domain Names", Work in Progress.
1585 [TELNET] Postel, J. and J. Reynolds, "Telnet Protocol
1586 Specification", STD 8, RFC 854, May 1983.
1588 Postel, J. and J. Reynolds, "Telnet Option
1589 Specifications", STD 8, RFC 855, May 1983.
1591 [UNICODE] The Unicode Consortium, The Unicode Standard, Version
1592 3.0, Addison-Wesley: Reading, MA, 2000. Update to
1593 version 3.1, 2001. Update to version 3.2, 2002.
1595 [UTR15] Davis, M. and M. Duerst, "Unicode Standard Annex #15:
1596 Unicode Normalization Forms", Unicode Consortium,
1597 March 2002. An integral part of The Unicode Standard,
1598 Version 3.1.1. Available at
1599 (http://www.unicode.org/reports/tr15/tr15-21.html).
1601 [WHOIS] Harrenstien, K, Stahl, M. and E. Feinler,
1602 "NICNAME/WHOIS", RFC 954, October 1985.
1604 [WHOIS-UPDATE] Gargano, J. and K. Weiss, "Whois and Network
1605 Information Lookup Service, Whois++", RFC 1834, August
1608 Weider, C., Fullton, J. and S. Spero, "Architecture of
1609 the Whois++ Index Service", RFC 1913, February 1996.
1611 Williamson, S., Kosters, M., Blacka, D., Singh, J. and
1612 K. Zeilstra, "Referral Whois (RWhois) Protocol V1.5",
1613 RFC 2167, June 1997;
1615 Daigle, L. and P. Faltstrom, "The
1616 application/whoispp-query Content-Type", RFC 2957,
1619 Daigle, L. and P. Falstrom, "The application/whoispp-
1620 response Content-type", RFC 2958, October 2000.
1626 Klensin Informational [Page 29]
1628 RFC 3467 Role of the Domain Name System (DNS) February 2003
1631 [X29] International Telecommuncations Union, "Recommendation
1632 X.29: Procedures for the exchange of control
1633 information and user data between a Packet
1634 Assembly/Disassembly (PAD) facility and a packet mode
1635 DTE or another PAD", December 1997.
1639 Many people have contributed to versions of this document or the
1640 thinking that went into it. The author would particularly like to
1641 thank Harald Alvestrand, Rob Austein, Bob Braden, Vinton Cerf, Matt
1642 Crawford, Leslie Daigle, Patrik Faltstrom, Eric A. Hall, Ted Hardie,
1643 Paul Hoffman, Erik Nordmark, and Zita Wenzel for making specific
1644 suggestions and/or challenging the assumptions and presentation of
1645 earlier versions and suggesting ways to improve them.
1650 1770 Massachusetts Ave, #322
1653 EMail: klensin+srch@jck.com
1655 A mailing list has been initiated for discussion of the topics
1656 discussed in this document, and closely-related issues, at
1657 ietf-irnss@lists.elistx.com. See http://lists.elistx.com/archives/
1658 for subscription and archival information.
1682 Klensin Informational [Page 30]
1684 RFC 3467 Role of the Domain Name System (DNS) February 2003
1687 10. Full Copyright Statement
1689 Copyright (C) The Internet Society (2003). All Rights Reserved.
1691 This document and translations of it may be copied and furnished to
1692 others, and derivative works that comment on or otherwise explain it
1693 or assist in its implementation may be prepared, copied, published
1694 and distributed, in whole or in part, without restriction of any
1695 kind, provided that the above copyright notice and this paragraph are
1696 included on all such copies and derivative works. However, this
1697 document itself may not be modified in any way, such as by removing
1698 the copyright notice or references to the Internet Society or other
1699 Internet organizations, except as needed for the purpose of
1700 developing Internet standards in which case the procedures for
1701 copyrights defined in the Internet Standards process must be
1702 followed, or as required to translate it into languages other than
1705 The limited permissions granted above are perpetual and will not be
1706 revoked by the Internet Society or its successors or assigns.
1708 This document and the information contained herein is provided on an
1709 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
1710 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
1711 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
1712 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
1713 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
1717 Funding for the RFC Editor function is currently provided by the
1738 Klensin Informational [Page 31]