7 Network Working Group R. Daniel
8 Request for Comments: 2168 Los Alamos National Laboratory
9 Category: Experimental M. Mealling
10 Network Solutions, Inc.
14 Resolution of Uniform Resource Identifiers
15 using the Domain Name System
20 This memo defines an Experimental Protocol for the Internet
21 community. This memo does not specify an Internet standard of any
22 kind. Discussion and suggestions for improvement are requested.
23 Distribution of this memo is unlimited.
28 Uniform Resource Locators (URLs) are the foundation of the World Wide
29 Web, and are a vital Internet technology. However, they have proven
30 to be brittle in practice. The basic problem is that URLs typically
31 identify a particular path to a file on a particular host. There is
32 no graceful way of changing the path or host once the URL has been
33 assigned. Neither is there a graceful way of replicating the resource
34 located by the URL to achieve better network utilization and/or fault
35 tolerance. Uniform Resource Names (URNs) have been hypothesized as a
36 adjunct to URLs that would overcome such problems. URNs and URLs are
37 both instances of a broader class of identifiers known as Uniform
38 Resource Identifiers (URIs).
40 The requirements document for URN resolution systems[15] defines the
41 concept of a "resolver discovery service". This document describes
42 the first, experimental, RDS. It is implemented by a new DNS Resource
43 Record, NAPTR (Naming Authority PoinTeR), that provides rules for
44 mapping parts of URIs to domain names. By changing the mapping
45 rules, we can change the host that is contacted to resolve a URI.
46 This will allow a more graceful handling of URLs over long time
47 periods, and forms the foundation for a new proposal for Uniform
58 Daniel & Mealling Experimental [Page 1]
60 RFC 2168 Resolution of URIs Using the DNS June 1997
63 In addition to locating resolvers, the NAPTR provides for other
64 naming systems to be grandfathered into the URN world, provides
65 independence between the name assignment system and the resolution
66 protocol system, and allows multiple services (Name to Location, Name
67 to Description, Name to Resource, ...) to be offered. In conjunction
68 with the SRV RR, the NAPTR record allows those services to be
69 replicated for the purposes of fault tolerance and load balancing.
74 Uniform Resource Locators have been a significant advance in
75 retrieving Internet-accessible resources. However, their brittle
76 nature over time has been recognized for several years. The Uniform
77 Resource Identifier working group proposed the development of Uniform
78 Resource Names to serve as persistent, location-independent
79 identifiers for Internet resources in order to overcome most of the
80 problems with URLs. RFC-1737 [1] sets forth requirements on URNs.
82 During the lifetime of the URI-WG, a number of URN proposals were
83 generated. The developers of several of those proposals met in a
84 series of meetings, resulting in a compromise known as the Knoxville
85 framework. The major principle behind the Knoxville framework is
86 that the resolution system must be separate from the way names are
87 assigned. This is in marked contrast to most URLs, which identify the
88 host to contact and the protocol to use. Readers are referred to [2]
89 for background on the Knoxville framework and for additional
90 information on the context and purpose of this proposal.
92 Separating the way names are resolved from the way they are
93 constructed provides several benefits. It allows multiple naming
94 approaches and resolution approaches to compete, as it allows
95 different protocols and resolvers to be used. There is just one
96 problem with such a separation - how do we resolve a name when it
97 can't give us directions to its resolver?
99 For the short term, DNS is the obvious candidate for the resolution
100 framework, since it is widely deployed and understood. However, it is
101 not appropriate to use DNS to maintain information on a per-resource
102 basis. First of all, DNS was never intended to handle that many
103 records. Second, the limited record size is inappropriate for catalog
104 information. Third, domain names are not appropriate as URNs.
106 Therefore our approach is to use DNS to locate "resolvers" that can
107 provide information on individual resources, potentially including
108 the resource itself. To accomplish this, we "rewrite" the URI into a
109 domain name following the rules provided in NAPTR records. Rewrite
110 rules provide considerable power, which is important when trying to
114 Daniel & Mealling Experimental [Page 2]
116 RFC 2168 Resolution of URIs Using the DNS June 1997
119 meet the goals listed above. However, collections of rules can become
120 difficult to understand. To lessen this problem, the NAPTR rules are
121 *always* applied to the original URI, *never* to the output of
124 Locating a resolver through the rewrite procedure may take multiple
125 steps, but the beginning is always the same. The start of the URI is
126 scanned to extract its colon-delimited prefix. (For URNs, the prefix
127 is always "urn:" and we extract the following colon-delimited
128 namespace identifier [3]). NAPTR resolution begins by taking the
129 extracted string, appending the well-known suffix ".urn.net", and
130 querying the DNS for NAPTR records at that domain name. Based on the
131 results of this query, zero or more additional DNS queries may be
132 needed to locate resolvers for the URI. The details of the
133 conversation between the client and the resolver thus located are
134 outside the bounds of this draft. Three brief examples of this
135 procedure are given in the next section.
137 The NAPTR RR provides the level of indirection needed to keep the
138 naming system independent of the resolution system, its protocols,
139 and services. Coupled with the new SRV resource record proposal[4]
140 there is also the potential for replicating the resolver on multiple
141 hosts, overcoming some of the most significant problems of URLs. This
142 is an important and subtle point. Not only do the NAPTR and SRV
143 records allow us to replicate the resource, we can replicate the
144 resolvers that know about the replicated resource. Preventing a
145 single point of failure at the resolver level is a significant
146 benefit. Separating the resolution procedure from the way names are
147 constructed has additional benefits. Different resolution procedures
148 can be used over time, and resolution procedures that are determined
149 to be useful can be extended to deal with additional namespaces.
154 The NAPTR proposal is the first resolution procedure to be considered
155 by the URN-WG. There are several concerns about the proposal which
156 have motivated the group to recommend it for publication as an
157 Experimental rather than a standards-track RFC.
159 First, URN resolution is new to the IETF and we wish to gain
160 operational experience before recommending any procedure for the
161 standards track. Second, the NAPTR proposal is based on DNS and
162 consequently inherits concerns about security and administration. The
163 recent advancement of the DNSSEC and secure update drafts to Proposed
164 Standard reduce these concerns, but we wish to experiment with those
165 new capabilities in the context of URN administration. A third area
166 of concern is the potential for a noticeable impact on the DNS. We
170 Daniel & Mealling Experimental [Page 3]
172 RFC 2168 Resolution of URIs Using the DNS June 1997
175 believe that the proposal makes appropriate use of caching and
176 additional information, but it is best to go slow where the potential
177 for impact on a core system like the DNS is concerned. Fourth, the
178 rewrite rules in the NAPTR proposal are based on regular expressions.
179 Since regular expressions are difficult for humans to construct
180 correctly, concerns exist about the usability and maintainability of
181 the rules. This is especially true where international character sets
182 are concerned. Finally, the URN-WG is developing a requirements
183 document for URN Resolution Services[15], but that document is not
184 complete. That document needs to precede any resolution service
185 proposals on the standards track.
190 "Must" or "Shall" - Software that does not behave in the manner that
191 this document says it must is not conformant to this
193 "Should" - Software that does not follow the behavior that this
194 document says it should may still be conformant, but is
195 probably broken in some fundamental way.
196 "May" - Implementations may or may not provide the described
197 behavior, while still remaining conformant to this
200 Brief overview and examples of the NAPTR RR:
201 ============================================
203 A detailed description of the NAPTR RR will be given later, but to
204 give a flavor for the proposal we first give a simple description of
205 the record and three examples of its use.
207 The key fields in the NAPTR RR are order, preference, service, flags,
208 regexp, and replacement:
210 * The order field specifies the order in which records MUST be
211 processed when multiple NAPTR records are returned in response to a
212 single query. A naming authority may have delegated a portion of
213 its namespace to another agency. Evaluating the NAPTR records in
214 the correct order is necessary for delegation to work properly.
216 * The preference field specifies the order in which records SHOULD be
217 processed when multiple NAPTR records have the same value of
218 "order". This field lets a service provider specify the order in
219 which resolvers are contacted, so that more capable machines are
220 contacted in preference to less capable ones.
226 Daniel & Mealling Experimental [Page 4]
228 RFC 2168 Resolution of URIs Using the DNS June 1997
231 * The service field specifies the resolution protocol and resolution
232 service(s) that will be available if the rewrite specified by the
233 regexp or replacement fields is applied. Resolution protocols are
234 the protocols used to talk with a resolver. They will be specified
235 in other documents, such as [5]. Resolution services are operations
236 such as N2R (URN to Resource), N2L (URN to URL), N2C (URN to URC),
237 etc. These will be discussed in the URN Resolution Services
238 document[6], and their behavior in a particular resolution protocol
239 will be given in the specification for that protocol (see [5] for a
242 * The flags field contains modifiers that affect what happens in the
243 next DNS lookup, typically for optimizing the process. Flags may
244 also affect the interpretation of the other fields in the record,
245 therefore, clients MUST skip NAPTR records which contain an unknown
248 * The regexp field is one of two fields used for the rewrite rules,
249 and is the core concept of the NAPTR record. The regexp field is a
250 String containing a sed-like substitution expression. (The actual
251 grammar for the substitution expressions is given later in this
252 draft). The substitution expression is applied to the original URN
253 to determine the next domain name to be queried. The regexp field
254 should be used when the domain name to be generated is conditional
255 on information in the URI. If the next domain name is always known,
256 which is anticipated to be a common occurrence, the replacement
257 field should be used instead.
259 * The replacement field is the other field that may be used for the
260 rewrite rule. It is an optimization of the rewrite process for the
261 case where the next domain name is fixed instead of being
262 conditional on the content of the URI. The replacement field is a
263 domain name (subject to compression if a DNS sender knows that a
264 given recipient is able to decompress names in this RR type's RDATA
265 field). If the rewrite is more complex than a simple substitution
266 of a domain name, the replacement field should be set to . and the
282 Daniel & Mealling Experimental [Page 5]
284 RFC 2168 Resolution of URIs Using the DNS June 1997
287 Note that the client applies all the substitutions and performs all
288 lookups, they are not performed in the DNS servers. Note also that it
289 is the belief of the developers of this document that regexps should
290 rarely be used. The replacement field seems adequate for the vast
291 majority of situations. Regexps are only necessary when portions of a
292 namespace are to be delegated to different resolvers. Finally, note
293 that the regexp and replacement fields are, at present, mutually
294 exclusive. However, developers of client software should be aware
295 that a new flag might be defined which requires values in both
301 Consider a URN that uses the hypothetical DUNS namespace. DUNS
302 numbers are identifiers for approximately 30 million registered
303 businesses around the world, assigned and maintained by Dunn and
304 Bradstreet. The URN might look like:
306 urn:duns:002372413:annual-report-1997
308 The first step in the resolution process is to find out about the
309 DUNS namespace. The namespace identifier, "duns", is extracted from
310 the URN, prepended to urn.net, and the NAPTRs for duns.urn.net looked
311 up. It might return records of the form:
314 ;; order pref flags service regexp replacement
315 IN NAPTR 100 10 "s" "dunslink+N2L+N2C" "" dunslink.udp.isi.dandb.com
316 IN NAPTR 100 20 "s" "rcds+N2C" "" rcds.udp.isi.dandb.com
317 IN NAPTR 100 30 "s" "http+N2L+N2C+N2R" "" http.tcp.isi.dandb.com
319 The order field contains equal values, indicating that no name
320 delegation order has to be followed. The preference field indicates
321 that the provider would like clients to use the special dunslink
322 protocol, followed by the RCDS protocol, and that HTTP is offered as
323 a last resort. All the records specify the "s" flag, which will be
324 explained momentarily. The service fields say that if we speak
325 dunslink, we will be able to issue either the N2L or N2C requests to
326 obtain a URL or a URC (description) of the resource. The Resource
327 Cataloging and Distribution Service (RCDS)[7] could be used to get a
328 URC for the resource, while HTTP could be used to get a URL, URC, or
329 the resource itself. All the records supply the next domain name to
330 query, none of them need to be rewritten with the aid of regular
338 Daniel & Mealling Experimental [Page 6]
340 RFC 2168 Resolution of URIs Using the DNS June 1997
343 The general case might require multiple NAPTR rewrites to locate a
344 resolver, but eventually we will come to the "terminal NAPTR". Once
345 we have the terminal NAPTR, our next probe into the DNS will be for a
346 SRV or A record instead of another NAPTR. Rather than probing for a
347 non-existent NAPTR record to terminate the loop, the flags field is
348 used to indicate a terminal lookup. If it has a value of "s", the
349 next lookup should be for SRV RRs, "a" denotes that A records should
350 sought. A "p" flag is also provided to indicate that the next action
351 is Protocol-specific, but that looking up another NAPTR will not be
354 Since our example RR specified the "s" flag, it was terminal.
355 Assuming our client does not know the dunslink protocol, our next
356 action is to lookup SRV RRs for rcds.udp.isi.dandb.com, which will
357 tell us hosts that can provide the necessary resolution service. That
360 ;; Pref Weight Port Target
361 rcds.udp.isi.dandb.com IN SRV 0 0 1000 defduns.isi.dandb.com
362 IN SRV 0 0 1000 dbmirror.com.au
363 IN SRV 0 0 1000 ukmirror.com.uk
365 telling us three hosts that could actually do the resolution, and
366 giving us the port we should use to talk to their RCDS server. (The
367 reader is referred to the SRV proposal [4] for the interpretation of
370 There is opportunity for significant optimization here. We can return
371 the SRV records as additional information for terminal NAPTRs (and
372 the A records as additional information for those SRVs). While this
373 recursive provision of additional information is not explicitly
374 blessed in the DNS specifications, it is not forbidden, and BIND does
375 take advantage of it [8]. This is a significant optimization. In
376 conjunction with a long TTL for *.urn.net records, the average number
377 of probes to DNS for resolving DUNS URNs would approach one.
378 Therefore, DNS server implementors SHOULD provide additional
379 information with NAPTR responses. The additional information will be
380 either SRV or A records. If SRV records are available, their A
381 records should be provided as recursive additional information.
383 Note that the example NAPTR records above are intended to represent
384 the reply the client will see. They are not quite identical to what
385 the domain administrator would put into the zone files. For one
386 thing, the administrator should supply the trailing '.' character on
394 Daniel & Mealling Experimental [Page 7]
396 RFC 2168 Resolution of URIs Using the DNS June 1997
402 Consider a URN namespace based on MIME Content-Ids. The URN might
405 urn:cid:199606121851.1@mordred.gatech.edu
407 (Note that this example is chosen for pedagogical purposes, and does
408 not conform to the recently-approved CID URL scheme.)
410 The first step in the resolution process is to find out about the CID
411 namespace. The namespace identifier, cid, is extracted from the URN,
412 prepended to urn.net, and the NAPTR for cid.urn.net looked up. It
413 might return records of the form:
416 ;; order pref flags service regexp replacement
417 IN NAPTR 100 10 "" "" "/urn:cid:.+@([^\.]+\.)(.*)$/\2/i" .
419 We have only one NAPTR response, so ordering the responses is not a
420 problem. The replacement field is empty, so we check the regexp
421 field and use the pattern provided there. We apply that regexp to the
422 entire URN to see if it matches, which it does. The \2 part of the
423 substitution expression returns the string "gatech.edu". Since the
424 flags field does not contain "s" or "a", the lookup is not terminal
425 and our next probe to DNS is for more NAPTR records:
426 lookup(query=NAPTR, "gatech.edu").
428 Note that the rule does not extract the full domain name from the
429 CID, instead it assumes the CID comes from a host and extracts its
430 domain. While all hosts, such as mordred, could have their very own
431 NAPTR, maintaining those records for all the machines at a site as
432 large as Georgia Tech would be an intolerable burden. Wildcards are
433 not appropriate here since they only return results when there is no
434 exactly matching names already in the system.
436 The record returned from the query on "gatech.edu" might look like:
439 ;; order pref flags service regexp replacement
440 IN NAPTR 100 50 "s" "z3950+N2L+N2C" "" z3950.tcp.gatech.edu
441 IN NAPTR 100 50 "s" "rcds+N2C" "" rcds.udp.gatech.edu
442 IN NAPTR 100 50 "s" "http+N2L+N2C+N2R" "" http.tcp.gatech.edu
450 Daniel & Mealling Experimental [Page 8]
452 RFC 2168 Resolution of URIs Using the DNS June 1997
455 Continuing with our example, we note that the values of the order and
456 preference fields are equal in all records, so the client is free to
457 pick any record. The flags field tells us that these are the last
458 NAPTR patterns we should see, and after the rewrite (a simple
459 replacement in this case) we should look up SRV records to get
460 information on the hosts that can provide the necessary service.
462 Assuming we prefer the Z39.50 protocol, our lookup might return:
464 ;; Pref Weight Port Target
465 z3950.tcp.gatech.edu IN SRV 0 0 1000 z3950.gatech.edu
466 IN SRV 0 0 1000 z3950.cc.gatech.edu
467 IN SRV 0 0 1000 z3950.uga.edu
469 telling us three hosts that could actually do the resolution, and
470 giving us the port we should use to talk to their Z39.50 server.
472 Recall that the regular expression used \2 to extract a domain name
473 from the CID, and \. for matching the literal '.' characters
474 seperating the domain name components. Since '\' is the escape
475 character, literal occurances of a backslash must be escaped by
476 another backslash. For the case of the cid.urn.net record above, the
477 regular expression entered into the zone file should be
478 "/urn:cid:.+@([^\\.]+\\.)(.*)$/\\2/i". When the client code actually
479 receives the record, the pattern will have been converted to
480 "/urn:cid:.+@([^.]+\.)(.*)$/\2/i".
485 Even if URN systems were in place now, there would still be a
486 tremendous number of URLs. It should be possible to develop a URN
487 resolution system that can also provide location independence for
488 those URLs. This is related to the requirement in [1] to be able to
489 grandfather in names from other naming systems, such as ISO Formal
490 Public Identifiers, Library of Congress Call Numbers, ISBNs, ISSNs,
493 The NAPTR RR could also be used for URLs that have already been
494 assigned. Assume we have the URL for a very popular piece of
495 software that the publisher wishes to mirror at multiple sites around
498 http://www.foo.com/software/latest-beta.exe
506 Daniel & Mealling Experimental [Page 9]
508 RFC 2168 Resolution of URIs Using the DNS June 1997
511 We extract the prefix, "http", and lookup NAPTR records for
512 http.urn.net. This might return a record of the form
514 http.urn.net IN NAPTR
515 ;; order pref flags service regexp replacement
516 100 90 "" "" "!http://([^/:]+)!\1!i" .
518 This expression returns everything after the first double slash and
519 before the next slash or colon. (We use the '!' character to delimit
520 the parts of the substitution expression. Otherwise we would have to
521 use backslashes to escape the forward slashes, and would have a
522 regexp in the zone file that looked like
523 "/http:\\/\\/([^\\/:]+)/\\1/i".).
525 Applying this pattern to the URL extracts "www.foo.com". Looking up
526 NAPTR records for that might return:
529 ;; order pref flags service regexp replacement
530 IN NAPTR 100 100 "s" "http+L2R" "" http.tcp.foo.com
531 IN NAPTR 100 100 "s" "ftp+L2R" "" ftp.tcp.foo.com
533 Looking up SRV records for http.tcp.foo.com would return information
534 on the hosts that foo.com has designated to be its mirror sites. The
535 client can then pick one for the user.
540 The format of the NAPTR RR is given below. The DNS type code for
543 Domain TTL Class Order Preference Flags Service Regexp
549 The domain name this resource record refers to.
551 Standard DNS Time To Live field
562 Daniel & Mealling Experimental [Page 10]
564 RFC 2168 Resolution of URIs Using the DNS June 1997
568 A 16-bit integer specifying the order in which the NAPTR
569 records MUST be processed to ensure correct delegation of
570 portions of the namespace over time. Low numbers are processed
571 before high numbers, and once a NAPTR is found that "matches"
572 a URN, the client MUST NOT consider any NAPTRs with a higher
576 A 16-bit integer which specifies the order in which NAPTR
577 records with equal "order" values SHOULD be processed, low
578 numbers being processed before high numbers. This is similar
579 to the preference field in an MX record, and is used so domain
580 administrators can direct clients towards more capable hosts
581 or lighter weight protocols.
584 A String giving flags to control aspects of the rewriting and
585 interpretation of the fields in the record. Flags are single
586 characters from the set [A-Z0-9]. The case of the alphabetic
587 characters is not significant.
589 At this time only three flags, "S", "A", and "P", are defined.
590 "S" means that the next lookup should be for SRV records
591 instead of NAPTR records. "A" means that the next lookup
592 should be for A records. The "P" flag says that the remainder
593 of the resolution shall be carried out in a Protocol-specific
594 fashion, and we should not do any more DNS queries.
596 The remaining alphabetic flags are reserved. The numeric flags
597 may be used for local experimentation. The S, A, and P flags
598 are all mutually exclusive, and resolution libraries MAY
599 signal an error if more than one is given. (Experimental code
600 and code for assisting in the creation of NAPTRs would be more
601 likely to signal such an error than a client such as a
602 browser). We anticipate that multiple flags will be allowed in
603 the future, so implementers MUST NOT assume that the flags
604 field can only contain 0 or 1 characters. Finally, if a client
605 encounters a record with an unknown flag, it MUST ignore it
606 and move to the next record. This test takes precedence even
607 over the "order" field. Since flags can control the
608 interpretation placed on fields, a novel flag might change the
609 interpretation of the regexp and/or replacement fields such
610 that it is impossible to determine if a record matched a URN.
618 Daniel & Mealling Experimental [Page 11]
620 RFC 2168 Resolution of URIs Using the DNS June 1997
624 Specifies the resolution service(s) available down this
625 rewrite path. It may also specify the particular protocol that
626 is used to talk with a resolver. A protocol MUST be specified
627 if the flags field states that the NAPTR is terminal. If a
628 protocol is specified, but the flags field does not state that
629 the NAPTR is terminal, the next lookup MUST be for a NAPTR.
630 The client MAY choose not to perform the next lookup if the
631 protocol is unknown, but that behavior MUST NOT be relied
634 The service field may take any of the values below (using the
635 Augmented BNF of RFC 822[9]):
637 service_field = [ [protocol] *("+" rs)]
638 protocol = ALPHA *31ALPHANUM
639 rs = ALPHA *31ALPHANUM
640 // The protocol and rs fields are limited to 32
641 // characters and must start with an alphabetic.
642 // The current set of "known" strings are:
643 // protocol = "rcds" / "thttp" / "hdl" / "rwhois" / "z3950"
644 // rs = "N2L" / "N2Ls" / "N2R" / "N2Rs" / "N2C"
645 // / "N2Ns" / "L2R" / "L2Ns" / "L2Ls" / "L2C"
647 i.e. an optional protocol specification followed by 0 or more
648 resolution services. Each resolution service is indicated by
649 an initial '+' character.
651 Note that the empty string is also a valid service field. This
652 will typically be seen at the top levels of a namespace, when
653 it is impossible to know what services and protocols will be
654 offered by a particular publisher within that name space.
656 At this time the known protocols are rcds[7], hdl[10] (binary,
657 UDP-based protocols), thttp[5] (a textual, TCP-based
658 protocol), rwhois[11] (textual, UDP or TCP based), and
659 Z39.50[12] (binary, TCP-based). More will be allowed later.
660 The names of the protocols must be formed from the characters
661 [a-Z0-9]. Case of the characters is not significant.
663 The service requests currently allowed will be described in
664 more detail in [6], but in brief they are:
665 N2L - Given a URN, return a URL
666 N2Ls - Given a URN, return a set of URLs
667 N2R - Given a URN, return an instance of the resource.
668 N2Rs - Given a URN, return multiple instances of the
669 resource, typically encoded using
670 multipart/alternative.
674 Daniel & Mealling Experimental [Page 12]
676 RFC 2168 Resolution of URIs Using the DNS June 1997
679 N2C - Given a URN, return a collection of meta-
680 information on the named resource. The format of
681 this response is the subject of another document.
682 N2Ns - Given a URN, return all URNs that are also
683 identifers for the resource.
684 L2R - Given a URL, return the resource.
685 L2Ns - Given a URL, return all the URNs that are
686 identifiers for the resource.
687 L2Ls - Given a URL, return all the URLs for instances of
688 of the same resource.
689 L2C - Given a URL, return a description of the
692 The actual format of the service request and response will be
693 determined by the resolution protocol, and is the subject for
694 other documents (e.g. [5]). Protocols need not offer all
695 services. The labels for service requests shall be formed from
696 the set of characters [A-Z0-9]. The case of the alphabetic
697 characters is not significant.
700 A STRING containing a substitution expression that is applied
701 to the original URI in order to construct the next domain name
702 to lookup. The grammar of the substitution expression is given
706 The next NAME to query for NAPTR, SRV, or A records depending
707 on the value of the flags field. As mentioned above, this may
710 Substitution Expression Grammar:
711 ================================
713 The content of the regexp field is a substitution expression. True
714 sed(1) substitution expressions are not appropriate for use in this
715 application for a variety of reasons, therefore the contents of the
716 regexp field MUST follow the grammar below:
718 subst_expr = delim-char ere delim-char repl delim-char *flags
719 delim-char = "/" / "!" / ... (Any non-digit or non-flag character other
720 than backslash '\'. All occurances of a delim_char in a
721 subst_expr must be the same character.)
722 ere = POSIX Extended Regular Expression (see [13], section
724 repl = dns_str / backref / repl dns_str / repl backref
726 backref = "\" 1POS_DIGIT
730 Daniel & Mealling Experimental [Page 13]
732 RFC 2168 Resolution of URIs Using the DNS June 1997
736 DNS_CHAR = "-" / "0" / ... / "9" / "a" / ... / "z" / "A" / ... / "Z"
737 POS_DIGIT = "1" / "2" / ... / "9" ; 0 is not an allowed backref
738 value domain name (see RFC-1123 [14]).
740 The result of applying the substitution expression to the original
741 URI MUST result in a string that obeys the syntax for DNS host names
742 [14]. Since it is possible for the regexp field to be improperly
743 specified, such that a non-conforming host name can be constructed,
744 client software SHOULD verify that the result is a legal host name
745 before making queries on it.
747 Backref expressions in the repl portion of the substitution
748 expression are replaced by the (possibly empty) string of characters
749 enclosed by '(' and ')' in the ERE portion of the substitution
750 expression. N is a single digit from 1 through 9, inclusive. It
751 specifies the N'th backref expression, the one that begins with the
752 N'th '(' and continues to the matching ')'. For example, the ERE
754 has backref expressions:
759 \5..\9 = error - no matching subexpression
761 The "i" flag indicates that the ERE matching SHALL be performed in a
762 case-insensitive fashion. Furthermore, any backref replacements MAY
763 be normalized to lower case when the "i" flag is given.
765 The first character in the substitution expression shall be used as
766 the character that delimits the components of the substitution
767 expression. There must be exactly three non-escaped occurrences of
768 the delimiter character in a substitution expression. Since escaped
769 occurrences of the delimiter character will be interpreted as
770 occurrences of that character, digits MUST NOT be used as delimiters.
771 Backrefs would be confused with literal digits were this allowed.
772 Similarly, if flags are specified in the substitution expression, the
773 delimiter character must not also be a flag character.
786 Daniel & Mealling Experimental [Page 14]
788 RFC 2168 Resolution of URIs Using the DNS June 1997
791 Advice to domain administrators:
792 ================================
794 Beware of regular expressions. Not only are they a pain to get
795 correct on their own, but there is the previously mentioned
796 interaction with DNS. Any backslashes in a regexp must be entered
797 twice in a zone file in order to appear once in a query response.
798 More seriously, the need for double backslashes has probably not been
799 tested by all implementors of DNS servers. We anticipate that urn.net
800 will be the heaviest user of regexps. Only when delegating portions
801 of namespaces should the typical domain administrator need to use
804 On a related note, beware of interactions with the shell when
805 manipulating regexps from the command line. Since '\' is a common
806 escape character in shells, there is a good chance that when you
807 think you are saying "\\" you are actually saying "\". Similar
808 caveats apply to characters such as
810 The "a" flag allows the next lookup to be for A records rather than
811 SRV records. Since there is no place for a port specification in the
812 NAPTR record, when the "A" flag is used the specified protocol must
813 be running on its default port.
815 The URN Sytnax draft defines a canonical form for each URN, which
816 requires %encoding characters outside a limited repertoire. The
817 regular expressions MUST be written to operate on that canonical
818 form. Since international character sets will end up with extensive
819 use of %encoded characters, regular expressions operating on them
820 will be essentially impossible to read or write by hand.
825 For the edification of implementers, pseudocode for a client routine
826 using NAPTRs is given below. This code is provided merely as a
827 convience, it does not have any weight as a standard way to process
828 NAPTR records. Also, as is the case with pseudocode, it has never
829 been executed and may contain logical errors. You have been warned.
833 // Given a URN, find a host that can resolve it.
835 findResolver(string URN) {
836 // prepend prefix to urn.net
837 sprintf(key, "%s.urn.net", extractNS(URN));
842 Daniel & Mealling Experimental [Page 15]
844 RFC 2168 Resolution of URIs Using the DNS June 1997
847 rewrite_flag = false;
849 if (key has been seen) {
850 quit with a loop detected error
852 add key to list of "seens"
853 records = lookup(type=NAPTR, key); // get all NAPTR RRs for 'key'
855 discard any records with an unknown value in the "flags" field.
856 sort NAPTR records by "order" field and "preference" field
857 (with "order" being more significant than "preference").
858 n_naptrs = number of NAPTR records in response.
859 curr_order = records[0].order;
860 max_order = records[n_naptrs-1].order;
862 // Process current batch of NAPTRs according to "order" field.
863 for (j=0; j < n_naptrs && records[j].order <= max_order; j++) {
864 if (unknown_flag) // skip this record and go to next one
866 newkey = rewrite(URN, naptr[j].replacement, naptr[j].regexp);
867 if (!newkey) // Skip to next record if the rewrite didn't
869 // We did do a rewrite, shrink max_order to current value
870 // so that delegation works properly
871 max_order = naptr[j].order;
872 // Will we know what to do with the protocol and services
873 // specified in the NAPTR? If not, try next record.
874 if(!isKnownProto(naptr[j].services)) {
877 if(!isKnownService(naptr[j].services)) {
881 // At this point we have a successful rewrite and we will
882 // know how to speak the protocol and request a known
883 // resolution service. Before we do the next lookup, check
884 // some optimization possibilities.
886 if (strcasecmp(flags, "S")
887 || strcasecmp(flags, "P"))
888 || strcasecmp(flags, "A")) {
890 services = naptr[j].services;
891 addnl = any SRV and/or A records returned as additional
898 Daniel & Mealling Experimental [Page 16]
900 RFC 2168 Resolution of URIs Using the DNS June 1997
906 } while (rewriteflag && !terminal);
908 // Did we not find our way to a resolver?
915 // Leave rest to another protocol?
916 if (strcasecmp(flags, "P")) {
917 return key as host to talk to;
920 // If not, keep plugging
921 if (!addnl) { // No SRVs came in as additional info, look them up
922 srvs = lookup(type=SRV, key);
925 sort SRV records by preference, weight, ...
926 foreach (SRV record) { // in order of preference
927 try contacting srv[j].target using the protocol and one of the
928 resolution service requests from the "services" field of the
931 return (target, protocol, service);
932 // Actually we would probably return a result, but this
933 // code was supposed to just tell us a good host to talk to.
935 die with an "unable to find a host" error;
941 - A client MUST process multiple NAPTR records in the order
942 specified by the "order" field, it MUST NOT simply use the first
943 record that provides a known protocol and service combination.
954 Daniel & Mealling Experimental [Page 17]
956 RFC 2168 Resolution of URIs Using the DNS June 1997
959 - If a record at a particular order matches the URI, but the
960 client doesn't know the specified protocol and service, the
961 client SHOULD continue to examine records that have the same
962 order. The client MUST NOT consider records with a higher value
963 of order. This is necessary to make delegation of portions of
964 the namespace work. The order field is what lets site
965 administrators say "all requests for URIs matching pattern x go
966 to server 1, all others go to server 2".
967 (A match is defined as:
968 1) The NAPTR provides a replacement domain name
970 2) The regular expression matches the URN
973 - When multiple RRs have the same "order", the client should use
974 the value of the preference field to select the next NAPTR to
975 consider. However, because of preferred protocols or services,
976 estimates of network distance and bandwidth, etc. clients may
977 use different criteria to sort the records.
978 - If the lookup after a rewrite fails, clients are strongly
979 encouraged to report a failure, rather than backing up to pursue
981 - When a namespace is to be delegated among a set of resolvers,
982 regexps must be used. Each regexp appears in a separate NAPTR
983 RR. Administrators should do as little delegation as possible,
984 because of limitations on the size of DNS responses.
985 - Note that SRV RRs impose additional requirements on clients.
990 The editors would like to thank Keith Moore for all his consultations
991 during the development of this draft. We would also like to thank
992 Paul Vixie for his assistance in debugging our implementation, and
993 his answers on our questions. Finally, we would like to acknowledge
994 our enormous intellectual debt to the participants in the Knoxville
995 series of meetings, as well as to the participants in the URI and URN
1001 [1] Sollins, Karen and Larry Masinter, "Functional Requirements
1002 for Uniform Resource Names", RFC-1737, Dec. 1994.
1004 [2] The URN Implementors, Uniform Resource Names: A Progress Report,
1005 http://www.dlib.org/dlib/february96/02arms.html, D-Lib Magazine,
1010 Daniel & Mealling Experimental [Page 18]
1012 RFC 2168 Resolution of URIs Using the DNS June 1997
1015 [3] Moats, Ryan, "URN Syntax", RFC-2141, May 1997.
1017 [4] Gulbrandsen, A. and P. Vixie, "A DNS RR for specifying
1018 the location of services (DNS SRV)", RFC-2052, October 1996.
1020 [5] Daniel, Jr., Ron, "A Trivial Convention for using HTTP in URN
1021 Resolution", RFC-2169, June 1997.
1023 [6] URN-WG, "URN Resolution Services", Work in Progress.
1025 [7] Moore, Keith, Shirley Browne, Jason Cox, and Jonathan Gettler,
1026 Resource Cataloging and Distribution System, Technical Report
1027 CS-97-346, University of Tennessee, Knoxville, December 1996
1029 [8] Paul Vixie, personal communication.
1031 [9] Crocker, Dave H. "Standard for the Format of ARPA Internet Text
1032 Messages", RFC-822, August 1982.
1034 [10] Orth, Charles and Bill Arms; Handle Resolution Protocol
1035 Specification, http://www.handle.net/docs/client_spec.html
1037 [11] Williamson, S., M. Kosters, D. Blacka, J. Singh, K. Zeilstra,
1038 "Referral Whois Protocol (RWhois)", RFC-2167, June 1997.
1040 [12] Information Retrieval (Z39.50): Application Service Definition
1041 and Protocol Specification, ANSI/NISO Z39.50-1995, July 1995.
1043 [13] IEEE Standard for Information Technology - Portable Operating
1044 System Interface (POSIX) - Part 2: Shell and Utilities (Vol. 1);
1045 IEEE Std 1003.2-1992; The Institute of Electrical and
1046 Electronics Engineers; New York; 1993. ISBN:1-55937-255-9
1048 [14] Braden, R., "Requirements for Internet Hosts - Application and
1049 and Support", RFC-1123, Oct. 1989.
1051 [15] Sollins, Karen, "Requirements and a Framework for URN Resolution
1052 Systems", November 1996, Work in Progress.
1066 Daniel & Mealling Experimental [Page 19]
1068 RFC 2168 Resolution of URIs Using the DNS June 1997
1071 Security Considerations
1072 =======================
1074 The use of "urn.net" as the registry for URN namespaces is subject to
1075 denial of service attacks, as well as other DNS spoofing attacks. The
1076 interactions with DNSSEC are currently being studied. It is expected
1077 that NAPTR records will be signed with SIG records once the DNSSEC
1080 The rewrite rules make identifiers from other namespaces subject to
1081 the same attacks as normal domain names. Since they have not been
1082 easily resolvable before, this may or may not be considered a
1085 Regular expressions should be checked for sanity, not blindly passed
1086 to something like PERL.
1088 This document has discussed a way of locating a resolver, but has not
1089 discussed any detail of how the communication with the resolver takes
1090 place. There are significant security considerations attached to the
1091 communication with a resolver. Those considerations are outside the
1092 scope of this document, and must be addressed by the specifications
1093 for particular resolver communication protocols.
1095 Author Contact Information:
1096 ===========================
1099 Los Alamos National Laboratory
1101 Los Alamos, NM, USA, 87545
1102 voice: +1 505 665 0597
1103 fax: +1 505 665 4939
1104 email: rdaniel@lanl.gov
1109 505 Huntmar Park Drive
1111 voice: (703) 742-0400
1113 email: michaelm@internic.net
1114 URL: http://www.netsol.com/
1122 Daniel & Mealling Experimental [Page 20]