doc/rfc5147.txt

   1
   2
   3
   4
   5
   6
   7 Network Working Group                                           E. Wilde
   8 Request for Comments: 5147                                   UC Berkeley
   9 Updates: 2046                                                  M. Duerst
  10 Category: Standards Track                       Aoyama Gakuin University
  11                                                               April 2008
  12
  13
  14          URI Fragment Identifiers for the text/plain Media Type
  15
  16 Status of This Memo
  17
  18    This document specifies an Internet standards track protocol for the
  19    Internet community, and requests discussion and suggestions for
  20    improvements.  Please refer to the current edition of the "Internet
  21    Official Protocol Standards" (STD 1) for the standardization state
  22    and status of this protocol.  Distribution of this memo is unlimited.
  23
  24 Abstract
  25
  26    This memo defines URI fragment identifiers for text/plain MIME
  27    entities.  These fragment identifiers make it possible to refer to
  28    parts of a text/plain MIME entity, either identified by character
  29    position or range, or by line position or range.  Fragment
  30    identifiers may also contain information for integrity checks to make
  31    them more robust.
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58 Wilde & Duerst              Standards Track                     [Page 1]
  59 \f
  60 RFC 5147            text/plain Fragment Identifiers           April 2008
  61
  62
  63 Table of Contents
  64
  65    1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
  66      1.1.  What Is text/plain?  . . . . . . . . . . . . . . . . . . .  3
  67      1.2.  What Is a URI Fragment Identifier? . . . . . . . . . . . .  4
  68      1.3.  Why text/plain Fragment Identifiers? . . . . . . . . . . .  4
  69      1.4.  Incremental Deployment . . . . . . . . . . . . . . . . . .  5
  70      1.5.  Notation Used in This Memo . . . . . . . . . . . . . . . .  5
  71    2.  Fragment Identification Methods  . . . . . . . . . . . . . . .  5
  72      2.1.  Fragment Identification Principles . . . . . . . . . . . .  6
  73        2.1.1.  Positions and Ranges . . . . . . . . . . . . . . . . .  6
  74        2.1.2.  Characters and Lines . . . . . . . . . . . . . . . . .  7
  75      2.2.  Combining the Principles . . . . . . . . . . . . . . . . .  7
  76        2.2.1.  Character Position . . . . . . . . . . . . . . . . . .  7
  77        2.2.2.  Character Range  . . . . . . . . . . . . . . . . . . .  8
  78        2.2.3.  Line Position  . . . . . . . . . . . . . . . . . . . .  8
  79        2.2.4.  Line Range . . . . . . . . . . . . . . . . . . . . . .  8
  80      2.3.  Fragment Identifier Robustness . . . . . . . . . . . . . .  8
  81    3.  Fragment Identification Syntax . . . . . . . . . . . . . . . .  9
  82      3.1.  Integrity Checks . . . . . . . . . . . . . . . . . . . . .  9
  83    4.  Fragment Identifier Processing . . . . . . . . . . . . . . . . 10
  84      4.1.  Handling of Line Endings in text/plain MIME Entities . . . 10
  85      4.2.  Handling of Position Values  . . . . . . . . . . . . . . . 11
  86      4.3.  Handling of Integrity Checks . . . . . . . . . . . . . . . 11
  87      4.4.  Syntax Errors in Fragment Identifiers  . . . . . . . . . . 12
  88    5.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
  89    6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 13
  90    7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
  91    8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
  92      8.1.  Normative References . . . . . . . . . . . . . . . . . . . 14
  93      8.2.  Informative References . . . . . . . . . . . . . . . . . . 14
  94    Appendix A.  Acknowledgements  . . . . . . . . . . . . . . . . . . 16
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114 Wilde & Duerst              Standards Track                     [Page 2]
 115 \f
 116 RFC 5147            text/plain Fragment Identifiers           April 2008
 117
 118
 119 1.  Introduction
 120
 121    This memo updates the text/plain media type defined in RFC 2046 [3]
 122    by defining URI fragment identifiers for text/plain MIME entities.
 123    This makes it possible to refer to parts of a text/plain MIME entity.
 124    Such parts can be identified by either character position or range,
 125    or by line position or range.  Integrity checking information can be
 126    added to a fragment identifier to make it more robust, enabling
 127    applications to detect changes of the entity.
 128
 129    This section gives an introduction to the general concepts of text/
 130    plain MIME entities and URI fragment identifiers, and it discusses
 131    the need for fragment identifiers for text/plain and deployment
 132    issues.  Section 2 discusses the principles and methods on which this
 133    memo is based.  Section 3 defines the syntax, and Section 4 discusses
 134    processing of text/plain fragment identifiers.  Section 5 shows some
 135    examples.
 136
 137 1.1.  What Is text/plain?
 138
 139    Internet Media Types (often referred to as "MIME types"), as defined
 140    in RFC 2045 [2] and RFC 2046 [3], are used to identify different
 141    types and sub-types of media.  RFC 2046 [3] and RFC 3676 [6] specify
 142    the text/plain media type, which is used for simple, unformatted
 143    text.  Quoting from RFC 2046 [3]: "Plain text does not provide for or
 144    allow formatting commands, font attribute specifications, processing
 145    instructions, interpretation directives, or content markup.  Plain
 146    text is seen simply as a linear sequence of characters, possibly
 147    interrupted by line breaks or page breaks".
 148
 149    The text/plain media type does not restrict the character encoding;
 150    any character encoding may be used.  In the absence of an explicit
 151    character encoding declaration, US-ASCII [13] is assumed as the
 152    default character encoding.  This variability of the character
 153    encoding makes it impossible to count characters in a text/plain MIME
 154    entity without taking the character encoding into account, because
 155    there are many character encodings using more than one octet per
 156    character.
 157
 158    The biggest advantage of text/plain MIME entities is their ease of
 159    use and their portability among different platforms.  As long as they
 160    use popular character encodings (such as US-ASCII or UTF-8 [12]),
 161    they can be displayed and processed on virtually every computer
 162    system.  The only remaining interoperability issue is the
 163    representation of line endings, which is discussed in Section 4.1.
 164
 165
 166
 167
 168
 169
 170 Wilde & Duerst              Standards Track                     [Page 3]
 171 \f
 172 RFC 5147            text/plain Fragment Identifiers           April 2008
 173
 174
 175 1.2.  What Is a URI Fragment Identifier?
 176
 177    URIs are the identification mechanism for resources on the Web.  The
 178    URI syntax specified in RFC 3986 [7] optionally includes a so-called
 179    "fragment identifier", separated by a number sign ('#').  The
 180    fragment identifier consists of additional reference information to
 181    be interpreted by the user agent after the retrieval action has been
 182    successfully completed.  The semantics of a fragment identifier are a
 183    property of the data resulting from a retrieval action, regardless of
 184    the type of URI used in the reference.  Therefore, the format and
 185    interpretation of fragment identifiers is dependent on the media type
 186    of the retrieval result.
 187
 188    The most popular fragment identifier is defined for text/html
 189    (defined in RFC 2854 [10]) and makes it possible to refer to a
 190    specific element (identified by the value of a 'name' or 'id'
 191    attribute) of an HTML document.  This makes it possible to reference
 192    a specific part of a Web page, rather than a Web page as a whole.
 193
 194 1.3.  Why text/plain Fragment Identifiers?
 195
 196    Referring to specific parts of a resource can be very useful because
 197    it enables users and applications to create more specific references.
 198    Users can create references to the part they really are interested in
 199    or want to talk about, rather than always pointing to a complete
 200    resource.  Even though it is suggested that fragment identification
 201    methods are specified in a media type's MIME registration (see [15]),
 202    many media types do not have fragment identification methods
 203    associated with them.
 204
 205    Fragment identifiers are only useful if supported by the client,
 206    because they are only interpreted by the client.  Therefore, a new
 207    fragment identification method will require some time to be adopted
 208    by clients, and older clients will not support it.  However, because
 209    the URI still works even if the fragment identifier is not supported
 210    (the resource is retrieved, but the fragment identifier is not
 211    interpreted), rapid adoption is not highly critical to ensure the
 212    success of a new fragment identification method.
 213
 214    Fragment identifiers for text/plain, as defined in this memo, make it
 215    possible to refer to specific parts of a text/plain MIME entity,
 216    using concepts of positions and ranges, which may be applied to
 217    characters and lines.  Thus, text/plain fragment identifiers enable
 218    users to exchange information more specifically, thereby reducing the
 219    time and effort that is necessary to manually search for the relevant
 220    part of a text/plain MIME entity.
 221
 222
 223
 224
 225
 226 Wilde & Duerst              Standards Track                     [Page 4]
 227 \f
 228 RFC 5147            text/plain Fragment Identifiers           April 2008
 229
 230
 231    The text/plain format does not support the embedding of links, so in
 232    most environments, text/plain resources can only serve as targets for
 233    links, and not as sources.  However, when combining the text/plain
 234    fragment identifiers specified in this memo with out-of-line linking
 235    mechanisms such as XLink [14], it becomes possible to "bind" link
 236    resources to text/plain resources and thereby "embed" links into
 237    text/plain resources.  Thus, the text/plain fragment identifiers
 238    specified in this memo open a path for text/plain files to become
 239    bidirectionally navigable resources in hypermedia systems such as the
 240    Web.
 241
 242 1.4.  Incremental Deployment
 243
 244    As long as text/plain fragment identifiers are not supported
 245    universally, it is important to consider the implications of
 246    incremental deployment.  Clients (for example, Web browsers) not
 247    supporting the text/plain fragment identifier described in this memo
 248    will work with URI references to text/plain MIME entities, but they
 249    will fail to locate the sub-resource identified by the fragment
 250    identifier.  This is a reasonable fallback behavior, and in general,
 251    users should take into account the possibility that a program
 252    interpreting a given URI will fail to interpret the fragment
 253    identifier part.  Since fragment identifier evaluation is local to
 254    the client (and happens after retrieving the MIME entity), there is
 255    no reliable way for a server to determine whether a requesting client
 256    is using a URI containing a fragment identifier.
 257
 258 1.5.  Notation Used in This Memo
 259
 260    The capitalized key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
 261    "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
 262    "OPTIONAL" in this document are to be interpreted as described in RFC
 263    2119 [4].
 264
 265 2.  Fragment Identification Methods
 266
 267    The identification of fragments of text/plain MIME entities can be
 268    based on different foundations.  Since it is not possible to insert
 269    explicit, invisible identifiers into a text/plain MIME entity (for
 270    example, as used in HTML documents, implemented through dedicated
 271    attributes), fragment identification has to rely on certain inherent
 272    properties of the MIME entity.  This memo specifies fragment
 273    identification using four different methods, which are character
 274    positions and ranges, and line positions and ranges, augmented by an
 275    integrity check mechanism for improving the robustness of fragment
 276    identifiers.
 277
 278
 279
 280
 281
 282 Wilde & Duerst              Standards Track                     [Page 5]
 283 \f
 284 RFC 5147            text/plain Fragment Identifiers           April 2008
 285
 286
 287    When interpreting character or line numbers, implementations MUST
 288    take the character encoding of the MIME entity into account, because
 289    character count and octet count may differ for the character encoding
 290    being used.  For example, a MIME entity using the UTF-16 encoding (as
 291    specified in RFC 2781 [11]) uses two octets per character in most
 292    cases, and sometimes four octets per character.  It can also have a
 293    leading BOM (Byte-Order Mark), which does not count as a character
 294    and thus also affects the mapping from a simple octet count to a
 295    character count.
 296
 297 2.1.  Fragment Identification Principles
 298
 299    Fragment identification can be done by combining two orthogonal
 300    principles, which are positions and ranges, and characters and lines.
 301    This section describes the principles themselves, while Section 2.2
 302    describes the combination of the principles.
 303
 304 2.1.1.  Positions and Ranges
 305
 306    A position does not identify an actual fragment of the MIME entity,
 307    but a position inside the MIME entity, which can be regarded as a
 308    fragment of length zero.  The use case for positions is to provide
 309    pointers for applications that may use them to implement
 310    functionalities such as "insert some text here", which needs a
 311    position rather than a fragment.  Positions are counted from zero;
 312    position zero being before the first character or line of a text/
 313    plain MIME entity.  Thus, a text/plain MIME entity having one
 314    character has two positions, one before the first character (position
 315    zero), and one after the first character (position 1).
 316
 317    Since positions are fragments of length zero, applications SHOULD use
 318    other methods than highlighting to indicate positions, the most
 319    obvious way being the positioning of a cursor (if the application
 320    supports the concept of a cursor).
 321
 322    Ranges, on the other hand, identify fragments of a MIME entity that
 323    have a length that may be greater than zero.  As a general principle
 324    for ranges, they specify both a lower and an upper bound.  The start
 325    or the end of a range specification may be omitted, defaulting to the
 326    first or last position of the MIME entity, respectively.  The end of
 327    a range must have a value greater than or equal to the start.  A
 328    range with identical start and end is legal and identifies a range of
 329    length zero, which is equivalent to a position.
 330
 331    Applications that support a concept such as highlighting SHOULD use
 332    such a concept to indicate fragments of lengths greater than zero to
 333    the user.
 334
 335
 336
 337
 338 Wilde & Duerst              Standards Track                     [Page 6]
 339 \f
 340 RFC 5147            text/plain Fragment Identifiers           April 2008
 341
 342
 343    For positions and ranges, it is implicitly assumed that if a number
 344    is greater than the actual number of elements in the MIME entity,
 345    then it is referring to the last element of the MIME entity (see
 346    Section 4 for details).
 347
 348 2.1.2.  Characters and Lines
 349
 350    The concept of positions and ranges can be applied to characters or
 351    lines.  In both cases, positions indicate points between these
 352    entities, while ranges identify zero or more of these entities by
 353    indicating positions.
 354
 355    Character positions are numbered starting with zero (ignoring initial
 356    BOM marks or similar concepts that are not part of the actual textual
 357    content of a text/plain MIME entity), and counting each character
 358    separately, with the exception of line endings, which are always
 359    counted as one character (see Section 4.1 for details).
 360
 361    Line positions are numbered starting with zero (with line position
 362    zero always being identical with character position zero);
 363    Section 4.1 describes how line endings are identified.  Fragments
 364    identified by lines include the line endings, so applications
 365    identifying line-based fragments MUST include the line endings in the
 366    fragment identification they are using (e.g., the highlighted
 367    selection).  If a MIME entity does not contain any line endings, then
 368    it consists of a single (the first) line.
 369
 370 2.2.  Combining the Principles
 371
 372    In the following sections, the principles described in the preceding
 373    section (positions/ranges and characters/lines) are combined,
 374    resulting in four use cases.  The schemes mentioned below refer to
 375    the fragment identifier syntax, described in detail in Section 3.
 376
 377 2.2.1.  Character Position
 378
 379    To identify a character position (i.e., a fragment of length zero
 380    between two characters), the 'char' scheme followed by a single
 381    number is used.  This method identifies a position between two
 382    characters (or before the first or after the last character), rather
 383    than identifying a fragment consisting of a number of characters.
 384    Character position counting starts with zero, so the character
 385    position before the first character of a text/plain MIME entity has
 386    the character position zero, and a MIME entity containing n distinct
 387    characters has n+1 distinct character positions, the last one having
 388    the character position n.
 389
 390
 391
 392
 393
 394 Wilde & Duerst              Standards Track                     [Page 7]
 395 \f
 396 RFC 5147            text/plain Fragment Identifiers           April 2008
 397
 398
 399 2.2.2.  Character Range
 400
 401    To identify a fragment of one or more characters (a character range),
 402    the 'char' scheme followed by a range specification is used.  A
 403    character range is a consecutive region of the MIME entity that
 404    extends from the starting character position of the range to the
 405    ending character position of the range.
 406
 407 2.2.3.  Line Position
 408
 409    To identify a line position (i.e., a fragment of length zero between
 410    two lines), the 'line' scheme followed by a single number is used.
 411    This method identifies a position between two lines (or before the
 412    first or after the last line), rather than identifying a fragment
 413    consisting of a number of lines.  Line position counting starts with
 414    zero, so the line position before the first line of a text/plain MIME
 415    entity has the line position zero, and a MIME entity containing n
 416    distinct lines has n+1 distinct line positions, the last one having
 417    the line position n.
 418
 419 2.2.4.  Line Range
 420
 421    To identify a fragment of one or more lines (a line range), the
 422    'line' scheme followed by a range specification is used.  A line
 423    range is a consecutive region of the MIME entity that extends from
 424    the starting line position of the range to the ending line position
 425    of the range.
 426
 427 2.3.  Fragment Identifier Robustness
 428
 429    It is easily possible that a modification of the referenced resource
 430    will break a fragment identifier.  If applications want to create
 431    more robust fragment identifiers, they may do so by adding integrity-
 432    check information to fragment identifiers.  Such information is used
 433    to detect changes in the resource.  Applications can then warn users
 434    about the possibility that a fragment identifier might have been
 435    broken by a modification of the resource.
 436
 437    Fragment identifiers are interpreted by clients, and therefore
 438    integrity-check information is defined on MIME entities rather than
 439    on the resource itself.  This means that the integrity-check
 440    information is specific to a certain entity.  Specifically, content
 441    encodings and/or content transfer encodings must be removed before
 442    using integrity-check information.
 443
 444    Integrity-check information may specify the character encoding that
 445    has been used when creating the information, and if such a
 446    specification is present, clients MUST check whether the character
 447
 448
 449
 450 Wilde & Duerst              Standards Track                     [Page 8]
 451 \f
 452 RFC 5147            text/plain Fragment Identifiers           April 2008
 453
 454
 455    encoding specified and the character encoding of the retrieved MIME
 456    entity are equal, and clients MUST NOT use the integrity check
 457    information if these values differ.  However, clients MAY choose to
 458    transcode the retrieved MIME entity in the case of differing
 459    character encodings, and after doing so, apply integrity checks.
 460    Please note that this method is inherently unreliable because certain
 461    characters or character sequences may have been lost or normalized
 462    due to restrictions in one of the character encodings used.
 463
 464 3.  Fragment Identification Syntax
 465
 466    The syntax for the text/plain fragment identifiers is
 467    straightforward.  The syntax defines four schemes, 'char', 'line',
 468    and integrity check (which can either be 'length' or 'md5').  The
 469    'char' and 'line' schemes can be used in two different variants,
 470    either the position variant (with a single number), or the range
 471    variant (with two comma-separated numbers).  An integrity check can
 472    either use the 'length' or the 'md5' scheme to specify a value.
 473    'length' in this case serves as a very weak but easy to calculate
 474    integrity check.
 475
 476    The following syntax definition uses ABNF as defined in RFC 5234 [9],
 477    including the rules DIGIT and HEXDIG.  The mime-charset rule is
 478    defined in RFC 2978 [5].
 479
 480    NOTE:  In the descriptions that follow, specified text values MUST be
 481       used exactly as given, using exactly the indicated lower-case
 482       letters.  In this respect, the ABNF usage differs from [9].
 483
 484
 485    text-fragment   =  text-scheme 0*( ";" integrity-check )
 486    text-scheme     =  ( char-scheme / line-scheme )
 487    char-scheme     =  "char=" ( position / range )
 488    line-scheme     =  "line=" ( position / range )
 489    integrity-check =  ( length-scheme / md5-scheme )
 490                         [ "," mime-charset ]
 491    position        =  number
 492    range           =  ( position "," [ position ] ) / ( "," position )
 493    number          =  1*( DIGIT )
 494    length-scheme   =  "length=" number
 495    md5-scheme      =  "md5=" md5-value
 496    md5-value       =  32HEXDIG
 497
 498 3.1.  Integrity Checks
 499
 500    An integrity check can either specify a MIME entity's length, or its
 501    MD5 fingerprint.  In both cases, it can optionally specify the
 502    character encoding that has been used when calculating the integrity
 503
 504
 505
 506 Wilde & Duerst              Standards Track                     [Page 9]
 507 \f
 508 RFC 5147            text/plain Fragment Identifiers           April 2008
 509
 510
 511    check, so that clients interpreting the fragment identifier may check
 512    whether they are using the same character encoding for their
 513    calculations.  For lengths, the character encoding can be necessary
 514    because it can influence the character count.  As an example, Unicode
 515    includes precomposed characters for writing Vietnamese, but in the
 516    windows-1258 encoding, also used for writing Vietnamese, some
 517    characters have to be encoded with separate diacritics, which means
 518    that two characters will be counted.  Applying Unicode terminology,
 519    this means that the length of a text/plain MIME entity is computed
 520    based on its "code points".  For MD5 fingerprints, the character
 521    encoding is necessary because the MD5 algorithm works on the binary
 522    representation of the text/plain resource.
 523
 524    To allow future changes to this specification to address developments
 525    in cryptography, implementations MUST ignore new types of integrity
 526    checks, with names other than 'length' and 'md5'.  If several
 527    integrity checks are present, an application can use whatever
 528    integrity checks it understands, and among these, those integrity
 529    checks that provide an appropriate trade-off between performance and
 530    the need for integrity checking.  Please see Section 4.3 for further
 531    details.
 532
 533    The length of a text/plain MIME entity is calculated by using the
 534    principles defined in Section 2.1.2.  The MD5 fingerprint of a text/
 535    plain MIME entity is calculated by using the algorithm presented in
 536    [1], encoding the result in 32 hexadecimal digits (using uppercase or
 537    lowercase letters) as a representation of the 128 bits that are the
 538    result of the MD5 algorithm.  Calculation of integrity checks is done
 539    after stripping any potential content-encodings or content-transfer-
 540    encodings of the transport mechanism.
 541
 542 4.  Fragment Identifier Processing
 543
 544    Applications implementing support for the mechanism described in this
 545    memo MUST behave as described in the following sections.
 546
 547 4.1.  Handling of Line Endings in text/plain MIME Entities
 548
 549    In Internet messages, line endings in text/plain MIME entities are
 550    represented by CR+LF character sequences (see RFC 2046 [3] and RFC
 551    3676 [6]).  However, some protocols (such as HTTP) additionally allow
 552    other conventions for line endings.  Also, some operating systems
 553    store text/plain entities locally with different line endings (in
 554    most cases, Unix uses LF, MacOS traditionally uses CR, and Windows
 555    uses CR+LF).
 556
 557    Independent of the number of bytes or characters used to represent a
 558    line ending, each line ending MUST be counted as one single
 559
 560
 561
 562 Wilde & Duerst              Standards Track                    [Page 10]
 563 \f
 564 RFC 5147            text/plain Fragment Identifiers           April 2008
 565
 566
 567    character.  Implementations interpreting text/plain fragment
 568    identifiers MUST take into account the line ending conventions of the
 569    protocols and other contexts that they work in.
 570
 571    As an example, an implementation working in the context of a Web
 572    browser supporting http: URIs has to support the various line ending
 573    conventions permitted by HTTP.  As another example, an implementation
 574    used on local files (e.g., with the file: URI scheme) has to support
 575    the conventions used for local storage.  All implementations SHOULD
 576    support the Internet-wide CR+LF line ending convention, and MAY
 577    support additional conventions not related to the protocols or
 578    systems they work with.
 579
 580    Implementers should be aware of the fact that line endings in plain
 581    text entities can be represented by other characters or character
 582    sequences than CR+LF.  Besides the abovementioned CR and LF, there
 583    are also NEL and CR+NEL.  In general, the encoding of line endings
 584    can also depend on the character encoding of the MIME entity, and
 585    implementations have to take this into account where necessary.
 586
 587 4.2.  Handling of Position Values
 588
 589    If any position value (as a position or as part of a range) is
 590    greater than the length of the actual MIME entity, then it identifies
 591    the last character position or line position of the MIME entity.  If
 592    the first position value in a range is not present, then the range
 593    extends from the start of the MIME entity.  If the second position
 594    value in a range is not present, then the range extends to the end of
 595    the MIME entity.  If a range scheme's positions are not properly
 596    ordered (i.e., the first number is less than the second), then the
 597    fragment identifier MUST be ignored.
 598
 599 4.3.  Handling of Integrity Checks
 600
 601    Clients are not required to implement the handling of integrity
 602    checks, so they MAY choose to ignore integrity check information
 603    altogether.  However, if they do implement integrity checking, the
 604    following applies:
 605
 606    If a fragment identifier contains one or more integrity checks, and a
 607    client retrieves a MIME entity and, using some integrity check(s),
 608    detects that the entity has changed (observing the character encoding
 609    specification as described in Section 3.1, if present), then the
 610    client SHOULD NOT interpret the text/plain fragment identifier.  A
 611    client MAY signal this situation to the user.
 612
 613
 614
 615
 616
 617
 618 Wilde & Duerst              Standards Track                    [Page 11]
 619 \f
 620 RFC 5147            text/plain Fragment Identifiers           April 2008
 621
 622
 623 4.4.  Syntax Errors in Fragment Identifiers
 624
 625    If a fragment identifier contains a syntax error (i.e., does not
 626    conform to the syntax specified in Section 3), then it MUST be
 627    ignored by clients.  Clients MUST NOT make any attempt to correct or
 628    guess fragment identifiers.  Syntax errors MAY be reported by
 629    clients.
 630
 631 5.  Examples
 632
 633    The following examples show some usages for the fragment identifiers
 634    defined in this memo.
 635
 636    http://example.com/text.txt#char=100
 637
 638    This URI identifies the position after the 100th character of the
 639    text.txt MIME entity.  It should be noted that it is not clear which
 640    octet(s) of the MIME entity this will be without retrieving the MIME
 641    entity and thus knowing which character encoding it is using (in case
 642    of HTTP, this information will be given in the Content-Type header of
 643    the response).  If the MIME entity has fewer than 100 characters, the
 644    URI identifies the position after the MIME entity's last character.
 645
 646    http://example.com/text.txt#line=10,20
 647
 648    This URI identifies lines 11 to 20 of the text.txt MIME entity.  If
 649    the MIME entity has fewer than 11 lines, it identifies the position
 650    after the last line.  If the MIME entity has less than 20 but at
 651    least 11 lines, it identifies the range from line 11 to the last line
 652    of the MIME entity.
 653
 654    https://example.com/text.txt#line=,1
 655
 656    This URI identifies the first line.  Please note that the URI scheme
 657    has been changed to https.
 658
 659    ftp://example.com/text.txt#line=10,20;length=9876,UTF-8
 660
 661    As in the second example, this URI identifies lines 11 to 20 of the
 662    text.txt MIME entity.  The additional length integrity check
 663    specifies that the MIME entity has a length of 9876 characters when
 664    encoded in UTF-8.  If the client supports the length scheme, it may
 665    test the retrieved MIME entity for its length, but only if the
 666    retrieved MIME entity uses the UTF-8 encoding or has been locally
 667    transcoded into this encoding.
 668
 669
 670
 671
 672
 673
 674 Wilde & Duerst              Standards Track                    [Page 12]
 675 \f
 676 RFC 5147            text/plain Fragment Identifiers           April 2008
 677
 678
 679    Please note that the FTP protocol, as well as some other protocols
 680    underlying some other URI schemes, do not provide explicit
 681    information about the media type of the resource being retrieved.
 682    Using fragment identifiers with such URI schemes is therefore
 683    inherently unreliable.  Current user agents use various heuristics to
 684    infer some media type for further processing.  Processing of the
 685    fragment identifier according to this memo is only appropriate if the
 686    inferred media type is text/plain.
 687
 688 6.  IANA Considerations
 689
 690    IANA has added a reference to this specification in the text/plain
 691    Media Type registration.
 692
 693 7.  Security Considerations
 694
 695    The fact that software implementing fragment identifiers for plain
 696    text and software not implementing them differs in behavior, and the
 697    fact that different software may show documents or fragments to users
 698    in different ways, can lead to misunderstandings on the part of
 699    users.  Such misunderstandings might be exploited in a way similar to
 700    spoofing or phishing.
 701
 702    In particular, care has to be taken if fragment identifiers are used
 703    together with a mechanism that allows showing only the part of a
 704    document identified by a fragment.  One scenario may be the use of a
 705    fragment identifier to hide small-print legal text.  Another scenario
 706    may be the inclusion of site-key-like material, which may give the
 707    user the impression of using the real site rather than a fake site;
 708    other scenarios may also be possible.  Possible countermeasures may
 709    include but are not limited to displaying the included content within
 710    clearly visible boundaries and limiting inclusion to material from
 711    the same security realm or from realms that give explicit permission
 712    to be included in another realm.
 713
 714    Please note that the above issues all apply to the client side;
 715    fragment identifiers are not used when resolving a URI to retrieve
 716    the representation of a resource, but are only applied on the client
 717    side.
 718
 719    Implementers and users of fragment identifiers for plain text should
 720    also be aware of the security considerations in RFC 3986 [7] and RFC
 721    3987 [8].
 722
 723
 724
 725
 726
 727
 728
 729
 730 Wilde & Duerst              Standards Track                    [Page 13]
 731 \f
 732 RFC 5147            text/plain Fragment Identifiers           April 2008
 733
 734
 735 8.  References
 736
 737 8.1.  Normative References
 738
 739    [1]   Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321,
 740          April 1992.
 741
 742    [2]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
 743          Extensions (MIME) Part One: Format of Internet Message Bodies",
 744          RFC 2045, November 1996.
 745
 746    [3]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
 747          Extensions (MIME) Part Two: Media Types", RFC 2046,
 748          November 1996.
 749
 750    [4]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
 751          Levels", BCP 14, RFC 2119, March 1997.
 752
 753    [5]   Freed, N. and J. Postel, "IANA Charset Registration
 754          Procedures", BCP 19, RFC 2978, October 2000.
 755
 756    [6]   Gellens, R., "The Text/Plain Format and DelSp Parameters",
 757          RFC 3676, February 2004.
 758
 759    [7]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
 760          Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986,
 761          January 2005.
 762
 763    [8]   Duerst, M. and M. Suignard, "Internationalized Resource
 764          Identifiers (IRI)", RFC 3987, January 2005.
 765
 766    [9]   Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
 767          Specifications: ABNF", STD 68, RFC 5234, January 2008.
 768
 769 8.2.  Informative References
 770
 771    [10]  Connolly, D. and L. Masinter, "The 'text/html' Media Type",
 772          RFC 2854, June 2000.
 773
 774    [11]  Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646",
 775          RFC 2781, February 2000.
 776
 777    [12]  Yergeau, F., "UTF-8, a transformation format of ISO 10646",
 778          STD 63, RFC 3629, November 2003.
 779
 780    [13]  ANSI X3.4-1986, "Coded Character Set - 7-Bit American National
 781          Standard Code for Information Interchange", 1986.
 782
 783
 784
 785
 786 Wilde & Duerst              Standards Track                    [Page 14]
 787 \f
 788 RFC 5147            text/plain Fragment Identifiers           April 2008
 789
 790
 791    [14]  DeRose, S., Maler, E., and D. Orchard, "XML Linking Language
 792          (XLink) Version 1.0", World Wide Web Consortium Recommendation,
 793          June 2001, <http://www.w3.org/TR/xlink/>.
 794
 795    [15]  Freed, N. and J. Klensin, "Media Type Specifications and
 796          Registration Procedures", BCP 13, RFC 4288, December 2005.
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842 Wilde & Duerst              Standards Track                    [Page 15]
 843 \f
 844 RFC 5147            text/plain Fragment Identifiers           April 2008
 845
 846
 847 Appendix A.  Acknowledgements
 848
 849    Thanks for comments and suggestions provided by Marcel Baschnagel,
 850    Stephane Bortzmeyer, Tim Bray, Iain Calder, John Cowan, Spencer
 851    Dawkins, Lisa Dusseault, Benja Fallenstein, Ted Hardie, Sam Hartman,
 852    Sandro Hawke, Jeffrey Hutzelman, Cullen Jennings, Graham Klyne, Dan
 853    Kohn, Henrik Levkowetz, Chris Newman, Mark Nottingham, Conrad Parker,
 854    and Tim Polk.
 855
 856 Authors' Addresses
 857
 858    Erik Wilde
 859    UC Berkeley
 860    School of Information, 311 South Hall
 861    Berkeley, CA 94720-4600
 862    U.S.A.
 863
 864    Phone: +1-510-6432253
 865    EMail: dret@berkeley.edu
 866    URI:   http://dret.net/netdret/
 867
 868
 869    Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever
 870                  possible, for example as "D&#252;rst" in XML and HTML.)
 871    Aoyama Gakuin University
 872    5-10-1 Fuchinobe
 873    Sagamihara, Kanagawa  229-8558
 874    Japan
 875
 876    Phone: +81 42 759 6329
 877    Fax:   +81 42 759 6495
 878    EMail: duerst@it.aoyama.ac.jp
 879    URI:   http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898 Wilde & Duerst              Standards Track                    [Page 16]
 899 \f
 900 RFC 5147            text/plain Fragment Identifiers           April 2008
 901
 902
 903 Full Copyright Statement
 904
 905    Copyright (C) The IETF Trust (2008).
 906
 907    This document is subject to the rights, licenses and restrictions
 908    contained in BCP 78, and except as set forth therein, the authors
 909    retain all their rights.
 910
 911    This document and the information contained herein are provided on an
 912    "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
 913    OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
 914    THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
 915    OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
 916    THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
 917    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 918
 919 Intellectual Property
 920
 921    The IETF takes no position regarding the validity or scope of any
 922    Intellectual Property Rights or other rights that might be claimed to
 923    pertain to the implementation or use of the technology described in
 924    this document or the extent to which any license under such rights
 925    might or might not be available; nor does it represent that it has
 926    made any independent effort to identify any such rights.  Information
 927    on the procedures with respect to rights in RFC documents can be
 928    found in BCP 78 and BCP 79.
 929
 930    Copies of IPR disclosures made to the IETF Secretariat and any
 931    assurances of licenses to be made available, or the result of an
 932    attempt made to obtain a general license or permission for the use of
 933    such proprietary rights by implementers or users of this
 934    specification can be obtained from the IETF on-line IPR repository at
 935    http://www.ietf.org/ipr.
 936
 937    The IETF invites any interested party to bring to its attention any
 938    copyrights, patents or patent applications, or other proprietary
 939    rights that may cover technology that may be required to implement
 940    this standard.  Please address the information to the IETF at
 941    ietf-ipr@ietf.org.
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954 Wilde & Duerst              Standards Track                    [Page 17]
 955 \f