Doc/ref/ref2.tex

   1 \chapter{Lexical analysis\label{lexical}}
   2
   3 A Python program is read by a \emph{parser}.  Input to the parser is a
   4 stream of \emph{tokens}, generated by the \emph{lexical analyzer}.  This
   5 chapter describes how the lexical analyzer breaks a file into tokens.
   6 \index{lexical analysis}
   7 \index{parser}
   8 \index{token}
   9
  10 Python uses the 7-bit \ASCII{} character set for program text.
  11 \versionadded[An encoding declaration can be used to indicate that
  12 string literals and comments use an encoding different from ASCII.]{2.3}
  13 For compatibility with older versions, Python only warns if it finds
  14 8-bit characters; those warnings should be corrected by either declaring
  15 an explicit encoding, or using escape sequences if those bytes are binary
  16 data, instead of characters.
  17
  18
  19 The run-time character set depends on the I/O devices connected to the
  20 program but is generally a superset of \ASCII.
  21
  22 \strong{Future compatibility note:} It may be tempting to assume that the
  23 character set for 8-bit characters is ISO Latin-1 (an \ASCII{}
  24 superset that covers most western languages that use the Latin
  25 alphabet), but it is possible that in the future Unicode text editors
  26 will become common.  These generally use the UTF-8 encoding, which is
  27 also an \ASCII{} superset, but with very different use for the
  28 characters with ordinals 128-255.  While there is no consensus on this
  29 subject yet, it is unwise to assume either Latin-1 or UTF-8, even
  30 though the current implementation appears to favor Latin-1.  This
  31 applies both to the source character set and the run-time character
  32 set.
  33
  34
  35 \section{Line structure\label{line-structure}}
  36
  37 A Python program is divided into a number of \emph{logical lines}.
  38 \index{line structure}
  39
  40
  41 \subsection{Logical lines\label{logical}}
  42
  43 The end of
  44 a logical line is represented by the token NEWLINE.  Statements cannot
  45 cross logical line boundaries except where NEWLINE is allowed by the
  46 syntax (e.g., between statements in compound statements).
  47 A logical line is constructed from one or more \emph{physical lines}
  48 by following the explicit or implicit \emph{line joining} rules.
  49 \index{logical line}
  50 \index{physical line}
  51 \index{line joining}
  52 \index{NEWLINE token}
  53
  54
  55 \subsection{Physical lines\label{physical}}
  56
  57 A physical line ends in whatever the current platform's convention is
  58 for terminating lines.  On \UNIX, this is the \ASCII{} LF (linefeed)
  59 character.  On DOS/Windows, it is the \ASCII{} sequence CR LF (return
  60 followed by linefeed).  On Macintosh, it is the \ASCII{} CR (return)
  61 character.
  62
  63
  64 \subsection{Comments\label{comments}}
  65
  66 A comment starts with a hash character (\code{\#}) that is not part of
  67 a string literal, and ends at the end of the physical line.  A comment
  68 signifies the end of the logical line unless the implicit line joining
  69 rules are invoked.
  70 Comments are ignored by the syntax; they are not tokens.
  71 \index{comment}
  72 \index{hash character}
  73
  74
  75 \subsection{Encoding declarations\label{encodings}}
  76
  77 If a comment in the first or second line of the Python script matches
  78 the regular expression \regexp{coding[=:]\e s*([\e w-_.]+)}, this comment is
  79 processed as an encoding declaration; the first group of this
  80 expression names the encoding of the source code file. The recommended
  81 forms of this expression are
  82
  83 \begin{verbatim}
  84 # -*- coding: <encoding-name> -*-
  85 \end{verbatim}
  86
  87 which is recognized also by GNU Emacs, and
  88
  89 \begin{verbatim}
  90 # vim:fileencoding=<encoding-name>
  91 \end{verbatim}
  92
  93 which is recognized by Bram Moolenar's VIM. In addition, if the first
  94 bytes of the file are the UTF-8 byte-order mark
  95 (\code{'\e xef\e xbb\e xbf'}), the declared file encoding is UTF-8
  96 (this is supported, among others, by Microsoft's \program{notepad}).
  97
  98 If an encoding is declared, the encoding name must be recognized by
  99 Python. % XXX there should be a list of supported encodings.
 100 The encoding is used for all lexical analysis, in particular to find
 101 the end of a string, and to interpret the contents of Unicode literals.
 102 String literals are converted to Unicode for syntactical analysis,
 103 then converted back to their original encoding before interpretation
 104 starts. The encoding declaration must appear on a line of its own.
 105
 106 \subsection{Explicit line joining\label{explicit-joining}}
 107
 108 Two or more physical lines may be joined into logical lines using
 109 backslash characters (\code{\e}), as follows: when a physical line ends
 110 in a backslash that is not part of a string literal or comment, it is
 111 joined with the following forming a single logical line, deleting the
 112 backslash and the following end-of-line character.  For example:
 113 \index{physical line}
 114 \index{line joining}
 115 \index{line continuation}
 116 \index{backslash character}
 117 %
 118 \begin{verbatim}
 119 if 1900 < year < 2100 and 1 <= month <= 12 \
 120    and 1 <= day <= 31 and 0 <= hour < 24 \
 121    and 0 <= minute < 60 and 0 <= second < 60:   # Looks like a valid date
 122         return 1
 123 \end{verbatim}
 124
 125 A line ending in a backslash cannot carry a comment.  A backslash does
 126 not continue a comment.  A backslash does not continue a token except
 127 for string literals (i.e., tokens other than string literals cannot be
 128 split across physical lines using a backslash).  A backslash is
 129 illegal elsewhere on a line outside a string literal.
 130
 131
 132 \subsection{Implicit line joining\label{implicit-joining}}
 133
 134 Expressions in parentheses, square brackets or curly braces can be
 135 split over more than one physical line without using backslashes.
 136 For example:
 137
 138 \begin{verbatim}
 139 month_names = ['Januari', 'Februari', 'Maart',      # These are the
 140                'April',   'Mei',      'Juni',       # Dutch names
 141                'Juli',    'Augustus', 'September',  # for the months
 142                'Oktober', 'November', 'December']   # of the year
 143 \end{verbatim}
 144
 145 Implicitly continued lines can carry comments.  The indentation of the
 146 continuation lines is not important.  Blank continuation lines are
 147 allowed.  There is no NEWLINE token between implicit continuation
 148 lines.  Implicitly continued lines can also occur within triple-quoted
 149 strings (see below); in that case they cannot carry comments.
 150
 151
 152 \subsection{Blank lines \index{blank line}\label{blank-lines}}
 153
 154 A logical line that contains only spaces, tabs, formfeeds and possibly
 155 a comment, is ignored (i.e., no NEWLINE token is generated).  During
 156 interactive input of statements, handling of a blank line may differ
 157 depending on the implementation of the read-eval-print loop.  In the
 158 standard implementation, an entirely blank logical line (i.e.\ one
 159 containing not even whitespace or a comment) terminates a multi-line
 160 statement.
 161
 162
 163 \subsection{Indentation\label{indentation}}
 164
 165 Leading whitespace (spaces and tabs) at the beginning of a logical
 166 line is used to compute the indentation level of the line, which in
 167 turn is used to determine the grouping of statements.
 168 \index{indentation}
 169 \index{whitespace}
 170 \index{leading whitespace}
 171 \index{space}
 172 \index{tab}
 173 \index{grouping}
 174 \index{statement grouping}
 175
 176 First, tabs are replaced (from left to right) by one to eight spaces
 177 such that the total number of characters up to and including the
 178 replacement is a multiple of
 179 eight (this is intended to be the same rule as used by \UNIX).  The
 180 total number of spaces preceding the first non-blank character then
 181 determines the line's indentation.  Indentation cannot be split over
 182 multiple physical lines using backslashes; the whitespace up to the
 183 first backslash determines the indentation.
 184
 185 \strong{Cross-platform compatibility note:} because of the nature of
 186 text editors on non-UNIX platforms, it is unwise to use a mixture of
 187 spaces and tabs for the indentation in a single source file.
 188
 189 A formfeed character may be present at the start of the line; it will
 190 be ignored for the indentation calculations above.  Formfeed
 191 characters occurring elsewhere in the leading whitespace have an
 192 undefined effect (for instance, they may reset the space count to
 193 zero).
 194
 195 The indentation levels of consecutive lines are used to generate
 196 INDENT and DEDENT tokens, using a stack, as follows.
 197 \index{INDENT token}
 198 \index{DEDENT token}
 199
 200 Before the first line of the file is read, a single zero is pushed on
 201 the stack; this will never be popped off again.  The numbers pushed on
 202 the stack will always be strictly increasing from bottom to top.  At
 203 the beginning of each logical line, the line's indentation level is
 204 compared to the top of the stack.  If it is equal, nothing happens.
 205 If it is larger, it is pushed on the stack, and one INDENT token is
 206 generated.  If it is smaller, it \emph{must} be one of the numbers
 207 occurring on the stack; all numbers on the stack that are larger are
 208 popped off, and for each number popped off a DEDENT token is
 209 generated.  At the end of the file, a DEDENT token is generated for
 210 each number remaining on the stack that is larger than zero.
 211
 212 Here is an example of a correctly (though confusingly) indented piece
 213 of Python code:
 214
 215 \begin{verbatim}
 216 def perm(l):
 217         # Compute the list of all permutations of l
 218     if len(l) <= 1:
 219                   return [l]
 220     r = []
 221     for i in range(len(l)):
 222              s = l[:i] + l[i+1:]
 223              p = perm(s)
 224              for x in p:
 225               r.append(l[i:i+1] + x)
 226     return r
 227 \end{verbatim}
 228
 229 The following example shows various indentation errors:
 230
 231 \begin{verbatim}
 232  def perm(l):                       # error: first line indented
 233 for i in range(len(l)):             # error: not indented
 234     s = l[:i] + l[i+1:]
 235         p = perm(l[:i] + l[i+1:])   # error: unexpected indent
 236         for x in p:
 237                 r.append(l[i:i+1] + x)
 238             return r                # error: inconsistent dedent
 239 \end{verbatim}
 240
 241 (Actually, the first three errors are detected by the parser; only the
 242 last error is found by the lexical analyzer --- the indentation of
 243 \code{return r} does not match a level popped off the stack.)
 244
 245
 246 \subsection{Whitespace between tokens\label{whitespace}}
 247
 248 Except at the beginning of a logical line or in string literals, the
 249 whitespace characters space, tab and formfeed can be used
 250 interchangeably to separate tokens.  Whitespace is needed between two
 251 tokens only if their concatenation could otherwise be interpreted as a
 252 different token (e.g., ab is one token, but a b is two tokens).
 253
 254
 255 \section{Other tokens\label{other-tokens}}
 256
 257 Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
 258 exist: \emph{identifiers}, \emph{keywords}, \emph{literals},
 259 \emph{operators}, and \emph{delimiters}.
 260 Whitespace characters (other than line terminators, discussed earlier)
 261 are not tokens, but serve to delimit tokens.
 262 Where
 263 ambiguity exists, a token comprises the longest possible string that
 264 forms a legal token, when read from left to right.
 265
 266
 267 \section{Identifiers and keywords\label{identifiers}}
 268
 269 Identifiers (also referred to as \emph{names}) are described by the following
 270 lexical definitions:
 271 \index{identifier}
 272 \index{name}
 273
 274 \begin{productionlist}
 275   \production{identifier}
 276              {(\token{letter}|"_") (\token{letter} | \token{digit} | "_")*}
 277   \production{letter}
 278              {\token{lowercase} | \token{uppercase}}
 279   \production{lowercase}
 280              {"a"..."z"}
 281   \production{uppercase}
 282              {"A"..."Z"}
 283   \production{digit}
 284              {"0"..."9"}
 285 \end{productionlist}
 286
 287 Identifiers are unlimited in length.  Case is significant.
 288
 289
 290 \subsection{Keywords\label{keywords}}
 291
 292 The following identifiers are used as reserved words, or
 293 \emph{keywords} of the language, and cannot be used as ordinary
 294 identifiers.  They must be spelled exactly as written here:%
 295 \index{keyword}%
 296 \index{reserved word}
 297
 298 \begin{verbatim}
 299 and       del       for       is        raise
 300 assert    elif      from      lambda    return
 301 break     else      global    not       try
 302 class     except    if        or        while
 303 continue  exec      import    pass      yield
 304 def       finally   in        print
 305 \end{verbatim}
 306
 307 % When adding keywords, use reswords.py for reformatting
 308
 309 Note that although the identifier \code{as} can be used as part of the
 310 syntax of \keyword{import} statements, it is not currently a reserved
 311 word.
 312
 313 In some future version of Python, the identifiers \code{as} and
 314 \code{None} will both become keywords.
 315
 316
 317 \subsection{Reserved classes of identifiers\label{id-classes}}
 318
 319 Certain classes of identifiers (besides keywords) have special
 320 meanings.  These are:
 321
 322 \begin{tableiii}{l|l|l}{code}{Form}{Meaning}{Notes}
 323 \lineiii{_*}{Not imported by \samp{from \var{module} import *}}{(1)}
 324 \lineiii{__*__}{System-defined name}{}
 325 \lineiii{__*}{Class-private name mangling}{}
 326 \end{tableiii}
 327
 328 (XXX need section references here.)
 329
 330 Note:
 331
 332 \begin{description}
 333 \item[(1)] The special identifier \samp{_} is used in the interactive
 334 interpreter to store the result of the last evaluation; it is stored
 335 in the \module{__builtin__} module.  When not in interactive mode,
 336 \samp{_} has no special meaning and is not defined.
 337 \end{description}
 338
 339
 340 \section{Literals\label{literals}}
 341
 342 Literals are notations for constant values of some built-in types.
 343 \index{literal}
 344 \index{constant}
 345
 346
 347 \subsection{String literals\label{strings}}
 348
 349 String literals are described by the following lexical definitions:
 350 \index{string literal}
 351
 352 \index{ASCII@\ASCII}
 353 \begin{productionlist}
 354   \production{stringliteral}
 355              {[\token{stringprefix}](\token{shortstring} | \token{longstring})}
 356   \production{stringprefix}
 357              {"r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"}
 358   \production{shortstring}
 359              {"'" \token{shortstringitem}* "'"
 360               | '"' \token{shortstringitem}* '"'}
 361   \production{longstring}
 362              {"'''" \token{longstringitem}* "'''"}
 363   \productioncont{| '"""' \token{longstringitem}* '"""'}
 364   \production{shortstringitem}
 365              {\token{shortstringchar} | \token{escapeseq}}
 366   \production{longstringitem}
 367              {\token{longstringchar} | \token{escapeseq}}
 368   \production{shortstringchar}
 369              {<any ASCII character except "\e" or newline or the quote>}
 370   \production{longstringchar}
 371              {<any ASCII character except "\e">}
 372   \production{escapeseq}
 373              {"\e" <any ASCII character>}
 374 \end{productionlist}
 375
 376 One syntactic restriction not indicated by these productions is that
 377 whitespace is not allowed between the \grammartoken{stringprefix} and
 378 the rest of the string literal.
 379
 380 \index{triple-quoted string}
 381 \index{Unicode Consortium}
 382 \index{string!Unicode}
 383 In plain English: String literals can be enclosed in matching single
 384 quotes (\code{'}) or double quotes (\code{"}).  They can also be
 385 enclosed in matching groups of three single or double quotes (these
 386 are generally referred to as \emph{triple-quoted strings}).  The
 387 backslash (\code{\e}) character is used to escape characters that
 388 otherwise have a special meaning, such as newline, backslash itself,
 389 or the quote character.  String literals may optionally be prefixed
 390 with a letter \character{r} or \character{R}; such strings are called
 391 \dfn{raw strings}\index{raw string} and use different rules for interpreting
 392 backslash escape sequences.  A prefix of \character{u} or \character{U}
 393 makes the string a Unicode string.  Unicode strings use the Unicode character
 394 set as defined by the Unicode Consortium and ISO~10646.  Some additional
 395 escape sequences, described below, are available in Unicode strings.
 396 The two prefix characters may be combined; in this case, \character{u} must
 397 appear before \character{r}.
 398
 399 In triple-quoted strings,
 400 unescaped newlines and quotes are allowed (and are retained), except
 401 that three unescaped quotes in a row terminate the string.  (A
 402 ``quote'' is the character used to open the string, i.e. either
 403 \code{'} or \code{"}.)
 404
 405 Unless an \character{r} or \character{R} prefix is present, escape
 406 sequences in strings are interpreted according to rules similar
 407 to those used by Standard C.  The recognized escape sequences are:
 408 \index{physical line}
 409 \index{escape sequence}
 410 \index{Standard C}
 411 \index{C}
 412
 413 \begin{tableiii}{l|l|c}{code}{Escape Sequence}{Meaning}{Notes}
 414 \lineiii{\e\var{newline}} {Ignored}{}
 415 \lineiii{\e\e}  {Backslash (\code{\e})}{}
 416 \lineiii{\e'}   {Single quote (\code{'})}{}
 417 \lineiii{\e"}   {Double quote (\code{"})}{}
 418 \lineiii{\e a}  {\ASCII{} Bell (BEL)}{}
 419 \lineiii{\e b}  {\ASCII{} Backspace (BS)}{}
 420 \lineiii{\e f}  {\ASCII{} Formfeed (FF)}{}
 421 \lineiii{\e n}  {\ASCII{} Linefeed (LF)}{}
 422 \lineiii{\e N\{\var{name}\}}
 423         {Character named \var{name} in the Unicode database (Unicode only)}{}
 424 \lineiii{\e r}  {\ASCII{} Carriage Return (CR)}{}
 425 \lineiii{\e t}  {\ASCII{} Horizontal Tab (TAB)}{}
 426 \lineiii{\e u\var{xxxx}}
 427         {Character with 16-bit hex value \var{xxxx} (Unicode only)}{(1)}
 428 \lineiii{\e U\var{xxxxxxxx}}
 429         {Character with 32-bit hex value \var{xxxxxxxx} (Unicode only)}{(2)}
 430 \lineiii{\e v}  {\ASCII{} Vertical Tab (VT)}{}
 431 \lineiii{\e\var{ooo}} {\ASCII{} character with octal value \var{ooo}}{(3)}
 432 \lineiii{\e x\var{hh}} {\ASCII{} character with hex value \var{hh}}{(4)}
 433 \end{tableiii}
 434 \index{ASCII@\ASCII}
 435
 436 \noindent
 437 Notes:
 438
 439 \begin{itemize}
 440 \item[(1)]
 441   Individual code units which form parts of a surrogate pair can be
 442   encoded using this escape sequence.
 443 \item[(2)]
 444   Any Unicode character can be encoded this way, but characters
 445   outside the Basic Multilingual Plane (BMP) will be encoded using a
 446   surrogate pair if Python is compiled to use 16-bit code units (the
 447   default).  Individual code units which form parts of a surrogate
 448   pair can be encoded using this escape sequence.
 449 \item[(3)]
 450   As in Standard C, up to three octal digits are accepted.
 451 \item[(4)]
 452   Unlike in Standard C, at most two hex digits are accepted.
 453 \end{itemize}
 454
 455
 456 Unlike Standard \index{unrecognized escape sequence}C,
 457 all unrecognized escape sequences are left in the string unchanged,
 458 i.e., \emph{the backslash is left in the string}.  (This behavior is
 459 useful when debugging: if an escape sequence is mistyped, the
 460 resulting output is more easily recognized as broken.)  It is also
 461 important to note that the escape sequences marked as ``(Unicode
 462 only)'' in the table above fall into the category of unrecognized
 463 escapes for non-Unicode string literals.
 464
 465 When an \character{r} or \character{R} prefix is present, a character
 466 following a backslash is included in the string without change, and \emph{all
 467 backslashes are left in the string}.  For example, the string literal
 468 \code{r"\e n"} consists of two characters: a backslash and a lowercase
 469 \character{n}.  String quotes can be escaped with a backslash, but the
 470 backslash remains in the string; for example, \code{r"\e""} is a valid string
 471 literal consisting of two characters: a backslash and a double quote;
 472 \code{r"\e"} is not a valid string literal (even a raw string cannot
 473 end in an odd number of backslashes).  Specifically, \emph{a raw
 474 string cannot end in a single backslash} (since the backslash would
 475 escape the following quote character).  Note also that a single
 476 backslash followed by a newline is interpreted as those two characters
 477 as part of the string, \emph{not} as a line continuation.
 478
 479 When an \character{r} or \character{R} prefix is used in conjunction
 480 with a \character{u} or \character{U} prefix, then the \code{\e uXXXX}
 481 escape sequence is processed while \emph{all other backslashes are
 482 left in the string}.  For example, the string literal
 483 \code{ur"\e{}u0062\e n"} consists of three Unicode characters: `LATIN
 484 SMALL LETTER B', `REVERSE SOLIDUS', and `LATIN SMALL LETTER N'.
 485 Backslashes can be escaped with a preceding backslash; however, both
 486 remain in the string.  As a result, \code{\e uXXXX} escape sequences
 487 are only recognized when there are an odd number of backslashes.
 488
 489 \subsection{String literal concatenation\label{string-catenation}}
 490
 491 Multiple adjacent string literals (delimited by whitespace), possibly
 492 using different quoting conventions, are allowed, and their meaning is
 493 the same as their concatenation.  Thus, \code{"hello" 'world'} is
 494 equivalent to \code{"helloworld"}.  This feature can be used to reduce
 495 the number of backslashes needed, to split long strings conveniently
 496 across long lines, or even to add comments to parts of strings, for
 497 example:
 498
 499 \begin{verbatim}
 500 re.compile("[A-Za-z_]"       # letter or underscore
 501            "[A-Za-z0-9_]*"   # letter, digit or underscore
 502           )
 503 \end{verbatim}
 504
 505 Note that this feature is defined at the syntactical level, but
 506 implemented at compile time.  The `+' operator must be used to
 507 concatenate string expressions at run time.  Also note that literal
 508 concatenation can use different quoting styles for each component
 509 (even mixing raw strings and triple quoted strings).
 510
 511
 512 \subsection{Numeric literals\label{numbers}}
 513
 514 There are four types of numeric literals: plain integers, long
 515 integers, floating point numbers, and imaginary numbers.  There are no
 516 complex literals (complex numbers can be formed by adding a real
 517 number and an imaginary number).
 518 \index{number}
 519 \index{numeric literal}
 520 \index{integer literal}
 521 \index{plain integer literal}
 522 \index{long integer literal}
 523 \index{floating point literal}
 524 \index{hexadecimal literal}
 525 \index{octal literal}
 526 \index{decimal literal}
 527 \index{imaginary literal}
 528 \index{complex!literal}
 529
 530 Note that numeric literals do not include a sign; a phrase like
 531 \code{-1} is actually an expression composed of the unary operator
 532 `\code{-}' and the literal \code{1}.
 533
 534
 535 \subsection{Integer and long integer literals\label{integers}}
 536
 537 Integer and long integer literals are described by the following
 538 lexical definitions:
 539
 540 \begin{productionlist}
 541   \production{longinteger}
 542              {\token{integer} ("l" | "L")}
 543   \production{integer}
 544              {\token{decimalinteger} | \token{octinteger} | \token{hexinteger}}
 545   \production{decimalinteger}
 546              {\token{nonzerodigit} \token{digit}* | "0"}
 547   \production{octinteger}
 548              {"0" \token{octdigit}+}
 549   \production{hexinteger}
 550              {"0" ("x" | "X") \token{hexdigit}+}
 551   \production{nonzerodigit}
 552              {"1"..."9"}
 553   \production{octdigit}
 554              {"0"..."7"}
 555   \production{hexdigit}
 556              {\token{digit} | "a"..."f" | "A"..."F"}
 557 \end{productionlist}
 558
 559 Although both lower case \character{l} and upper case \character{L} are
 560 allowed as suffix for long integers, it is strongly recommended to always
 561 use \character{L}, since the letter \character{l} looks too much like the
 562 digit \character{1}.
 563
 564 Plain integer decimal literals must be at most 2147483647 (i.e., the
 565 largest positive integer, using 32-bit arithmetic).  Plain octal and
 566 hexadecimal literals may be as large as 4294967295, but values larger
 567 than 2147483647 are converted to a negative value by subtracting
 568 4294967296.  There is no limit for long integer literals apart from
 569 what can be stored in available memory.
 570
 571 Some examples of plain and long integer literals:
 572
 573 \begin{verbatim}
 574 7     2147483647                        0177    0x80000000
 575 3L    79228162514264337593543950336L    0377L   0x100000000L
 576 \end{verbatim}
 577
 578
 579 \subsection{Floating point literals\label{floating}}
 580
 581 Floating point literals are described by the following lexical
 582 definitions:
 583
 584 \begin{productionlist}
 585   \production{floatnumber}
 586              {\token{pointfloat} | \token{exponentfloat}}
 587   \production{pointfloat}
 588              {[\token{intpart}] \token{fraction} | \token{intpart} "."}
 589   \production{exponentfloat}
 590              {(\token{intpart} | \token{pointfloat})
 591               \token{exponent}}
 592   \production{intpart}
 593              {\token{digit}+}
 594   \production{fraction}
 595              {"." \token{digit}+}
 596   \production{exponent}
 597              {("e" | "E") ["+" | "-"] \token{digit}+}
 598 \end{productionlist}
 599
 600 Note that the integer and exponent parts of floating point numbers
 601 can look like octal integers, but are interpreted using radix 10.  For
 602 example, \samp{077e010} is legal, and denotes the same number
 603 as \samp{77e10}.
 604 The allowed range of floating point literals is
 605 implementation-dependent.
 606 Some examples of floating point literals:
 607
 608 \begin{verbatim}
 609 3.14    10.    .001    1e100    3.14e-10    0e0
 610 \end{verbatim}
 611
 612 Note that numeric literals do not include a sign; a phrase like
 613 \code{-1} is actually an expression composed of the operator
 614 \code{-} and the literal \code{1}.
 615
 616
 617 \subsection{Imaginary literals\label{imaginary}}
 618
 619 Imaginary literals are described by the following lexical definitions:
 620
 621 \begin{productionlist}
 622   \production{imagnumber}{(\token{floatnumber} | \token{intpart}) ("j" | "J")}
 623 \end{productionlist}
 624
 625 An imaginary literal yields a complex number with a real part of
 626 0.0.  Complex numbers are represented as a pair of floating point
 627 numbers and have the same restrictions on their range.  To create a
 628 complex number with a nonzero real part, add a floating point number
 629 to it, e.g., \code{(3+4j)}.  Some examples of imaginary literals:
 630
 631 \begin{verbatim}
 632 3.14j   10.j    10j     .001j   1e100j  3.14e-10j
 633 \end{verbatim}
 634
 635
 636 \section{Operators\label{operators}}
 637
 638 The following tokens are operators:
 639 \index{operators}
 640
 641 \begin{verbatim}
 642 +       -       *       **      /       //      %
 643 <<      >>      &       |       ^       ~
 644 <       >       <=      >=      ==      !=      <>
 645 \end{verbatim}
 646
 647 The comparison operators \code{<>} and \code{!=} are alternate
 648 spellings of the same operator.  \code{!=} is the preferred spelling;
 649 \code{<>} is obsolescent.
 650
 651
 652 \section{Delimiters\label{delimiters}}
 653
 654 The following tokens serve as delimiters in the grammar:
 655 \index{delimiters}
 656
 657 \begin{verbatim}
 658 (       )       [       ]       {       }
 659 ,       :       .       `       =       ;
 660 +=      -=      *=      /=      //=     %=
 661 &=      |=      ^=      >>=     <<=     **=
 662 \end{verbatim}
 663
 664 The period can also occur in floating-point and imaginary literals.  A
 665 sequence of three periods has a special meaning as an ellipsis in slices.
 666 The second half of the list, the augmented assignment operators, serve
 667 lexically as delimiters, but also perform an operation.
 668
 669 The following printing \ASCII{} characters have special meaning as part
 670 of other tokens or are otherwise significant to the lexical analyzer:
 671
 672 \begin{verbatim}
 673 '       "       #       \
 674 \end{verbatim}
 675
 676 The following printing \ASCII{} characters are not used in Python.  Their
 677 occurrence outside string literals and comments is an unconditional
 678 error:
 679 \index{ASCII@\ASCII}
 680
 681 \begin{verbatim}
 682 @       $       ?
 683 \end{verbatim}