Doc/ref/ref2.tex

   1 \chapter{Lexical analysis\label{lexical}}
   2
   3 A Python program is read by a \emph{parser}.  Input to the parser is a
   4 stream of \emph{tokens}, generated by the \emph{lexical analyzer}.  This
   5 chapter describes how the lexical analyzer breaks a file into tokens.
   6 \index{lexical analysis}
   7 \index{parser}
   8 \index{token}
   9
  10 Python uses the 7-bit \ASCII{} character set for program text.
  11 \versionadded[An encoding declaration can be used to indicate that
  12 string literals and comments use an encoding different from ASCII.]{2.3}
  13 For compatibility with older versions, Python only warns if it finds
  14 8-bit characters; those warnings should be corrected by either declaring
  15 an explicit encoding, or using escape sequences if those bytes are binary
  16 data, instead of characters.
  17
  18
  19 The run-time character set depends on the I/O devices connected to the
  20 program but is generally a superset of \ASCII.
  21
  22 \strong{Future compatibility note:} It may be tempting to assume that the
  23 character set for 8-bit characters is ISO Latin-1 (an \ASCII{}
  24 superset that covers most western languages that use the Latin
  25 alphabet), but it is possible that in the future Unicode text editors
  26 will become common.  These generally use the UTF-8 encoding, which is
  27 also an \ASCII{} superset, but with very different use for the
  28 characters with ordinals 128-255.  While there is no consensus on this
  29 subject yet, it is unwise to assume either Latin-1 or UTF-8, even
  30 though the current implementation appears to favor Latin-1.  This
  31 applies both to the source character set and the run-time character
  32 set.
  33
  34
  35 \section{Line structure\label{line-structure}}
  36
  37 A Python program is divided into a number of \emph{logical lines}.
  38 \index{line structure}
  39
  40
  41 \subsection{Logical lines\label{logical}}
  42
  43 The end of
  44 a logical line is represented by the token NEWLINE.  Statements cannot
  45 cross logical line boundaries except where NEWLINE is allowed by the
  46 syntax (e.g., between statements in compound statements).
  47 A logical line is constructed from one or more \emph{physical lines}
  48 by following the explicit or implicit \emph{line joining} rules.
  49 \index{logical line}
  50 \index{physical line}
  51 \index{line joining}
  52 \index{NEWLINE token}
  53
  54
  55 \subsection{Physical lines\label{physical}}
  56
  57 A physical line ends in whatever the current platform's convention is
  58 for terminating lines.  On \UNIX, this is the \ASCII{} LF (linefeed)
  59 character.  On Windows, it is the \ASCII{} sequence CR LF (return
  60 followed by linefeed).  On Macintosh, it is the \ASCII{} CR (return)
  61 character.
  62
  63
  64 \subsection{Comments\label{comments}}
  65
  66 A comment starts with a hash character (\code{\#}) that is not part of
  67 a string literal, and ends at the end of the physical line.  A comment
  68 signifies the end of the logical line unless the implicit line joining
  69 rules are invoked.
  70 Comments are ignored by the syntax; they are not tokens.
  71 \index{comment}
  72 \index{hash character}
  73
  74
  75 \subsection{Encoding declarations\label{encodings}}
  76
  77 If a comment in the first or second line of the Python script matches
  78 the regular expression \regexp{coding[=:]\e s*([\e w-_.]+)}, this comment is
  79 processed as an encoding declaration; the first group of this
  80 expression names the encoding of the source code file. The recommended
  81 forms of this expression are
  82
  83 \begin{verbatim}
  84 # -*- coding: <encoding-name> -*-
  85 \end{verbatim}
  86
  87 which is recognized also by GNU Emacs, and
  88
  89 \begin{verbatim}
  90 # vim:fileencoding=<encoding-name>
  91 \end{verbatim}
  92
  93 which is recognized by Bram Moolenar's VIM. In addition, if the first
  94 bytes of the file are the UTF-8 byte-order mark
  95 (\code{'\e xef\e xbb\e xbf'}), the declared file encoding is UTF-8
  96 (this is supported, among others, by Microsoft's \program{notepad}).
  97
  98 If an encoding is declared, the encoding name must be recognized by
  99 Python. % XXX there should be a list of supported encodings.
 100 The encoding is used for all lexical analysis, in particular to find
 101 the end of a string, and to interpret the contents of Unicode literals.
 102 String literals are converted to Unicode for syntactical analysis,
 103 then converted back to their original encoding before interpretation
 104 starts. The encoding declaration must appear on a line of its own.
 105
 106 \subsection{Explicit line joining\label{explicit-joining}}
 107
 108 Two or more physical lines may be joined into logical lines using
 109 backslash characters (\code{\e}), as follows: when a physical line ends
 110 in a backslash that is not part of a string literal or comment, it is
 111 joined with the following forming a single logical line, deleting the
 112 backslash and the following end-of-line character.  For example:
 113 \index{physical line}
 114 \index{line joining}
 115 \index{line continuation}
 116 \index{backslash character}
 117 %
 118 \begin{verbatim}
 119 if 1900 < year < 2100 and 1 <= month <= 12 \
 120    and 1 <= day <= 31 and 0 <= hour < 24 \
 121    and 0 <= minute < 60 and 0 <= second < 60:   # Looks like a valid date
 122         return 1
 123 \end{verbatim}
 124
 125 A line ending in a backslash cannot carry a comment.  A backslash does
 126 not continue a comment.  A backslash does not continue a token except
 127 for string literals (i.e., tokens other than string literals cannot be
 128 split across physical lines using a backslash).  A backslash is
 129 illegal elsewhere on a line outside a string literal.
 130
 131
 132 \subsection{Implicit line joining\label{implicit-joining}}
 133
 134 Expressions in parentheses, square brackets or curly braces can be
 135 split over more than one physical line without using backslashes.
 136 For example:
 137
 138 \begin{verbatim}
 139 month_names = ['Januari', 'Februari', 'Maart',      # These are the
 140                'April',   'Mei',      'Juni',       # Dutch names
 141                'Juli',    'Augustus', 'September',  # for the months
 142                'Oktober', 'November', 'December']   # of the year
 143 \end{verbatim}
 144
 145 Implicitly continued lines can carry comments.  The indentation of the
 146 continuation lines is not important.  Blank continuation lines are
 147 allowed.  There is no NEWLINE token between implicit continuation
 148 lines.  Implicitly continued lines can also occur within triple-quoted
 149 strings (see below); in that case they cannot carry comments.
 150
 151
 152 \subsection{Blank lines \label{blank-lines}}
 153
 154 \index{blank line}
 155 A logical line that contains only spaces, tabs, formfeeds and possibly
 156 a comment, is ignored (i.e., no NEWLINE token is generated).  During
 157 interactive input of statements, handling of a blank line may differ
 158 depending on the implementation of the read-eval-print loop.  In the
 159 standard implementation, an entirely blank logical line (i.e.\ one
 160 containing not even whitespace or a comment) terminates a multi-line
 161 statement.
 162
 163
 164 \subsection{Indentation\label{indentation}}
 165
 166 Leading whitespace (spaces and tabs) at the beginning of a logical
 167 line is used to compute the indentation level of the line, which in
 168 turn is used to determine the grouping of statements.
 169 \index{indentation}
 170 \index{whitespace}
 171 \index{leading whitespace}
 172 \index{space}
 173 \index{tab}
 174 \index{grouping}
 175 \index{statement grouping}
 176
 177 First, tabs are replaced (from left to right) by one to eight spaces
 178 such that the total number of characters up to and including the
 179 replacement is a multiple of
 180 eight (this is intended to be the same rule as used by \UNIX).  The
 181 total number of spaces preceding the first non-blank character then
 182 determines the line's indentation.  Indentation cannot be split over
 183 multiple physical lines using backslashes; the whitespace up to the
 184 first backslash determines the indentation.
 185
 186 \strong{Cross-platform compatibility note:} because of the nature of
 187 text editors on non-UNIX platforms, it is unwise to use a mixture of
 188 spaces and tabs for the indentation in a single source file.
 189
 190 A formfeed character may be present at the start of the line; it will
 191 be ignored for the indentation calculations above.  Formfeed
 192 characters occurring elsewhere in the leading whitespace have an
 193 undefined effect (for instance, they may reset the space count to
 194 zero).
 195
 196 The indentation levels of consecutive lines are used to generate
 197 INDENT and DEDENT tokens, using a stack, as follows.
 198 \index{INDENT token}
 199 \index{DEDENT token}
 200
 201 Before the first line of the file is read, a single zero is pushed on
 202 the stack; this will never be popped off again.  The numbers pushed on
 203 the stack will always be strictly increasing from bottom to top.  At
 204 the beginning of each logical line, the line's indentation level is
 205 compared to the top of the stack.  If it is equal, nothing happens.
 206 If it is larger, it is pushed on the stack, and one INDENT token is
 207 generated.  If it is smaller, it \emph{must} be one of the numbers
 208 occurring on the stack; all numbers on the stack that are larger are
 209 popped off, and for each number popped off a DEDENT token is
 210 generated.  At the end of the file, a DEDENT token is generated for
 211 each number remaining on the stack that is larger than zero.
 212
 213 Here is an example of a correctly (though confusingly) indented piece
 214 of Python code:
 215
 216 \begin{verbatim}
 217 def perm(l):
 218         # Compute the list of all permutations of l
 219     if len(l) <= 1:
 220                   return [l]
 221     r = []
 222     for i in range(len(l)):
 223              s = l[:i] + l[i+1:]
 224              p = perm(s)
 225              for x in p:
 226               r.append(l[i:i+1] + x)
 227     return r
 228 \end{verbatim}
 229
 230 The following example shows various indentation errors:
 231
 232 \begin{verbatim}
 233  def perm(l):                       # error: first line indented
 234 for i in range(len(l)):             # error: not indented
 235     s = l[:i] + l[i+1:]
 236         p = perm(l[:i] + l[i+1:])   # error: unexpected indent
 237         for x in p:
 238                 r.append(l[i:i+1] + x)
 239             return r                # error: inconsistent dedent
 240 \end{verbatim}
 241
 242 (Actually, the first three errors are detected by the parser; only the
 243 last error is found by the lexical analyzer --- the indentation of
 244 \code{return r} does not match a level popped off the stack.)
 245
 246
 247 \subsection{Whitespace between tokens\label{whitespace}}
 248
 249 Except at the beginning of a logical line or in string literals, the
 250 whitespace characters space, tab and formfeed can be used
 251 interchangeably to separate tokens.  Whitespace is needed between two
 252 tokens only if their concatenation could otherwise be interpreted as a
 253 different token (e.g., ab is one token, but a b is two tokens).
 254
 255
 256 \section{Other tokens\label{other-tokens}}
 257
 258 Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
 259 exist: \emph{identifiers}, \emph{keywords}, \emph{literals},
 260 \emph{operators}, and \emph{delimiters}.
 261 Whitespace characters (other than line terminators, discussed earlier)
 262 are not tokens, but serve to delimit tokens.
 263 Where
 264 ambiguity exists, a token comprises the longest possible string that
 265 forms a legal token, when read from left to right.
 266
 267
 268 \section{Identifiers and keywords\label{identifiers}}
 269
 270 Identifiers (also referred to as \emph{names}) are described by the following
 271 lexical definitions:
 272 \index{identifier}
 273 \index{name}
 274
 275 \begin{productionlist}
 276   \production{identifier}
 277              {(\token{letter}|"_") (\token{letter} | \token{digit} | "_")*}
 278   \production{letter}
 279              {\token{lowercase} | \token{uppercase}}
 280   \production{lowercase}
 281              {"a"..."z"}
 282   \production{uppercase}
 283              {"A"..."Z"}
 284   \production{digit}
 285              {"0"..."9"}
 286 \end{productionlist}
 287
 288 Identifiers are unlimited in length.  Case is significant.
 289
 290
 291 \subsection{Keywords\label{keywords}}
 292
 293 The following identifiers are used as reserved words, or
 294 \emph{keywords} of the language, and cannot be used as ordinary
 295 identifiers.  They must be spelled exactly as written here:%
 296 \index{keyword}%
 297 \index{reserved word}
 298
 299 \begin{verbatim}
 300 and       del       for       is        raise
 301 assert    elif      from      lambda    return
 302 break     else      global    not       try
 303 class     except    if        or        while
 304 continue  exec      import    pass      yield
 305 def       finally   in        print
 306 \end{verbatim}
 307
 308 % When adding keywords, use reswords.py for reformatting
 309
 310 Note that although the identifier \code{as} can be used as part of the
 311 syntax of \keyword{import} statements, it is not currently a reserved
 312 word.
 313
 314 In some future version of Python, the identifiers \code{as} and
 315 \code{None} will both become keywords.
 316
 317
 318 \subsection{Reserved classes of identifiers\label{id-classes}}
 319
 320 Certain classes of identifiers (besides keywords) have special
 321 meanings.  These are:
 322
 323 \begin{tableiii}{l|l|l}{code}{Form}{Meaning}{Notes}
 324 \lineiii{_*}{Not imported by \samp{from \var{module} import *}}{(1)}
 325 \lineiii{__*__}{System-defined name}{}
 326 \lineiii{__*}{Class-private name mangling}{}
 327 \end{tableiii}
 328
 329 See sections: \ref{import}, ``The \keyword{import} statement'';
 330 \ref{specialnames}, ``Special method names'';
 331 \ref{atom-identifiers}, ``Identifiers (Names)''.
 332
 333 Note:
 334
 335 \begin{description}
 336 \item[(1)] The special identifier \samp{_} is used in the interactive
 337 interpreter to store the result of the last evaluation; it is stored
 338 in the \module{__builtin__} module.  When not in interactive mode,
 339 \samp{_} has no special meaning and is not defined.
 340 \end{description}
 341
 342
 343 \section{Literals\label{literals}}
 344
 345 Literals are notations for constant values of some built-in types.
 346 \index{literal}
 347 \index{constant}
 348
 349
 350 \subsection{String literals\label{strings}}
 351
 352 String literals are described by the following lexical definitions:
 353 \index{string literal}
 354
 355 \index{ASCII@\ASCII}
 356 \begin{productionlist}
 357   \production{stringliteral}
 358              {[\token{stringprefix}](\token{shortstring} | \token{longstring})}
 359   \production{stringprefix}
 360              {"r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"}
 361   \production{shortstring}
 362              {"'" \token{shortstringitem}* "'"
 363               | '"' \token{shortstringitem}* '"'}
 364   \production{longstring}
 365              {"'''" \token{longstringitem}* "'''"}
 366   \productioncont{| '"""' \token{longstringitem}* '"""'}
 367   \production{shortstringitem}
 368              {\token{shortstringchar} | \token{escapeseq}}
 369   \production{longstringitem}
 370              {\token{longstringchar} | \token{escapeseq}}
 371   \production{shortstringchar}
 372              {<any ASCII character except "\e" or newline or the quote>}
 373   \production{longstringchar}
 374              {<any ASCII character except "\e">}
 375   \production{escapeseq}
 376              {"\e" <any ASCII character>}
 377 \end{productionlist}
 378
 379 One syntactic restriction not indicated by these productions is that
 380 whitespace is not allowed between the \grammartoken{stringprefix} and
 381 the rest of the string literal.
 382
 383 \index{triple-quoted string}
 384 \index{Unicode Consortium}
 385 \index{string!Unicode}
 386 In plain English: String literals can be enclosed in matching single
 387 quotes (\code{'}) or double quotes (\code{"}).  They can also be
 388 enclosed in matching groups of three single or double quotes (these
 389 are generally referred to as \emph{triple-quoted strings}).  The
 390 backslash (\code{\e}) character is used to escape characters that
 391 otherwise have a special meaning, such as newline, backslash itself,
 392 or the quote character.  String literals may optionally be prefixed
 393 with a letter \character{r} or \character{R}; such strings are called
 394 \dfn{raw strings}\index{raw string} and use different rules for interpreting
 395 backslash escape sequences.  A prefix of \character{u} or \character{U}
 396 makes the string a Unicode string.  Unicode strings use the Unicode character
 397 set as defined by the Unicode Consortium and ISO~10646.  Some additional
 398 escape sequences, described below, are available in Unicode strings.
 399 The two prefix characters may be combined; in this case, \character{u} must
 400 appear before \character{r}.
 401
 402 In triple-quoted strings,
 403 unescaped newlines and quotes are allowed (and are retained), except
 404 that three unescaped quotes in a row terminate the string.  (A
 405 ``quote'' is the character used to open the string, i.e. either
 406 \code{'} or \code{"}.)
 407
 408 Unless an \character{r} or \character{R} prefix is present, escape
 409 sequences in strings are interpreted according to rules similar
 410 to those used by Standard C.  The recognized escape sequences are:
 411 \index{physical line}
 412 \index{escape sequence}
 413 \index{Standard C}
 414 \index{C}
 415
 416 \begin{tableiii}{l|l|c}{code}{Escape Sequence}{Meaning}{Notes}
 417 \lineiii{\e\var{newline}} {Ignored}{}
 418 \lineiii{\e\e}  {Backslash (\code{\e})}{}
 419 \lineiii{\e'}   {Single quote (\code{'})}{}
 420 \lineiii{\e"}   {Double quote (\code{"})}{}
 421 \lineiii{\e a}  {\ASCII{} Bell (BEL)}{}
 422 \lineiii{\e b}  {\ASCII{} Backspace (BS)}{}
 423 \lineiii{\e f}  {\ASCII{} Formfeed (FF)}{}
 424 \lineiii{\e n}  {\ASCII{} Linefeed (LF)}{}
 425 \lineiii{\e N\{\var{name}\}}
 426         {Character named \var{name} in the Unicode database (Unicode only)}{}
 427 \lineiii{\e r}  {\ASCII{} Carriage Return (CR)}{}
 428 \lineiii{\e t}  {\ASCII{} Horizontal Tab (TAB)}{}
 429 \lineiii{\e u\var{xxxx}}
 430         {Character with 16-bit hex value \var{xxxx} (Unicode only)}{(1)}
 431 \lineiii{\e U\var{xxxxxxxx}}
 432         {Character with 32-bit hex value \var{xxxxxxxx} (Unicode only)}{(2)}
 433 \lineiii{\e v}  {\ASCII{} Vertical Tab (VT)}{}
 434 \lineiii{\e\var{ooo}} {\ASCII{} character with octal value \var{ooo}}{(3)}
 435 \lineiii{\e x\var{hh}} {\ASCII{} character with hex value \var{hh}}{(4)}
 436 \end{tableiii}
 437 \index{ASCII@\ASCII}
 438
 439 \noindent
 440 Notes:
 441
 442 \begin{itemize}
 443 \item[(1)]
 444   Individual code units which form parts of a surrogate pair can be
 445   encoded using this escape sequence.
 446 \item[(2)]
 447   Any Unicode character can be encoded this way, but characters
 448   outside the Basic Multilingual Plane (BMP) will be encoded using a
 449   surrogate pair if Python is compiled to use 16-bit code units (the
 450   default).  Individual code units which form parts of a surrogate
 451   pair can be encoded using this escape sequence.
 452 \item[(3)]
 453   As in Standard C, up to three octal digits are accepted.
 454 \item[(4)]
 455   Unlike in Standard C, at most two hex digits are accepted.
 456 \end{itemize}
 457
 458
 459 Unlike Standard \index{unrecognized escape sequence}C,
 460 all unrecognized escape sequences are left in the string unchanged,
 461 i.e., \emph{the backslash is left in the string}.  (This behavior is
 462 useful when debugging: if an escape sequence is mistyped, the
 463 resulting output is more easily recognized as broken.)  It is also
 464 important to note that the escape sequences marked as ``(Unicode
 465 only)'' in the table above fall into the category of unrecognized
 466 escapes for non-Unicode string literals.
 467
 468 When an \character{r} or \character{R} prefix is present, a character
 469 following a backslash is included in the string without change, and \emph{all
 470 backslashes are left in the string}.  For example, the string literal
 471 \code{r"\e n"} consists of two characters: a backslash and a lowercase
 472 \character{n}.  String quotes can be escaped with a backslash, but the
 473 backslash remains in the string; for example, \code{r"\e""} is a valid string
 474 literal consisting of two characters: a backslash and a double quote;
 475 \code{r"\e"} is not a valid string literal (even a raw string cannot
 476 end in an odd number of backslashes).  Specifically, \emph{a raw
 477 string cannot end in a single backslash} (since the backslash would
 478 escape the following quote character).  Note also that a single
 479 backslash followed by a newline is interpreted as those two characters
 480 as part of the string, \emph{not} as a line continuation.
 481
 482 When an \character{r} or \character{R} prefix is used in conjunction
 483 with a \character{u} or \character{U} prefix, then the \code{\e uXXXX}
 484 escape sequence is processed while \emph{all other backslashes are
 485 left in the string}.  For example, the string literal
 486 \code{ur"\e{}u0062\e n"} consists of three Unicode characters: `LATIN
 487 SMALL LETTER B', `REVERSE SOLIDUS', and `LATIN SMALL LETTER N'.
 488 Backslashes can be escaped with a preceding backslash; however, both
 489 remain in the string.  As a result, \code{\e uXXXX} escape sequences
 490 are only recognized when there are an odd number of backslashes.
 491
 492 \subsection{String literal concatenation\label{string-catenation}}
 493
 494 Multiple adjacent string literals (delimited by whitespace), possibly
 495 using different quoting conventions, are allowed, and their meaning is
 496 the same as their concatenation.  Thus, \code{"hello" 'world'} is
 497 equivalent to \code{"helloworld"}.  This feature can be used to reduce
 498 the number of backslashes needed, to split long strings conveniently
 499 across long lines, or even to add comments to parts of strings, for
 500 example:
 501
 502 \begin{verbatim}
 503 re.compile("[A-Za-z_]"       # letter or underscore
 504            "[A-Za-z0-9_]*"   # letter, digit or underscore
 505           )
 506 \end{verbatim}
 507
 508 Note that this feature is defined at the syntactical level, but
 509 implemented at compile time.  The `+' operator must be used to
 510 concatenate string expressions at run time.  Also note that literal
 511 concatenation can use different quoting styles for each component
 512 (even mixing raw strings and triple quoted strings).
 513
 514
 515 \subsection{Numeric literals\label{numbers}}
 516
 517 There are four types of numeric literals: plain integers, long
 518 integers, floating point numbers, and imaginary numbers.  There are no
 519 complex literals (complex numbers can be formed by adding a real
 520 number and an imaginary number).
 521 \index{number}
 522 \index{numeric literal}
 523 \index{integer literal}
 524 \index{plain integer literal}
 525 \index{long integer literal}
 526 \index{floating point literal}
 527 \index{hexadecimal literal}
 528 \index{octal literal}
 529 \index{decimal literal}
 530 \index{imaginary literal}
 531 \index{complex!literal}
 532
 533 Note that numeric literals do not include a sign; a phrase like
 534 \code{-1} is actually an expression composed of the unary operator
 535 `\code{-}' and the literal \code{1}.
 536
 537
 538 \subsection{Integer and long integer literals\label{integers}}
 539
 540 Integer and long integer literals are described by the following
 541 lexical definitions:
 542
 543 \begin{productionlist}
 544   \production{longinteger}
 545              {\token{integer} ("l" | "L")}
 546   \production{integer}
 547              {\token{decimalinteger} | \token{octinteger} | \token{hexinteger}}
 548   \production{decimalinteger}
 549              {\token{nonzerodigit} \token{digit}* | "0"}
 550   \production{octinteger}
 551              {"0" \token{octdigit}+}
 552   \production{hexinteger}
 553              {"0" ("x" | "X") \token{hexdigit}+}
 554   \production{nonzerodigit}
 555              {"1"..."9"}
 556   \production{octdigit}
 557              {"0"..."7"}
 558   \production{hexdigit}
 559              {\token{digit} | "a"..."f" | "A"..."F"}
 560 \end{productionlist}
 561
 562 Although both lower case \character{l} and upper case \character{L} are
 563 allowed as suffix for long integers, it is strongly recommended to always
 564 use \character{L}, since the letter \character{l} looks too much like the
 565 digit \character{1}.
 566
 567 Plain integer decimal literals that are above the largest representable
 568 plain integer (e.g., 2147483647 when using 32-bit arithmetic) are accepted
 569 as if they were long integers instead.  Octal and hexadecimal literals
 570 behave similarly, but when in the range just above the largest representable
 571 plain integer but below the largest unsigned 32-bit number (on a machine
 572 using 32-bit arithmetic), 4294967296, they are taken as the negative plain
 573 integer obtained by subtracting 4294967296 from their unsigned value.  There
 574 is no limit for long integer literals apart from what can be stored in
 575 available memory.  For example, 0xdeadbeef is taken, on a 32-bit machine,
 576 as the value -559038737, while 0xdeadbeeffeed is taken as the value
 577 244837814107885L.
 578
 579 Some examples of plain integer literals (first row) and long integer
 580 literals (second and third rows):
 581
 582 \begin{verbatim}
 583 7     2147483647                        0177    0x80000000
 584 3L    79228162514264337593543950336L    0377L   0x100000000L
 585       79228162514264337593543950336             0xdeadbeeffeed
 586 \end{verbatim}
 587
 588
 589 \subsection{Floating point literals\label{floating}}
 590
 591 Floating point literals are described by the following lexical
 592 definitions:
 593
 594 \begin{productionlist}
 595   \production{floatnumber}
 596              {\token{pointfloat} | \token{exponentfloat}}
 597   \production{pointfloat}
 598              {[\token{intpart}] \token{fraction} | \token{intpart} "."}
 599   \production{exponentfloat}
 600              {(\token{intpart} | \token{pointfloat})
 601               \token{exponent}}
 602   \production{intpart}
 603              {\token{digit}+}
 604   \production{fraction}
 605              {"." \token{digit}+}
 606   \production{exponent}
 607              {("e" | "E") ["+" | "-"] \token{digit}+}
 608 \end{productionlist}
 609
 610 Note that the integer and exponent parts of floating point numbers
 611 can look like octal integers, but are interpreted using radix 10.  For
 612 example, \samp{077e010} is legal, and denotes the same number
 613 as \samp{77e10}.
 614 The allowed range of floating point literals is
 615 implementation-dependent.
 616 Some examples of floating point literals:
 617
 618 \begin{verbatim}
 619 3.14    10.    .001    1e100    3.14e-10    0e0
 620 \end{verbatim}
 621
 622 Note that numeric literals do not include a sign; a phrase like
 623 \code{-1} is actually an expression composed of the operator
 624 \code{-} and the literal \code{1}.
 625
 626
 627 \subsection{Imaginary literals\label{imaginary}}
 628
 629 Imaginary literals are described by the following lexical definitions:
 630
 631 \begin{productionlist}
 632   \production{imagnumber}{(\token{floatnumber} | \token{intpart}) ("j" | "J")}
 633 \end{productionlist}
 634
 635 An imaginary literal yields a complex number with a real part of
 636 0.0.  Complex numbers are represented as a pair of floating point
 637 numbers and have the same restrictions on their range.  To create a
 638 complex number with a nonzero real part, add a floating point number
 639 to it, e.g., \code{(3+4j)}.  Some examples of imaginary literals:
 640
 641 \begin{verbatim}
 642 3.14j   10.j    10j     .001j   1e100j  3.14e-10j
 643 \end{verbatim}
 644
 645
 646 \section{Operators\label{operators}}
 647
 648 The following tokens are operators:
 649 \index{operators}
 650
 651 \begin{verbatim}
 652 +       -       *       **      /       //      %
 653 <<      >>      &       |       ^       ~
 654 <       >       <=      >=      ==      !=      <>
 655 \end{verbatim}
 656
 657 The comparison operators \code{<>} and \code{!=} are alternate
 658 spellings of the same operator.  \code{!=} is the preferred spelling;
 659 \code{<>} is obsolescent.
 660
 661
 662 \section{Delimiters\label{delimiters}}
 663
 664 The following tokens serve as delimiters in the grammar:
 665 \index{delimiters}
 666
 667 \begin{verbatim}
 668 (       )       [       ]       {       }
 669 ,       :       .       `       =       ;
 670 +=      -=      *=      /=      //=     %=
 671 &=      |=      ^=      >>=     <<=     **=
 672 \end{verbatim}
 673
 674 The period can also occur in floating-point and imaginary literals.  A
 675 sequence of three periods has a special meaning as an ellipsis in slices.
 676 The second half of the list, the augmented assignment operators, serve
 677 lexically as delimiters, but also perform an operation.
 678
 679 The following printing \ASCII{} characters have special meaning as part
 680 of other tokens or are otherwise significant to the lexical analyzer:
 681
 682 \begin{verbatim}
 683 '       "       #       \
 684 \end{verbatim}
 685
 686 The following printing \ASCII{} characters are not used in Python.  Their
 687 occurrence outside string literals and comments is an unconditional
 688 error:
 689 \index{ASCII@\ASCII}
 690
 691 \begin{verbatim}
 692 @       $       ?
 693 \end{verbatim}