1 \section{\module{tokenize
} ---
2 Tokenizer for Python source
}
4 \declaremodule{standard
}{tokenize
}
5 \modulesynopsis{Lexical scanner for Python source code.
}
6 \moduleauthor{Ka Ping Yee
}{}
7 \sectionauthor{Fred L. Drake, Jr.
}{fdrake@acm.org
}
10 The
\module{tokenize
} module provides a lexical scanner for Python
11 source code, implemented in Python. The scanner in this module
12 returns comments as tokens as well, making it useful for implementing
13 ``pretty-printers,'' including colorizers for on-screen displays.
15 The scanner is exposed via single function:
18 \begin{funcdesc
}{tokenize
}{readline
\optional{, tokeneater
}}
19 The
\function{tokenize()
} function accepts two parameters: one
20 representing the input stream, and one providing an output mechanism
21 for
\function{tokenize()
}.
23 The first parameter,
\var{readline
}, must be a callable object which
24 provides the same interface as
\method{readline()
} method of
25 built-in file objects (see section~
\ref{bltin-file-objects
}). Each
26 call to the function should return one line of input as a string.
28 The second parameter,
\var{tokeneater
}, must also be a callable
29 object. It is called with five parameters: the token type, the
30 token string, a tuple
\code{(
\var{srow
},
\var{scol
})
} specifying the
31 row and column where the token begins in the source, a tuple
32 \code{(
\var{erow
},
\var{ecol
})
} giving the ending position of the
33 token, and the line on which the token was found. The line passed
34 is the
\emph{logical
} line; continuation lines are included.
38 All constants from the
\refmodule{token
} module are also exported from
39 \module{tokenize
}, as is one additional token type value that might be
40 passed to the
\var{tokeneater
} function by
\function{tokenize()
}:
42 \begin{datadesc
}{COMMENT
}
43 Token value used to indicate a comment.