1 \section{Standard Module
\module{urllib
}}
9 This module provides a high-level interface for fetching data across
10 the World-Wide Web. In particular, the
\function{urlopen()
} function
11 is similar to the built-in function
\function{open()
}, but accepts
12 Universal Resource Locators (URLs) instead of filenames. Some
13 restrictions apply --- it can only open URLs for reading, and no seek
14 operations are available.
16 It defines the following public functions:
18 \begin{funcdesc
}{urlopen
}{url
}
19 Open a network object denoted by a URL for reading. If the URL does
20 not have a scheme identifier, or if it has
\file{file:
} as its scheme
21 identifier, this opens a local file; otherwise it opens a socket to a
22 server somewhere on the network. If the connection cannot be made, or
23 if the server returns an error code, the
\exception{IOError
} exception
24 is raised. If all went well, a file-like object is returned. This
25 supports the following methods:
\method{read()
},
\method{readline()
},
26 \method{readlines()
},
\method{fileno()
},
\method{close()
} and
28 Except for the last one, these methods have the same interface as for
29 file objects --- see section
\ref{bltin-file-objects
} in this
30 manual. (It is not a built-in file object, however, so it can't be
31 used at those few places where a true built-in file object is
34 The
\method{info()
} method returns an instance of the class
35 \class{mimetools.Message
} containing the headers received from the
36 server, if the protocol uses such headers (currently the only
37 supported protocol that uses this is HTTP). See the description of
38 the
\module{mimetools
}\refstmodindex{mimetools
} module.
41 \begin{funcdesc
}{urlretrieve
}{url
}
42 Copy a network object denoted by a URL to a local file, if necessary.
43 If the URL points to a local file, or a valid cached copy of the
44 object exists, the object is not copied. Return a tuple
45 \code{(
\var{filename
},
\var{headers
})
} where
\var{filename
} is the
46 local file name under which the object can be found, and
\var{headers
}
47 is either
\code{None
} (for a local object) or whatever the
48 \method{info()
} method of the object returned by
\function{urlopen()
}
49 returned (for a remote object, possibly cached). Exceptions are the
50 same as for
\function{urlopen()
}.
53 \begin{funcdesc
}{urlcleanup
}{}
54 Clear the cache that may have been built up by previous calls to
55 \function{urlretrieve()
}.
58 \begin{funcdesc
}{quote
}{string
\optional{, addsafe
}}
59 Replace special characters in
\var{string
} using the
\samp{\%xx
} escape.
60 Letters, digits, and the characters
\character{_,.-
} are never quoted.
61 The optional
\var{addsafe
} parameter specifies additional characters
62 that should not be quoted --- its default value is
\code{'/'
}.
64 Example:
\code{quote('/\~connolly/')
} yields
\code{'/\%
7econnolly/'
}.
67 \begin{funcdesc
}{quote_plus
}{string
\optional{, addsafe
}}
68 Like
\function{quote()
}, but also replaces spaces by plus signs, as
69 required for quoting HTML form values.
72 \begin{funcdesc
}{unquote
}{string
}
73 Replace
\samp{\%xx
} escapes by their single-character equivalent.
75 Example:
\code{unquote('/\%
7Econnolly/')
} yields
\code{'/\~connolly/'
}.
78 \begin{funcdesc
}{unquote_plus
}{string
}
79 Like
\function{unquote()
}, but also replaces plus signs by spaces, as
80 required for unquoting HTML form values.
88 Currently, only the following protocols are supported: HTTP, (versions
89 0.9 and
1.0), Gopher (but not Gopher-+), FTP, and local files.
90 \indexii{HTTP
}{protocol
}
91 \indexii{Gopher
}{protocol
}
92 \indexii{FTP
}{protocol
}
95 The caching feature of
\function{urlretrieve()
} has been disabled
96 until I find the time to hack proper processing of Expiration time
100 There should be a function to query whether a particular URL is in
104 For backward compatibility, if a URL appears to point to a local file
105 but the file can't be opened, the URL is re-interpreted using the FTP
106 protocol. This can sometimes cause confusing error messages.
109 The
\function{urlopen()
} and
\function{urlretrieve()
} functions can
110 cause arbitrarily long delays while waiting for a network connection
111 to be set up. This means that it is difficult to build an interactive
112 web client using these functions without using threads.
115 The data returned by
\function{urlopen()
} or
\function{urlretrieve()
}
116 is the raw data returned by the server. This may be binary data
117 (e.g. an image), plain text or (for example) HTML. The HTTP protocol
118 provides type information in the reply header, which can be inspected
119 by looking at the
\code{content-type
} header. For the Gopher protocol,
120 type information is encoded in the URL; there is currently no easy way
121 to extract it. If the returned data is HTML, you can use the module
122 \module{htmllib
}\refstmodindex{htmllib
} to parse it.
124 \indexii{HTTP
}{protocol
}
125 \indexii{Gopher
}{protocol
}
128 Although the
\module{urllib
} module contains (undocumented) routines
129 to parse and unparse URL strings, the recommended interface for URL
130 manipulation is in module
\module{urlparse
}\refstmodindex{urlparse
}.