Don't add vimline to auto-generated files.
[htmlpurifier/darkodev.git] / docs / enduser-uri-filter.html
blob18d6993615853e08708b9cb1d7e33c13e670b622
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
4 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6 <meta name="description" content="Tutorial for creating custom URI filters." />
7 <link rel="stylesheet" type="text/css" href="style.css" />
9 <title>URI Filters - HTML Purifier</title>
11 </head><body>
13 <h1>URI Filters</h1>
15 <div id="filing">Filed under End-User</div>
16 <div id="index">Return to the <a href="index.html">index</a>.</div>
17 <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
19 <p>
20 This is a quick and dirty document to get you on your way to writing
21 custom URI filters for your own URL filtering needs. Why would you
22 want to write a URI filter? If you need URIs your users put into
23 HTML to magically change into a different URI, this is
24 exactly what you need!
25 </p>
27 <h2>Creating the class</h2>
29 <p>
30 Any URI filter you make will be a subclass of <code>HTMLPurifier_URIFilter</code>.
31 The scaffolding is thus:
32 </p>
34 <pre>class HTMLPurifier_URIFilter_<strong>NameOfFilter</strong> extends HTMLPurifier_URIFilter
36 public $name = '<strong>NameOfFilter</strong>';
37 public function prepare($config) {}
38 public function filter(&$uri, $config, $context) {}
39 }</pre>
41 <p>
42 Fill in the variable <code>$name</code> with the name of your filter, and
43 take a look at the two methods. <code>prepare()</code> is an initialization
44 method that is called only once, before any filtering has been done of the
45 HTML. Use it to perform any costly setup work that only needs to be done
46 once. <code>filter()</code> is the guts and innards of our filter:
47 it takes the URI and does whatever needs to be done to it.
48 </p>
50 <p>
51 If you've worked with HTML Purifier, you'll recognize the <code>$config</code>
52 and <code>$context</code> parameters. On the other hand, <code>$uri</code>
53 is something unique to this section of the application: it's a
54 <code>HTMLPurifier_URI</code> object. The interface is thus:
55 </p>
57 <pre>class HTMLPurifier_URI
59 public $scheme, $userinfo, $host, $port, $path, $query, $fragment;
60 public function HTMLPurifier_URI($scheme, $userinfo, $host, $port, $path, $query, $fragment);
61 public function toString();
62 public function copy();
63 public function getSchemeObj($config, $context);
64 public function validate($config, $context);
65 }</pre>
67 <p>
68 The first three methods are fairly self-explanatory: you have a constructor,
69 a serializer, and a cloner. Generally, you won't be using them when
70 you are manipulating the URI objects themselves.
71 <code>getSchemeObj()</code> is a special purpose method that returns
72 a <code>HTMLPurifier_URIScheme</code> object corresponding to the specific
73 URI at hand. <code>validate()</code> performs general-purpose validation
74 on the internal components of a URI. Once again, you don't need to
75 worry about these: they've already been handled for you.
76 </p>
78 <h2>URI format</h2>
80 <p>
81 As a URIFilter, we're interested in the member variables of the URI object.
82 </p>
84 <table class="quick"><tbody>
85 <tr><th>Scheme</th> <td>The protocol for identifying (and possibly locating) a resource (http, ftp, https)</td></tr>
86 <tr><th>Userinfo</th> <td>User information such as a username (bob)</td></tr>
87 <tr><th>Host</th> <td>Domain name or IP address of the server (example.com, 127.0.0.1)</td></tr>
88 <tr><th>Port</th> <td>Network port number for the server (80, 12345)</td></tr>
89 <tr><th>Path</th> <td>Data that identifies the resource, possibly hierarchical (/path/to, ed@example.com)</td></tr>
90 <tr><th>Query</th> <td>String of information to be interpreted by the resource (?q=search-term)</td></tr>
91 <tr><th>Fragment</th> <td>Additional information for the resource after retrieval (#bookmark)</td></tr>
92 </tbody></table>
94 <p>
95 Because the URI is presented to us in this form, and not
96 <code>http://bob@example.com:8080/foo.php?q=string#hash</code>, it saves us
97 a lot of trouble in having to parse the URI every time we want to filter
98 it. For the record, the above URI has the following components:
99 </p>
101 <table class="quick"><tbody>
102 <tr><th>Scheme</th> <td>http</td></tr>
103 <tr><th>Userinfo</th> <td>bob</td></tr>
104 <tr><th>Host</th> <td>example.com</td></tr>
105 <tr><th>Port</th> <td>8080</td></tr>
106 <tr><th>Path</th> <td>/foo.php</td></tr>
107 <tr><th>Query</th> <td>q=string</td></tr>
108 <tr><th>Fragment</th> <td>hash</td></tr>
109 </tbody></table>
112 Note that there is no question mark or octothorpe in the query or
113 fragment: these get removed during parsing.
114 </p>
117 With this information, you can get straight to implementing your
118 <code>filter()</code> method. But one more thing...
119 </p>
121 <h2>Return value: Boolean, not URI</h2>
124 You may have noticed that the URI is being passed in by reference.
125 This means that whatever changes you make to it, those changes will
126 be reflected in the URI object the callee had. <strong>Do not
127 return the URI object: it is unnecessary and will cause bugs.</strong>
128 Instead, return a boolean value, true if the filtering was successful,
129 or false if the URI is beyond repair and needs to be axed.
130 </p>
133 Let's suppose I wanted to write a filter that converted links with a
134 custom <code>image</code> scheme to its corresponding real path on
135 our website:
136 </p>
138 <pre>class HTMLPurifier_URIFilter_TransformImageScheme extends HTMLPurifier_URIFilter
140 public $name = 'TransformImageScheme';
141 public function filter(&$uri, $config, $context) {
142 if ($uri->scheme !== 'image') return true;
143 $img_name = $uri->path;
144 // Overwrite the previous URI object
145 $uri = new HTMLPurifier_URI('http', null, null, null, '/img/' . $img_name . '.png', null, null);
146 return true;
148 }</pre>
151 Notice I did not <code>return $uri;</code>. This filter would turn
152 <code>image:Foo</code> into <code>/img/Foo.png</code>.
153 </p>
155 <h2>Activating your filter</h2>
158 Having a filter is all well and good, but you need to tell HTML Purifier
159 to use it. Fortunately, this part's simple:
160 </p>
162 <pre>$uri = $config->getDefinition('URI');
163 $uri->addFilter(new HTMLPurifier_URIFilter_<strong>NameOfFilter</strong>());</pre>
166 If you want to be really fancy, you can define a configuration directive
167 for your filter and have HTML Purifier automatically manage whether or
168 not your filter gets loaded or not (this is how internal filters manage
169 things):
170 </p>
172 <pre>HTMLPurifier_ConfigSchema::define(
173 'URI', '<strong>NameOfFilter</strong>', false, 'bool',
174 '<strong>What your filter does.</strong>'
176 $uri = $config->getDefinition('URI', true);
177 $uri->registerFilter(new HTMLPurifier_URIFilter_<strong>NameOfFilter</strong>());
178 </pre>
181 Now, your filter will only be called when %URI.<strong>NameOfFilter</strong>
182 is set to true.
183 </p>
185 <h2>Post-filter</h2>
188 Remember our TransformImageScheme filter? That filter acted before we had
189 performed scheme validation; otherwise, the URI would have been filtered
190 out when it was discovered that there was no image scheme. Well, a post-filter
191 is run after scheme specific validation, so it's ideal for bulk
192 post-processing of URIs, including munging. To specify a URI as a post-filter,
193 set the <code>$post</code> member variable to TRUE.
194 </p>
196 <pre>class HTMLPurifier_URIFilter_MyPostFilter extends HTMLPurifier_URIFilter
198 public $name = 'MyPostFilter';
199 public $post = true;
200 // ... extra code here
202 </pre>
204 <h2>Examples</h2>
207 Check the
208 <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/URIFilter">URIFilter</a>
209 directory for more implementation examples, and see <a href="proposal-new-directives.txt">the
210 new directives proposal document</a> for ideas on what could be implemented
211 as a filter.
212 </p>
214 </body></html>
216 <!-- vim: et sw=4 sts=4 -->