1 <?xml version=
"1.0" encoding=
"UTF-8"?>
2 <!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
4 <html xmlns=
"http://www.w3.org/1999/xhtml" xml:
lang=
"en" lang=
"en">
6 <meta http-equiv=
"Content-Type" content=
"text/html; charset=UTF-8" />
7 <meta name=
"description" content=
"Describes config schema framework in HTML Purifier." />
8 <link rel=
"stylesheet" type=
"text/css" href=
"./style.css" />
9 <title>Config Schema - HTML Purifier
</title>
13 <h1>Config Schema
</h1>
15 <div id=
"filing">Filed under Development
</div>
16 <div id=
"index">Return to the
<a href=
"index.html">index
</a>.
</div>
17 <div id=
"home"><a href=
"http://htmlpurifier.org/">HTML Purifier
</a> End-User Documentation
</div>
20 HTML Purifier has a fairly complex system for configuration. Users
21 interact with a
<code>HTMLPurifier_Config
</code> object to
22 set configuration directives. The values they set are validated according
23 to a configuration schema,
<code>HTMLPurifier_ConfigSchema
</code>.
27 The schema is mostly transparent to end-users, but if you're doing development
28 work for HTML Purifier and need to define a new configuration directive,
29 you'll need to interact with it. We'll also talk about how to define
30 userspace configuration directives at the very end.
33 <h2>Write a directive file
</h2>
36 Directive files define configuration directives to be used by
37 HTML Purifier. They are placed in
<code>library/HTMLPurifier/ConfigSchema/schema/
</code>
38 in the form
<code><em>Namespace
</em>.
<em>Directive
</em>.txt
</code> (I
39 couldn't think of a more descriptive file extension.)
40 Directive files are actually what we call
<code>StringHash
</code>es,
41 i.e. associative arrays represented in a string form reminiscent of
42 <a href=
"http://qa.php.net/write-test.php">PHPT
</a> tests. Here's a
43 sample directive file,
<code>Test.Sample.txt
</code>:
50 VALUE-ALIASES: 'baz' =
> 'bar'
53 This is a sample configuration directive for the purposes of the
54 <code
>dev-config-schema.html
<code
> documentation.
59 Each of these segments has a specific meaning:
74 <td>The name of the directive, in the form Namespace.Directive
75 (implicitly the first line)
</td>
80 <td>The type of variable this directive accepts. See below for
81 details. You can also add
<code>/null
</code> to the end of
82 any basic type to allow null values too.
</td>
87 <td>A parseable PHP expression of the default value.
</td>
92 <td>An HTML description of what this directive does.
</td>
97 <td><em>Recommended
</em>. The version of HTML Purifier this directive was added.
98 Directives that have been around since
1.0.0 don't have this,
99 but any new ones should.
</td>
103 <td>Test.Example
</td>
104 <td><em>Optional
</em>. A comma separated list of aliases for this directive.
105 This is most useful for backwards compatibility and should
106 not be used otherwise.
</td>
110 <td>'foo', 'bar'
</td>
111 <td><em>Optional
</em>. Set of allowed value for a directive,
112 a comma separated list of parseable PHP expressions. This
113 is only allowed string, istring, text and itext TYPEs.
</td>
116 <td>VALUE-ALIASES
</td>
117 <td>'baz' =
> 'bar'
</td>
118 <td><em>Optional
</em>. Mapping of one value to another, and
119 should be a comma separated list of keypair duples. This
120 is only allowed string, istring, text and itext TYPEs.
</td>
123 <td>DEPRECATED-VERSION
</td>
125 <td><em>Not shown
</em>. Indicates that the directive was
126 deprecated this version.
</td>
129 <td>DEPRECATED-USE
</td>
130 <td>Test.NewDirective
</td>
131 <td><em>Not shown
</em>. Indicates what new directive should be
132 used instead. Note that the directives will functionally be
133 different, although they should offer the same functionality.
134 If they are identical, use an alias instead.
</td>
139 <td><em>Not shown
</em>. Indicates if there is an external library
140 the user will need to download and install to use this configuration
141 directive. As of right now, this is merely a Google-able name; future
142 versions may also provide links and instructions.
</td>
148 Some notes on format and style:
153 Each of these keys can be expressed in the short format
154 (
<code>KEY: Value
</code>) or the long format
155 (
<code>--KEY--
</code> with value beneath). You must use the
156 long format if multiple lines are needed, or if a long format
157 has been used already (that's why
<code>ALIASES
</code> in our
158 example is in the long format); otherwise, it's user preference.
161 The HTML descriptions should be wrapped at about
80 columns; do
162 not rely on editor word-wrapping.
167 Also, as promised, here is the set of possible types:
170 <table class=
"table">
182 <td><a href=
"http://docs.php.net/manual/en/language.types.string.php">String
</a> without newlines
</td>
187 <td>Case insensitive ASCII string without newlines
</td>
191 <td>"A<em>\n</em>b"</td>
192 <td>String with newlines
</td>
196 <td>"a<em>\n</em>b"</td>
197 <td>Case insensitive ASCII string without newlines
</td>
207 <td>Floating point number
</td>
216 <td>array('key' =
> true)
</td>
217 <td>Lookup array, used with
<code>isset($var[$key])
</code></td>
221 <td>array('f', 'b')
</td>
222 <td>List array, with ordered numerical indexes
</td>
226 <td>array('key' =
> 'val')
</td>
227 <td>Associative array of keys to values
</td>
231 <td>new stdclass
</td>
232 <td>Any PHP variable is fine
</td>
238 The examples represent what will be returned out of the configuration
239 object; users have a little bit of leeway when setting configuration
240 values (for example, a lookup value can be specified as a list;
241 HTML Purifier will flip it as necessary.) These types are defined
242 in
<a href=
"http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/VarParser.php">
243 library/HTMLPurifier/VarParser.php
</a>.
247 For more information on what values are allowed, and how they are parsed,
248 consult
<a href=
"http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
249 library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php
</a>, as well
250 as
<a href=
"http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Interchange/Directive.php">
251 library/HTMLPurifier/ConfigSchema/Interchange/Directive.php
</a> for
252 the semantics of the parsed values.
255 <h2>Refreshing the cache
</h2>
258 You may have noticed that your directive file isn't doing anything
259 yet. That's because it hasn't been added to the runtime
260 <code>HTMLPurifier_ConfigSchema
</code> instance. Run
261 <code>maintenance/generate-schema-cache.php
</code> to fix this.
262 If there were no errors, you're good to go! Don't forget to add
263 some unit tests for your functionality!
267 If you ever make changes to your configuration directives, you
268 will need to run this script again.
274 All directive files go through a rigorous validation process
275 through
<a href=
"http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Validator.php">
276 library/HTMLPurifier/ConfigSchema/Validator.php
</a>, as well
277 as some basic checks during building. While
278 listing every error out here is out-of-scope for this document, we
279 can give some general tips for interpreting error messages.
280 There are two types of errors: builder errors and validation errors.
283 <h3>Builder errors
</h3>
287 <strong>Exception:
</strong> Expected type string, got
288 integer in DEFAULT in directive hash 'Ns.Dir'
293 You can identify a builder error by the keyword
"directive hash."
294 These are the easiest to deal with, because they directly correspond
295 with your directive file. Find the offending directive file (which
296 is the directive hash plus the .txt extension), find the
297 offending index (
"in DEFAULT" means the DEFAULT key) and fix the error.
298 This particular error would occur if your default value is not the same
302 <h3>Validation errors
</h3>
306 <strong>Exception:
</strong> Alias
3 in valueAliases in directive
307 'Ns.Dir' must be a string
312 These are a little trickier, because we're not actually validating
313 your directive file, or even the direct string hash representation.
314 We're validating an Interchange object, and the error messages do
315 not mention any string hash keys.
319 Nevertheless, it's not difficult to figure out what went wrong.
320 Read the
"context" statements in reverse:
324 <dt>in directive 'Ns.Dir'
</dt>
325 <dd>This means we need to look at the directive file
<code>Ns.Dir.txt
</code></dd>
326 <dt>in valueAliases
</dt>
327 <dd>There's no key actually called this, but there's one that's close:
328 VALUE-ALIASES. Indeed, that's where to look.
</dd>
330 <dd>The value alias that is equal to
3 is the culprit.
</dd>
334 In this particular case, you're not allowed to alias integers values to
339 The most difficult part is translating the Interchange member variable (valueAliases)
340 into a directive file key (VALUE-ALIASES), but there's a one-to-one
341 correspondence currently. If the two formats diverge, any discrepancies
342 will be described in
<a href=
"http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
343 library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php
</a>.
349 Much of the configuration schema framework's codebase deals with
350 shuffling data from one format to another, and doing validation on this
352 The keystone of all of this is the
<code>HTMLPurifier_ConfigSchema_Interchange
</code>
353 class, which represents the purest, parsed representation of the schema.
357 Hand-writing this data is unwieldy, however, so we write directive files.
358 These directive files are parsed by
<code>HTMLPurifier_StringHashParser
</code>
359 into
<code>HTMLPurifier_StringHash
</code>es, which then
360 are run through
<code>HTMLPurifier_ConfigSchema_InterchangeBuilder
</code>
361 to construct the interchange object.
365 From the interchange object, the data can be siphoned into other forms
366 using
<code>HTMLPurifier_ConfigSchema_Builder
</code> subclasses.
367 For example,
<code>HTMLPurifier_ConfigSchema_Builder_ConfigSchema
</code>
368 generates a runtime
<code>HTMLPurifier_ConfigSchema
</code> object,
369 which
<code>HTMLPurifier_Config
</code> uses to validate its incoming
370 data. There is also an XML serializer, which is used to build documentation.
376 <!-- vim: et sw=4 sts=4 -->