Update comparison page to latest version of htmLawed.
[htmlpurifier-web.git] / contribute.xhtml
blob48987596b4623a069e17b1c6815bc07bb8b1a86b
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4 <html
5 xmlns="http://www.w3.org/1999/xhtml"
6 xmlns:xi="http://www.w3.org/2001/XInclude"
7 xml:lang="en">
8 <head>
9 <title>Contribute - HTML Purifier</title>
10 <xi:include href="common-meta.xml" xpointer="xpointer(/*/node())" />
11 <meta name="description" content="How to help HTML Purifier grow through code and attention." />
12 <meta name="keywords" content="HTMLPurifier, HTML Purifier, HTML, filter, filtering, standards, compliant, contribute, contribution, open source, community, help, code, needed" />
13 </head>
14 <body>
16 <xi:include href="common-header.xml" xpointer="xpointer(/*/node())" />
18 <div id="main">
19 <h1 id="title">Contribute</h1>
21 <div id="content">
23 <p>
24 The very first question to ask yourself before reading this page is this:
25 </p>
27 <blockquote><p><em>Why contribute?</em></p></blockquote>
29 <p>
30 As open-source software, you are not legally obligated to give anything
31 back to the community. In such a sense, HTML Purifier is our gift to
32 you, and you very well can run away and never be heard from again.
33 </p>
35 <p>
36 We hope, however, that this lack of a legal obligation doesn't prevent
37 you from contributing back to our project. We poured many hours into
38 this project, and doubtless, this project has saved
39 many hours on your behalf. If HTML Purifier saved you 200 hours of work
40 (the actual figure might be more, might be less), even if you contribute
41 ten hours back to the project, you still come out ahead 190 hours.
42 </p>
44 <p>
45 Additionally, your use of this library also requires substantial investment
46 on your part as well. You were required to learn the APIs, read the
47 documentation, tweak things so that they worked with your application,
48 et cetera. Contributing back means making good use of this investment:
49 it means not only will your expertise and knowledge be fed back into
50 HTML Purifier, but you might learn a thing or to from the internals that
51 you didn't know before.
52 </p>
54 <p>
55 If I've convinced you, read on! It's quite easy to get started...
56 </p>
58 <div id="toc" />
60 <h2>What can you do?</h2>
62 <p>
63 Contributions can come in many forms. Documentation, code, even
64 evangelism, can all help a project. One of the things we've noticed,
65 however, is that many contributions come from people helping
66 themselves. They have an itch, a special requirement, and they help
67 the project out in that area.
68 </p>
70 <p>
71 What might that itch be? Over the years, we've accumulated many feature
72 requests in our <a href="dev/TODO">TODO</a> file. There are also
73 tasty tidbits in the <a href="docs">proposal section of our
74 documentation.</a> You might have an
75 idea for a new AutoFormatter, or maybe would like to implement an HTMLModule
76 for a set of elements that HTML Purifier doesn't support yet. Maybe you
77 want a demo page built-in with the library so that you can easily test
78 things out without using HTML Purifier's demo page. Code something that
79 interests you.
80 </p>
82 <h2>Coding standards</h2>
84 <p>
85 As a general rule of thumb, make sure your code looks like the code around
86 it. Probably the biggest thing is to remember four spaces, no tabs (if you
87 perpetually forget, get your text-editor to make whitespace visible). There
88 are a number of other formatting subtleties, but suffice to say
89 <em>consistency</em> is the order of the day in this project. You're not
90 going to read <acronym title="Yet Another Coding Standard">YACS</acronym> anyway.
91 </p>
93 <p>
94 The code you write must be PHP 5.0.5 compatible, so avoid later features
95 like magic methods. The code you write also must have unit tests, which
96 reside in the <em>tests/</em> directory. The workflow for your feature
97 should be along the lines of:
98 </p>
100 <ol>
101 <li>Write unit tests</li>
102 <li>Hack hack hack</li>
103 <li>Run <em>php tests/index.php</em></li>
104 <li>If failures, go back to 1 or 2</li>
105 <li>Commit and submit patch</li>
106 </ol>
109 HTML Purifier prides itself in having an evergreen test suite, so if your
110 change breaks other tests, it probably won't be accepted.
111 </p>
113 <h2>Getting setup</h2>
116 You already know how to <em>use</em> HTML Purifier. But do you know how
117 to develop it?
118 </p>
120 <h3>Git</h3>
123 HTML Purifier's repository is hosted via Git. If you've used Git before,
124 you can skip this section: you already know what the workflow is for
125 working on Git, so just clone from <em>git://repo.or.cz/htmlpurifier.git</em> and
126 get going. Otherwise, read-on.
127 </p>
130 In order to hack on HTML Purifier's source tree, you will first need to
131 make sure Git is installed on your system. Type the following command
132 in your prompt:
133 </p>
135 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/">git</a> --version</pre>
138 And you should get something along the lines of <q>git version 1.5.6</q>.
139 Otherwise:
140 </p>
142 <dl>
143 <dt>You use Linux:</dt>
144 <dd>
145 Grab Git from your friendly neighborhood package manager. Or compile
146 from source with package provided at <a href="http://git.or.cz/">git.or.cz</a>.
147 Either should be relatively simple.
148 </dd>
149 <dt>You use Windows:</dt>
150 <dd>
151 Download and install <a href="http://code.google.com/p/msysgit/">msysgit</a>.
152 Then, for all of the following commands
153 we discuss, enter them in the console provided by Git Bash. If you have
154 Cygwin, you can also use setup.exe to install Git.
155 </dd>
156 <dt>You use a Mac:</dt>
157 <dd>
158 There are binaries available from <a href="http://metastatic.org/text/Concern/2007/09/15/new-git-package-for-os-x/">various</a>
159 <a href="http://code.google.com/p/git-osx-installer/">sources</a>; I haven't
160 tried them so your mileage may vary. Since Mac is a BSD-like system, you
161 can also <a href="http://www.dekorte.com/blog/blog.cgi?do=item&amp;id=2539">compile
162 from source.</a>
163 </dd>
164 </dl>
167 Run the earlier command again to make sure the installation went
168 smoothly. Now run this command:
169 </p>
171 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-clone.html">git clone</a> git://repo.or.cz/htmlpurifier.git</kbd></pre>
174 This will copy the HTML Purifier codebase into the htmlpurifier folder.
175 </p>
178 You will want to configure the Git installation with your name and
179 email address. You can do this with these two commands.
180 </p>
182 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-config.html">git config</a> --global user.name "Bob Doe"
183 git config --global user.email bob@example.com</kbd></pre>
186 Let us fast forward for a moment and imagine that we already made our changes
187 and would now like to send the changes to HTML Purifier for review. You
188 will to execute these commands:
189 </p>
191 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-status.html">git status</a></kbd></pre>
194 This command will give you a quick rundown about all the files Git knows
195 about. If you have any <q>Untracked files</q>, you will need to add
196 them with:
197 </p>
199 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-add.html">git add</a> <em>$filename</em></kbd></pre>
201 <blockquote class="aside"><p>
202 (You can also add <q>Changed but not updated</q> files, but because we will
203 be using the <kbd>-a</kbd> option this is strictly unnecessary.)
204 </p></blockquote>
207 Now, you will want to commit your changes. Users of centralized version
208 control systems, beware: this does not push it to a remote repository,
209 or anything like that. It simply records the change in your local repository.
210 Doing so is as simple as:
211 </p>
213 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-commit.html">git commit</a> -as</kbd></pre>
215 <blockquote class="aside"><p>
216 The <q>a</q> flag tells Git to commit all modified files, even if you didn't
217 git add them. The <q>s</q> flag tells Git to sign off your commit message
218 with your name and email.
219 </p></blockquote>
222 You will then have a screen brought up to enter a commit message. If this
223 screen is vim (you can tell if your command line window transmuted into
224 something you've never seen before), type <kbd>i</kbd> (<samp>--INSERT--</samp>
225 mode), write your commit message, type <kbd>ESC</kbd>, and
226 then type <kbd>:wq ENTER</kbd> (write and quit).
227 </p>
230 A quick note about commit messages: there is a very specific format for them.
231 They should look something like this:
232 </p>
234 <pre><samp>Concise one-line statement describing change
236 Full explanation for the change. If you fixed a bug, make
237 sure you describe what was wrong, how you fixed it, and
238 what the behavior is now. If it was a feature, describe
239 why the feature is useful, how you use it, and any tricky
240 implementation details.
242 In short, the body of the commit message (which can span multiple
243 paragraphs) should, along with the code diff, be self
244 explanatory and not require any email introduction. At the
245 same time, your commit message will be immortalized and
246 should be in third-person and formal.
248 Signed-off-by: Edward Z. Yang &lt;edwardzyang@thewritingpot.com&gt;</samp></pre>
251 Finally, after the commit has been recorded, you will want to make a
252 patch to distribute to other people to review and test. Doing so is
253 as simple as:
254 </p>
256 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/git-format-patch.html">git format-patch</a> -1</pre>
258 <blockquote class="aside"><p>
259 You can substitute -1 for -#, where # is the number of commits you would
260 like to write patches for. You can also specify a commit hash ID.
261 </p></blockquote>
264 A file named roughly <em>0001-Short-description.patch</em> will be
265 created, with the complete contents of your change.
266 </p>
268 <p>In summary:</p>
270 <pre class="command"><kbd>git clone git://repo.or.cz/htmlpurifier.git
271 git config --global user.name "Bob Doe"
272 git config --global user.email bob@example.com
273 cd htmlpurifier</kbd>
274 # hack hack hack
275 <kbd>git status
276 git add newfile1.txt subdir/newfile2.txt
277 git commit -as
278 git format-patch -1
279 # send patch off</kbd></pre>
282 Two quick notes before we go on to some HTML Purifier specific instructions:
283 </p>
285 <ol>
286 <li>
288 If you are posting the patch on the forum, be sure to copy-paste it
289 in-between <code>&lt;pre&gt;&lt;![CDATA[</code> and <code>]]&gt;&lt;/pre&gt;</code>
290 If you are emailing the patch, we prefer that you send it inline in a text
291 email (be sure to configure your mail client not to wrap lines, check out
292 <a href="http://repo.or.cz/w/git.git?a=blob;f=Documentation/SubmittingPatches;hb=HEAD">SubmittingPatches guidelines from the Git project</a> for more details.)
293 </p></li>
294 <li>
296 In all probability, there have been changes to the HTML Purifier codebase
297 since you made your patch. As part of your duties as a patch-maker, you
298 should ensure that your patch remains off of the HEAD of our master branch.
299 You can do so with the command:
300 </p>
301 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/git-pull.html">git pull</a> --rebase</pre>
303 You may also find it useful to perform your development in a topic branch.
304 You can do this using:
305 </p>
306 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/git-checkout.html">git checkout</a> -b <em>branchname</em></pre>
308 The benefits of a setup like this is you can now do a regular
309 <kbd>git pull</kbd> on the master branch, and then use
310 <kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html">git rebase</a> master</kbd> on your own branch to keep it up to
311 date. This can be useful if your patch produces a conflict.
312 (One quick note; you switch between branches using <kbd>git
313 checkout <em>branchname</em></kbd>. The -b flag creates a new branch.)
314 </p>
315 <blockquote class="aside"><p>
316 The default behavior of <kbd>git pull</kbd> in such a case is to merge
317 your branch. If you were a release maintainer, this is what you would
318 want to do, since your history was public and rewriting history
319 could be disruptive. With private, local changes, however, performing
320 the merge makes the history needlessly complicated.
321 </p></blockquote>
322 </li>
323 </ol>
325 <h3>SimpleTest</h3>
328 As mentioned before, one of the keys to successfully developing a new
329 feature on HTML Purifier is a comprehensive set of unit tests. However,
330 unit tests serve you no good if you can't run them.
331 </p>
334 The first step in getting unit tests running on HTML Purifier is downloading
335 <a href="http://simpletest.org">SimpleTest</a>, our test suite. However,
336 the public 1.0.1 release won't work with HTML Puriifer, as it is still
337 <abbr>PHP</abbr>4 compatible and will give off spurious errors. You need to
338 use the trunk version of SimpleTest. This version can be checked out
339 using <a href="http://subversion.tigris.org/">Subversion</a> with this command:
340 </p>
342 <pre class="command"><kbd>svn co https://simpletest.svn.sourceforge.net/svnroot/simpletest/simpletest/trunk simpletest</kbd></pre>
345 The next step is to tell HTML Purifier about the SimpleTest installation.
346 You can do this by copying the <em>test-settings.sample.php</em> file
347 to <em>test-settings.php</em> and configuring it according to the
348 instructions inside. The only variable you must edit is
349 <var>$simpletest_location</var>.
350 </p>
352 <blockquote><p>
353 At the moment, it is somewhat difficult to get the optional parameters setup
354 properly. If you feel adventurous, try the instructions; they should work,
355 but might be a little complicated or sparser than usual.
356 </p></blockquote>
359 Now, check if everything is running by typing <kbd>php tests/index.php --flush</kbd>
360 from the root of your HTML Purifier working copy. You should get a full
361 complement of passing tests. Congratulations!
362 </p>
364 <h2>Workflow</h2>
367 After identifying what changes you would like to make to HTML Purifier,
368 you will need to code appropriate unit tests for it. (If you are of the
369 code first, test later mentality, that is fine too; just make sure the tests
370 are 1. written and 2. comprehensive.) If you modify the file
371 <em>library/HTMLPurifier/ConfigSchema.php</em>, chances are the corresponding
372 tests are in <em>tests/HTMLPurifier/ConfigSchemaTest.php</em> (i.e. substitute
373 library with tests and append a Test to the filename.)
374 </p>
377 We prefer, first-and-foremost, <em>unit</em> tests, that is, the test should
378 not have any dependencies on any other objects, and if it does, those
379 dependencies should be filled in using SimpleTest's excellent
380 <a href="http://www.lastcraft.com/mock_objects_documentation.php">mock object support</a>.
381 We also believe strongly in integration tests,
382 which take in the form of htmlt files, and test HTML Purifier as a whole
383 with your modifications. An htmlt file looks like this:
384 </p>
386 <pre><samp><![CDATA[--INI--
387 %HTML.Allowed = "b,i,u,p"
388 --HTML--
389 <b>Foo<a id="asdf">bar</a></b>
390 --EXPECT--
391 <b>Foobar</b>
392 ]]></samp></pre>
395 The <samp>--INI--</samp> section indicates the configuration directives
396 that should be used with this test (if you added a new feature, you will
397 most probably be using this section to activate it). The <samp>--HTML--</samp>
398 section indicates the input, and the <samp>--EXPECT--</samp> indicates
399 the expected output. Be sure to include a trailing newline. You can place
400 these files in the <em>tests/HTMLPurifier/HTMLT</em> directory; give them
401 a descriptive filename.
402 </p>
405 It is my hope that you find the HTML Purifier core code a joy (or at least,
406 not painful) to work with; every class and method has a docblock that doesn't
407 reiterate what you can find inside its body, but also how the component
408 fits into HTML Purifier as a whole. If you find any section of code that
409 is missing or has poor documentation, please notify us and we will
410 correct it immediately. (Remember, <kbd>git pull --rebase</kbd> to
411 update your branch!)
412 </p>
415 There are, however, some architectural features that are not immediately
416 evident from mere source-code browsing. In this case, you are encouraged
417 to check out the documentation in the <em>docs/</em> folder (web
418 accessible at <a href="docs/">the same location.</a>)
419 <a href="docs/dev-flush.html"><q>Flushing the Purifier</q></a>
420 and <a href="docs/dev-config-schema.html"><q>Config Schema</q></a> in the Development center are of particular
421 notability: in all likelihood you will need this knowledge in order to
422 get HTML Purifier working the way you want it to.
423 </p>
425 <h2>Debugging</h2>
428 Your debugging skills are as good as
429 mine, but there are few things that are helpful to keep in mind:
430 </p>
432 <ul>
433 <li>
434 You can modify the granularity of tests to run down to a single
435 test-case method. The first method is to specify the <em>f</em>
436 parameter with a value like <samp>HTMLPurifier/ConfigSchemaTest.php</samp>
437 which will cause HTML Purifier to run only that test. (In web URL
438 speak, this means <em>tests/index.php?f=HTMLPurifier/ConfigSchemaTest.php</em>,
439 in command line speak, this means <kbd>php tests/index.php -f HTMLPurifier/ConfigSchemaTest.php</kbd>.
440 To run only a single test <em>method</em>, prefix that method with
441 <code>__only</code>. Be sure to revert this change when you're done
442 hammering away, and don't forget to test <em>everything</em> before committing.
443 </li>
444 <li>
445 HTML Purifier does not have a debugging/verbose mode, so any internal
446 data-checks need to be <code>var_dump</code>'ed by the user.
447 <a href="http://www.xdebug.org/">XDebug</a> makes var_dump'ing a pleasure
448 by colorizing and escaping output. (The stack traces are also quite
449 handy!) There is also a function called <code>printTokens($tokens, $index)</code> specifically
450 for outputting arrays of tokens. The <var>$index</var> variable
451 indicates a token to make bold, and can be omitted.
452 </li>
453 <li>
454 There's a Debugger class. Don't use it. It kinda sucks.
455 </li>
456 <li>
457 If it seems like a change you made had no effect on your tests, try
458 flushing with <em>flush</em>.
459 </li>
460 <li>
461 SimpleTest's error message when an <code>assertIdentical</code> message fails with
462 strings is incomprehensible, so keep your test strings small or be ready to
463 <code>var_dump</code> if necessary.
464 </li>
465 <li>
466 Beware whitespace. Tests should work whether or not they're Unix (LF), Windows (CRLF)
467 or Mac (CR) encoded. This usually means <em>not</em> using <code>PHP_EOL</code>
468 but rather a literal newline in the source code.
469 </li>
470 </ul>
472 </div>
473 </div>
474 </body>
475 </html>