Add OWASP AntiSamy to comparison list.
[htmlpurifier-web.git] / sucks.xhtml
blob0f9a2c3b54662dea39497adcde8a7705ad23f840
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4 <html xmlns="http://www.w3.org/1999/xhtml"
5 xmlns:xi="http://www.w3.org/2001/XInclude"
6 xmlns:xc="urn:xhtml-compiler"
7 xml:lang="en">
8 <head>
9 <title>HTML Purifier Sucks - HTML Purifier</title>
10 <xi:include href="common-meta.xml" xpointer="xpointer(/*/node())" />
11 <meta name="keywords" content="HTMLPurifier, HTML Purifier, HTML, filter, sucks, devils advocate, evil, bad" />
12 </head>
13 <body>
15 <xi:include href="common-header.xml" xpointer="xpointer(/*/node())" />
17 <div id="main">
18 <h1 id="title">HTML Purifier Sucks</h1>
20 <div id="content">
22 <blockquote class="fancy">
23 <div class="quote">
24 ...needless to say, I don't think I'll bother investigating further!
25 </div>
26 <div class="origin">
27 &mdash; Stormrider on <a href="http://www.sitepoint.com/forums/showpost.php?p=3621314&amp;postcount=119">SitePoint Forums</a>
28 </div>
29 </blockquote>
31 <p>
32 Contrary to what <a href="comparison.html">this comparison page</a>
33 suggests, HTML Purifier sucks. It swallows oceans, it drinks blood,
34 and it is more effective than your dust-busting Hoover 3000. Why does it
35 suck? How can we make it un-sucky?
36 </p>
38 <div class="warning">
39 This document is currently under construction.
40 </div>
42 <div id="toc" />
44 <h2>Bloat</h2>
46 <p>
47 As of version 2.1.3, HTML Purifier's library folder contains
48 <strong>164 files</strong> in <strong>30 folders</strong>, weighing
49 at about 696 kilobytes. For comparison, the CodeIgniter
50 web application framework contains 147 files, 29 folders and weighs
51 902 kilobytes.
52 </p>
54 <p>
55 These back-of-a-napkin statistics are very telling about HTML Purifier's
56 internal architecture: object-oriented, one class per file and small
57 components, to the extreme. It also works against HTML Purifier when
58 it comes to the performance department. For most input strings, the
59 memory footprint from this library's source code is higher than the
60 memory used actually processing the HTML (four megabytes,
61 <a href="http://forums.devnetwork.net/viewtopic.php?p=405175#405175">last I checked</a>.)
62 </p>
64 <h2>Performance</h2>
66 <p>
67 HTML Purifier is extremely slow. Various benchmarks have shown HTML
68 Purifier to be an order of a magnitude slower than comparable solutions.
69 </p>
71 <h2>Whitespace</h2>
73 <p>
74 The <a href="http://www.sitepoint.com/forums/showpost.php?p=3621314&amp;postcount=119">Stormrider
75 quote</a> at the very beginning of this document is for one very
76 specific problem: whitespace.
77 </p>
79 <h2>Data-loss</h2>
81 <p>
82 It is trivially easy to nuke the contents of a document by inserting
83 a <code>&lt;/div&gt;</code> tag near the beginning, when DOMLex
84 is being used.
85 </p>
87 </div>
88 </div>
89 </body>
90 </html>