1
<?xml version=
"1.0" encoding=
"UTF-8"?>
2 <!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
5 xmlns=
"http://www.w3.org/1999/xhtml"
6 xmlns:
xi=
"http://www.w3.org/2001/XInclude"
9 <title>Shift_JIS Full Disclosure - Security - HTML Purifier
</title>
10 <xi:include href=
"common-meta.xml" xpointer=
"xpointer(/*/node())" />
11 <meta name=
"description" content=
"Full disclosure security page detailing the Shift_JIS CSS backslash attack." />
12 <meta name=
"keywords" content=
"HTMLPurifier, HTML Purifier, HTML, filter, filtering, standards, compliant, 3.1.1, attack, full disclosure, xss, security, shift_jis, backslash, css" />
16 <xi:include href=
"common-header.xml" xpointer=
"xpointer(/*/node())" />
19 <h1 id=
"title">Shift_JIS Full Disclosure
</h1>
24 A difference betweeen the behavior of iconv (the utility HTML Purifier
25 uses to transform character encodings) and browsers allowed an attacker
26 to use the Yen character (
<code>5C
</code> in Shift_JIS) to trick
27 HTML Purifier into outputting a byte-sequence most browsers would
28 interpret as a backslash. This could then be used to execute arbitrary
29 JavaScript from
<abbr>CSS
</abbr>.
33 This vulnerability was reported privately to the vendor by
34 <a href=
"http://d.hatena.ne.jp/teracc/">Takeshi Terada
</a>.
35 No active exploits are currently known.
41 This vulnerability was fixed in HTML Purifier
3.1.1 and
2.1.5.
44 <h2 id=
"Details">Details
</h2>
47 The large majority of character sets in the world are equivalent
48 to US-ASCII in the
7-bit domain. Shift_JIS (as well as Johab) are
49 notable exceptions, redefining two byte sequences
<code>5C
</code>
50 and
<code>7E
</code> to be different characters. In Shift_JIS:
76 This is quite exceptional, and puts users of Shift_JIS in a hard
77 place because they have no way of expressing the backslash or
78 tilde legitimately. Consequently, browsers treat the byte sequence
79 as equivalent to a backslash, even if it renders as a Yen.
83 Iconv, on the other hand, transforms the
<code>5C
</code> byte
84 sequence to Unicode U+
00A5 (in UTF-
8, this is
<code>C2 A5
</code>), the
85 correct character for Yen. This is incorrect behavior, and leads
86 to the security vulnerability: HTML Purifier thinks that the backslash
87 is actually a Yen, and does not take any appropriate security
88 measures. Then, when the Yen is converted back to
<code>5C
</code>,
89 it gains backslash behavior and can be used to break out of a
90 quoted CSS string. Furthermore, traditionally buggy behavior
91 will be observed if a backslash is somehow introduced to the
92 HTML during processing, as iconv does not know how to convert
93 a backslash in UTF-
8 back to a backslash in Shift_JIS (hint: it's
94 impossible without changing the font).
98 The fix involves undoing the unnecessary transformation that iconv
99 performs. HTML Purifier generalizes the fix to all character
101 <code>HTMLPurifier_Encoder-
>testEncodingSupportsASCII()
</code>
102 by iterating through all printable
7-bit byte sequences and checking
103 if conversion to UTF-
8 causes a change, in which case appropriate
104 measures should be taken. We do not know of any widely used character
105 encodings besides Shift_JIS, however, that would be affected by this
109 <h2 id=
"History">History
</h2>
112 The vulnerability was reported on May
24,
2008 via email, as a follow
113 up to the another
<a href=
"css-backslash">unrelated vulnerability
</a> in CSS handling.
114 A patch was committed to the public repository on
<a href=
"http://repo.or.cz/w/htmlpurifier.git?a=commit;h=bb16d8eae571dd4e30e3a62cce03d436d46cefaf">May
25,
2008</a>,
115 with the summary:
<q>Fix Shift_JIS encoding wonkiness with yen symbols and whatnot.
</q>
116 HTML Purifier
3.1.1 was released on June
19,
2008.