5 SRC_ZONE 0xA1-0xF9 / 0x40-0xFE / 8
12 # This mapping data is made from the mapping data provided by Unicode, Inc.
15 # Name: BIG5 to Unicode table (complete)
16 # Unicode version: 1.1
17 # Table version: 0.0d3
18 # Table format: Format A
19 # Date: 11 February 1994
21 # Copyright (c) 1991-1994 Unicode, Inc. All Rights reserved.
23 # This file is provided as-is by Unicode, Inc. (The Unicode Consortium).
24 # No claims are made as to fitness for any particular purpose. No
25 # warranties of any kind are expressed or implied. The recipient
26 # agrees to determine applicability of information provided. If this
27 # file has been provided on magnetic media by Unicode, Inc., the sole
28 # remedy for any claim will be exchange of defective media within 90
31 # Recipient is granted the right to make copies in any form for
32 # internal distribution and to freely use the information supplied
33 # in the creation of products supporting Unicode. Unicode, Inc.
34 # specifically excludes the right to re-distribute this file directly
35 # to third parties or other organizations whether for profit or not.
40 # This table contains one set of mappings from BIG5 into Unicode.
41 # Note that these data are *possible* mappings only and may not be the
42 # same as those used by actual products, nor may they be the best suited
43 # for all uses. For more information on the mappings between various code
44 # pages incorporating the repertoire of BIG5 and Unicode, consult the
45 # VENDORS mapping data. Normative information on the mapping between
46 # BIG5 and Unicode may be found in the Unihan.txt file in the
47 # latest Unicode Character Database.
49 # If you have carefully considered the fact that the mappings in
50 # this table are only one possible set of mappings between BIG5 and
51 # Unicode and have no normative status, but still feel that you
52 # have located an error in the table that requires fixing, you may
53 # report any such error to errata@unicode.org.
55 # WARNING! It is currently impossible to provide round-trip compatibility
56 # between BIG5 and Unicode.
58 # A number of characters are not currently mapped because
59 # of conflicts with other mappings. They are as follows:
61 # BIG5 Description Comments
63 # 0xA15A SPACING UNDERSCORE duplicates A1C4
64 # 0xA1C3 SPACING HEAVY OVERSCORE not in Unicode
65 # 0xA1C5 SPACING HEAVY UNDERSCORE not in Unicode
66 # 0xA1FE LT DIAG UP RIGHT TO LOW LEFT duplicates A2AC
67 # 0xA240 LT DIAG UP LEFT TO LOW RIGHT duplicates A2AD
68 # 0xA2CC HANGZHOU NUMERAL TEN conflicts with A451 mapping
69 # 0xA2CE HANGZHOU NUMERAL THIRTY conflicts with A4CA mapping
71 # We currently map all of these characters to U+FFFD REPLACEMENT CHARACTER.
72 # It is also possible to map these characters to their duplicates, or to
77 # 1. In addition to the above, there is some uncertainty about the
78 # mappings in the range C6A1 - C8FE, and F9DD - F9FE. The ETEN
79 # version of BIG5 organizes the former range differently, and adds
80 # additional characters in the latter range. The correct mappings
81 # these ranges need to be determined.
83 # 2. There is an uncertainty in the mapping of the Big Five character
84 # 0xA3BC. This character occurs within the Big Five block of tone marks
85 # for bopomofo and is intended to be the tone mark for the first tone in
86 # Mandarin Chinese. We have selected the mapping U+02C9 MODIFIER LETTER
87 # MACRON (Mandarin Chinese first tone) to reflect this semantic.
88 # However, because bopomofo uses the absense of a tone mark to indicate
89 # the first Mandarin tone, most implementations of Big Five represent
90 # this character with a blank space, and so a mapping such as U+2003 EM
91 # SPACE might be preferred.
93 # Format: Three tab-separated columns
94 # Column #1 is the BIG5 code (in hex as 0xXXXX)
95 # Column #2 is the Unicode (in hex as 0xXXXX)
96 # Column #3 is the Unicode name (follows a comment sign, '#')
97 # The official names for Unicode characters U+4E00
98 # to U+9FA5, inclusive, is "CJK UNIFIED IDEOGRAPH-XXXX",
99 # where XXXX is the code point. Including all these
100 # names in this file increases its size substantially
101 # and needlessly. The token "<CJK>" is used for the
102 # name of these characters. If necessary, it can be
103 # expanded algorithmically by a parser or editor.
105 # The entries are in BIG5 order