1 # $NetBSD: UCS%GB2312.src,v 1.5 2006/08/19 10:58:41 tnozaki Exp $
5 SRC_ZONE 0x00A4 - 0xFFE5
12 # This mapping data is made from the mapping data provided by Unicode, Inc.
15 # Name: GB2312-80 to Unicode table (complete, hex format)
16 # Unicode version: 3.0
18 # Table format: Format A
19 # Date: 1999 October 8
21 # Copyright (c) 1991-1999 Unicode, Inc. All Rights reserved.
23 # This file is provided as-is by Unicode, Inc. (The Unicode Consortium).
24 # No claims are made as to fitness for any particular purpose. No
25 # warranties of any kind are expressed or implied. The recipient
26 # agrees to determine applicability of information provided. If this
27 # file has been provided on optical media by Unicode, Inc., the sole
28 # remedy for any claim will be exchange of defective media within 90
31 # Unicode, Inc. hereby grants the right to freely use the information
32 # supplied in this file in the creation of products supporting the
33 # Unicode Standard, and to make copies of this file in any form for
34 # internal or external distribution as long as this notice remains
40 # This table contains one set of mappings from GB2312-80 into Unicode.
41 # Note that these data are *possible* mappings only and may not be the
42 # same as those used by actual products, nor may they be the best suited
43 # for all uses. For more information on the mappings between various code
44 # pages incorporating the repertoire of GB2312-80 and Unicode, consult the
45 # VENDORS mapping data. Normative information on the mapping between
46 # GB2312-80 and Unicode may be found in the Unihan.txt file in the
47 # latest Unicode Character Database.
49 # If you have carefully considered the fact that the mappings in
50 # this table are only one possible set of mappings between GB2312-80 and
51 # Unicode and have no normative status, but still feel that you
52 # have located an error in the table that requires fixing, you may
53 # report any such error to errata@unicode.org.
56 # Format: Three tab-separated columns
57 # Column #1 is the GB2312 code (in hex as 0xXXXX)
58 # Column #2 is the Unicode (in hex as 0xXXXX)
59 # Column #3 the Unicode name (follows a comment sign, '#')
60 # The official names for Unicode characters U+4E00
61 # to U+9FA5, inclusive, is "CJK UNIFIED IDEOGRAPH-XXXX",
62 # where XXXX is the code point. Including all these
63 # names in this file increases its size substantially
64 # and needlessly. The token "<CJK>" is used for the
65 # name of these characters. If necessary, it can be
66 # expanded algorithmically by a parser or editor.
68 # The entries are in GB2312 order
70 # The following algorithms can be used to change the hex form
71 # of GB2312 to other standard forms:
73 # To change hex to EUC form, add 0x8080
74 # To change hex to kuten form, first subtract 0x2020. Then
75 # the high and low bytes correspond to the ku and ten of
76 # the kuten form. For example, 0x2121 -> 0x0101 -> 0101;
77 # 0x777E -> 0x575E -> 8794
80 # 1.0 version updates 0.0d2 version by correcting mapping for 0x212C
81 # from U+2225 to U+2016.
235 0x2015 = 0x212A # fallback -> 0x2014
622 0x30FB = 0x2124 # fallback -> 0x00B7