cmake build system: visiblity support for clang
[supercollider.git] / external_libraries / icu / unicode / utf_old.h
blob9ba538c481db6764bc243cfaae9d00ae4ea11175
1 /*
2 *******************************************************************************
4 * Copyright (C) 2002-2004, International Business Machines
5 * Corporation and others. All Rights Reserved.
7 *******************************************************************************
8 * file name: utf.h
9 * encoding: US-ASCII
10 * tab size: 8 (not used)
11 * indentation:4
13 * created on: 2002sep21
14 * created by: Markus W. Scherer
17 /**
18 * \file
19 * The macros in utf_old.h are all deprecated and their use discouraged.
20 * Some of the design principles behind the set of UTF macros
21 * have changed or proved impractical.
22 * Almost all of the old "UTF macros" are at least renamed.
23 * If you are looking for a new equivalent to an old macro, please see the
24 * comment at the old one.
26 * utf_old.h is included by utf.h after unicode/umachine.h
27 * and some common definitions, to not break old code.
29 * Brief summary of reasons for deprecation:
30 * - Switch on UTF_SIZE (selection of UTF-8/16/32 default string processing)
31 * was impractical.
32 * - Switch on UTF_SAFE etc. (selection of unsafe/safe/strict default string processing)
33 * was of little use and impractical.
34 * - Whole classes of macros became obsolete outside of the UTF_SIZE/UTF_SAFE
35 * selection framework: UTF32_ macros (all trivial)
36 * and UTF_ default and intermediate macros (all aliases).
37 * - The selection framework also caused many macro aliases.
38 * - Change in Unicode standard: "irregular" sequences (3.0) became illegal (3.2).
39 * - Change of language in Unicode standard:
40 * Growing distinction between internal x-bit Unicode strings and external UTF-x
41 * forms, with the former more lenient.
42 * Suggests renaming of UTF16_ macros to U16_.
43 * - The prefix "UTF_" without a width number confused some users.
44 * - "Safe" append macros needed the addition of an error indicator output.
45 * - "Safe" UTF-8 macros used legitimate (if rarely used) code point values
46 * to indicate error conditions.
47 * - The use of the "_CHAR" infix for code point operations confused some users.
49 * More details:
51 * Until ICU 2.2, utf.h theoretically allowed to choose among UTF-8/16/32
52 * for string processing, and among unsafe/safe/strict default macros for that.
54 * It proved nearly impossible to write non-trivial, high-performance code
55 * that is UTF-generic.
56 * Unsafe default macros would be dangerous for default string processing,
57 * and the main reason for the "strict" versions disappeared:
58 * Between Unicode 3.0 and 3.2 all "irregular" UTF-8 sequences became illegal.
59 * The only other conditions that "strict" checked for were non-characters,
60 * which are valid during processing. Only during text input/output should they
61 * be checked, and at that time other well-formedness checks may be
62 * necessary or useful as well.
63 * This can still be done by using U16_NEXT and U_IS_UNICODE_NONCHAR
64 * or U_IS_UNICODE_CHAR.
66 * The old UTF8_..._SAFE macros also used some normal Unicode code points
67 * to indicate malformed sequences.
68 * The new UTF8_ macros without suffix use negative values instead.
70 * The entire contents of utf32.h was moved here without replacement
71 * because all those macros were trivial and
72 * were meaningful only in the framework of choosing the UTF size.
74 * See Jitterbug 2150 and its discussion on the ICU mailing list
75 * in September 2002.
77 * <hr>
79 * <em>Obsolete part</em> of pre-ICU 2.4 utf.h file documentation:
81 * <p>The original concept for these files was for ICU to allow
82 * in principle to set which UTF (UTF-8/16/32) is used internally
83 * by defining UTF_SIZE to either 8, 16, or 32. utf.h would then define the UChar type
84 * accordingly. UTF-16 was the default.</p>
86 * <p>This concept has been abandoned.
87 * A lot of the ICU source code &#8212; especially low-level code like
88 * conversion, normalization, and collation &#8212; assumes UTF-16,
89 * utf.h enforces the default of UTF-16.
90 * The UTF-8 and UTF-32 macros remain for now for completeness and backward compatibility.</p>
92 * <p>Accordingly, utf.h defines UChar to be an unsigned 16-bit integer. If this matches wchar_t, then
93 * UChar is defined to be exactly wchar_t, otherwise uint16_t.</p>
95 * <p>UChar32 is defined to be a signed 32-bit integer (int32_t), large enough for a 21-bit
96 * Unicode code point (Unicode scalar value, 0..0x10ffff).
97 * Before ICU 2.4, the definition of UChar32 was similarly platform-dependent as
98 * the definition of UChar. For details see the documentation for UChar32 itself.</p>
100 * <p>utf.h also defines a number of C macros for handling single Unicode code points and
101 * for using UTF Unicode strings. It includes utf8.h, utf16.h, and utf32.h for the actual
102 * implementations of those macros and then aliases one set of them (for UTF-16) for general use.
103 * The UTF-specific macros have the UTF size in the macro name prefixes (UTF16_...), while
104 * the general alias macros always begin with UTF_...</p>
106 * <p>Many string operations can be done with or without error checking.
107 * Where such a distinction is useful, there are two versions of the macros, "unsafe" and "safe"
108 * ones with ..._UNSAFE and ..._SAFE suffixes. The unsafe macros are fast but may cause
109 * program failures if the strings are not well-formed. The safe macros have an additional, boolean
110 * parameter "strict". If strict is FALSE, then only illegal sequences are detected.
111 * Otherwise, irregular sequences and non-characters are detected as well (like single surrogates).
112 * Safe macros return special error code points for illegal/irregular sequences:
113 * Typically, U+ffff, or values that would result in a code unit sequence of the same length
114 * as the erroneous input sequence.<br>
115 * Note that _UNSAFE macros have fewer parameters: They do not have the strictness parameter, and
116 * they do not have start/length parameters for boundary checking.</p>
118 * <p>Here, the macros are aliased in two steps:
119 * In the first step, the UTF-specific macros with UTF16_ prefix and _UNSAFE and _SAFE suffixes are
120 * aliased according to the UTF_SIZE to macros with UTF_ prefix and the same suffixes and signatures.
121 * Then, in a second step, the default, general alias macros are set to use either the unsafe or
122 * the safe/not strict (default) or the safe/strict macro;
123 * these general macros do not have a strictness parameter.</p>
125 * <p>It is possible to change the default choice for the general alias macros to be unsafe, safe/not strict or safe/strict.
126 * The default is safe/not strict. It is not recommended to select the unsafe macros as the basis for
127 * Unicode string handling in ICU! To select this, define UTF_SAFE, UTF_STRICT, or UTF_UNSAFE.</p>
129 * <p>For general use, one should use the default, general macros with UTF_ prefix and no _SAFE/_UNSAFE suffix.
130 * Only in some cases it may be necessary to control the choice of macro directly and use a less generic alias.
131 * For example, if it can be assumed that a string is well-formed and the index will stay within the bounds,
132 * then the _UNSAFE version may be used.
133 * If a UTF-8 string is to be processed, then the macros with UTF8_ prefixes need to be used.</p>
135 * <hr>
137 * @deprecated ICU 2.4. Use the macros in utf.h, utf16.h, utf8.h instead.
140 #ifndef __UTF_OLD_H__
141 #define __UTF_OLD_H__
143 #ifndef U_HIDE_DEPRECATED_API
145 /* utf.h must be included first. */
146 #ifndef __UTF_H__
147 # include "utf.h"
148 #endif
150 /* Formerly utf.h, part 1 --------------------------------------------------- */
152 #ifdef U_USE_UTF_DEPRECATES
154 * Unicode string and array offset and index type.
155 * ICU always counts Unicode code units (UChars) for
156 * string offsets, indexes, and lengths, not Unicode code points.
158 * @obsolete ICU 2.6. Use int32_t directly instead since this API will be removed in that release.
160 typedef int32_t UTextOffset;
161 #endif
163 /** Number of bits in a Unicode string code unit - ICU uses 16-bit Unicode. @deprecated ICU 2.4. Obsolete, see utf_old.h. */
164 #define UTF_SIZE 16
167 * The default choice for general Unicode string macros is to use the ..._SAFE macro implementations
168 * with strict=FALSE.
170 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
172 #define UTF_SAFE
173 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
174 #undef UTF_UNSAFE
175 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
176 #undef UTF_STRICT
179 * <p>UTF8_ERROR_VALUE_1 and UTF8_ERROR_VALUE_2 are special error values for UTF-8,
180 * which need 1 or 2 bytes in UTF-8:<br>
181 * U+0015 = NAK = Negative Acknowledge, C0 control character<br>
182 * U+009f = highest C1 control character</p>
184 * <p>These are used by UTF8_..._SAFE macros so that they can return an error value
185 * that needs the same number of code units (bytes) as were seen by
186 * a macro. They should be tested with UTF_IS_ERROR() or UTF_IS_VALID().</p>
188 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
190 #define UTF8_ERROR_VALUE_1 0x15
193 * See documentation on UTF8_ERROR_VALUE_1 for details.
195 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
197 #define UTF8_ERROR_VALUE_2 0x9f
200 * Error value for all UTFs. This code point value will be set by macros with error
201 * checking if an error is detected.
203 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
205 #define UTF_ERROR_VALUE 0xffff
208 * Is a given 32-bit code an error value
209 * as returned by one of the macros for any UTF?
211 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
213 #define UTF_IS_ERROR(c) \
214 (((c)&0xfffe)==0xfffe || (c)==UTF8_ERROR_VALUE_1 || (c)==UTF8_ERROR_VALUE_2)
217 * This is a combined macro: Is c a valid Unicode value _and_ not an error code?
219 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
221 #define UTF_IS_VALID(c) \
222 (UTF_IS_UNICODE_CHAR(c) && \
223 (c)!=UTF8_ERROR_VALUE_1 && (c)!=UTF8_ERROR_VALUE_2)
226 * Is this code unit or code point a surrogate (U+d800..U+dfff)?
227 * @deprecated ICU 2.4. Renamed to U_IS_SURROGATE and U16_IS_SURROGATE, see utf_old.h.
229 #define UTF_IS_SURROGATE(uchar) (((uchar)&0xfffff800)==0xd800)
232 * Is a given 32-bit code point a Unicode noncharacter?
234 * @deprecated ICU 2.4. Renamed to U_IS_UNICODE_NONCHAR, see utf_old.h.
236 #define UTF_IS_UNICODE_NONCHAR(c) \
237 ((c)>=0xfdd0 && \
238 ((uint32_t)(c)<=0xfdef || ((c)&0xfffe)==0xfffe) && \
239 (uint32_t)(c)<=0x10ffff)
242 * Is a given 32-bit value a Unicode code point value (0..U+10ffff)
243 * that can be assigned a character?
245 * Code points that are not characters include:
246 * - single surrogate code points (U+d800..U+dfff, 2048 code points)
247 * - the last two code points on each plane (U+__fffe and U+__ffff, 34 code points)
248 * - U+fdd0..U+fdef (new with Unicode 3.1, 32 code points)
249 * - the highest Unicode code point value is U+10ffff
251 * This means that all code points below U+d800 are character code points,
252 * and that boundary is tested first for performance.
254 * @deprecated ICU 2.4. Renamed to U_IS_UNICODE_CHAR, see utf_old.h.
256 #define UTF_IS_UNICODE_CHAR(c) \
257 ((uint32_t)(c)<0xd800 || \
258 ((uint32_t)(c)>0xdfff && \
259 (uint32_t)(c)<=0x10ffff && \
260 !UTF_IS_UNICODE_NONCHAR(c)))
262 /* Formerly utf8.h ---------------------------------------------------------- */
265 * Count the trail bytes for a UTF-8 lead byte.
266 * @deprecated ICU 2.4. Renamed to U8_COUNT_TRAIL_BYTES, see utf_old.h.
268 #define UTF8_COUNT_TRAIL_BYTES(leadByte) (utf8_countTrailBytes[(uint8_t)leadByte])
271 * Mask a UTF-8 lead byte, leave only the lower bits that form part of the code point value.
272 * @deprecated ICU 2.4. Renamed to U8_MASK_LEAD_BYTE, see utf_old.h.
274 #define UTF8_MASK_LEAD_BYTE(leadByte, countTrailBytes) ((leadByte)&=(1<<(6-(countTrailBytes)))-1)
276 /** Is this this code point a single code unit (byte)? @deprecated ICU 2.4. Renamed to U8_IS_SINGLE, see utf_old.h. */
277 #define UTF8_IS_SINGLE(uchar) (((uchar)&0x80)==0)
278 /** Is this this code unit the lead code unit (byte) of a code point? @deprecated ICU 2.4. Renamed to U8_IS_LEAD, see utf_old.h. */
279 #define UTF8_IS_LEAD(uchar) ((uint8_t)((uchar)-0xc0)<0x3e)
280 /** Is this this code unit a trailing code unit (byte) of a code point? @deprecated ICU 2.4. Renamed to U8_IS_TRAIL, see utf_old.h. */
281 #define UTF8_IS_TRAIL(uchar) (((uchar)&0xc0)==0x80)
283 /** Does this scalar Unicode value need multiple code units for storage? @deprecated ICU 2.4. Use U8_LENGTH or test ((uint32_t)(c)>0x7f) instead, see utf_old.h. */
284 #define UTF8_NEED_MULTIPLE_UCHAR(c) ((uint32_t)(c)>0x7f)
287 * Given the lead character, how many bytes are taken by this code point.
288 * ICU does not deal with code points >0x10ffff
289 * unless necessary for advancing in the byte stream.
291 * These length macros take into account that for values >0x10ffff
292 * the UTF8_APPEND_CHAR_SAFE macros would write the error code point 0xffff
293 * with 3 bytes.
294 * Code point comparisons need to be in uint32_t because UChar32
295 * may be a signed type, and negative values must be recognized.
297 * @deprecated ICU 2.4. Use U8_LENGTH instead, see utf_old.h.
299 #if 1
300 # define UTF8_CHAR_LENGTH(c) \
301 ((uint32_t)(c)<=0x7f ? 1 : \
302 ((uint32_t)(c)<=0x7ff ? 2 : \
303 ((uint32_t)((c)-0x10000)>0xfffff ? 3 : 4) \
306 #else
307 # define UTF8_CHAR_LENGTH(c) \
308 ((uint32_t)(c)<=0x7f ? 1 : \
309 ((uint32_t)(c)<=0x7ff ? 2 : \
310 ((uint32_t)(c)<=0xffff ? 3 : \
311 ((uint32_t)(c)<=0x10ffff ? 4 : \
312 ((uint32_t)(c)<=0x3ffffff ? 5 : \
313 ((uint32_t)(c)<=0x7fffffff ? 6 : 3) \
319 #endif
321 /** The maximum number of bytes per code point. @deprecated ICU 2.4. Renamed to U8_MAX_LENGTH, see utf_old.h. */
322 #define UTF8_MAX_CHAR_LENGTH 4
324 /** Average number of code units compared to UTF-16. @deprecated ICU 2.4. Obsolete, see utf_old.h. */
325 #define UTF8_ARRAY_SIZE(size) ((5*(size))/2)
327 /** @deprecated ICU 2.4. Renamed to U8_GET_UNSAFE, see utf_old.h. */
328 #define UTF8_GET_CHAR_UNSAFE(s, i, c) { \
329 int32_t _utf8_get_char_unsafe_index=(int32_t)(i); \
330 UTF8_SET_CHAR_START_UNSAFE(s, _utf8_get_char_unsafe_index); \
331 UTF8_NEXT_CHAR_UNSAFE(s, _utf8_get_char_unsafe_index, c); \
334 /** @deprecated ICU 2.4. Use U8_GET instead, see utf_old.h. */
335 #define UTF8_GET_CHAR_SAFE(s, start, i, length, c, strict) { \
336 int32_t _utf8_get_char_safe_index=(int32_t)(i); \
337 UTF8_SET_CHAR_START_SAFE(s, start, _utf8_get_char_safe_index); \
338 UTF8_NEXT_CHAR_SAFE(s, _utf8_get_char_safe_index, length, c, strict); \
341 /** @deprecated ICU 2.4. Renamed to U8_NEXT_UNSAFE, see utf_old.h. */
342 #define UTF8_NEXT_CHAR_UNSAFE(s, i, c) { \
343 (c)=(s)[(i)++]; \
344 if((uint8_t)((c)-0xc0)<0x35) { \
345 uint8_t __count=UTF8_COUNT_TRAIL_BYTES(c); \
346 UTF8_MASK_LEAD_BYTE(c, __count); \
347 switch(__count) { \
348 /* each following branch falls through to the next one */ \
349 case 3: \
350 (c)=((c)<<6)|((s)[(i)++]&0x3f); \
351 case 2: \
352 (c)=((c)<<6)|((s)[(i)++]&0x3f); \
353 case 1: \
354 (c)=((c)<<6)|((s)[(i)++]&0x3f); \
355 /* no other branches to optimize switch() */ \
356 break; \
361 /** @deprecated ICU 2.4. Renamed to U8_APPEND_UNSAFE, see utf_old.h. */
362 #define UTF8_APPEND_CHAR_UNSAFE(s, i, c) { \
363 if((uint32_t)(c)<=0x7f) { \
364 (s)[(i)++]=(uint8_t)(c); \
365 } else { \
366 if((uint32_t)(c)<=0x7ff) { \
367 (s)[(i)++]=(uint8_t)(((c)>>6)|0xc0); \
368 } else { \
369 if((uint32_t)(c)<=0xffff) { \
370 (s)[(i)++]=(uint8_t)(((c)>>12)|0xe0); \
371 } else { \
372 (s)[(i)++]=(uint8_t)(((c)>>18)|0xf0); \
373 (s)[(i)++]=(uint8_t)((((c)>>12)&0x3f)|0x80); \
375 (s)[(i)++]=(uint8_t)((((c)>>6)&0x3f)|0x80); \
377 (s)[(i)++]=(uint8_t)(((c)&0x3f)|0x80); \
381 /** @deprecated ICU 2.4. Renamed to U8_FWD_1_UNSAFE, see utf_old.h. */
382 #define UTF8_FWD_1_UNSAFE(s, i) { \
383 (i)+=1+UTF8_COUNT_TRAIL_BYTES((s)[i]); \
386 /** @deprecated ICU 2.4. Renamed to U8_FWD_N_UNSAFE, see utf_old.h. */
387 #define UTF8_FWD_N_UNSAFE(s, i, n) { \
388 int32_t __N=(n); \
389 while(__N>0) { \
390 UTF8_FWD_1_UNSAFE(s, i); \
391 --__N; \
395 /** @deprecated ICU 2.4. Renamed to U8_SET_CP_START_UNSAFE, see utf_old.h. */
396 #define UTF8_SET_CHAR_START_UNSAFE(s, i) { \
397 while(UTF8_IS_TRAIL((s)[i])) { --(i); } \
400 /** @deprecated ICU 2.4. Use U8_NEXT instead, see utf_old.h. */
401 #define UTF8_NEXT_CHAR_SAFE(s, i, length, c, strict) { \
402 (c)=(s)[(i)++]; \
403 if((c)>=0x80) { \
404 if(UTF8_IS_LEAD(c)) { \
405 (c)=utf8_nextCharSafeBody(s, &(i), (int32_t)(length), c, strict); \
406 } else { \
407 (c)=UTF8_ERROR_VALUE_1; \
412 /** @deprecated ICU 2.4. Use U8_APPEND instead, see utf_old.h. */
413 #define UTF8_APPEND_CHAR_SAFE(s, i, length, c) { \
414 if((uint32_t)(c)<=0x7f) { \
415 (s)[(i)++]=(uint8_t)(c); \
416 } else { \
417 (i)=utf8_appendCharSafeBody(s, (int32_t)(i), (int32_t)(length), c, NULL); \
421 /** @deprecated ICU 2.4. Renamed to U8_FWD_1, see utf_old.h. */
422 #define UTF8_FWD_1_SAFE(s, i, length) U8_FWD_1(s, i, length)
424 /** @deprecated ICU 2.4. Renamed to U8_FWD_N, see utf_old.h. */
425 #define UTF8_FWD_N_SAFE(s, i, length, n) U8_FWD_N(s, i, length, n)
427 /** @deprecated ICU 2.4. Renamed to U8_SET_CP_START, see utf_old.h. */
428 #define UTF8_SET_CHAR_START_SAFE(s, start, i) U8_SET_CP_START(s, start, i)
430 /** @deprecated ICU 2.4. Renamed to U8_PREV_UNSAFE, see utf_old.h. */
431 #define UTF8_PREV_CHAR_UNSAFE(s, i, c) { \
432 (c)=(s)[--(i)]; \
433 if(UTF8_IS_TRAIL(c)) { \
434 uint8_t __b, __count=1, __shift=6; \
436 /* c is a trail byte */ \
437 (c)&=0x3f; \
438 for(;;) { \
439 __b=(s)[--(i)]; \
440 if(__b>=0xc0) { \
441 UTF8_MASK_LEAD_BYTE(__b, __count); \
442 (c)|=(UChar32)__b<<__shift; \
443 break; \
444 } else { \
445 (c)|=(UChar32)(__b&0x3f)<<__shift; \
446 ++__count; \
447 __shift+=6; \
453 /** @deprecated ICU 2.4. Renamed to U8_BACK_1_UNSAFE, see utf_old.h. */
454 #define UTF8_BACK_1_UNSAFE(s, i) { \
455 while(UTF8_IS_TRAIL((s)[--(i)])) {} \
458 /** @deprecated ICU 2.4. Renamed to U8_BACK_N_UNSAFE, see utf_old.h. */
459 #define UTF8_BACK_N_UNSAFE(s, i, n) { \
460 int32_t __N=(n); \
461 while(__N>0) { \
462 UTF8_BACK_1_UNSAFE(s, i); \
463 --__N; \
467 /** @deprecated ICU 2.4. Renamed to U8_SET_CP_LIMIT_UNSAFE, see utf_old.h. */
468 #define UTF8_SET_CHAR_LIMIT_UNSAFE(s, i) { \
469 UTF8_BACK_1_UNSAFE(s, i); \
470 UTF8_FWD_1_UNSAFE(s, i); \
473 /** @deprecated ICU 2.4. Use U8_PREV instead, see utf_old.h. */
474 #define UTF8_PREV_CHAR_SAFE(s, start, i, c, strict) { \
475 (c)=(s)[--(i)]; \
476 if((c)>=0x80) { \
477 if((c)<=0xbf) { \
478 (c)=utf8_prevCharSafeBody(s, start, &(i), c, strict); \
479 } else { \
480 (c)=UTF8_ERROR_VALUE_1; \
485 /** @deprecated ICU 2.4. Renamed to U8_BACK_1, see utf_old.h. */
486 #define UTF8_BACK_1_SAFE(s, start, i) U8_BACK_1(s, start, i)
488 /** @deprecated ICU 2.4. Renamed to U8_BACK_N, see utf_old.h. */
489 #define UTF8_BACK_N_SAFE(s, start, i, n) U8_BACK_N(s, start, i, n)
491 /** @deprecated ICU 2.4. Renamed to U8_SET_CP_LIMIT, see utf_old.h. */
492 #define UTF8_SET_CHAR_LIMIT_SAFE(s, start, i, length) U8_SET_CP_LIMIT(s, start, i, length)
494 /* Formerly utf16.h --------------------------------------------------------- */
496 /** Is uchar a first/lead surrogate? @deprecated ICU 2.4. Renamed to U_IS_LEAD and U16_IS_LEAD, see utf_old.h. */
497 #define UTF_IS_FIRST_SURROGATE(uchar) (((uchar)&0xfffffc00)==0xd800)
499 /** Is uchar a second/trail surrogate? @deprecated ICU 2.4. Renamed to U_IS_TRAIL and U16_IS_TRAIL, see utf_old.h. */
500 #define UTF_IS_SECOND_SURROGATE(uchar) (((uchar)&0xfffffc00)==0xdc00)
502 /** Assuming c is a surrogate, is it a first/lead surrogate? @deprecated ICU 2.4. Renamed to U_IS_SURROGATE_LEAD and U16_IS_SURROGATE_LEAD, see utf_old.h. */
503 #define UTF_IS_SURROGATE_FIRST(c) (((c)&0x400)==0)
505 /** Helper constant for UTF16_GET_PAIR_VALUE. @deprecated ICU 2.4. Renamed to U16_SURROGATE_OFFSET, see utf_old.h. */
506 #define UTF_SURROGATE_OFFSET ((0xd800<<10UL)+0xdc00-0x10000)
508 /** Get the UTF-32 value from the surrogate code units. @deprecated ICU 2.4. Renamed to U16_GET_SUPPLEMENTARY, see utf_old.h. */
509 #define UTF16_GET_PAIR_VALUE(first, second) \
510 (((first)<<10UL)+(second)-UTF_SURROGATE_OFFSET)
512 /** @deprecated ICU 2.4. Renamed to U16_LEAD, see utf_old.h. */
513 #define UTF_FIRST_SURROGATE(supplementary) (UChar)(((supplementary)>>10)+0xd7c0)
515 /** @deprecated ICU 2.4. Renamed to U16_TRAIL, see utf_old.h. */
516 #define UTF_SECOND_SURROGATE(supplementary) (UChar)(((supplementary)&0x3ff)|0xdc00)
518 /** @deprecated ICU 2.4. Renamed to U16_LEAD, see utf_old.h. */
519 #define UTF16_LEAD(supplementary) UTF_FIRST_SURROGATE(supplementary)
521 /** @deprecated ICU 2.4. Renamed to U16_TRAIL, see utf_old.h. */
522 #define UTF16_TRAIL(supplementary) UTF_SECOND_SURROGATE(supplementary)
524 /** @deprecated ICU 2.4. Renamed to U16_IS_SINGLE, see utf_old.h. */
525 #define UTF16_IS_SINGLE(uchar) !UTF_IS_SURROGATE(uchar)
527 /** @deprecated ICU 2.4. Renamed to U16_IS_LEAD, see utf_old.h. */
528 #define UTF16_IS_LEAD(uchar) UTF_IS_FIRST_SURROGATE(uchar)
530 /** @deprecated ICU 2.4. Renamed to U16_IS_TRAIL, see utf_old.h. */
531 #define UTF16_IS_TRAIL(uchar) UTF_IS_SECOND_SURROGATE(uchar)
533 /** Does this scalar Unicode value need multiple code units for storage? @deprecated ICU 2.4. Use U16_LENGTH or test ((uint32_t)(c)>0xffff) instead, see utf_old.h. */
534 #define UTF16_NEED_MULTIPLE_UCHAR(c) ((uint32_t)(c)>0xffff)
536 /** @deprecated ICU 2.4. Renamed to U16_LENGTH, see utf_old.h. */
537 #define UTF16_CHAR_LENGTH(c) ((uint32_t)(c)<=0xffff ? 1 : 2)
539 /** @deprecated ICU 2.4. Renamed to U16_MAX_LENGTH, see utf_old.h. */
540 #define UTF16_MAX_CHAR_LENGTH 2
542 /** Average number of code units compared to UTF-16. @deprecated ICU 2.4. Obsolete, see utf_old.h. */
543 #define UTF16_ARRAY_SIZE(size) (size)
546 * Get a single code point from an offset that points to any
547 * of the code units that belong to that code point.
548 * Assume 0<=i<length.
550 * This could be used for iteration together with
551 * UTF16_CHAR_LENGTH() and UTF_IS_ERROR(),
552 * but the use of UTF16_NEXT_CHAR[_UNSAFE]() and
553 * UTF16_PREV_CHAR[_UNSAFE]() is more efficient for that.
554 * @deprecated ICU 2.4. Renamed to U16_GET_UNSAFE, see utf_old.h.
556 #define UTF16_GET_CHAR_UNSAFE(s, i, c) { \
557 (c)=(s)[i]; \
558 if(UTF_IS_SURROGATE(c)) { \
559 if(UTF_IS_SURROGATE_FIRST(c)) { \
560 (c)=UTF16_GET_PAIR_VALUE((c), (s)[(i)+1]); \
561 } else { \
562 (c)=UTF16_GET_PAIR_VALUE((s)[(i)-1], (c)); \
567 /** @deprecated ICU 2.4. Use U16_GET instead, see utf_old.h. */
568 #define UTF16_GET_CHAR_SAFE(s, start, i, length, c, strict) { \
569 (c)=(s)[i]; \
570 if(UTF_IS_SURROGATE(c)) { \
571 uint16_t __c2; \
572 if(UTF_IS_SURROGATE_FIRST(c)) { \
573 if((i)+1<(length) && UTF_IS_SECOND_SURROGATE(__c2=(s)[(i)+1])) { \
574 (c)=UTF16_GET_PAIR_VALUE((c), __c2); \
575 /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
576 } else if(strict) {\
577 /* unmatched first surrogate */ \
578 (c)=UTF_ERROR_VALUE; \
580 } else { \
581 if((i)-1>=(start) && UTF_IS_FIRST_SURROGATE(__c2=(s)[(i)-1])) { \
582 (c)=UTF16_GET_PAIR_VALUE(__c2, (c)); \
583 /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
584 } else if(strict) {\
585 /* unmatched second surrogate */ \
586 (c)=UTF_ERROR_VALUE; \
589 } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \
590 (c)=UTF_ERROR_VALUE; \
594 /** @deprecated ICU 2.4. Renamed to U16_NEXT_UNSAFE, see utf_old.h. */
595 #define UTF16_NEXT_CHAR_UNSAFE(s, i, c) { \
596 (c)=(s)[(i)++]; \
597 if(UTF_IS_FIRST_SURROGATE(c)) { \
598 (c)=UTF16_GET_PAIR_VALUE((c), (s)[(i)++]); \
602 /** @deprecated ICU 2.4. Renamed to U16_APPEND_UNSAFE, see utf_old.h. */
603 #define UTF16_APPEND_CHAR_UNSAFE(s, i, c) { \
604 if((uint32_t)(c)<=0xffff) { \
605 (s)[(i)++]=(uint16_t)(c); \
606 } else { \
607 (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
608 (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
612 /** @deprecated ICU 2.4. Renamed to U16_FWD_1_UNSAFE, see utf_old.h. */
613 #define UTF16_FWD_1_UNSAFE(s, i) { \
614 if(UTF_IS_FIRST_SURROGATE((s)[(i)++])) { \
615 ++(i); \
619 /** @deprecated ICU 2.4. Renamed to U16_FWD_N_UNSAFE, see utf_old.h. */
620 #define UTF16_FWD_N_UNSAFE(s, i, n) { \
621 int32_t __N=(n); \
622 while(__N>0) { \
623 UTF16_FWD_1_UNSAFE(s, i); \
624 --__N; \
628 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_START_UNSAFE, see utf_old.h. */
629 #define UTF16_SET_CHAR_START_UNSAFE(s, i) { \
630 if(UTF_IS_SECOND_SURROGATE((s)[i])) { \
631 --(i); \
635 /** @deprecated ICU 2.4. Use U16_NEXT instead, see utf_old.h. */
636 #define UTF16_NEXT_CHAR_SAFE(s, i, length, c, strict) { \
637 (c)=(s)[(i)++]; \
638 if(UTF_IS_FIRST_SURROGATE(c)) { \
639 uint16_t __c2; \
640 if((i)<(length) && UTF_IS_SECOND_SURROGATE(__c2=(s)[(i)])) { \
641 ++(i); \
642 (c)=UTF16_GET_PAIR_VALUE((c), __c2); \
643 /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
644 } else if(strict) {\
645 /* unmatched first surrogate */ \
646 (c)=UTF_ERROR_VALUE; \
648 } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \
649 /* unmatched second surrogate or other non-character */ \
650 (c)=UTF_ERROR_VALUE; \
654 /** @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h. */
655 #define UTF16_APPEND_CHAR_SAFE(s, i, length, c) { \
656 if((uint32_t)(c)<=0xffff) { \
657 (s)[(i)++]=(uint16_t)(c); \
658 } else if((uint32_t)(c)<=0x10ffff) { \
659 if((i)+1<(length)) { \
660 (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
661 (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
662 } else /* not enough space */ { \
663 (s)[(i)++]=UTF_ERROR_VALUE; \
665 } else /* c>0x10ffff, write error value */ { \
666 (s)[(i)++]=UTF_ERROR_VALUE; \
670 /** @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h. */
671 #define UTF16_FWD_1_SAFE(s, i, length) U16_FWD_1(s, i, length)
673 /** @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h. */
674 #define UTF16_FWD_N_SAFE(s, i, length, n) U16_FWD_N(s, i, length, n)
676 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h. */
677 #define UTF16_SET_CHAR_START_SAFE(s, start, i) U16_SET_CP_START(s, start, i)
679 /** @deprecated ICU 2.4. Renamed to U16_PREV_UNSAFE, see utf_old.h. */
680 #define UTF16_PREV_CHAR_UNSAFE(s, i, c) { \
681 (c)=(s)[--(i)]; \
682 if(UTF_IS_SECOND_SURROGATE(c)) { \
683 (c)=UTF16_GET_PAIR_VALUE((s)[--(i)], (c)); \
687 /** @deprecated ICU 2.4. Renamed to U16_BACK_1_UNSAFE, see utf_old.h. */
688 #define UTF16_BACK_1_UNSAFE(s, i) { \
689 if(UTF_IS_SECOND_SURROGATE((s)[--(i)])) { \
690 --(i); \
694 /** @deprecated ICU 2.4. Renamed to U16_BACK_N_UNSAFE, see utf_old.h. */
695 #define UTF16_BACK_N_UNSAFE(s, i, n) { \
696 int32_t __N=(n); \
697 while(__N>0) { \
698 UTF16_BACK_1_UNSAFE(s, i); \
699 --__N; \
703 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT_UNSAFE, see utf_old.h. */
704 #define UTF16_SET_CHAR_LIMIT_UNSAFE(s, i) { \
705 if(UTF_IS_FIRST_SURROGATE((s)[(i)-1])) { \
706 ++(i); \
710 /** @deprecated ICU 2.4. Use U16_PREV instead, see utf_old.h. */
711 #define UTF16_PREV_CHAR_SAFE(s, start, i, c, strict) { \
712 (c)=(s)[--(i)]; \
713 if(UTF_IS_SECOND_SURROGATE(c)) { \
714 uint16_t __c2; \
715 if((i)>(start) && UTF_IS_FIRST_SURROGATE(__c2=(s)[(i)-1])) { \
716 --(i); \
717 (c)=UTF16_GET_PAIR_VALUE(__c2, (c)); \
718 /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
719 } else if(strict) {\
720 /* unmatched second surrogate */ \
721 (c)=UTF_ERROR_VALUE; \
723 } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \
724 /* unmatched first surrogate or other non-character */ \
725 (c)=UTF_ERROR_VALUE; \
729 /** @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h. */
730 #define UTF16_BACK_1_SAFE(s, start, i) U16_BACK_1(s, start, i)
732 /** @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h. */
733 #define UTF16_BACK_N_SAFE(s, start, i, n) U16_BACK_N(s, start, i, n)
735 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h. */
736 #define UTF16_SET_CHAR_LIMIT_SAFE(s, start, i, length) U16_SET_CP_LIMIT(s, start, i, length)
738 /* Formerly utf32.h --------------------------------------------------------- */
741 * Old documentation:
743 * This file defines macros to deal with UTF-32 code units and code points.
744 * Signatures and semantics are the same as for the similarly named macros
745 * in utf16.h.
746 * utf32.h is included by utf.h after unicode/umachine.h</p>
747 * and some common definitions.
748 * <p><b>Usage:</b> ICU coding guidelines for if() statements should be followed when using these macros.
749 * Compound statements (curly braces {}) must be used for if-else-while...
750 * bodies and all macro statements should be terminated with semicolon.</p>
753 /* internal definitions ----------------------------------------------------- */
755 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
756 #define UTF32_IS_SAFE(c, strict) \
757 (!(strict) ? \
758 (uint32_t)(c)<=0x10ffff : \
759 UTF_IS_UNICODE_CHAR(c))
762 * For the semantics of all of these macros, see utf16.h.
763 * The UTF-32 versions are trivial because any code point is
764 * encoded using exactly one code unit.
767 /* single-code point definitions -------------------------------------------- */
769 /* classes of code unit values */
771 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
772 #define UTF32_IS_SINGLE(uchar) 1
773 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
774 #define UTF32_IS_LEAD(uchar) 0
775 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
776 #define UTF32_IS_TRAIL(uchar) 0
778 /* number of code units per code point */
780 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
781 #define UTF32_NEED_MULTIPLE_UCHAR(c) 0
782 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
783 #define UTF32_CHAR_LENGTH(c) 1
784 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
785 #define UTF32_MAX_CHAR_LENGTH 1
787 /* average number of code units compared to UTF-16 */
789 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
790 #define UTF32_ARRAY_SIZE(size) (size)
792 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
793 #define UTF32_GET_CHAR_UNSAFE(s, i, c) { \
794 (c)=(s)[i]; \
797 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
798 #define UTF32_GET_CHAR_SAFE(s, start, i, length, c, strict) { \
799 (c)=(s)[i]; \
800 if(!UTF32_IS_SAFE(c, strict)) { \
801 (c)=UTF_ERROR_VALUE; \
805 /* definitions with forward iteration --------------------------------------- */
807 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
808 #define UTF32_NEXT_CHAR_UNSAFE(s, i, c) { \
809 (c)=(s)[(i)++]; \
812 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
813 #define UTF32_APPEND_CHAR_UNSAFE(s, i, c) { \
814 (s)[(i)++]=(c); \
817 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
818 #define UTF32_FWD_1_UNSAFE(s, i) { \
819 ++(i); \
822 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
823 #define UTF32_FWD_N_UNSAFE(s, i, n) { \
824 (i)+=(n); \
827 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
828 #define UTF32_SET_CHAR_START_UNSAFE(s, i) { \
831 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
832 #define UTF32_NEXT_CHAR_SAFE(s, i, length, c, strict) { \
833 (c)=(s)[(i)++]; \
834 if(!UTF32_IS_SAFE(c, strict)) { \
835 (c)=UTF_ERROR_VALUE; \
839 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
840 #define UTF32_APPEND_CHAR_SAFE(s, i, length, c) { \
841 if((uint32_t)(c)<=0x10ffff) { \
842 (s)[(i)++]=(c); \
843 } else /* c>0x10ffff, write 0xfffd */ { \
844 (s)[(i)++]=0xfffd; \
848 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
849 #define UTF32_FWD_1_SAFE(s, i, length) { \
850 ++(i); \
853 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
854 #define UTF32_FWD_N_SAFE(s, i, length, n) { \
855 if(((i)+=(n))>(length)) { \
856 (i)=(length); \
860 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
861 #define UTF32_SET_CHAR_START_SAFE(s, start, i) { \
864 /* definitions with backward iteration -------------------------------------- */
866 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
867 #define UTF32_PREV_CHAR_UNSAFE(s, i, c) { \
868 (c)=(s)[--(i)]; \
871 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
872 #define UTF32_BACK_1_UNSAFE(s, i) { \
873 --(i); \
876 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
877 #define UTF32_BACK_N_UNSAFE(s, i, n) { \
878 (i)-=(n); \
881 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
882 #define UTF32_SET_CHAR_LIMIT_UNSAFE(s, i) { \
885 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
886 #define UTF32_PREV_CHAR_SAFE(s, start, i, c, strict) { \
887 (c)=(s)[--(i)]; \
888 if(!UTF32_IS_SAFE(c, strict)) { \
889 (c)=UTF_ERROR_VALUE; \
893 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
894 #define UTF32_BACK_1_SAFE(s, start, i) { \
895 --(i); \
898 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
899 #define UTF32_BACK_N_SAFE(s, start, i, n) { \
900 (i)-=(n); \
901 if((i)<(start)) { \
902 (i)=(start); \
906 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
907 #define UTF32_SET_CHAR_LIMIT_SAFE(s, i, length) { \
910 /* Formerly utf.h, part 2 --------------------------------------------------- */
913 * Estimate the number of code units for a string based on the number of UTF-16 code units.
915 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
917 #define UTF_ARRAY_SIZE(size) UTF16_ARRAY_SIZE(size)
919 /** @deprecated ICU 2.4. Renamed to U16_GET_UNSAFE, see utf_old.h. */
920 #define UTF_GET_CHAR_UNSAFE(s, i, c) UTF16_GET_CHAR_UNSAFE(s, i, c)
922 /** @deprecated ICU 2.4. Use U16_GET instead, see utf_old.h. */
923 #define UTF_GET_CHAR_SAFE(s, start, i, length, c, strict) UTF16_GET_CHAR_SAFE(s, start, i, length, c, strict)
926 /** @deprecated ICU 2.4. Renamed to U16_NEXT_UNSAFE, see utf_old.h. */
927 #define UTF_NEXT_CHAR_UNSAFE(s, i, c) UTF16_NEXT_CHAR_UNSAFE(s, i, c)
929 /** @deprecated ICU 2.4. Use U16_NEXT instead, see utf_old.h. */
930 #define UTF_NEXT_CHAR_SAFE(s, i, length, c, strict) UTF16_NEXT_CHAR_SAFE(s, i, length, c, strict)
933 /** @deprecated ICU 2.4. Renamed to U16_APPEND_UNSAFE, see utf_old.h. */
934 #define UTF_APPEND_CHAR_UNSAFE(s, i, c) UTF16_APPEND_CHAR_UNSAFE(s, i, c)
936 /** @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h. */
937 #define UTF_APPEND_CHAR_SAFE(s, i, length, c) UTF16_APPEND_CHAR_SAFE(s, i, length, c)
940 /** @deprecated ICU 2.4. Renamed to U16_FWD_1_UNSAFE, see utf_old.h. */
941 #define UTF_FWD_1_UNSAFE(s, i) UTF16_FWD_1_UNSAFE(s, i)
943 /** @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h. */
944 #define UTF_FWD_1_SAFE(s, i, length) UTF16_FWD_1_SAFE(s, i, length)
947 /** @deprecated ICU 2.4. Renamed to U16_FWD_N_UNSAFE, see utf_old.h. */
948 #define UTF_FWD_N_UNSAFE(s, i, n) UTF16_FWD_N_UNSAFE(s, i, n)
950 /** @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h. */
951 #define UTF_FWD_N_SAFE(s, i, length, n) UTF16_FWD_N_SAFE(s, i, length, n)
954 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_START_UNSAFE, see utf_old.h. */
955 #define UTF_SET_CHAR_START_UNSAFE(s, i) UTF16_SET_CHAR_START_UNSAFE(s, i)
957 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h. */
958 #define UTF_SET_CHAR_START_SAFE(s, start, i) UTF16_SET_CHAR_START_SAFE(s, start, i)
961 /** @deprecated ICU 2.4. Renamed to U16_PREV_UNSAFE, see utf_old.h. */
962 #define UTF_PREV_CHAR_UNSAFE(s, i, c) UTF16_PREV_CHAR_UNSAFE(s, i, c)
964 /** @deprecated ICU 2.4. Use U16_PREV instead, see utf_old.h. */
965 #define UTF_PREV_CHAR_SAFE(s, start, i, c, strict) UTF16_PREV_CHAR_SAFE(s, start, i, c, strict)
968 /** @deprecated ICU 2.4. Renamed to U16_BACK_1_UNSAFE, see utf_old.h. */
969 #define UTF_BACK_1_UNSAFE(s, i) UTF16_BACK_1_UNSAFE(s, i)
971 /** @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h. */
972 #define UTF_BACK_1_SAFE(s, start, i) UTF16_BACK_1_SAFE(s, start, i)
975 /** @deprecated ICU 2.4. Renamed to U16_BACK_N_UNSAFE, see utf_old.h. */
976 #define UTF_BACK_N_UNSAFE(s, i, n) UTF16_BACK_N_UNSAFE(s, i, n)
978 /** @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h. */
979 #define UTF_BACK_N_SAFE(s, start, i, n) UTF16_BACK_N_SAFE(s, start, i, n)
982 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT_UNSAFE, see utf_old.h. */
983 #define UTF_SET_CHAR_LIMIT_UNSAFE(s, i) UTF16_SET_CHAR_LIMIT_UNSAFE(s, i)
985 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h. */
986 #define UTF_SET_CHAR_LIMIT_SAFE(s, start, i, length) UTF16_SET_CHAR_LIMIT_SAFE(s, start, i, length)
988 /* Define default macros (UTF-16 "safe") ------------------------------------ */
991 * Does this code unit alone encode a code point (BMP, not a surrogate)?
992 * Same as UTF16_IS_SINGLE.
993 * @deprecated ICU 2.4. Renamed to U_IS_SINGLE and U16_IS_SINGLE, see utf_old.h.
995 #define UTF_IS_SINGLE(uchar) U16_IS_SINGLE(uchar)
998 * Is this code unit the first one of several (a lead surrogate)?
999 * Same as UTF16_IS_LEAD.
1000 * @deprecated ICU 2.4. Renamed to U_IS_LEAD and U16_IS_LEAD, see utf_old.h.
1002 #define UTF_IS_LEAD(uchar) U16_IS_LEAD(uchar)
1005 * Is this code unit one of several but not the first one (a trail surrogate)?
1006 * Same as UTF16_IS_TRAIL.
1007 * @deprecated ICU 2.4. Renamed to U_IS_TRAIL and U16_IS_TRAIL, see utf_old.h.
1009 #define UTF_IS_TRAIL(uchar) U16_IS_TRAIL(uchar)
1012 * Does this code point require multiple code units (is it a supplementary code point)?
1013 * Same as UTF16_NEED_MULTIPLE_UCHAR.
1014 * @deprecated ICU 2.4. Use U16_LENGTH or test ((uint32_t)(c)>0xffff) instead.
1016 #define UTF_NEED_MULTIPLE_UCHAR(c) UTF16_NEED_MULTIPLE_UCHAR(c)
1019 * How many code units are used to encode this code point (1 or 2)?
1020 * Same as UTF16_CHAR_LENGTH.
1021 * @deprecated ICU 2.4. Renamed to U16_LENGTH, see utf_old.h.
1023 #define UTF_CHAR_LENGTH(c) U16_LENGTH(c)
1026 * How many code units are used at most for any Unicode code point (2)?
1027 * Same as UTF16_MAX_CHAR_LENGTH.
1028 * @deprecated ICU 2.4. Renamed to U16_MAX_LENGTH, see utf_old.h.
1030 #define UTF_MAX_CHAR_LENGTH U16_MAX_LENGTH
1033 * Set c to the code point that contains the code unit i.
1034 * i could point to the lead or the trail surrogate for the code point.
1035 * i is not modified.
1036 * Same as UTF16_GET_CHAR.
1037 * \pre 0<=i<length
1039 * @deprecated ICU 2.4. Renamed to U16_GET, see utf_old.h.
1041 #define UTF_GET_CHAR(s, start, i, length, c) U16_GET(s, start, i, length, c)
1044 * Set c to the code point that starts at code unit i
1045 * and advance i to beyond the code units of this code point (post-increment).
1046 * i must point to the first code unit of a code point.
1047 * Otherwise c is set to the trail unit (surrogate) itself.
1048 * Same as UTF16_NEXT_CHAR.
1049 * \pre 0<=i<length
1050 * \post 0<i<=length
1052 * @deprecated ICU 2.4. Renamed to U16_NEXT, see utf_old.h.
1054 #define UTF_NEXT_CHAR(s, i, length, c) U16_NEXT(s, i, length, c)
1057 * Append the code units of code point c to the string at index i
1058 * and advance i to beyond the new code units (post-increment).
1059 * The code units beginning at index i will be overwritten.
1060 * Same as UTF16_APPEND_CHAR.
1061 * \pre 0<=c<=0x10ffff
1062 * \pre 0<=i<length
1063 * \post 0<i<=length
1065 * @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h.
1067 #define UTF_APPEND_CHAR(s, i, length, c) UTF16_APPEND_CHAR_SAFE(s, i, length, c)
1070 * Advance i to beyond the code units of the code point that begins at i.
1071 * I.e., advance i by one code point.
1072 * Same as UTF16_FWD_1.
1073 * \pre 0<=i<length
1074 * \post 0<i<=length
1076 * @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h.
1078 #define UTF_FWD_1(s, i, length) U16_FWD_1(s, i, length)
1081 * Advance i to beyond the code units of the n code points where the first one begins at i.
1082 * I.e., advance i by n code points.
1083 * Same as UT16_FWD_N.
1084 * \pre 0<=i<length
1085 * \post 0<i<=length
1087 * @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h.
1089 #define UTF_FWD_N(s, i, length, n) U16_FWD_N(s, i, length, n)
1092 * Take the random-access index i and adjust it so that it points to the beginning
1093 * of a code point.
1094 * The input index points to any code unit of a code point and is moved to point to
1095 * the first code unit of the same code point. i is never incremented.
1096 * In other words, if i points to a trail surrogate that is preceded by a matching
1097 * lead surrogate, then i is decremented. Otherwise it is not modified.
1098 * This can be used to start an iteration with UTF_NEXT_CHAR() from a random index.
1099 * Same as UTF16_SET_CHAR_START.
1100 * \pre start<=i<length
1101 * \post start<=i<length
1103 * @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h.
1105 #define UTF_SET_CHAR_START(s, start, i) U16_SET_CP_START(s, start, i)
1108 * Set c to the code point that has code units before i
1109 * and move i backward (towards the beginning of the string)
1110 * to the first code unit of this code point (pre-increment).
1111 * i must point to the first code unit after the last unit of a code point (i==length is allowed).
1112 * Same as UTF16_PREV_CHAR.
1113 * \pre start<i<=length
1114 * \post start<=i<length
1116 * @deprecated ICU 2.4. Renamed to U16_PREV, see utf_old.h.
1118 #define UTF_PREV_CHAR(s, start, i, c) U16_PREV(s, start, i, c)
1121 * Move i backward (towards the beginning of the string)
1122 * to the first code unit of the code point that has code units before i.
1123 * I.e., move i backward by one code point.
1124 * i must point to the first code unit after the last unit of a code point (i==length is allowed).
1125 * Same as UTF16_BACK_1.
1126 * \pre start<i<=length
1127 * \post start<=i<length
1129 * @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h.
1131 #define UTF_BACK_1(s, start, i) U16_BACK_1(s, start, i)
1134 * Move i backward (towards the beginning of the string)
1135 * to the first code unit of the n code points that have code units before i.
1136 * I.e., move i backward by n code points.
1137 * i must point to the first code unit after the last unit of a code point (i==length is allowed).
1138 * Same as UTF16_BACK_N.
1139 * \pre start<i<=length
1140 * \post start<=i<length
1142 * @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h.
1144 #define UTF_BACK_N(s, start, i, n) U16_BACK_N(s, start, i, n)
1147 * Take the random-access index i and adjust it so that it points beyond
1148 * a code point. The input index points beyond any code unit
1149 * of a code point and is moved to point beyond the last code unit of the same
1150 * code point. i is never decremented.
1151 * In other words, if i points to a trail surrogate that is preceded by a matching
1152 * lead surrogate, then i is incremented. Otherwise it is not modified.
1153 * This can be used to start an iteration with UTF_PREV_CHAR() from a random index.
1154 * Same as UTF16_SET_CHAR_LIMIT.
1155 * \pre start<i<=length
1156 * \post start<i<=length
1158 * @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h.
1160 #define UTF_SET_CHAR_LIMIT(s, start, i, length) U16_SET_CP_LIMIT(s, start, i, length)
1162 #endif /* U_HIDE_DEPRECATED_API */
1164 #endif