2 .\" Copyright (c) 2007, Sun Microsystems Inc. All Rights Reserved.
3 .\" The contents of this file are subject to the terms of the Common Development and Distribution License (the "License"). You may not use this file except in compliance with the License.
4 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. See the License for the specific language governing permissions and limitations under the License.
5 .\" When distributing Covered Code, include this CDDL HEADER in each file and include the License file at usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your own identifying information: Portions Copyright [yyyy] [name of copyright owner]
6 .TH U8_TEXTPREP_STR 3C "Sep 18, 2007"
8 u8_textprep_str \- string-based UTF-8 text preparation function
12 #include <sys/u8_textprep.h>
14 \fBsize_t\fR \fBu8_textprep_str\fR(\fBchar *\fR\fIinarray\fR, \fBsize_t *\fR\fIinlen\fR,
15 \fBchar *\fR\fIoutarray\fR, \fBsize_t *\fR\fIoutlen\fR, \fBint\fR \fIflag\fR,
16 \fBsize_t\fR \fIunicode_version\fR, \fBint *\fR\fIerrnum\fR);
26 A pointer to a byte array containing a sequence of UTF-8 character bytes to be
36 As input argument, the number of bytes to be prepared in \fIinarray\fR. As
37 output argument, the number of bytes in \fIinarray\fR still not consumed.
46 A pointer to a byte array where prepared UTF-8 character bytes can be saved.
55 As input argument, the number of available bytes at \fIoutarray\fR where
56 prepared character bytes can be saved. As output argument, after the
57 conversion, the number of bytes still available at \fIoutarray\fR.
66 The possible preparation options constructed by a bitwise-inclusive-OR of the
71 \fB\fBU8_TEXTPREP_IGNORE_NULL\fR\fR
75 Normally \fBu8_textprep_str()\fR stops the preparation if it encounters null
76 byte even if the current \fIinlen\fR is pointing to a value bigger than zero.
78 With this option, null byte does not stop the preparation and the preparation
79 continues until \fIinlen\fR specified amount of \fIinarray\fR bytes are all
80 consumed for preparation or an error happened.
86 \fB\fBU8_TEXTPREP_IGNORE_INVALID\fR\fR
90 Normally \fBu8_textprep_str()\fR stops the preparation if it encounters illegal
91 or incomplete characters with corresponding \fIerrnum\fR values.
93 When this option is set, \fBu8_textprep_str()\fR does not stop the preparation
94 and instead treats such characters as no need to do any preparation.
100 \fB\fBU8_TEXTPREP_TOUPPER\fR\fR
104 Map lowercase characters to uppercase characters if applicable.
110 \fB\fBU8_TEXTPREP_TOLOWER\fR\fR
114 Map uppercase characters to lowercase characters if applicable.
120 \fB\fBU8_TEXTPREP_NFD\fR\fR
124 Apply Unicode Normalization Form D.
130 \fB\fBU8_TEXTPREP_NFC\fR\fR
134 Apply Unicode Normalization Form C.
140 \fB\fBU8_TEXTPREP_NFKD\fR\fR
144 Apply Unicode Normalization Form KD.
150 \fB\fBU8_TEXTPREP_NFKC\fR\fR
154 Apply Unicode Normalization Form KC.
157 Only one case folding option is allowed. Only one Unicode Normalization option
160 When a case folding option and a Unicode Normalization option are specified
161 together, UTF-8 text preparation is done by doing case folding first and then
162 Unicode Normalization.
164 If no option is specified, no processing occurs except the simple copying of
165 bytes from input to output.
171 \fB\fIunicode_version\fR\fR
174 The version of Unicode data that should be used during UTF-8 text preparation.
175 The following values are supported:
179 \fB\fBU8_UNICODE_320\fR\fR
183 Use Unicode 3.2.0 data during comparison.
189 \fB\fBU8_UNICODE_500\fR\fR
193 Use Unicode 5.0.0 data during comparison.
199 \fB\fBU8_UNICODE_LATEST\fR\fR
203 Use the latest Unicode version data available which is Unicode 5.0.0 currently.
214 The error value when preparation is not completed or fails. The following
215 values are supported:
222 Text preparation stopped due to lack of space in the output array.
231 Specified option values are conflicting and cannot be supported.
240 Text preparation stopped due to an input byte that does not belong to UTF-8.
249 Text preparation stopped due to an incomplete UTF-8 character at the end of the
259 The specified Unicode version value is not a supported version.
267 The \fBu8_textprep_str()\fR function prepares the sequence of UTF-8 characters
268 in the array specified by \fIinarray\fR into a sequence of corresponding UTF-8
269 characters prepared in the array specified by \fIoutarray\fR. The \fIinarray\fR
270 argument points to a character byte array to the first character in the input
271 array and \fIinlen\fR indicates the number of bytes to the end of the array to
272 be converted. The \fIoutarray\fR argument points to a character byte array to
273 the first available byte in the output array and \fIoutlen\fR indicates the
274 number of the available bytes to the end of the array. Unless \fIflag\fR is
275 \fBU8_TEXTPREP_IGNORE_NULL\fR, \fBu8_textprep_str()\fR normally stops when it
276 encounters a null byte from the input array regardless of the current
280 If \fIflag\fR is \fBU8_TEXTPREP_IGNORE_INVALID\fR and a sequence of input bytes
281 does not form a valid UTF-8 character, preparation stops after the previous
282 successfully prepared character. If \fIflag\fR is
283 \fBU8_TEXTPREP_IGNORE_INVALID\fR and the input array ends with an incomplete
284 UTF-8 character, preparation stops after the previous successfully prepared
285 bytes. If the output array is not large enough to hold the entire prepared
286 text, preparation stops just prior to the input bytes that would cause the
287 output array to overflow. The value pointed to by \fIinlen\fR is decremented to
288 reflect the number of bytes still not prepared in the input array. The value
289 pointed to by \fIoutlen\fR is decremented to reflect the number of bytes still
290 available in the output array.
294 The \fBu8_textprep_str()\fR function updates the values pointed to by
295 \fIinlen\fR and \fIoutlen\fR arguments to reflect the extent of the
296 preparation. When \fBU8_TEXTPREP_IGNORE_INVALID\fR is specified,
297 \fBu8_textprep_str()\fR returns the number of illegal or incomplete characters
298 found during the text preparation. When \fBU8_TEXTPREP_IGNORE_INVALID\fR is not
299 specified and the text preparation is entirely successful, the function returns
300 0. If the entire string in the input array is prepared, the value pointed to by
301 \fIinlen\fR will be 0. If the text preparation is stopped due to any conditions
302 mentioned above, the value pointed to by \fIinlen\fR will be non-zero and
303 \fIerrnum\fR is set to indicate the error. If such and any other error occurs,
304 \fBu8_textprep_str()\fR returns (\fBsize_t\fR)-1 and sets \fIerrnum\fR to
308 \fBExample 1 \fRSimple UTF-8 text preparation
312 #include <sys/u8_textprep.h>
325 * We got a UTF-8 pathname from somewhere.
327 * Calculate the length of input string including the terminating
328 * NULL byte and prepare other arguments.
330 (void) strlcpy(ib, pathname, MAXPATHLEN);
335 * Do toupper case folding, apply Unicode Normalization Form D,
336 * ignore NULL byte, and ignore any illegal/incomplete characters.
338 ret = u8_textprep_str(ib, &il, ob, &ol,
339 (U8_TEXTPREP_IGNORE_NULL|U8_TEXTPREP_IGNORE_INVALID|
340 U8_TEXTPREP_TOUPPER|U8_TEXTPREP_NFD), U8_UNICODE_LATEST, &err);
341 if (ret == (size_t)-1) {
356 See \fBattributes\fR(5) for descriptions of the following attributes:
364 ATTRIBUTE TYPE ATTRIBUTE VALUE
366 Interface Stability Committed
374 \fBu8_strcmp\fR(3C), \fBu8_validate\fR(3C), \fBattributes\fR(5),
375 \fBu8_strcmp\fR(9F), \fBu8_textprep_str\fR(9F), \fBu8_validate\fR(9F)
378 The Unicode Standard (http://www.unicode.org)
382 After the text preparation, the number of prepared UTF-8 characters and the
383 total number bytes may decrease or increase when you compare the numbers with
387 Case conversions are performed using Unicode data of the corresponding version.
388 There are no locale-specific case conversions that can be performed.