3 codesets.library/codesets.library
4 codesets.library/CodesetsSupportedA
5 codesets.library/CodesetsFindA
6 codesets.library/CodesetsFindBestA
7 codesets.library/CodesetsConvertStrA
8 codesets.library/CodesetsFreeA
9 codesets.library/CodesetsFreeVecPooledA
10 codesets.library/CodesetsSetDefaultA
11 codesets.library/CodesetsListCreateA
12 codesets.library/CodesetsListDeleteA
13 codesets.library/CodesetsListAddA
14 codesets.library/CodesetsListRemoveA
15 codesets.library/CodesetsUTF8CreateA
16 codesets.library/CodesetsUTF8ToStrA
17 codesets.library/CodesetsUTF8Len
18 codesets.library/CodesetsIsValidUTF8
19 codesets.library/CodesetsIsLegalUTF8
20 codesets.library/CodesetsIsLegalUTF8Sequence
21 codesets.library/CodesetsStrLenA
22 codesets.library/CodesetsConvertUTF16toUTF32
23 codesets.library/CodesetsConvertUTF16toUTF8
24 codesets.library/CodesetsConvertUTF32toUTF16
25 codesets.library/CodesetsConvertUTF32toUTF8
26 codesets.library/CodesetsConvertUTF8toUTF16
27 codesets.library/CodesetsConvertUTF8toUTF32
28 codesets.library/CodesetsDecodeB64A
29 codesets.library/CodesetsEncodeB64A
31 \fcodesets.library/codesets.library
33 *******************************************************************
34 Copyright (c) 2005-2008 by codesets.library Open Source Team
38 codesets.library is an AmigaOS shared library which provides
39 functions to deal with different kind of codesets. It provides
40 general character conversion routines, e.g. for converting
41 from one charset (e.g. UTF8) into another (e.g. ISO-8859-1) or
44 codesets.library is mainly based on some code from UNICODE, some
45 code from the SimpleMail project as well as some additions done
46 by the codesets.library Open Source Team.
48 It is released and distributed under the terms of the GNU Lesser
49 General Public License (LGPL) and available free of charge.
51 Please visit http://www.sf.net/projects/codesetslib/ for
52 the very latest version and information regarding codesets.library.
53 *******************************************************************
55 For some short introduction on how to use codesets.library, the
56 following pharagraph should provide a good summary. What you
57 usually want to do with codesets.library is, to convert strings from
58 one so-called "Source Codeset" into another "Destination Codeset".
59 The following list are only the main functions provided to
60 developers, wanting to achieve this conversion in their applications:
66 For querying codesets library which codesets/charsets it supports
67 either by its internal available charsets or by having obtained
68 them from the operating system (e.g. AmigaOS4), this function
71 E.g. in a MUI application you would do something like:
76 if((array = CodesetsSupportedA(NULL)))
78 DoMethod(list, MUIM_List_Insert, array, -1, MUIV_List_Insert_Sorted);
79 CodesetsFreeA(array, NULL);
88 For processing/converting a specific string, you normally have to
89 specify in which codeset this string has to be intepreted. For this
90 purpose you have to pass a so-called "Source Codeset" to the main
91 function of codesets.library. With the "CodesetsFindA()" function you
92 can query codesets.library for providing you a pointer to the
93 corresponding codeset structure which you afterwards will forward to
94 the main conversion routines later on.
96 For receiving the pointer to the Amiga-1251 codeset:
100 if((cs = CodesetsFind("Amiga-1251",
101 CSA_FallbackToDefault, FALSE,
108 For querying codesets.library for the currently used system wide
109 default of your running operating system:
111 struct codeset *default;
113 if((default = CodesetsFindA(NULL, NULL)))
121 CodesetsConvertStrA()
122 ---------------------
124 The more or less most common function to use in codesets.library is
125 definitly this function. It allows to convert a string from
126 one "Source Codeset" to another "Destination Codeset". It takes
127 the source string converts it internally into UTF8 if necessary and
128 then directly convert the UTF8 to the specified destination codeset.
130 To convert a string 'str' to a destination codeset:
134 if((destString = CodesetsConvertStr(CSA_SourceCodeset, srcCodeset,
135 CSA_DestCodeset, destCodeset,
141 CodesetsFreeA(destString, NULL);
145 Even if the above functions should cover most of the common functionality
146 an ordinary user of codesets.library would require, it supplies a lot more
147 functions which in fact we will not go into detail here but present
148 certain examples in the respective documentation section of each function.
150 However, if you find the documentation is still too limited or you feel
151 some major functionality is missing regarding dealing with codesets,
152 please let us know so that we or even you can improve it.
155 Your codesets.library Open Source Team.
158 \fcodesets.library/CodesetsSupportedA
161 CodesetsSupportedA - returns names of supported codesets
164 array = CodesetsSupportedA(attrs);
167 STRPTR * CodesetsSupportedA(struct TagItem *);
169 array = CodesetsSupported(tag1, ...);
172 STRPTR * CodesetsSupported(Tag, ...);
175 Returns a NULL terminated array of the supported codeset
176 names. The array _must_ be freed with CodesetsFreeA().
179 attrs - a list of additional tag items. Valid items are:
181 CSA_CodesetList (struct codesetList *)
182 You may supply an unlimited number of additional
183 codeset lists which you have previously allocated/loaded
184 with CodesetsListCreateA(). Otherwise just the internal
185 list of available codesets will be searched.
188 CSA_AllowMultibyteCodesets (BOOL)
189 Include multibyte codesets (UTF8, UTF16, UTF32) in the
190 generated names array.
194 array - the names array or NULL on an error.
197 For printing out all supported codeset names:
202 if((array = CodesetsSupportedA(NULL)))
206 for(i=0; array[i] != NULL; i++)
207 printf("%s", array[i]);
209 CodesetsFreeA(array, NULL);
214 codesets.library/CodesetsListCreateA
216 \fcodesets.library/CodesetsFindA
219 CodesetsFindA - finds a codeset
222 codeset = CodesetsFindA(name, attrs);
225 struct codeset * CodesetsFindA(STRPTR, struct TagItem *);
227 codeset = CodesetsFind(name, tag1, ...);
230 struct codeset * CodesetsFind(STRPTR, Tag, ...);
233 Finds and returns a codeset by its name. The data behind the
234 pointer should be considered read-only and must not be altered
238 name - the codeset name (or alias) to find
239 attrs - a list of additional tag items. Valid items are:
241 CSA_FallbackToDefault (BOOL)
242 If TRUE the function never fails and returns the default
243 codeset if the supplied codeset name can't be found.
246 CSA_CodesetList (struct codesetList *)
247 You may supply an unlimited number of additional
248 codeset lists which you have previously allocated/loaded
249 with CodesetsListCreateA(). Otherwise just the internal
250 list of available codesets will be searched.
254 codeset - the codeset or NULL on an error
257 E.g. for receiving the pointer to the Amiga-1251 codeset:
262 if((cs = CodesetsFind("Amiga-1251",
263 CSA_FallbackToDefault, FALSE,
270 For querying codesets.library for the currently used system
271 wide default of your running operating system:
273 struct codeset *default;
275 if((default = CodesetsFindA(NULL, NULL)))
282 Please note for querying the system's default codeset the
283 method of finding this codeset is highly dependent on the way
284 the operating system can be queried for it. E.g. on AmigaOS4
285 the default codeset is queried with updated system functions,
286 but for AmigaOS3 a static list of language<>codeset mappings
290 codesets.library/CodesetsListCreateA
292 \fcodesets.library/CodesetsFindBestA
295 CodesetsFindBestA - finds the best codeset matching a
299 codeset = CodesetsFindBestA(attrs);
302 struct codeset * CodesetsFindBestA(struct TagItem *);
304 codeset = CodesetsFindBest(tag1, ...);
307 struct codeset * CodesetsFindBest(Tag, ...);
310 Returns the best found codeset for the given text in the supplied
311 codeset family. In case no proper codeset for the supplied source string
312 could be found, NULL is returned or the default codeset if the
313 CSA_FallbackToDefault attribute is set to TRUE. In addition, in case
314 the CSA_ErrPtr is given, the amount of failed identifications (chars)
318 attrs - a list of tag items. Valid items are:
321 The string which you want to convert. Must be supplied,
322 otherwise the functions returns NULL.
324 CSA_SourceLen (ULONG)
325 Length of CSA_Source or less to check just a part
326 Default: string length of CSA_Source
329 Pointer to an integer variable which will be filled with the
330 number of found errors (not identifyable chars)
333 CSA_CodesetList (struct codesetList *)
334 You may supply an unlimited number of additional
335 codeset lists which you have previously allocated/loaded
336 with CodesetsListCreateA(). Otherwise just the internal
337 list of available codesets will be searched.
340 CSA_CodesetFamily (ULONG)
341 To narrow the analyze, a user might define the codeset family
342 of which the supplied text might be composed of. The reason for
343 this is, that there isn't a unique identification algorithm
344 which can tell the codeset out of a given text. So to narrow
345 the identification, the follow values might be specified:
347 CSV_CodesetFamily_Latin - Latin codeset family (e.g. ISO-8859-X)
348 CSV_CodesetFamily_Cyrillic - Cyrillic codeset family (e.g. KOI8R)
350 Default: CSV_CodesetFamily_Latin
352 CSA_FallbackToDefault (BOOL)
353 If TRUE the function never fails and returns the default
354 codeset if the supplied text couldn't be identified
358 codeset - the best matching codeset or NULL in case a NULL pointer
359 was supplied as the source string.
362 E.g. for receiving the pointer to 'best matching' codeset matching
367 char str[] = "îÅ×ÏÚÍÏÖÎÏ ÐÅÒÅËÏÄÉÒÏ×ÁÔØ ÉÚ ËÏÄÉÒÏ×ËÉ";
370 if((cs = CodesetsFindBest(CSA_Source, str,
372 CSA_CodesetFamily, CSV_CodesetFamily_Cyrillic,
373 CSA_FallBackToDefault, FALSE,
376 ... should return the KOI8-R codeset ...
381 codesets.library/CodesetsListCreateA
383 \fcodesets.library/CodesetsConvertStrA
386 CodesetsConvertStrA - converts a string from one source codeset to
387 another destination codeset.
390 dest = CodesetsConvertStrA(attrs)
393 STRPTR CodesetsConvertStrA(struct TagItem *);
395 dest = CodesetsConvertStr(tag1, ...);
398 STRPTR CodesetsConvertStr(Tag, ...);
401 The function takes source string which is encoded in a so-called
402 'Source codeset' and converts it immediately into an equivalent
403 string which will be encoded in the corresponding 'Destination Codeset'.
406 attrs - a list of mandatory tag items. Valid items are:
409 The string which you want to convert. Must be supplied,
410 otherwise the functions returns NULL.
412 CSA_SourceLen (ULONG)
413 Length of CSA_Source or less to convert just a part
414 Default: string length of CSA_Source
416 CSA_SourceCodeset (struct codeset *)
417 The codeset in which the source string is encoded.
418 Default: the system's default codeset
420 CSA_DestCodeset (struct codeset *)
421 The codeset to which the source string should be converted to.
422 Default: the system's default codeset
424 CSA_DestLenPtr (ULONG *)
425 If supplied, will contain the length of the converted string
428 CSA_MapForeignChars (BOOL)
429 If a character of the source string cannot be directly mapped
430 to the destination codeset a "?" character will normally be used
431 to signal this case. If this attribute is set, an internal
432 replacement table will be used which tries to replace these
433 "foreign" characters by "looklike" ASCII character sequences.
434 Please note, that this functionality is mostly just usable by
435 Latin users due to the straight mapping to ASCII (7bit).
438 CSA_MapForeignCharsHook (struct Hook *)
439 If a character of the source string cannot be directly mapped
440 to the destination codeset a "?" character will normally be used
441 to signal this case. By using this attribute, a hook can be
442 supplied which is called for every such foreign character.
443 Within this hook the UTF8 sequence is supplied which cannot be
444 directly mapped to the destination codeset. During the execution
445 of the hook a replacement string might be specified, which in turn
446 will be used by the internals of codesets.library to map this
447 "foreign" char to a difference character or UTF8 sequence.
449 If both, CSA_MapForeignChars and CSA_MapForeignCharsHook, are
450 specified the hook will only be executed in case the internal
451 routines don't supply an own mapping for the foreign UTF8 sequence.
453 The hook function should be declared as:
455 ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
456 REG(a2, struct replaceMsg *msg),
457 REG(a1, void *dummy))
463 place your desired replacement string here
466 the UTF8 sequence to be replaced, this string is READ-ONLY!
469 the length of the UTF8 sequence to be replaced, do NOT peek
472 The return value of this hook function is the length of the replacement
473 string. Return zero if no replacement did happen. Positive values will
474 be treated as lengths of ASCII strings. Negative values signals a
475 replacement by another UTF8 sequence. Please note, that in case you
476 supply a UTF8 sequence as a replacement for the "foreign" UTF8, your
477 hook might be called again if this sequence can still not be mapped to
478 the destination codesets, thus is again a "foreign" sequence.
482 either a pointer to the generated destination string or NULL
486 To convert an ISO-8859-1 encoded string 'src' into an Amiga-1251
487 equivalent 'dst' string:
490 struct codeset *srcCodeset, *dstCodeset;
492 srcCodeset = CodesetsFindA("ISO-8859-1", NULL);
493 dstCodeset = CodesetsFindA("Amiga-1251", NULL);
495 if((dst = CodesetsConvertStr(CSA_SourceCodeset, srcCodeset,
496 CSA_DestCodeset, dstCodeset,
502 CodesetsFreeA(dst, NULL);
507 codesets.library/CodesetsFreeA
509 \fcodesets.library/CodesetsFreeA
512 CodesetsFreeA - frees objects previously internally allocated
516 CodesetsFreeA(obj, attrs)
518 void CodesetsFreeA(APTR, struct TagItem *);
520 CodesetsFree(obj, tag1, ...);
522 void CodesetsFree(APTR, Tag, ...);
525 Frees object previously allocated by codesets.library. E.g. using
526 functions like CodesetsSupportedA() or CodesetsConvertStrA().
529 obj - the object to free
530 attrs - a list of additional tag items. Currently non items.
540 if((array = CodesetsSupportedA(NULL)))
544 CodesetsFreeA(array, NULL);
549 codesets.library/CodesetsSupportedA
550 codesets.library/CodesetsConvertStrA
552 \fcodesets.library/CodesetsFreeVecPooledA
555 CodesetsFreeVecPooledA - frees objects previously allocated
556 by methods supporting CSA_Pool
559 CodesetsFreeVecPooledA(pool, obj, attrs)
561 void CodesetsFreeVecPooledA(APTR, APTR, struct TagItem *);
563 CodesetsFreeVecPooled(pool, obj, tag1, ...);
565 void CodesetsFreeVecPooled(APTR, APTR, Tag, ...);
568 Frees object previously allocated by codesets.library via a
569 private memory pool which was previously used on codesets
570 functions via the CSA_Pool tag.
573 pool - pointer to the private memory pool
574 obj - the object to free
575 attrs - a list of additional tag items. Valid tags are:
577 CSA_PoolSem (struct SignalSemaphore *)
578 A semaphore to lock when using CSA_Pool
590 if((utf8 = CodesetsUTF8Create(CSA_Source, str,
596 CodesetsFreeVecPooledA(pool,utf8,NULL);
601 codesets.library/CodesetsUTF8CreateA
602 codesets.library/CodesetsUTF8ToStrA
604 \fcodesets.library/CodesetsSetDefaultA
607 CodesetsSetDefaultA - sets the default codeset, overwriting
608 the system default if necessary.
611 codeset = CodesetsSetDefaultA(name, attrs);
614 struct codeset * CodesetsSetDefaultA(STRPTR, struct TagItem *);
616 codeset = CodesetsSetDefault(name, tag1, ...);
619 struct codeset * CodesetsSetDefault(STRPTR, Tag, ...);
622 Sets the default codeset to name. The codeset will be stored in
623 the environment variable 'codeset_default'.
626 name - the name of the codeset to set as default
627 attrs - a list of additional tag items. Valid items are:
630 If TRUE the codeset will be permanently saved and survives
631 a reset. Otherwise the default setting will just last until
636 codeset - the codeset or NULL
639 In case the operating system supports the direct query of the
640 currently active system's default codeset, this function will
641 still overwrite this setting. So by using this method a user may
642 overwrite all system's setting and set a global default codeset
643 for his machine no matter what the OS suggests. However, in case
644 your operating sytsem perfectly supports the querying of the
645 system's default codeset (e.g. AmigaOS4) you are adviced to use
646 this function with care - or even avoid to use it at all.
649 codesets.library/CodesetsFindA
651 \fcodesets.library/CodesetsListCreateA
654 CodesetsListCreateA - creates a private, task-wise codeset list
655 and returns it to the user for further reference.
658 list = CodesetsListCreateA(attrs);
661 struct codesetList * CodesetsListCreateA(struct TagItem *);
663 list = CodesetsListCreate(tag1, ...);
666 struct codesetList * CodesetsListCreateA(Tag, ...);
669 This function allows to create a private, task-wise codeset list by
670 loading charset files from either a whole directory tree, a specific
671 charset file or even by using an exsiting codeset structure.
672 By using this function, an application might load and carry its very
673 own private charsets in parallel to the internal charsets of
674 codeset.library. This way each application can provide a different
675 codeset list to the user without having to load and manage these
679 attrs - a list of addtional tag items. Valid items are:
681 CSA_CodesetDir (STRPTR)
682 The path to a whole directory which codesets library will
683 walk through for searching for proper charset files.
686 CSA_CodesetFile (STRPTR)
687 The path to a specific file which codesets.library will try
688 to load as a standard charset translation file.
691 CSA_SourceCodeset (struct codeset *)
692 The pointer to an already existing codeset structure which
693 will immediately be added to the created list. Please be
694 carefull to add one codeset to multiple lists, especially
695 when you do a CodesetsListDelete() to free the list.
699 list - the private codeset list or NULL on an error condition
702 For convienence, if no tag item attribute at all is supplied to the
703 function, codesets.library will try to load charsets from the
704 corresponding "PROGDIR:Charsets" directoy and add found codeset to
705 the list. However, in case a tag item is specified (no matter what
706 kind) the PROGDIR: scanning will be omitted.
709 For loading all found charset files from PROGDIR:Charsets:
712 struct codesetList *csList;
714 if((csList = CodesetsListCreateA(NULL)))
716 STRPTR codesetArray = CodesetsSupported(CSA_CodesetList, csList,
719 // codesetsArray should now also carry our private
720 // codesets from PROGDIR:Charsets
723 CodesetsListDeleteA(CSA_CodesetList, csList,
729 codesets.library/CodesetsListDeleteA
730 codesets.library/CodesetsListAddA
731 codesets.library/CodesetsListRemoveA
732 codesets.library/CodesetsListSupportedA
733 codesets.library/CodesetsListFindA
734 codesets.library/CodesetsListFindBestA
736 \fcodesets.library/CodesetsListDeleteA
739 CodesetsListDeleteA - deletes/frees all resources of previously created
740 private codeset lists.
743 result = CodesetsListDeleteA(attrs);
746 BOOL CodesetsListDeleteA(struct TagItem *);
748 result = CodesetsListDelete(tag1, ...);
751 BOOL CodesetsListDelete(Tag, ...);
754 This function deletes all resources (also the contained codeset
755 structures per default) and frees the memory of previously allocated
756 private codeset lists.
759 attrs - a list of mandatory tag items. Valid items are:
761 CSA_CodesetList (struct codesetList *)
762 Pointer to a previously created, private codeset list whos
763 resources should be freed.
766 CSA_FreeCodesets (BOOL)
767 If TRUE, all contained codesets should also be freed/deleted,
768 otherwise just frees the list object itself.
772 result - TRUE on success otherwise FALSE
775 Please note that if you added an explicit codeset structure to more
776 than two private codeset lists you may run into problems with you
777 don't take care of this yourself. This is a dumb function which just
778 walks through the list and frees all resources. Set CSA_FreeCodesets
779 to FALSE in case you just want to free the list object.
782 codesets.library/CodesetsListCreateA
783 codesets.library/CodesetsListAddA
784 codesets.library/CodesetsListRemoveA
786 \fcodesets.library/CodesetsListAddA
789 CodesetsListAddA - allows to add additional codesets to an already
790 existing private codeset list previously created with
791 CodesetsListCreateA().
794 result = CodesetsListAddA(attrs);
797 BOOL CodesetsListAddA(struct TagItem *);
799 result = CodesetsListAdd(tag1, ...);
802 BOOL CodesetsListAdd(Tag, ...);
805 This function allows to add additional codesets to an already existing
806 private codeset list. Either codesets themself may be added directly, or
807 the path to either a file or a directory may be specified from which
808 additional codesets may be loaded from known charset files.
811 attrs - a list of mandatory tag items. Valid items are:
813 CSA_CodesetDir (STRPTR)
814 The path to a whole directory which codesets library will
815 walk through for searching for proper charset files.
818 CSA_CodesetFile (STRPTR)
819 The path to a specific file which codesets.library will try
820 to load as a standard charset translation file.
823 CSA_SourceCodeset (struct codeset *)
824 The pointer to an already existing codeset structure which
825 will immediately be added to the created list. Please be
826 carefull to add one codeset to multiple lists, especially
827 when you do a CodesetsListDelete() to free the list.
831 result - TRUE on success otherwise FALSE
834 Be careful when adding one codeset to more than one codeset list as
835 you may run into problems when freeing the list afterwards.
838 codesets.library/CodesetsListCreateA
839 codesets.library/CodesetsListDeleteA
840 codesets.library/CodesetsListAddA
842 \fcodesets.library/CodesetsListRemoveA
845 CodesetsListRemoveA - removes a single or multiple codesets from a
846 previously created codeset list.
849 result = CodesetsListRemoveA(attrs);
852 BOOL CodesetsListRemoveA(struct TagItem *);
854 result = CodesetsListRemove(tag1, ...);
857 BOOL CodesetsListRemove(Tag, ...);
860 This function allows to remove single or multiple codesets from a
861 previously created codeset list. The removed codeset structures will
862 also be freed/deleted per default.
865 attrs - a list of mandatory tag items. Valid items are:
867 CSA_SourceCodeset (struct codeset *)
868 Pointer to a codeset structure which should be removed from
869 its corresponding list. Per default its resources will also
873 CSA_FreeCodesets (BOOL)
874 If TRUE, all supplied codesets should also be freed/deleted,
875 otherwise the codesets will just be removed from their lists.
879 result - TRUE on success otherwise FALSE
882 The function will automatically prevent removal of codesets from the
883 internal codeset list of codesets.library and will return FALSE in
884 case a user tried to remove a codeset from the internal list.
887 codesets.library/CodesetsListDeleteA
888 codesets.library/CodesetsListAddA
890 \fcodesets.library/CodesetsUTF8CreateA
893 CodesetsUTF8CreateA - creates an UTF8 compliant string
894 interpretation out of a supplied source
899 utf8 = CodesetsUTF8CreateA(attrs);
901 UTF8 * CodesetsUTF8CreateA(struct TagItem *);
903 utf8 = CodesetsUTF8Create(tag1, ...);
905 UTF8 * CodesetsUTF8Create(Tag, ...);
909 Creates an UTF8 from a string which is encoded in specified
913 attrs - a list of mandatory tag items. Valid items are:
916 The string which you want to convert. Must be supplied,
917 otherwise the functions returns NULL.
919 CSA_SourceLen (ULONG)
920 Length of CSA_Source or less to convert just a part
921 Default: string length of CSA_Source
923 CSA_SourceCodeset (struct codeset *)
924 The codeset in which the source string is encoded.
925 Default: the system's default codeset
928 Destination buffer. If you supply a valid buffer here, you
929 must also set CSA_DestLen to the length of your buffer. If
930 CSA_AllocIfNeeded is TRUE, CSA_DestLen is checked to see if
931 CSA_Dest may contain the whole utf8. If CSA_Dest can't
932 contain the utf8, a brand new buffer is allocated. If
933 CSA_AllocIfNeeded is FALSE, up to CSA_DestLen (ending '\0'
934 included) are written to CSA_Dest. If CSA_DestHook is supplied,
938 CSA_DestHook (struct Hook *)
939 Destination hook. If this is supplied, it is called with a
940 partial converted string.
942 The hook function should be declared as:
944 ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
945 REG(a2, struct convertMsg *msg),
952 The partial '\0' terminated buffer
963 length of string 'buf'
965 You may define the min length of the buffer via CSA_DestLen.
966 If so, accepted values are 16<=v<=sizeof_codeset_buffer.
968 Don't count on this size to be fixed, even if you used
972 If CSA_DestHook is used, it represents the min length of the
973 buffer that causes hook calls. Otherwise it is the size of
974 the buffer supplied in CSA_Dest. So if CSA_DestHook is
975 supplied, CSA_DestLen is optional, otherwise it is required.
977 CSA_DestLenPtr (ULONG *)
978 If supplied, will contain the length of the utf8 string
980 CSA_AllocIfNeeded (BOOL)
981 If the destination buffer length is too small to contain
982 the UTF8 a new buffer is allocated
986 If a new destination buffer needs to be allocated (it happens
987 if and only if CSA_DestHook is not used, CSA_AllocIfNeeded
988 is TRUE, or if CSA_Dest buffer is too small for the utf8) this
989 pool is used. The result must be freed via
990 CodesetsFreeVecPooledA(pool, utf8, NULL).
991 If CSA_Pool is not supplied, the destination buffer is allocated
992 from the internal memory pool and must be freed via
993 CodesetsFreeA(utf8, NULL).
995 CSA_PoolSem (struct SignalSemaphore *)
996 A semaphore to lock when using CSA_Pool
999 utf8 - the utf8 string or NULL
1000 If CSA_DestHook is used always NULL.
1001 If CSA_DestHook is not used NULL means failure
1005 The shortest invocation is:
1010 if((utf8 = CodesetsUTF8Create(CSA_Source, str,
1015 CodesetsFreeA(utf8,NULL);
1020 In case you want to use your pool to allocate mem:
1026 if((utf8 = CodesetsUTF8Create(CSA_Source, str,
1032 CodesetsFreeVecPooledA(pool,utf8,NULL);
1037 If your pool is to be arbitrated via a semaphore:
1042 struct SignalSemaphore *sem;
1044 if((utf8 = CodesetsUTF8Create(CSA_Source, str,
1051 CodesetsFreeVecPooledA(pool,utf8,NULL);
1056 If you want to use your own buffer to reduce mem
1062 if((utf8 = CodesetsUTF8Create(CSA_Source, str,
1064 CSA_DestLen, sizeof(buf),
1070 CodesetsFreeA(utf8,NULL);
1075 If your string are max MAXLEN chars long (e.g. image to be
1076 in a MUI application and you know the max size of your
1077 string gadgets), you should better supply your own buffer:
1080 STRPTR buf[MAXSIZE*6+1];
1082 if((utf8 = CodesetsUTF8Create(CSA_Source, str,
1084 CSA_Dest, sizeof(buf),
1092 If you strings are very large and so you are sure there is
1093 no mem for them and or you have your own reasons to do
1096 static ULONG ASM SAVEDS
1097 destFun(REG(a0, struct Hook *hook),
1098 REG(a2, struct convertMsg *msg),
1099 REG(a1, STRPTR buf))
1101 printf("[%3ld] [%s]\n",msg->len,buf);
1102 if(msg->state == CSV_End)
1109 dest.h_Entry = (HOOKFUNC)destFun;
1111 CodesetsUTF8Create(CSA_Source, str,
1112 CSA_DestHook, &dest,
1117 codesets.library/CodesetsUTF8ToStrA
1118 codesets.library/CodesetsUTF8Len
1120 \fcodesets.library/CodesetsUTF8ToStrA
1123 CodesetsUTF8ToStrA - converts an UTF8 encoded string into
1124 a specified destination codeset.
1128 str = CodesetsUTF8ToStrA(attrs);
1131 STRPTR CodesetsUTF8ToStrA(attrs);
1133 str = CodesetsUTF8ToStr(tag1, ...);
1136 STRPTR CodesetsUTF8ToStr(Tag,...);
1140 Convert an utf8 string to a specified codeset.
1143 attrs - a list of mandatory tag items. Valid items are:
1146 The string which you want to convert. Must be supplied,
1147 otherwise the functions returns NULL.
1149 CSA_SourceLen (ULONG)
1150 Length of CSA_Source. Must be > 0 or the function returns
1152 Default: string length of CSA_Source - strlen()
1155 Destination buffer. If you supply a valid buffer here, you
1156 must also set CSA_DestLen to the length of your buffer. If
1157 CSA_AllocIfNeeded is TRUE, CSA_DestLen is checked to see if
1158 CSA_Dest may contain the whole converted string. If CSA_Dest
1159 can't contain the output string, a brand new buffer is allocated.
1160 If CSA_AllocIfNeeded is FALSE, up to CSA_DestLen (ending '\0'
1161 included) are written to CSA_Dest. If CSA_DestHook is supplied,
1162 CSA_Dest is ignored.
1165 CSA_DestCodeset (struct codeset *)
1166 The codeset to which the UTF8 string should be encoded to.
1167 Default: the system's default codeset
1169 CSA_DestHook (struct Hook *)
1170 Destination hook. If this is supplied, it is called with a
1171 partial converted string.
1173 The hook function should be declared as:
1175 ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
1176 REG(a2, struct convertMsg *msg),
1177 REG(a1, STRPTR buf))
1183 The partial '\0' terminated buffer
1194 length of string 'buf'
1196 You may define the min length of the buffer via CSA_DestLen.
1197 If so, accepted values are 16<=v<=sizeof_codeset_buffer.
1199 Don't count on this size to be fixed, even if you used
1203 If CSA_DestHook is used, it represents the min length of the
1204 buffer that causes hook calls. Otherwise it is the size of
1205 the buffer supplied in CSA_Dest. So if CSA_DestHook is
1206 supplied, CSA_DestLen is optional, otherwise it is required.
1208 CSA_DestLenPtr (ULONG *)
1209 If supplied, will contain the length of the converted string.
1211 CSA_AllocIfNeeded (BOOL)
1212 If the destination buffer length is too small to contain
1213 the output string, a new buffer is allocated.
1217 If a new destination buffer needs to be allocated (it happens
1218 if and only if CSA_DestHook is not used, CSA_AllocIfNeeded
1219 is TRUE, or if CSA_Dest buffer is too small for the utf8) this
1220 pool is used. The result must be freed via
1221 CodesetsFreeVecPooledA(pool, string, NULL).
1222 If CSA_Pool is not supplied, the destination buffer is allocated
1223 from the internal memory pool and must be freed via
1224 CodesetsFreeA(string, NULL).
1226 CSA_PoolSem (struct SignalSemaphore *)
1227 A semaphore to lock when using CSA_Pool
1230 Pointer to an integer variable which will be filled with the
1231 number of found issues (number of not convertable chars)
1234 CSA_MapForeignChars (BOOL)
1235 If a character of the source string cannot be directly mapped
1236 to the destination codeset a "?" character will normally be used
1237 to signal this case. If this attribute is set, an internal
1238 replacement table will be used which tries to replace these
1239 "foreign" characters by "looklike" ASCII character sequences.
1240 Please note, that this functionality is mostly just usable by
1241 Latin users due to the straight mapping to ASCII (7bit).
1244 CSA_MapForeignCharsHook (struct Hook *)
1245 If a character of the source string cannot be directly mapped
1246 to the destination codeset a "?" character will normally be used
1247 to signal this case. By using this attribute, a hook can be
1248 supplied which is called for every such foreign character.
1249 Within this hook the UTF8 sequence is supplied which cannot be
1250 directly mapped to the destination codeset. During the execution
1251 of the hook a replacement string might be specified, which in turn
1252 will be used by the internals of codesets.library to map this
1253 "foreign" char to a difference character or UTF8 sequence.
1255 If both, CSA_MapForeignChars and CSA_MapForeignCharsHook, are
1256 specified the hook will only be executed in case the internal
1257 routines don't supply an own mapping for the foreign UTF8 sequence.
1259 The hook function should be declared as:
1261 ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
1262 REG(a2, struct replaceMsg *msg),
1263 REG(a1, void *dummy))
1269 place your desired replacement string here
1272 the UTF8 sequence to be replaced, this string is READ-ONLY!
1275 the length of the UTF8 sequence to be replaced, do NOT peek
1278 The return value of this hook function is the length of the replacement
1279 string. Return zero if no replacement did happen. Positive values will
1280 be treated as lengths of ASCII strings. Negative values signals a
1281 replacement by another UTF8 sequence. Please note, that in case you
1282 supply a UTF8 sequence as a replacement for the "foreign" UTF8, your
1283 hook might be called again if this sequence can still not be mapped to
1284 the destination codesets, thus is again a "foreign" sequence.
1288 str - the string or NULL
1289 If CSA_DestHook is used always NULL.
1290 If CSA_DestHook is not used NULL means failure
1294 codesets.library/CodesetsUTF8CreateA
1295 codesets.library/CodesetsUTF8Len
1297 \fcodesets.library/CodesetsUTF8Len
1300 CodesetsUTF8Len - returns the length of a supplied utf8 string.
1303 len = CodesetsUTF8Len(utf8);
1306 ULONG CodesetsUTF8Len(UTF8 *);
1309 Returns the amount of real characters stored in a supplied UTF8
1310 string. This is _NOT_ the space required to store the UTF8 string,
1311 it is the actual number of _real_ character the UTF8 represents.
1314 utf8 - pointer to the UTF8 string generated by the internal
1315 functions of codesets.library
1318 len - length of utf8
1321 codesets.library/CodesetsUTF8CreateA
1322 codesets.library/CodesetsUTF8ToStrA
1324 \fcodesets.library/CodesetsIsValidUTF8
1327 CodesetsIsValidUTF8 - tells if a supplied standard string is meant to
1328 carry a perfectly valid UTF8 sequence
1331 result = CodesetsIsValidUTF8(str);
1334 BOOL CodesetsIsValidUTF8(STRPTR);
1337 Returns TRUE in case the supplied string only contains char sequences
1338 which are compatible to the UTF8 standard.
1341 str - a standard STRPTR string.
1344 result - TRUE in case the string conatins valid UTF8 data.
1347 This function uses the common 'GOOD_UCS' macro together with parsing
1348 the whole string. This means that it will only return TRUE in case
1349 the supplied string only contains UTF8 sequences. A mixture of UTF8
1350 and non-UTF8 sequences will result in the function returning FALSE.
1353 codesets.library/CodesetsUTF8CreateA
1354 codesets.library/CodesetsUTF8ToStrA
1356 \fcodesets.library/CodesetsIsLegalUTF8
1359 CodesetsIsLegalUTF8 - check a UTF8 sequence
1362 res = CodesetsIsLegalUTF8(source, length);
1365 ULONG CodesetsIsLegalUTF8(UTF8 *, ULONG);
1369 Checks if source is a valid UTF8 sequence generated
1370 by the internal functions of codesets.library
1373 source - the char sequence to check
1374 length - size of source
1380 codesets.library/CodesetsUTF8CreateA
1381 codesets.library/CodesetsUTF8ToStrA
1383 \fcodesets.library/CodesetsIsLegalUTF8Sequence
1386 CodesetsIsLegalUTF8Sequence - check a char sequence
1389 res = CodesetsIsLegalUTF8Sequence(source, end);
1392 ULONG CodesetsIsLegalUTF8(UTF8 *, UTF8 *);
1395 Check if source is a valid UTF8 sequence within the
1396 source and end boundaries.
1399 source - the char sequence to check
1400 end - pointer to the end of the sequence to check
1406 codesets.library/CodesetsUTF8CreateA
1407 codesets.library/CodesetsUTF8ToStrA
1409 \fcodesets.library/CodesetsStrLenA
1412 CodesetsStrLenA - returns the length of the source string
1413 in case it will be converted to an UTF8
1417 len = CodesetsStrLenA(str, attrs)
1420 ULONG CodesetsStrLenA(STRPTR, struct TagItem *);
1422 len = CodesetsStrLen(str, tag1, ...);
1425 ULONG CodesetsStrLen(STRPTR, Tag, ...);
1428 Return the length (size) of str in case it will be converted to
1429 an UTF8 compliant string.
1432 str - the string to obtain length of
1433 attrs - a list of additional tag items. Valid items are:
1435 CSA_SourceCodeset (struct codeset *)
1436 The codeset the source string is encoded in.
1437 Default: the system's default codeset
1439 CSA_SourceLen (ULONG)
1441 Default: string length of CSA_Source
1444 len - the length of the string if it will be converted to
1448 codesets.library/CodesetsUTF8CreateA
1450 \fcodesets.library/CodesetsConvertUTF16toUTF32
1453 CodesetsConvertUTF16toUTF32 - converts from UTF16 to UTF32
1456 res = CodesetsConvertUTF16toUTF32(sourceStart,sourceEnd,targetStart,targetEnd,flags );
1459 ULONG CodesetsConvertUTF16toUTF32(const UTF16 **,const UTF16 *,UTF32 **,UTF32 *,ULONG);
1462 Converts UTF16 to UTF32.
1470 \fcodesets.library/CodesetsConvertUTF16toUTF8
1473 CodesetsConvertUTF16toUTF8 - converts from UTF16 to UTF8
1476 res = CodesetsConvertUTF16toUTF8(sourceStart,sourceEnd,targetStart,targetEnd,flags );
1479 ULONG CodesetsConvertUTF16toUTF8(const UTF16 **,const UTF16 *,UTF8 **,UTF8 *,ULONG);
1482 Converts UTF16 to UTF8.
1490 \fcodesets.library/CodesetsConvertUTF32toUTF16
1493 CodesetsConvertUTF32toUTF16 - converts from UTF32 to UTF16
1496 res = CodesetsConvertUTF32toUTF16(sourceStart,sourceEnd,targetStart,targetEnd,flags );
1499 ULONG CodesetsConvertUTF32toUTF16(const UTF32 **,const UTF32 *,UTF16 **,UTF16 *,ULONG);
1502 Converts UTF32 to UTF16.
1510 \fcodesets.library/CodesetsConvertUTF32toUTF8
1513 CodesetsConvertUTF32toUTF8 - converts from UTF32 to UTF8
1516 res = CodesetsConvertUTF32toUTF8(sourceStart,sourceEnd,targetStart,targetEnd,flags );
1519 ULONG CodesetsConvertUTF32toUTF8(const UTF32 **,const UTF32 *,UTF8 **,UTF8 *,ULONG);
1522 Converts UTF32 to UTF16.
1530 \fcodesets.library/CodesetsConvertUTF8toUTF16
1533 CodesetsConvertUTF8toUTF16 - converts from UTF8 to UTF16
1536 res = CodesetsConvertUTF8toUTF16(sourceStart,sourceEnd,targetStart,targetEnd,flags );
1539 ULONG CodesetsConvertUTF8toUTF16(const UTF8 **,const UTF8 *,UTF16 **,UTF16 *,ULONG);
1542 Converts UTF8 to UTF16.
1550 \fcodesets.library/CodesetsConvertUTF8toUTF32
1553 CodesetsConvertUTF8toUTF32 - converts from UTF8 to UTF32
1556 res = CodesetsConvertUTF8toUTF32(sourceStart,sourceEnd,targetStart,targetEnd,flags );
1559 ULONG CodesetsConvertUTF8toUTF32(const UTF8 **,const UTF8 *,UTF32 **,UTF32 *,ULONG);
1562 Converts UTF8 to UTF32.
1570 \fcodesets.library/CodesetsDecodeB64A
1573 CodesetsDecodeB64A - decodes a supplied base64 encoded string
1574 or file into plain text charwise.
1577 res = CodesetsDecodeB64A(attrs);
1580 ULONG CodesetsDecodeB64A(struct TagItem *);
1582 res = CodesetsDecodeB64(tag1, ...);
1585 ULONG CodesetsDecodeB64A(Tag, ....);
1588 Decodes a string or a complete base64 encoded file to a
1589 plain text buffer or also a destination file
1592 attrs - a list of mandatory tag items. Valid items are:
1594 CSA_B64SourceString (STRPTR)
1595 The source string to decode
1597 CSA_B64SourceLen (ULONG)
1598 The length of CSA_B64SourceString Must be supplied if
1599 CSA_B64SourceString is used.
1601 CSA_B64SourceFile (STRPTR)
1604 CSA_B64DestPtr (STRPTR *)
1605 Destination buffer pointer. Set to the allocated buffer.
1606 Must be supplied if CSA_B64DestFile is not used. To
1607 free the buffer use CodesetsFreeA().
1609 CSA_B64DestFile (STRPTR)
1610 Destination file name. Must be supplied if
1611 CSA_B64DestPtr is used.
1613 CSA_B64FLG_NtCheckErr (BOOL)
1614 Don't stop on error.
1617 res - result, one of (if 0 OK, if >0 error)
1621 CSR_B64_ERROR_INCOMPLETE
1622 CSR_B64_ERROR_ILLEGAL
1625 It fully operates charwise and doesn't take respect of the
1626 individual codeset the decoded data may be still be encoded to.
1629 codesets.library/CodesetsEncodeB64A
1631 \fcodesets.library/CodesetsEncodeB64A
1634 CodesetsEncodeB64A - encodes a string or whole file
1638 res = CodesetsEncodeB64A(attrs);
1641 ULONG CodesetsEncodeB64A(struct TagItem *);
1643 res = CodesetsEncodeB64(tag1, ...);
1646 ULONG CodesetsEncodeB64(Tag, ....);
1649 Encodes the supplied string or file to either a whole
1650 buffer or also to a file.
1653 attrs - a list of mandatory tag items. Valid items are:
1655 CSA_B64SourceString (STRPTR)
1656 The source string to encode
1658 CSA_B64SourceLen (ULONG)
1659 The length of CSA_B64SourceString. Must be supplied if
1660 CSA_B64SourceString is used.
1662 CSA_B64SourceFile (STRPTR)
1665 CSA_B64DestPtr (STRPTR *)
1666 Destination buffer pointer. Set to the allocated buffer.
1667 Must be supplied if CSA_B64DestFile is not used. To
1668 free the buffer use CodesetsFreeA().
1670 CSA_B64DestFile (STRPTR)
1671 Destination file name. Must be supplied if
1672 CSA_B64DestPtr is used.
1674 CSA_B64MaxLineLen (ULONG)
1675 Maximum length of encoded lines. 0<v<256
1679 If TRUE eol is \n (LF), otherwise \r\n (CRLF).
1683 res - result, one of (if 0 OK, if >0 error)
1687 CSR_B64_ERROR_INCOMPLETE
1688 CSR_B64_ERROR_ILLEGAL
1691 It fully operates charwise and doesn't take respect of the
1692 individual codeset the decoded data may be encoded to.
1695 codesets.library/CodesetsDecodeB64A