1 This is ../../../apps/gperf/ace_gperf.info, produced by makeinfo version
2 4.3 from ../../../apps/gperf/ace_gperf.texi.
4 INFO-DIR-SECTION GNU programming tools
6 * ACE_Gperf: (ace_gperf). Perfect Hash Function Generator.
9 This file documents the features of the GNU Perfect Hash Function
12 Copyright (C) 1989 Free Software Foundation, Inc.
14 Permission is granted to make and distribute verbatim copies of this
15 manual provided the copyright notice and this permission notice are
16 preserved on all copies.
18 Permission is granted to copy and distribute modified versions of
19 this manual under the conditions for verbatim copying, provided also
20 that the section entitled "GNU General Public License" is included
21 exactly as in the original, and provided that the entire resulting
22 derived work is distributed under the terms of a permission notice
23 identical to this one.
25 Permission is granted to copy and distribute translations of this
26 manual into another language, under the above conditions for modified
27 versions, except that the section entitled "GNU `gperf' General Public
28 License" an d this permission notice may be included in translations
29 approved by the Free Software Foundation instead of in the original
33 File: gperf.info, Node: Top, Next: Copying, Prev: (dir), Up: (dir)
41 This manual documents the GNU `gperf' perfect hash function generator
42 utility, focusing on its features and how to use them, and how to report
47 * Copying:: GNU `gperf' General Public License says
48 how you can copy and share `gperf'.
49 * Contributors:: People who have contributed to `gperf'.
50 * Motivation:: Static search structures and GNU GPERF.
51 * Search Structures:: Static search structures and GNU `gperf'
52 * Description:: High-level discussion of how GPERF functions.
53 * Options:: A description of options to the program.
54 * Bugs:: Known bugs and limitations with GPERF.
55 * Projects:: Things still left to do.
56 * Implementation:: Implementation Details for GNU GPERF.
57 * Bibliography:: Material Referenced in this Report.
59 --- The Detailed Node Listing ---
61 High-Level Description of GNU `gperf'
63 * Input Format:: Input Format to `gperf'
64 * Output Format:: Output Format for Generated C Code with `gperf'
66 Input Format to `gperf'
68 * Declarations:: `struct' Declarations and C Code Inclusion.
69 * Keywords:: Format for Keyword Entries.
70 * Functions:: Including Additional C Functions.
73 File: gperf.info, Node: Copying, Next: Contributors, Prev: Top, Up: Top
75 GNU GENERAL PUBLIC LICENSE
76 **************************
78 Version 1, February 1989
79 Copyright (C) 1989 Free Software Foundation, Inc.
80 675 Mass Ave, Cambridge, MA 02139, USA
82 Everyone is permitted to copy and distribute verbatim copies
83 of this license document, but changing it is not allowed.
88 The license agreements of most software companies try to keep users
89 at the mercy of those companies. By contrast, our General Public
90 License is intended to guarantee your freedom to share and change free
91 software--to make sure the software is free for all its users. The
92 General Public License applies to the Free Software Foundation's
93 software and to any other program whose authors commit to using it.
94 You can use it for your programs, too.
96 When we speak of free software, we are referring to freedom, not
97 price. Specifically, the General Public License is designed to make
98 sure that you have the freedom to give away or sell copies of free
99 software, that you receive source code or can get it if you want it,
100 that you can change the software or use pieces of it in new free
101 programs; and that you know you can do these things.
103 To protect your rights, we need to make restrictions that forbid
104 anyone to deny you these rights or to ask you to surrender the rights.
105 These restrictions translate to certain responsibilities for you if you
106 distribute copies of the software, or if you modify it.
108 For example, if you distribute copies of a such a program, whether
109 gratis or for a fee, you must give the recipients all the rights that
110 you have. You must make sure that they, too, receive or can get the
111 source code. And you must tell them their rights.
113 We protect your rights with two steps: (1) copyright the software,
114 and (2) offer you this license which gives you legal permission to copy,
115 distribute and/or modify the software.
117 Also, for each author's protection and ours, we want to make certain
118 that everyone understands that there is no warranty for this free
119 software. If the software is modified by someone else and passed on, we
120 want its recipients to know that what they have is not the original, so
121 that any problems introduced by others will not reflect on the original
122 authors' reputations.
124 The precise terms and conditions for copying, distribution and
128 1. This License Agreement applies to any program or other work which
129 contains a notice placed by the copyright holder saying it may be
130 distributed under the terms of this General Public License. The
131 "Program", below, refers to any such program or work, and a "work
132 based on the Program" means either the Program or any work
133 containing the Program or a portion of it, either verbatim or with
134 modifications. Each licensee is addressed as "you".
136 2. You may copy and distribute verbatim copies of the Program's source
137 code as you receive it, in any medium, provided that you
138 conspicuously and appropriately publish on each copy an
139 appropriate copyright notice and disclaimer of warranty; keep
140 intact all the notices that refer to this General Public License
141 and to the absence of any warranty; and give any other recipients
142 of the Program a copy of this General Public License along with
143 the Program. You may charge a fee for the physical act of
146 3. You may modify your copy or copies of the Program or any portion of
147 it, and copy and distribute such modifications under the terms of
148 Paragraph 1 above, provided that you also do the following:
150 * cause the modified files to carry prominent notices stating
151 that you changed the files and the date of any change; and
153 * cause the whole of any work that you distribute or publish,
154 that in whole or in part contains the Program or any part
155 thereof, either with or without modifications, to be licensed
156 at no charge to all third parties under the terms of this
157 General Public License (except that you may choose to grant
158 warranty protection to some or all third parties, at your
161 * If the modified program normally reads commands interactively
162 when run, you must cause it, when started running for such
163 interactive use in the simplest and most usual way, to print
164 or display an announcement including an appropriate copyright
165 notice and a notice that there is no warranty (or else,
166 saying that you provide a warranty) and that users may
167 redistribute the program under these conditions, and telling
168 the user how to view a copy of this General Public License.
170 * You may charge a fee for the physical act of transferring a
171 copy, and you may at your option offer warranty protection in
174 Mere aggregation of another independent work with the Program (or
175 its derivative) on a volume of a storage or distribution medium
176 does not bring the other work under the scope of these terms.
178 4. You may copy and distribute the Program (or a portion or
179 derivative of it, under Paragraph 2) in object code or executable
180 form under the terms of Paragraphs 1 and 2 above provided that you
181 also do one of the following:
183 * accompany it with the complete corresponding machine-readable
184 source code, which must be distributed under the terms of
185 Paragraphs 1 and 2 above; or,
187 * accompany it with a written offer, valid for at least three
188 years, to give any third party free (except for a nominal
189 charge for the cost of distribution) a complete
190 machine-readable copy of the corresponding source code, to be
191 distributed under the terms of Paragraphs 1 and 2 above; or,
193 * accompany it with the information you received as to where the
194 corresponding source code may be obtained. (This alternative
195 is allowed only for noncommercial distribution and only if you
196 received the program in object code or executable form alone.)
198 Source code for a work means the preferred form of the work for
199 making modifications to it. For an executable file, complete
200 source code means all the source code for all modules it contains;
201 but, as a special exception, it need not include source code for
202 modules which are standard libraries that accompany the operating
203 system on which the executable file runs, or for standard header
204 files or definitions files that accompany that operating system.
206 5. You may not copy, modify, sublicense, distribute or transfer the
207 Program except as expressly provided under this General Public
208 License. Any attempt otherwise to copy, modify, sublicense,
209 distribute or transfer the Program is void, and will automatically
210 terminate your rights to use the Program under this License.
211 However, parties who have received copies, or rights to use
212 copies, from you under this General Public License will not have
213 their licenses terminated so long as such parties remain in full
216 6. By copying, distributing or modifying the Program (or any work
217 based on the Program) you indicate your acceptance of this license
218 to do so, and all its terms and conditions.
220 7. Each time you redistribute the Program (or any work based on the
221 Program), the recipient automatically receives a license from the
222 original licensor to copy, distribute or modify the Program
223 subject to these terms and conditions. You may not impose any
224 further restrictions on the recipients' exercise of the rights
227 8. The Free Software Foundation may publish revised and/or new
228 versions of the General Public License from time to time. Such
229 new versions will be similar in spirit to the present version, but
230 may differ in detail to address new problems or concerns.
232 Each version is given a distinguishing version number. If the
233 Program specifies a version number of the license which applies to
234 it and "any later version", you have the option of following the
235 terms and conditions either of that version or of any later
236 version published by the Free Software Foundation. If the Program
237 does not specify a version number of the license, you may choose
238 any version ever published by the Free Software Foundation.
240 9. If you wish to incorporate parts of the Program into other free
241 programs whose distribution conditions are different, write to the
242 author to ask for permission. For software which is copyrighted
243 by the Free Software Foundation, write to the Free Software
244 Foundation; we sometimes make exceptions for this. Our decision
245 will be guided by the two goals of preserving the free status of
246 all derivatives of our free software and of promoting the sharing
247 and reuse of software generally.
251 10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
252 WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
253 LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
254 HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT
255 WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT
256 NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
257 FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE
258 QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
259 PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY
260 SERVICING, REPAIR OR CORRECTION.
262 11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
263 WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
264 MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
265 LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
266 INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
267 INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
268 DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU
269 OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY
270 OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
271 ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
273 END OF TERMS AND CONDITIONS
275 Appendix: How to Apply These Terms to Your New Programs
276 =======================================================
278 If you develop a new program, and you want it to be of the greatest
279 possible use to humanity, the best way to achieve this is to make it
280 free software which everyone can redistribute and change under these
283 To do so, attach the following notices to the program. It is safest
284 to attach them to the start of each source file to most effectively
285 convey the exclusion of warranty; and each file should have at least the
286 "copyright" line and a pointer to where the full notice is found.
288 ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
289 Copyright (C) 19YY NAME OF AUTHOR
291 This program is free software; you can redistribute it and/or modify
292 it under the terms of the GNU General Public License as published by
293 the Free Software Foundation; either version 1, or (at your option)
296 This program is distributed in the hope that it will be useful,
297 but WITHOUT ANY WARRANTY; without even the implied warranty of
298 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
299 GNU General Public License for more details.
301 You should have received a copy of the GNU General Public License
302 along with this program; if not, write to the Free Software
303 Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
305 Also add information on how to contact you by electronic and paper
308 If the program is interactive, make it output a short notice like
309 this when it starts in an interactive mode:
311 Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
312 Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
313 This is free software, and you are welcome to redistribute it
314 under certain conditions; type `show c' for details.
316 The hypothetical commands `show w' and `show c' should show the
317 appropriate parts of the General Public License. Of course, the
318 commands you use may be called something other than `show w' and `show
319 c'; they could even be mouse-clicks or menu items--whatever suits your
322 You should also get your employer (if you work as a programmer) or
323 your school, if any, to sign a "copyright disclaimer" for the program,
324 if necessary. Here a sample; alter the names:
326 Yoyodyne, Inc., hereby disclaims all copyright interest in the
327 program `Gnomovision' (a program to direct compilers to make passes
328 at assemblers) written by James Hacker.
330 SIGNATURE OF TY COON, 1 April 1989
331 Ty Coon, President of Vice
333 That's all there is to it!
336 File: gperf.info, Node: Contributors, Next: Motivation, Prev: Copying, Up: Top
338 Contributors to GNU `gperf' Utility
339 ***********************************
341 * The GNU `gperf' perfect hash function generator utility was
342 originally written in GNU C++ by Douglas C. Schmidt. It is now
343 also available in a highly-portable "old-style" C version. The
344 general idea for the perfect hash function generator was inspired
345 by Keith Bostic's algorithm written in C, and distributed to
346 net.sources around 1984. The current program is a heavily
347 modified, enhanced, and extended implementation of Keith's basic
348 idea, created at the University of California, Irvine. Bugs,
349 patches, and suggestions should be reported to schmidt at
352 * Special thanks is extended to Michael Tiemann and Doug Lea, for
353 providing a useful compiler, and for giving me a forum to exhibit
356 In addition, Adam de Boor and Nels Olson provided many tips and
357 insights that greatly helped improve the quality and functionality
361 File: ace_gperf.info, Node: Motivation, Next: Search Structures, Prev: Contributors, Up: Top
366 `ace_gperf' is a perfect hash function generator written in C++. It
367 transforms an _n_ element user-specified keyword set _W_ into a perfect
368 hash function _F_. _F_ uniquely maps keywords in _W_ onto the range
369 0.._k_, where _k_ >= _n_. If _k = n_ then _F_ is a _minimal_ perfect
370 hash function. `gperf' generates a 0.._k_ element static lookup table
371 and a pair of C functions. These functions determine whether a given
372 character string _s_ occurs in _W_, using at most one probe into the
375 `ace_gperf' currently generates the reserved keyword recognizer for
376 lexical analyzers in several production and research compilers and
377 language processing tools, including GNU C, GNU C++, GNU Pascal, GNU
378 Modula 3, and GNU indent. Complete C++ source code for `gperf' is
379 available via anonymous ftp from ics.uci.edu. `gperf' also is
380 distributed along with the GNU libg++ library. A highly portable,
381 functionally equivalent K&R C version of `gperf' is archived in
382 comp.sources.unix, volume 20. Finally, a paper describing `gperf''s
383 design and implementation in greater detail is available in the Second
384 USENIX C++ Conference proceedings.
387 File: gperf.info, Node: Search Structures, Next: Description, Prev: Motivation, Up: Top
389 Static search structures and GNU `gperf'
390 ****************************************
392 A "static search structure" is an Abstract Data Type with certain
393 fundamental operations, _e.g._, _initialize_, _insert_, and _retrieve_.
394 Conceptually, all insertions occur before any retrievals. In
395 practice, `gperf' generates a `static' array containing search set
396 keywords and any associated attributes specified by the user. Thus,
397 there is essentially no execution-time cost for the insertions. It is
398 a useful data structure for representing _static search sets_. Static
399 search sets occur frequently in software system applications. Typical
400 static search sets include compiler reserved words, assembler
401 instruction opcodes, and built-in shell interpreter commands. Search
402 set members, called "keywords", are inserted into the structure only
403 once, usually during program initialization, and are not generally
404 modified at run-time.
406 Numerous static search structure implementations exist, _e.g._,
407 arrays, linked lists, binary search trees, digital search tries, and
408 hash tables. Different approaches offer trade-offs between space
409 utilization and search time efficiency. For example, an _n_ element
410 sorted array is space efficient, though the average-case time
411 complexity for retrieval operations using binary search is proportional
412 to log _n_. Conversely, hash table implementations often locate a
413 table entry in constant time, but typically impose additional memory
414 overhead and exhibit poor worst case performance.
416 _Minimal perfect hash functions_ provide an optimal solution for a
417 particular class of static search sets. A minimal perfect hash
418 function is defined by two properties:
420 * It allows keyword recognition in a static search set using at most
421 _one_ probe into the hash table. This represents the "perfect"
424 * The actual memory allocated to store the keywords is precisely
425 large enough for the keyword set, and _no larger_. This is the
428 For most applications it is far easier to generate _perfect_ hash
429 functions than _minimal perfect_ hash functions. Moreover, non-minimal
430 perfect hash functions frequently execute faster than minimal ones in
431 practice. This phenomena occurs since searching a sparse keyword table
432 increases the probability of locating a "null" entry, thereby reducing
433 string comparisons. `gperf''s default behavior generates
434 _near-minimal_ perfect hash functions for keyword sets. However,
435 `gperf' provides many options that permit user control over the degree
436 of minimality and perfection.
438 Static search sets often exhibit relative stability over time. For
439 example, Ada's 63 reserved words have remained constant for nearly a
440 decade. It is therefore frequently worthwhile to expend concerted
441 effort building an optimal search structure _once_, if it subsequently
442 receives heavy use multiple times. `gperf' removes the drudgery
443 associated with constructing time- and space-efficient search
444 structures by hand. It has proven a useful and practical tool for
445 serious programming projects. Output from `gperf' is currently used in
446 several production and research compilers, including GNU C, GNU C++,
447 GNU Pascal, and GNU Modula 3. The latter two compilers are not yet
448 part of the official GNU distribution. Each compiler utilizes `gperf'
449 to automatically generate static search structures that efficiently
450 identify their respective reserved keywords.
453 File: gperf.info, Node: Description, Next: Options, Prev: Search Structures, Up: Top
455 High-Level Description of GNU `gperf'
456 *************************************
460 * Input Format:: Input Format to `gperf'
461 * Output Format:: Output Format for Generated C Code with `gperf'
463 The perfect hash function generator `gperf' reads a set of
464 "keywords" from a "keyfile" (or from the standard input by default).
465 It attempts to derive a perfect hashing function that recognizes a
466 member of the "static keyword set" with at most a single probe into the
467 lookup table. If `gperf' succeeds in generating such a function it
468 produces a pair of C source code routines that perform hashing and
469 table lookup recognition. All generated C code is directed to the
470 standard output. Command-line options described below allow you to
471 modify the input and output format to `gperf'.
473 By default, `gperf' attempts to produce time-efficient code, with
474 less emphasis on efficient space utilization. However, several options
475 exist that permit trading-off execution time for storage space and vice
476 versa. In particular, expanding the generated table size produces a
477 sparse search structure, generally yielding faster searches.
478 Conversely, you can direct `gperf' to utilize a C `switch' statement
479 scheme that minimizes data space storage size. Furthermore, using a C
480 `switch' may actually speed up the keyword retrieval time somewhat.
481 Actual results depend on your C compiler, of course.
483 In general, `gperf' assigns values to the characters it is using for
484 hashing until some set of values gives each keyword a unique value. A
485 helpful heuristic is that the larger the hash value range, the easier
486 it is for `gperf' to find and generate a perfect hash function.
487 Experimentation is the key to getting the most from `gperf'.
490 File: gperf.info, Node: Input Format, Next: Output Format, Prev: Description, Up: Description
492 Input Format to `gperf'
493 =======================
495 You can control the input keyfile format by varying certain
496 command-line arguments, in particular the `-t' option. The input's
497 appearance is similar to GNU utilities `flex' and `bison' (or UNIX
498 utilities `lex' and `yacc'). Here's an outline of the general format:
506 _Unlike_ `flex' or `bison', all sections of `gperf''s input are
507 optional. The following sections describe the input format for each
512 * Declarations:: `struct' Declarations and C Code Inclusion.
513 * Keywords:: Format for Keyword Entries.
514 * Functions:: Including Additional C Functions.
517 File: gperf.info, Node: Declarations, Next: Keywords, Prev: Input Format, Up: Input Format
519 `struct' Declarations and C Code Inclusion
520 ------------------------------------------
522 The keyword input file optionally contains a section for including
523 arbitrary C declarations and definitions, as well as provisions for
524 providing a user-supplied `struct'. If the `-t' option _is_ enabled,
525 you _must_ provide a C `struct' as the last component in the
526 declaration section from the keyfile file. The first field in this
527 struct must be a `char *' identifier called "name," although it is
528 possible to modify this field's name with the `-K' option described
531 Here is simple example, using months of the year and their
534 struct months { char *name; int number; int days; int leap_days; };
549 Separating the `struct' declaration from the list of key words and
550 other fields are a pair of consecutive percent signs, `%%', appearing
551 left justified in the first column, as in the UNIX utility `lex'.
553 Using a syntax similar to GNU utilities `flex' and `bison', it is
554 possible to directly include C source text and comments verbatim into
555 the generated output file. This is accomplished by enclosing the region
556 inside left-justified surrounding `%{', `%}' pairs. Here is an input
557 fragment based on the previous example that illustrates this feature:
561 /* This section of code is inserted directly into the output. */
562 int return_month_days (struct months *months, int is_leap_year);
564 struct months { char *name; int number; int days; int leap_days; };
571 It is possible to omit the declaration section entirely. In this
572 case the keyfile begins directly with the first keyword line, _e.g._:
581 File: gperf.info, Node: Keywords, Next: Functions, Prev: Declarations, Up: Input Format
583 Format for Keyword Entries
584 --------------------------
586 The second keyfile format section contains lines of keywords and any
587 associated attributes you might supply. A line beginning with `#' in
588 the first column is considered a comment. Everything following the `#'
589 is ignored, up to and including the following newline.
591 The first field of each non-comment line is always the key itself.
592 It should be given as a simple name, _i.e._, without surrounding string
593 quotation marks, and be left-justified flush against the first column.
594 In this context, a "field" is considered to extend up to, but not
595 include, the first blank, comma, or newline. Here is a simple example
596 taken from a partial list of C reserved words:
598 # These are a few C reserved words, see the c.`gperf' file
599 # for a complete list of ANSI C reserved words.
610 Note that unlike `flex' or `bison' the first `%%' marker may be
611 elided if the declaration section is empty.
613 Additional fields may optionally follow the leading keyword. Fields
614 should be separated by commas, and terminate at the end of line. What
615 these fields mean is entirely up to you; they are used to initialize the
616 elements of the user-defined `struct' provided by you in the
617 declaration section. If the `-t' option is _not_ enabled these fields
618 are simply ignored. All previous examples except the last one contain
622 File: gperf.info, Node: Functions, Prev: Keywords, Up: Input Format
624 Including Additional C Functions
625 --------------------------------
627 The optional third section also corresponds closely with conventions
628 found in `flex' and `bison'. All text in this section, starting at the
629 final `%%' and extending to the end of the input file, is included
630 verbatim into the generated output file. Naturally, it is your
631 responsibility to ensure that the code contained in this section is
635 File: gperf.info, Node: Output Format, Prev: Input Format, Up: Description
637 Output Format for Generated C Code with `gperf'
638 ===============================================
640 Several options control how the generated C code appears on the
641 standard output. Two C function are generated. They are called `hash'
642 and `in_word_set', although you may modify the name for `in_word_set'
643 with a command-line option. Both functions require two arguments, a
644 string, `char *' STR, and a length parameter, `int' LEN. Their default
645 function prototypes are as follows:
647 static int hash (char *str, int len);
648 int in_word_set (char *str, int len);
650 By default, the generated `hash' function returns an integer value
651 created by adding LEN to several user-specified STR key positions
652 indexed into an "associated values" table stored in a local static
653 array. The associated values table is constructed internally by
654 `gperf' and later output as a static local C array called HASH_TABLE;
655 its meaning and properties are described below. *Note
656 Implementation::. The relevant key positions are specified via the `-k'
657 option when running `gperf', as detailed in the _Options_ section
658 below. *Note Options::.
660 Two options, `-g' (assume you are compiling with GNU C and its
661 `inline' feature) and `-a' (assume ANSI C-style function prototypes),
662 alter the content of both the generated `hash' and `in_word_set'
663 routines. However, function `in_word_set' may be modified more
664 extensively, in response to your option settings. The options that
665 affect the `in_word_set' structure are:
668 Have function `in_word_set' return a pointer rather than a
672 Make use of the user-defined `struct'.
674 `-S TOTAL SWITCH STATEMENTS'
675 Generate 1 or more C `switch' statement rather than use a
676 large, (and potentially sparse) static array. Although the
677 exact time and space savings of this approach vary according
678 to your C compiler's degree of optimization, this method
679 often results in smaller and faster code.
681 If the `-t', `-S', and `-p' options are omitted the default action
682 is to generate a `char *' array containing the keys, together with
683 additional null strings used for padding the array. By experimenting
684 with the various input and output options, and timing the resulting C
685 code, you can determine the best option choices for different keyword
689 File: gperf.info, Node: Options, Next: Bugs, Prev: Description, Up: Top
691 Options to the `gperf' Utility
692 ******************************
694 There are _many_ options to `gperf'. They were added to make the
695 program more convenient for use with real applications. "On-line" help
696 is readily available via the `-h' option. Other options include:
699 Generate ANSI Standard C code using function prototypes. The
700 default is to use "classic" K&R C function declaration syntax.
703 Generates C code that uses the `strncmp' function to perform
704 string comparisons. The default action is to use `strcmp'.
707 Makes the contents of all generated lookup tables constant,
708 _i.e._, "readonly." Many compilers can generate more
709 efficient code for this by putting the tables in readonly
713 Enables the debugging option. This produces verbose
714 diagnostics to "standard error" when `gperf' is executing.
715 It is useful both for maintaining the program and for
716 determining whether a given set of options is actually
717 speeding up the search for a solution. Some useful
718 information is dumped at the end of the program when the `-d'
722 Handle keywords whose key position sets hash to duplicate
723 values. Duplicate hash values occur for two reasons:
725 * Since `gperf' does not backtrack it is possible for it
726 to process all your input keywords without finding a
727 unique mapping for each word. However, frequently only
728 a very small number of duplicates occur, and the
729 majority of keys still require one probe into the table.
731 * Sometimes a set of keys may have the same names, but
732 possess different attributes. With the -D option
733 `gperf' treats all these keys as part of an equivalence
734 class and generates a perfect hash function with multiple
735 comparisons for duplicate keys. It is up to you to
736 completely disambiguate the keywords by modifying the
737 generated C code. However, `gperf' helps you out by
738 organizing the output.
740 Option `-D' is extremely useful for certain large or highly
741 redundant keyword sets, _i.e._, assembler instruction opcodes.
742 Using this option usually means that the generated hash
743 function is no longer perfect. On the other hand, it permits
744 `gperf' to work on keyword sets that it otherwise could not
747 `-e KEYWORD DELIMITER LIST'
748 Allows the user to provide a string containing delimiters
749 used to separate keywords from their attributes. The default
750 is ",\n". This option is essential if you want to use
751 keywords that have embedded commas or newlines. One useful
752 trick is to use -e'TAB', where TAB is the literal tab
756 Define constant values using an enum local to the lookup
757 function rather than with #defines. This also means that
758 different lookup functions can reside in the same file.
759 Thanks to James Clark (jjc at ai.mit.edu).
761 `-f ITERATION AMOUNT'
762 Generate the perfect hash function "fast." This decreases
763 `gperf''s running time at the cost of minimizing generated
764 table-size. The iteration amount represents the number of
765 times to iterate when resolving a collision. `0' means
766 `iterate by the number of keywords. This option is probably
767 most useful when used in conjunction with options `-D' and/or
768 `-S' for _large_ keyword sets.
771 Assume a GNU compiler, _e.g._, `g++' or `gcc'. This makes
772 all generated routines use the "inline" keyword to remove the
773 cost of function calls. Note that `-g' does _not_ imply
774 `-a', since other non-ANSI C compilers may have provisions
775 for a function `inline' feature.
778 Generate the static table of keywords as a static global
779 variable, rather than hiding it inside of the lookup function
780 (which is the default behavior).
783 Prints a short summary on the meaning of each program option.
784 Aborts further program execution.
786 `-H HASH FUNCTION NAME'
787 Allows you to specify the name for the generated hash
788 function. Default name is `hash.' This option permits the
789 use of two hash tables in the same file.
792 Provides an initial VALUE for the associate values array.
793 Default is 0. Increasing the initial value helps inflate the
794 final table size, possibly leading to more time efficient
795 keyword lookups. Note that this option is not particularly
796 useful when `-S' is used. Also, `-i' is overriden when the
800 Affects the "jump value," _i.e._, how far to advance the
801 associated character value upon collisions. JUMP VALUE is
802 rounded up to an odd number, the default is 5. If the JUMP
803 VALUE is 0 `gper f' jumps by random amounts.
806 Allows selection of the character key positions used in the
807 keywords' hash function. The allowable choices range between
808 1-126, inclusive. The positions are separated by commas,
809 _e.g._, `-k 9,4,13,14'; ranges may be used, _e.g._, `-k 2-7';
810 and positions may occur in any order. Furthermore, the
811 meta-character '*' causes the generated hash function to
812 consider *all* character positions in each key, whereas '$'
813 instructs the hash function to use the "final character" of a
814 key (this is the only way to use a character position greater
815 than 126, incidentally).
817 For instance, the option `-k 1,2,4,6-10,'$'' generates a hash
818 function that considers positions 1,2,4,6,7,8,9,10, plus the
819 last character in each key (which may differ for each key,
820 obviously). Keys with length less than the indicated key
821 positions work properly, since selected key positions
822 exceeding the key length are simply not referenced in the
826 By default, the program assumes the structure component
827 identifier for the keyword is "name." This option allows an
828 arbitrary choice of identifier for this component, although
829 it still must occur as the first field in your supplied
833 Compare key lengths before trying a string comparison. This
834 might cut down on the number of string comparisons made
835 during the lookup, since keys with different lengths are
836 never compared via `strcmp'. However, using `-l' might
837 greatly increase the size of the generated C code if the
838 lookup table range is large (which implies that the switch
839 option `-S' is not enabled), since the length table contains
840 as many elements as there are entries in the lookup table.
842 `-L GENERATED LANGUAGE NAME'
843 Instructs `gperf' to generate code in the language specified
844 by the option's argument. Languages handled are currently
845 C++ and C. The default is C.
848 Instructs the generator not to include the length of a
849 keyword when computing its hash value. This may save a few
850 assembly instructions in the generated lookup table.
852 `-N LOOKUP FUNCTION NAME'
853 Allows you to specify the name for the generated lookup
854 function. Default name is `in_word_set.' This option
855 permits completely automatic generation of perfect hash
856 functions, especially when multiple generated hash functions
857 are used in the same application.
860 Reorders the keywords by sorting the keywords so that
861 frequently occuring key position set components appear first.
862 A second reordering pass follows so that keys with "already
863 determined values" are placed towards the front of the
864 keylist. This may decrease the time required to generate a
865 perfect hash function for many keyword sets, and also produce
866 more minimal perfect hash functions. The reason for this is
867 that the reordering helps prune the search time by handling
868 inevitable collisions early in the search process. On the
869 other hand, if the number of keywords is _very_ large using
870 `-o' may _increase_ `gperf''s execution time, since
871 collisions will begin earlier and continue throughout the
872 remainder of keyword processing. See Cichelli's paper from
873 the January 1980 Communications of the ACM for details.
876 Changes the return value of the generated function
877 `in_word_set' from boolean (_i.e._, 0 or 1), to either type
878 "pointer to user-defined struct," (if the `-t' option is
879 enabled), or simply to `char *', if `-t' is not enabled.
880 This option is most useful when the `-t' option (allowing
881 user-defined structs) is used. For example, it is possible
882 to automatically generate the GNU C reserved word lookup
883 routine with the options `-p' and `-t'.
886 Utilizes randomness to initialize the associated values
887 table. This frequently generates solutions faster than using
888 deterministic initialization (which starts all associated
889 values at 0). Furthermore, using the randomization option
890 generally increases the size of the table. If `gperf' has
891 difficultly with a certain keyword set try using `-r' or `-D'.
894 Affects the size of the generated hash table. The numeric
895 argument for this option indicates "how many times larger or
896 smaller" the maximum associated value range should be, in
897 relationship to the number of keys. If the SIZE-MULTIPLE is
898 negative the maximum associated value is calculated by
899 _dividing_ it into the total number of keys. For example, a
900 value of 3 means "allow the maximum associated value to be
901 about 3 times larger than the number of input keys."
903 Conversely, a value of -3 means "allow the maximum associated
904 value to be about 3 times smaller than the number of input
905 keys." Negative values are useful for limiting the overall
906 size of the generated hash table, though this usually
907 increases the number of duplicate hash values.
909 If `generate switch' option `-S' is _not_ enabled, the maximum
910 associated value influences the static array table size, and
911 a larger table should decrease the time required for an
912 unsuccessful search, at the expense of extra table space.
914 The default value is 1, thus the default maximum associated
915 value about the same size as the number of keys (for
916 efficiency, the maximum associated value is always rounded up
917 to a power of 2). The actual table size may vary somewhat,
918 since this technique is essentially a heuristic. In
919 particular, setting this value too high slows down `gperf''s
920 runtime, since it must search through a much larger range of
921 values. Judicious use of the `-f' option helps alleviate this
924 `-S TOTAL SWITCH STATEMENTS'
925 Causes the generated C code to use a `switch' statement
926 scheme, rather than an array lookup table. This can lead to
927 a reduction in both time and space requirements for some
928 keyfiles. The argument to this option determines how many
929 `switch' statements are generated. A value of 1 generates 1
930 `switch' containing all the elements, a value of 2 generates
931 2 tables with 1/2 the elements in each `switch', etc. This
932 is useful since many C compilers cannot correctly generate
933 code for large `switch' statements. This option was inspired
934 in part by Keith Bostic's original C program.
937 Allows you to include a `struct' type declaration for
938 generated code. Any text before a pair of consecutive %% is
939 consider part of the type declaration. Key words and
940 additional fields may follow this, one group of fields per
941 line. A set of examples for generating perfect hash tables
942 and functions for Ada, C, and G++, Pascal, and Modula 2 and 3
943 reserved words are distributed with this release.
946 Prevents the transfer of the type declaration to the output
947 file. Use this option if the type is already defined
951 Prints out the current version number.
954 Allow user to specify name of generated C++ class. Default
955 name is `Perfect_Hash'.
958 File: gperf.info, Node: Bugs, Next: Projects, Prev: Options, Up: Top
960 Known Bugs and Limitations with `gperf'
961 ***************************************
963 The following are some limitations with the current release of
966 * The `gperf' utility is tuned to execute quickly, and works quickly
967 for small to medium size data sets (around 1000 keywords). It is
968 extremely useful for maintaining perfect hash functions for
969 compiler keyword sets. Several recent enhancements now enable
970 `gperf' to work efficiently on much larger keyword sets (over
971 15,000 keywords). When processing large keyword sets it helps
972 greatly to have over 8 megs of RAM.
974 However, since `gperf' does not backtrack no guaranteed solution
975 occurs on every run. On the other hand, it is usually easy to
976 obtain a solution by varying the option parameters. In
977 particular, try the `-r' option, and also try changing the default
978 arguments to the `-s' and `-j' options. To _guarantee_ a
979 solution, use the `-D' and `-S' options, although the final
980 results are not likely to be a _perfect_ hash function anymore!
981 Finally, use the `-f' option if you want `gperf' to generate the
982 perfect hash function _fast_, with less emphasis on making it
985 * The size of the generate static keyword array can get _extremely_
986 large if the input keyword file is large or if the keywords are
987 quite similar. This tends to slow down the compilation of the
988 generated C code, and _greatly_ inflates the object code size. If
989 this situation occurs, consider using the `-S' option to reduce
990 data size, potentially increasing keyword recognition time a
991 negligible amount. Since many C compilers cannot correctly
992 generated code for large switch statements it is important to
993 qualify the -S option with an appropriate numerical argument that
994 controls the number of switch statements generated.
996 * The maximum number of key positions selected for a given key has an
997 arbitrary limit of 126. This restriction should be removed, and if
998 anyone considers this a problem write me and let me know so I can
999 remove the constraint.
1001 * The C++ source code only compiles correctly with GNU G++, version
1002 1.36 (and hopefully later versions). Porting to AT&T cfront would
1003 be tedious, but possible (and desirable). There is also a K&R C
1004 version available now. This should compile without change on most
1005 BSD systems, but may require a bit of work to run on SYSV, since
1006 `gperf' uses ALLOCA in several places. Send mail to schmidt at
1007 ics.uci.edu for information.
1010 File: gperf.info, Node: Projects, Next: Implementation, Prev: Bugs, Up: Top
1012 Things Still Left to Do
1013 ***********************
1015 It should be "relatively" easy to replace the current perfect hash
1016 function algorithm with a more exhaustive approach; the perfect hash
1017 module is essential independent from other program modules. Additional
1018 worthwhile improvements include:
1020 * Make the algorithm more robust. At present, the program halts
1021 with an error diagnostic if it can't find a direct solution and
1022 the `-D' option is not enabled. A more comprehensive, albeit
1023 computationally expensive, approach would employ backtracking or
1024 enable alternative options and retry. It's not clear how helpful
1025 this would be, in general, since most search sets are rather small
1028 * Another useful extension involves modifying the program to generate
1029 "minimal" perfect hash functions (under certain circumstances, the
1030 current version can be rather extravagant in the generated table
1031 size). Again, this is mostly of theoretical interest, since a
1032 sparse table often produces faster lookups, and use of the `-S'
1033 `switch' option can minimize the data size, at the expense of
1034 slightly longer lookups (note that the gcc compiler generally
1035 produces good code for `switch' statements, reducing the need for
1036 more complex schemes).
1038 * In addition to improving the algorithm, it would also be useful to
1039 generate a C++ class or Ada package as the code output, in
1040 addition to the current C routines.
1043 File: gperf.info, Node: Implementation, Next: Bibliography, Prev: Projects, Up: Top
1045 Implementation Details of GNU `gperf'
1046 *************************************
1048 A paper describing the high-level description of the data structures
1049 and algorithms used to implement `gperf' will soon be available. This
1050 paper is useful not only from a maintenance and enhancement perspective,
1051 but also because they demonstrate several clever and useful programming
1052 techniques, _e.g._, `Iteration Number' boolean arrays, double hashing,
1053 a "safe" and efficient method for reading arbitrarily long input from a
1054 file, and a provably optimal algorithm for simultaneously determining
1055 both the minimum and maximum elements in a list.
1058 File: gperf.info, Node: Bibliography, Prev: Implementation, Up: Top
1063 [1] Chang, C.C.: A Scheme for Constructing Ordered Minimal Perfect
1064 Hashing Functions Information Sciences 39(1986), 187-195.
1066 [2] Cichelli, Richard J. Author's Response to "On Cichelli's Minimal
1067 Perfec t Hash Functions Method" Communications of the ACM, 23,
1068 12(December 1980), 729.
1070 [3] Cichelli, Richard J. Minimal Perfect Hash Functions Made Simple
1071 Communications of the ACM, 23, 1(January 1980), 17-19.
1073 [4] Cook, C. R. and Oldehoeft, R.R. A Letter Oriented Minimal
1074 Perfect Hashing Function SIGPLAN Notices, 17, 9(September 1982), 18-27.
1076 [5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.
1077 Practical Perfect Hashing Computer Journal, 28, 1(January 1985), 54-58.
1079 [6] Jaeschke, G. Reciprocal Hashing: A Method for Generating Minimal
1080 Perfect Hashing Functions Communications of the ACM, 24, 12(December
1083 [7] Jaeschke, G. and Osterburg, G. On Cichelli's Minimal Perfect
1084 Hash Functions Method Communications of the ACM, 23, 12(December 1980),
1087 [8] Sager, Thomas J. A Polynomial Time Generator for Minimal Perfect
1088 Hash Functions Communications of the ACM, 28, 5(December 1985), 523-532
1090 [9] Schmidt, Douglas C. GPERF: A Perfect Hash Function Generator
1091 Second USENIX C++ Conference Proceedings, April 1990.
1093 [10] Sebesta, R.W. and Taylor, M.A. Minimal Perfect Hash Functions
1094 for Reserved Word Lists SIGPLAN Notices, 20, 12(September 1985), 47-53.
1096 [11] Sprugnoli, R. Perfect Hashing Functions: A Single Probe
1097 Retrieving Method for Static Sets Communications of the ACM, 20
1098 11(November 1977), 841-850.
1100 [12] Stallman, Richard M. Using and Porting GNU CC Free Software
1103 [13] Stroustrup, Bjarne The C++ Programming Language.
1104 Addison-Wesley, 1986.
1106 [14] Tiemann, Michael D. User's Guide to GNU C++ Free Software
1113 Node: Copying
\x7f2566
1114 Node: Contributors
\x7f15867
1115 Node: Motivation
\x7f16967
1116 Node: Search Structures
\x7f18234
1117 Node: Description
\x7f21787
1118 Node: Input Format
\x7f23607
1119 Node: Declarations
\x7f24402
1120 Node: Keywords
\x7f26709
1121 Node: Functions
\x7f28300
1122 Node: Output Format
\x7f28794
1123 Node: Options
\x7f31264
1125 Node: Projects
\x7f47321
1126 Node: Implementation
\x7f48898
1127 Node: Bibliography
\x7f49617