1 # Natural Order String Comparison
3 Martin Pool <http://sourcefrog.net>
5 Computer string sorting algorithms generally don't order strings
6 containing numbers in the same way that a human would do. Consider:
12 It would be more friendly if the program listed the files as
18 Filenames sort properly if people insert leading zeros, but they
21 I've written a subroutine that compares strings according to this
22 natural ordering. You can use this routine in your own software, or
23 download a patch to add it to your favourite Unix program.
28 Strings are sorted as usual, except that decimal integer substrings
29 are compared on their numeric value. For example,
32 a < a0 < a1 < a1a < a1b < a2 < a10 < a20
35 Strings can contain several number parts:
37 x2-g8 < x2-y7 < x2-y08 < x8-y8
39 in which case numeric fields are separated by nonnumeric characters.
40 Leading spaces are ignored. This works very well for IP addresses
41 from log files, for example.
44 Leading zeros are *not* ignored, which tends to give more
45 reasonable results on decimal fractions.
47 1.001 < 1.002 < 1.010 < 1.02 < 1.1 < 1.3
49 Some applications may wish to change this by modifying the test that calls `isspace`.
51 Performance is linear: each character of the string is scanned
52 at most once, and only as many characters as necessary to decide
58 This software is copyright by Martin Pool, and made available under
59 the same licence as zlib:
62 > This software is provided 'as-is', without any express or implied
63 > warranty. In no event will the authors be held liable for any damages
64 > arising from the use of this software.
66 > Permission is granted to anyone to use this software for any purpose,
67 > including commercial applications, and to alter it and redistribute it
68 > freely, subject to the following restrictions:
70 > 1. The origin of this software must not be misrepresented; you must not
71 > claim that you wrote the original software. If you use this software
72 > in a product, an acknowledgment in the product documentation would be
73 > appreciated but is not required.
75 > 2. Altered source versions must be plainly marked as such, and must not be
76 > misrepresented as being the original software.
78 > 3. This notice may not be removed or altered from any source distribution.
81 This licence applies only to the C implementation. You are free to
82 reimplement the idea fom scratch in any language.
86 `strnatcmp.c`, `strnatcmp.h` - the algorithm itself
88 `natsort.c` - example driver program.
90 `natcompare.js` - Kristof Coomans wrote a natural sort comparison in Javascript.
92 `natcmp.rb` -- An implementation by Alan Davies in Ruby.
98 POSIX sort(1) has the -n option to sort numbers, but this doesn't
99 work if there is a non-numeric prefix.
102 GNU ls(1) has the `--sort=version` option, which works
107 The PHP scripting language now has a
108 [strnatcmp](http://us3.php.net/manual/en/function.strnatcmp.php)
109 function based on this code.
110 The PHP wrapper was done by Andrei Zimievsky.
114 [Stuart Cheshire has a Macintosh system extension](http://www.naturalordersort.org/)
115 to do natural ordering.
116 I indepdendently reinvented the algorithm, but Stuart had it
117 first. I borrowed the term natural sort from him.
122 [`Sort::Versions`](http://search.cpan.org/src/EDAVIS/Sort-Versions-1.4/README)
123 in Perl. "The code has some special magic to deal with common conventions in program version numbers, like the difference between 'decimal' versions (eg perl 5.005) and the Unix kind (eg perl 5.6.1)."
125 [Sort::Naturally](http://www.cpan.org/modules/by-module/Sort/Sort-Naturally-1.01.readme)
126 is also in Perl, by Sean M. Burke. It uses locale-sensitive character classes to sort words and numeric substrings
127 in a way similar to natsort.
130 Ed Avis wrote [something similar in Haskell](http://membled.com/work/apps/todo/numsort).
134 Pierre-Luc Paour wrote a
135 [`NaturalOrderComparator`](http://pierre-luc.paour.9online.fr/NaturalOrderComparator.java)
138 [Numacomp](http://sourceforge.net/projects/numacomp) - similar thing in Python.
140 [as3natcompare](http://code.google.com/p/as3natcompare/) implementation in Flash ActionScript 3.
144 Comparison of characters is purely numeric, without taking
145 character set or locale into account. So it is only correct for
146 ASCII. This should probably be a separate function because doing
147 the comparisons will probably introduce a dependency on the OS
148 mechanism for finding the locale and comparing characters.
150 It might be good to support multibyte character sets too.
152 If you fix either of these, please mail me. They should not be