1 # This file contains a list of stemmers to include in the distribution.
2 # The format is a set of space separated lines - on each line:
3 # First item is name of stemmer.
4 # Second item is comma separated list of character sets.
5 # Third item is comma separated list of names to refer to the stemmer by.
7 # Lines starting with a #, or blank lines, are ignored.
9 # List all the main algorithms for each language, in UTF-8, and also with
10 # the most commonly used encoding.
12 danish UTF_8,ISO_8859_1 danish,da,dan
13 dutch UTF_8,ISO_8859_1 dutch,nl,dut,nld
14 english UTF_8,ISO_8859_1 english,en,eng
15 finnish UTF_8,ISO_8859_1 finnish,fi,fin
16 french UTF_8,ISO_8859_1 french,fr,fre,fra
17 german UTF_8,ISO_8859_1 german,de,ger,deu
18 hungarian UTF_8,ISO_8859_1 hungarian,hu,hun
19 italian UTF_8,ISO_8859_1 italian,it,ita
20 norwegian UTF_8,ISO_8859_1 norwegian,no,nor
21 portuguese UTF_8,ISO_8859_1 portuguese,pt,por
22 russian UTF_8,KOI8_R russian,ru,rus
23 spanish UTF_8,ISO_8859_1 spanish,es,esl,spa
24 swedish UTF_8,ISO_8859_1 swedish,sv,swe
26 # Also include the traditional porter algorithm for english.
27 # The porter algorithm is included in the libstemmer distribution to assist
28 # with backwards compatibility, but for new systems the english algorithm
29 # should be used in preference.
30 porter UTF_8,ISO_8859_1 porter
32 # Some other stemmers in the snowball project are not included in the standard
33 # distribution. To compile a libstemmer with them in, add them to this list,
34 # and regenerate the distribution. (You will need a full source checkout for
35 # this.) They are included in the snowball website as curiosities, but are not
36 # intended for general use, and use of them is is not fully supported. These
39 # german2 - This is a slight modification of the german stemmer.
40 # kraaij_pohlmann - This is a different dutch stemmer.
41 # lovins - This is an english stemmer, but fairly outdated, and
42 # only really applicable to a restricted type of input text
43 # (keywords in academic publications).
44 # romanian1 - This is a romanian stemmer, which we haven't yet
46 # romanian2 - This is a different romanian stemmer, which we haven't