snowball_stemmer

Description

This program is a simple wrapper around libstemmer, providing an easy to use stemmer for many languages (Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, and Turkish).

It reads text from stdin and writes the stemmed text to stdout. Examples:

$ echo "arguable importance" | ./snowball_stemmer english
arguabl import
Note: the input should be tokenized, ie, words should be separated from punctuation by whitespace; multiple spaces between words are not preserved at the output.
Usage

The program reads from stdin and writes to stdout.
Command syntax:

$ ./snowball_stemmer ALGORITHM < input.txt > output.txt

To get a list of available stemming algorithms just type the command without arguments:

$ ./snowball_stemmer
usage:
./snowball_stemmer danish
./snowball_stemmer dutch
./snowball_stemmer english
./snowball_stemmer finnish
./snowball_stemmer french
./snowball_stemmer german
./snowball_stemmer hungarian
./snowball_stemmer italian
./snowball_stemmer norwegian
./snowball_stemmer porter
./snowball_stemmer portuguese
./snowball_stemmer romanian
./snowball_stemmer russian
./snowball_stemmer spanish
./snowball_stemmer swedish
./snowball_stemmer turkish
Download

snowball_stemmer_rev0.tar.gz

The program requires a C compiler. Tested with gcc 4.4.3 on linux.

This code is released under a Creative Commons Attribution 3.0 Unported License.