A stemmer for Czech implemented in Python.
I ported the algorithm from the Java implementation by Ljiljana Dolamic, University of Neuchatel.
The file (czech_stemmer.py) may be used as a standalone program or as a module. When used as a program, it reads text from stdin and writes the stemmed text to stdout. Examples:
$ echo "listina základních práv evropské unie" | ./czech_stemmer.py light list základn práv evropsk uni
$ echo "listina základních práv evropské unie" | ./czech_stemmer.py aggressive lis základ práv evrops uni
The program reads from stdin and writes to stdout. Command syntax:
$ ./czech_stemmer.py MODE < input.txt > output.txt
MODE is either light (may understem some words) or aggressive (may overstem some words).
czech_stemmer_rev0.tar.gz
The program requires Python ≥ 3.1. Tested on linux.
This code is released under a Creative Commons Attribution 3.0 Unported License.
Luís Gomes luismsgomes@gmail.com http://research.variancia.com/ updated 19 November 2010