A stemmer for Czech implemented in Python.

I ported the algorithm from the Java implementation by Ljiljana Dolamic, University of Neuchatel.

The file ( may be used as a standalone program or as a module. When used as a program, it reads text from stdin and writes the stemmed text to stdout. Examples:

$ echo "listina základních práv evropské unie" | ./ light
list základn práv evropsk uni
$ echo "listina základních práv evropské unie" | ./ aggressive
lis základ práv evrops uni
Note: the input should be tokenized, ie, words should be separated from punctuation by whitespace; multiple spaces between words are not preserved at the output.

The program reads from stdin and writes to stdout.
Command syntax:

$ ./ MODE < input.txt > output.txt

MODE is either light (may understem some words) or aggressive (may overstem some words).



The program requires Python ≥ 3.1. Tested on linux.

This code is released under a Creative Commons Attribution 3.0 Unported License.