hindi_stemmer

Description

A stemmer for Hindi implemented in Python.

This program implements the suffix-stripping algorithm described in "A Lightweight Stemmer for Hindi" by Ananthakrishnan Ramanathan and Durgesh D Rao.

The file (hindi_stemmer.py) may be used as a standalone program or as a module. When used as a program, it reads text from stdin and writes the stemmed text to stdout. Examples:

$ echo "ख़रीदारों के लिए मार्ग दर्शिका" | ./hindi_stemmer.py
खरीदार के लिए मार्ग दर्शिक
Note: the input should be tokenized, ie, words should be separated from punctuation by whitespace; multiple spaces between words are not preserved at the output.
Usage

The program reads from stdin and writes to stdout.
Command syntax:

$ ./hindi_stemmer.py < input.txt > output.txt
Download

hindi_stemmer_rev0.tar.gz

The program requires Python ≥ 3.1. Tested on linux.

This code is released under a Creative Commons Attribution 3.0 Unported License.