CoboCards App FAQ & Wishes Feedback
Language: English Language
Sign up for free  Login

This flashcard is just one of a free flashcard set. See all flashcards!

All main topics / PTT / PTT / PTT
49
Stemming
• Stemming is the process of reducing a word into its stem.

• The stem or root form is not necessarily a word by itself, but it
can be used to generate words by concatenating the right suffix.

    • Example:
    • fish, fishes and fishing stems into fish
      It is a correct word

    • study, studies and studying stems into studi
     It is not an English word.

• Most commonly, stemming algorithms (a.k.a. stemmers) are
based on rules for suffix stripping.

• The most famous algorithm is the Porter stemmer. Introduced in 1979.
• A more aggressive stemming algorithm is the Lancaster  stemmer. Introduced in 1990.
• Es gibt mehrere Python Libaries wie:NLTK und PyStemmer.

Stemming in Python

• Stemming with NLTK

import nltk
from nltk.stem.porter import PorterStemmer
def stem(tokens):
stem = []
for item in tokens:
stems.append(PorterStemmer().stem(item))
return stems

• Stemming with PyStemmer

import Stemmer
def stem(tokens):
stemmer = Stemmer.Stemmer('english')
stems = stemmer.stemWords(tokens)
return stems
New comment
Flashcard info:
Author: CoboCards-User
Main topic: PTT
Topic: PTT
School / Univ.: Uni Koblenz
City: Koblenz
Published: 08.07.2016

Cancel
Email

Password

Login    

Forgot password?
Deutsch  English