CoboCards App FAQ & Wünsche Feedback
Sprache: Deutsch Sprache
Kostenlos registrieren  Login

Zu dieser Karteikarte gibt es einen kompletten Satz an Karteikarten. Kostenlos!

Alle Oberthemen / PTT / PTT / PTT
49
Stemming
• Stemming is the process of reducing a word into its stem.

• The stem or root form is not necessarily a word by itself, but it
can be used to generate words by concatenating the right suffix.

    • Example:
    • fish, fishes and fishing stems into fish
      It is a correct word

    • study, studies and studying stems into studi
     It is not an English word.

• Most commonly, stemming algorithms (a.k.a. stemmers) are
based on rules for suffix stripping.

• The most famous algorithm is the Porter stemmer. Introduced in 1979.
• A more aggressive stemming algorithm is the Lancaster  stemmer. Introduced in 1990.
• Es gibt mehrere Python Libaries wie:NLTK und PyStemmer.

Stemming in Python

• Stemming with NLTK

import nltk
from nltk.stem.porter import PorterStemmer
def stem(tokens):
stem = []
for item in tokens:
stems.append(PorterStemmer().stem(item))
return stems

• Stemming with PyStemmer

import Stemmer
def stem(tokens):
stemmer = Stemmer.Stemmer('english')
stems = stemmer.stemWords(tokens)
return stems
Neuer Kommentar
Karteninfo:
Autor: CoboCards-User
Oberthema: PTT
Thema: PTT
Schule / Uni: Uni Koblenz
Ort: Koblenz
Veröffentlicht: 08.07.2016

Abbrechen
E-Mail

Passwort

Login    

Passwort vergessen?
Deutsch  English