KinoSearch::Analysis::Stopalizer - suppress a "stoplist" of common words


NAME

KinoSearch::Analysis::Stopalizer - suppress a ``stoplist'' of common words

Back to Top


SYNOPSIS

    my $stopalizer = KinoSearch::Analysis::Stopalizer->new(
        language => 'fr',
    );
    my $polyanalyzer = KinoSearch::Analysis::PolyAnalyzer->new(
        analyzers => [ $lc_normalizer, $tokenizer, $stopalizer, $stemmer ],
    );

Back to Top


DESCRIPTION

A ``stoplist'' is collection of ``stopwords'': words which are common enough to be of little value when determining search results. For example, so many documents in English contain ``the'', ``if'', and ``maybe'' that it may improve both performance and relevance to block them.

    # before
    @token_texts = ('i', 'am', 'the', 'walrus');
    
    # after
    @token_texts = ('',  '',   '',    'walrus');

Back to Top


CONSTRUCTOR

new

    my $stopalizer = KinoSearch::Analysis::Stopalizer->new(
        language => 'de',
    );
    
    # or...
    my $stopalizer = KinoSearch::Analysis::Stopalizer->new(
        stoplist => \%stoplist,
    );

new() takes two possible parameters, language and stoplist. If stoplist is supplied, it will be used, overriding the behavior indicated by the value of language.

stoplist - must be a hashref, with stopwords as the keys of the hash and values set to 1.

language - must be the ISO code for a language. Loads a default stoplist supplied by Lingua::StopWords.

Back to Top


SEE ALSO

Lingua::StopWords

Back to Top


COPYRIGHT

Copyright 2005-2006 Marvin Humphrey

Back to Top


LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.15.

Back to Top

 KinoSearch::Analysis::Stopalizer - suppress a "stoplist" of common words