zensols.nlparse.stopword

This namesapce provides ways of filtering stop word tokens.

To avoid the double negative in function names, go words are defined to be the compliment of a vocabulary with a stop word list. Functions like meaningful-word? tell whether or not a token is a stop word, which are defined to be:

  • stopwords (predefined list)
  • punctuation
  • numbers
  • non-alphabetic characters
  • URLs

*stopword-config*

dynamic

Configuration for filtering stop words.

Keys

  • :post-tags POS tags for go words (see namespace docs)
  • :word-form-fn function run on the token in go-word-form; for example if #(-> % :lemma s/lower-case) then lemmatization is used (i.e. Running -> run)

go-word-form

(go-word-form token)

Conical string word count form of a token. .

go-word-forms

(go-word-forms tokens)

Filter tokens per go-word? and return their form based on go-word-form.

go-word?

(go-word? token)

Return whether a token is a go token.