zensols.nlparse.stopword
This namesapce provides ways of filtering stop word tokens.
To avoid the double negative in function names, go words are defined to be the compliment of a vocabulary with a stop word list. Functions like meaningful-word? tell whether or not a token is a stop word, which are defined to be:
- stopwords (predefined list)
- punctuation
- numbers
- non-alphabetic characters
- URLs
*stopword-config*
dynamic
Configuration for filtering stop words.
Keys
- :post-tags POS tags for go words (see namespace docs)
- :word-form-fn function run on the token in go-word-form; for example if
#(-> % :lemma s/lower-case)
then lemmatization is used (i.e. Running -> run)
go-word-forms
(go-word-forms tokens)
Filter tokens per go-word? and return their form based on go-word-form.