zensols.nlparse.feature.word

Feature utility functions for tokens and words.

dictionary-feature-metas

(dictionary-feature-metas)(dictionary-feature-metas lang-codes)

dictionary-features

(dictionary-features tokens)(dictionary-features tokens lang-codes)

Dictionary features include in/out-of-vocabulary ratio. The lang-codes parameter is a hash set of two letter string language code (see zensols.nlparse.wordlist/in-word-list?) to look up, which defaults to en for English.

See zensols.nlparse.wordlist/word-list-locales

token-feature-metas

(token-feature-metas)

Metadata for token-features.

token-features

(token-features panon tokens)

Return token features for panon for all tokens. The following features are given:

  • :utterance-length The character length of the utterance.
  • :mention-count Number of mentions in the utterance.
  • :sent-count Number of sentences in the utterance.
  • :token-count Total tokens across all sentences.
  • :token-average-length Average character lenght of all tokens.
  • :stopword-count Number of stop words in hte utterance.
  • :is-question Whether or not the last token across all sentences is a question.

wordnet-feature-metas

(wordnet-feature-metas)

wordnet-features

(wordnet-features word)(wordnet-features word pos-tag)

Get features generated from WordNet from word.