zensols.nlparse.feature.word
Feature utility functions for tokens and words.
dictionary-feature-metas
(dictionary-feature-metas)
(dictionary-feature-metas lang-codes)
See dictionary-features.
dictionary-features
(dictionary-features tokens)
(dictionary-features tokens lang-codes)
Dictionary features include in/out-of-vocabulary ratio. The lang-codes parameter is a hash set of two letter string language code (see zensols.nlparse.wordlist/in-word-list?) to look up, which defaults to en
for English.
token-features
(token-features panon tokens)
Return token features for panon for all tokens. The following features are given:
- :utterance-length The character length of the utterance.
- :mention-count Number of mentions in the utterance.
- :sent-count Number of sentences in the utterance.
- :token-count Total tokens across all sentences.
- :token-average-length Average character lenght of all tokens.
- :stopword-count Number of stop words in hte utterance.
- :is-question Whether or not the last token across all sentences is a question.
wordnet-feature-metas
(wordnet-feature-metas)
wordnet-features
(wordnet-features word)
(wordnet-features word pos-tag)
Get features generated from WordNet from word.
- word the word to lookup
- pos-tag a wordnet pos tag (see zensols.nlparse.wordnet/pos-tags)