Token Normalizers and Mappers List#

This package provides a simple, yet robust way to generate a string stream of tokens using a TokenNormalizer as mentioned in the parsing documentation (please read this first).

A full list of token normalizers mappers are listed below. Note that the API was written to easily extend to create your own using the configuration factory API.

  • TokenNormalizer: Base token extractor returns tuples of tokens and their normalized version.

  • TokenMapper: Abstract class used to transform token tuples generated from TokenNormalizer.normalize.

  • MapTokenNormalizer: A normalizer that applies a sequence of TokenMappers to transform the normalized token text.

  • SplitTokenMapper: Splits the normalized text on a per token basis with a regular expression.

  • LemmatizeTokenMapper: Lemmatize tokens and optional remove entity stop words.

  • FilterTokenMapper: Filter tokens based on token (Spacy) attributes.

  • SubstituteTokenMapper: Replace a string in normalized token text.

  • LambdaTokenMapper: Use a lambda expression to map a token tuple.