zensols.nlparse.tok-re

This namespace extends the NER system to easily add any regular expression using the Stanford TokensRegex API.

This takes a sequence of regular expressions and entity metadata as input and produces a file format the TokensRegex API consumes to tag entities.

This is an example of the output.

item

(item content label & opts)

Create an item used to create a pattern/line in the Stanford CoreNLP regular expression definition file with a regex created from content and NER label.

The opts parameter are keys with:

:lem-min-len minimum item utterance length to turn on lemmatization for the last token (default -1), for example:
- 2: if the string is or longer than 2 chars lemmatize the last token
- 0: always lemmatize
- -1: never lemmatize
:case-min-tok must have at least N tokens to turn on case sensitivity (default to -1), for example:
- 2: if there are 1 or 2 tokens make it case sensitive
- 1: if there is only one token then make it case sensitive
- 0: always case sensitive
- -1: always case insensitive
:conj-regexp? add and|& regex to match both symbols, defaults to true
:first-det-chop? chop off ‘the’ at the beginning of the item utterance, defaults to true
:is-regexp? if true write the regular expression verbatim instead of generating one from the utterance like form

view source

parse-features

(parse-features feature-string)

view source

write-regex-files

(write-regex-files regex-output-file features-output-file items)

Write all items to the Stanford token regular expression files regex-output-file with all possible features in features-output-file.

view source

Generated by Codox

NLP Parsing and Feature Creation 0.1.6

Project

Topics

Namespaces

Public Vars

zensols.nlparse.tok-re

item

parse-features

write-regex-files