zensols.annotate.gate

Wrapper for Gate annotation natural language processing utility. This is a small wrapper that makes the following easier:

  • Annotating Documents
  • Create Store Documents
  • Creating Annotation Schemas

*corpus-name*

dynamic

The default corpus name when creating a Gate data store.

annotate-document

(annotate-document start end label doc)(annotate-document start end label features doc)

Annotate a document with entity label type from character position [**start** end) using additional entity metadata features in document doc.

annotation-schema

(annotation-schema label)(annotation-schema label options)

Create an annotation schema (i.e. entity) label. If options is given provide additional feature schema metadata. See annotation-schema-from-resource.

annotation-schema-from-resource

(annotation-schema-from-resource resource)

Create a schema annotation from a schema the contents of resource. See Gate docs). See annotation-schema.

configure-plugins

(configure-plugins)(configure-plugins plugins)

Configure Gate plugins. The no-arg default configures the Alignment plugin.

create-document

(create-document text)(create-document text name)

Create a document with raw text. You can annotate the returned document with annotate-document.

initialize

(initialize)

Initialize the Gate system. This is called when this namespace is loaded.

retrieve-documents

(retrieve-documents store-dir)

Retrieve Gate documents as maps that was stored by a human annotator or by store-documents. The data to be retrieved comes from the file system pointed by the directory store-dir.

This returns a lazy sequence of maps that have the following keys:

  • :document The gate.Document instance (if you really need it)
  • :name The name of the document
  • :content The text string content of the document.
  • :annotation A map of annotation maps that have the following keys:
    • :text: The text of the annotation
    • :label The label of the annotation (*type* in Gate parlance)
    • :annotations The character interval of the annotation text (start/end node in Gate parlance

store-documents

(store-documents store-dir documents & {:keys [resources]})

Create a Gate data store that can be opened by the Gate GUI. This creates a directory structure at store-dir and populates it documents that were create with create-document. The name of the corpus is taken from *corpus-name*.

Important: this first deletes the store-dir directory if it exists.

Keys