zensols.deepnlp.model package¶
Submodules¶
zensols.deepnlp.model.facade module¶
A facade that supports natural language model feature updating through a facade.
- class zensols.deepnlp.model.facade.LanguageModelFacade(config, config_factory=<property object>, progress_bar=True, progress_bar_cols='term', executor_name='executor', writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, predictions_dataframe_factory_class=<class 'zensols.deeplearn.result.pred.PredictionsDataFrameFactory'>, suppress_transformer_warnings=True)[source]¶
Bases:
ModelFacade
A facade that supports natural language model feature updating through a facade. This facade also provides logging configuration for NLP domains for this package.
This class makes assumptions on the naming of the embedding layer vectorizer naming. See
embedding
.- __init__(config, config_factory=<property object>, progress_bar=True, progress_bar_cols='term', executor_name='executor', writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, predictions_dataframe_factory_class=<class 'zensols.deeplearn.result.pred.PredictionsDataFrameFactory'>, suppress_transformer_warnings=True)¶
- property doc_parser: FeatureDocumentParser¶
Return the document parser assocated with the language vectorizer manager.
- See:
obj:language_vectorizer_manager
- property embedding: str¶
The embedding layer.
Important: the naming of the
embedding
parameter is that which is given in the configuration without the_layer
postfix. For example,embedding
isglove_50_embedding
for:glove_50_embedding
is the name of theGloveWordEmbedModel
glove_50_feature_vectorizer
is the name of theWordVectorEmbeddingFeatureVectorizer
glove_50_embedding_layer
is the name of the :class: ~zensols.deepnlp.vectorize.WordVectorEmbeddingLayer
- Parameters:
embedding – the kind of embedding, i.e.
glove_50_embedding
- property enum_feature_ids: Set[str]¶
Spacy enumeration encodings used to token wise to widen the input embeddings.
- get_max_word_piece_len()[source]¶
Get the longest word piece length for the first found configured transformer embedding feature vectorizer.
- Return type:
- get_transformer_vectorizer()[source]¶
Return the first found tranformer token vectorizer.
- Return type:
- property language_vectorizer_manager: FeatureVectorizerManager¶
Return the language vectorizer manager for the class.
- class zensols.deepnlp.model.facade.LanguageModelFacadeConfig(manager_name, attribs, embedding_attribs)[source]¶
Bases:
object
Configuration that defines how and what to access language configuration data. Note that this data reflects how you have the model configured per the configuration file. Parameter examples are given per the Movie Review example.
- __init__(manager_name, attribs, embedding_attribs)¶
-
attribs:
Set
[str
]¶ token, document etc), such as
enum
,count
,dep
etc.- Type:
The language attributes (all levels
zensols.deepnlp.model.sequence module¶
Utility classes for mapping aggregating and collating sequence (i.e. NER) labels.
- class zensols.deepnlp.model.sequence.BioSequenceAnnotationMapper(begin_tag='B', in_tag='I', out_tag='O')[source]¶
Bases:
object
Matches feature documents/tokens with spaCy document/tokens and entity labels.
- __init__(begin_tag='B', in_tag='I', out_tag='O')¶
- map(classes, docs)[source]¶
Map BIO entities and documents to pairings as annotations.
- Parameters:
classes (
Tuple
[List
[str
]]) – a tuple of lists, each list containing the class of the token in BIO formatdocs (
Tuple
[FeatureDocument
]) – the feature documents to assign labels
- Return type:
- Returns:
a tuple of annotation instances, each with coupling of label, feature token and spaCy token
- class zensols.deepnlp.model.sequence.SequenceAnnotation(label, doc, tokens)[source]¶
Bases:
PersistableContainer
,Dictable
An annotation of a pair matching feature and spaCy tokens.
- __init__(label, doc, tokens)¶
-
doc:
FeatureDocument
¶ The feature document associated with this annotation.
- property sent: FeatureSentence¶
The sentence containing the annotated tokens.
- property token_matches: Tuple[FeatureToken, Token]¶
Pairs of matching feature token to token mapping. This is useful for annotating spaCy documents.
-
tokens:
Tuple
[FeatureToken
]¶ The tokens annotated with
label
.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, short=False)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.deepnlp.model.sequence.SequenceDocumentAnnotation(doc, sequence_anons)[source]¶
Bases:
Dictable
Contains token annotations for a
FeatureDocument
as a duple ofSequenceAnnotation
.- __init__(doc, sequence_anons)¶
-
doc:
FeatureDocument
¶ The feature document associated with this annotation.
-
sequence_anons:
Tuple
[SequenceAnnotation
]¶ The annotations for the respective
doc
.
- property spacy_doc: Doc¶
The spaCy document associated with this annotation.
- property token_matches: Tuple[str, FeatureToken, Token]¶
Triple of matching feature token to token mapping in the form (
label
,feature token
,spacy token
). This is useful for annotating spaCy documents.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, short=False)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable