zensols.amr package¶
Subpackages¶
- zensols.amr.wlk package
- Submodules
- zensols.amr.wlk.amr_similarity module
- zensols.amr.wlk.graph_helpers module
- zensols.amr.wlk.score module
- Module contents
Submodules¶
zensols.amr.align module¶
Add alignments to AMR sentences.
- class zensols.amr.align.AmrAlignmentPopulator(aligner, add_missing_metadata=True, raise_exception=True)[source]¶
Bases:
object
Adds alignment markers to AMR graphs.
- __init__(aligner, add_missing_metadata=True, raise_exception=True)¶
- zensols.amr.align.create_amr_align_component(nlp, name, aligner)[source]¶
Create an instance of
AmrAlignmentPopulator
.
zensols.amr.alignpop module¶
Includes classes to add alginments to AMR graphs using an ISI formatted alignment string.
- class zensols.amr.alignpop.AlignmentPopulator(graph, alignment_key='alignments')[source]¶
Bases:
object
Adds alignments from an ISI formatted string.
- __init__(graph, alignment_key='alignments')¶
-
alignment_key:
str
= 'alignments'¶ The key in the graph’s metadata with the ISI formatted alignment string.
- class zensols.amr.alignpop.PathAlignment(index, path, alignment_str, alignment, triple)[source]¶
Bases:
object
An alignment that contains the path and alignment to node, or an edge for role alignments.
- __init__(index, path, alignment_str, alignment, triple)¶
-
alignment:
Union
[Alignment
,RoleAlignment
]¶ The alignment of the node or edge.
zensols.amr.amrlib module¶
AMR parser and generator model implementations using amrlib
.
- class zensols.amr.amrlib.AmrlibGenerator(name=None, installer=None, alternate_path=None, use_tense=True)[source]¶
Bases:
_AmrlibModelContainer
,AmrGenerator
- __init__(name=None, installer=None, alternate_path=None, use_tense=True)¶
- generate(doc)[source]¶
Generate a sentence from a spaCy document.
- Parameters:
doc (
AmrDocument
) – the spaCy document used to generate the sentence- Return type:
- Returns:
a text sentence for each respective sentence in
doc
zensols.amr.annotate module¶
AMR annotated corpus utility classes.
- class zensols.amr.annotate.AnnotatedAmrDocument(sents, path=None, doc_id=None)[source]¶
Bases:
AmrDocument
An AMR document containing a unique document identifier from the corpus.
- __init__(sents, path=None, doc_id=None)¶
Initialize.
- Parameters:
sents (Tuple[AmrSentence, …]) – the document’s sentences
path (Optional[Path, …]) – the path to file containing the Penman notation sentence graphs used in
sents
model – the model to initailize
AmrSentence
whensents
is a list of string Penman graphs
- property body: AmrDocument¶
The sentences that make up the body of the document.
- doc_id: str = None¶
The unique document identifier.
- static get_feature_sentences(feature_doc, amr_docs)[source]¶
Return the feature sentences of those that refer to the AMR sentences, but starting from the AMR side.
- Parameters:
feature_doc (
AmrFeatureDocument
) – the document having theFeatureSentence
instancesamr_docs (
Iterable
[AmrDocument
]) – the documents having the sentences, such assummary
- Return type:
- property sections: Tuple[AnnotatedAmrSectionDocument]¶
The sections of the document.
- property summary: AmrDocument¶
The sentences that make up the summary of the document.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_summary=True, include_sections=True, include_body=False, include_amr=True, **kwargs)[source]¶
Write the contents of this instance to
writer
using indentiondepth
.- Parameters:
include_summary (
bool
) – whether to include the summary sentencesinclude_sectional – whether to include the sectional sentences
include_body (
bool
) – whether to include the body sentencesinclude_amr (
bool
) – whether to include the super class AMR outputkwargs – arguments given to the super classe’s write, such as
limit_sent=0
to effectively disable it
- class zensols.amr.annotate.AnnotatedAmrDocumentStash(installer, doc_dir, corpus_cache_dir, id_name, id_regexp=re.compile('([^.]+)\\\\.(\\\\d+)'), sent_type_col='snt-type', sent_type_mapping=None, doc_parser=None, amr_sent_model=None, amr_sent_class=<class 'zensols.amr.annotate.AnnotatedAmrSentence'>, amr_doc_class=<class 'zensols.amr.annotate.AnnotatedAmrDocument'>, doc_annotator=None)[source]¶
Bases:
Stash
A factory stash that creates
AnnotatedAmrDocument
instances of annotated documents from a single text file containing a corpus of AMR Penman formatted graphs.- __init__(installer, doc_dir, corpus_cache_dir, id_name, id_regexp=re.compile('([^.]+)\\\\.(\\\\d+)'), sent_type_col='snt-type', sent_type_mapping=None, doc_parser=None, amr_sent_model=None, amr_sent_class=<class 'zensols.amr.annotate.AnnotatedAmrSentence'>, amr_doc_class=<class 'zensols.amr.annotate.AnnotatedAmrDocument'>, doc_annotator=None)¶
- amr_doc_class¶
The class used to create new instances of
AmrDocument
.alias of
AnnotatedAmrDocument
- amr_sent_class¶
The class used to create new instances of
AmrSentence
.alias of
AnnotatedAmrSentence
-
amr_sent_model:
str
= None¶ The model set in the
AmrSentence
initializer.
- property corpus_df: pd.DataFrame¶
A data frame containing the identifier, text of the sentences and the annotated sentence types of the corpus.
- property corpus_doc: AmrDocument¶
A document containing all the sentences from the corpus.
- delete(name=None)[source]¶
Delete the resource for data pointed to by
name
or the entire resource ifname
is not given.
-
doc_annotator:
AnnotationFeatureDocumentParser
= None¶ Used to annotated AMR documents if not
None
.
- property doc_counts: pd.DataFrame¶
A data frame of the counts by unique identifier.
-
doc_dir:
Path
¶ The directory containing sentence type mapping for documents or
None
if there are no sentence type alignments.
-
doc_parser:
FeatureDocumentParser
= None¶ If provided, AMR metadata is added to sentences, which is needed by the AMR populator.
- export_sent_type_template(doc_id, out_path=None)[source]¶
Create a CSV file that contains the sentences and other metadata of an annotated document used to annotated sentence types.
-
id_regexp:
Pattern
= re.compile('([^.]+)\\.(\\d+)')¶ The regular expression used to create the
id_name
if it exists. The regular expression must have with two groups: the first the ID and the second is the sentence index.
-
installer:
Installer
¶ The installer containing the AMR annotated corpus.
-
sent_type_mapping:
Dict
[str
,str
] = None¶ Used to map what’s in the corpus to a value of
SentenceType
if given.
- class zensols.amr.annotate.AnnotatedAmrFeatureDocumentFactory(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0)[source]¶
Bases:
object
Creates instances of
AmrFeatureDocument
each withAmrFeatureDocument.amr
instance ofAnnotatedAmrDocument
andAmrFeatureSentence.amr
withAnnotatedAmrSentence
. This is created using a JSON file or a list ofdict
.The keys each dictionary are the case-insensitive enumeration values of
SentenceType
. Keysid
andcomment
are the unique document identifier and a comment that is added to the AMR sentence metadata. Both are optional, and ifid
is missing, :obj:doc_id
.An example JSON creates a document with ID
ex1
, acomment
metadata, oneSentenceType.SUMMARY
and twoSentenceType.BODY
sentences:[{ "id": "ex1", "comment": "very short", "body": "The man ran to make the train. He just missed it.", "summary": "A man got caught in the door of a train he just missed." }]
- See:
- __init__(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0)¶
-
doc_id:
int
= 0¶ An instance based enumerated value is used, which is enumerated for each document missing an ID.
-
doc_parser:
FeatureDocumentParser
¶ The feature document parser used to create the Penman formatted graphs.
- from_data(data)[source]¶
Create AMR documents based on the type of
data
.
- from_dicts(data)[source]¶
Parse and create an AMR documents from a list of
dict
.- Parameters:
- See:
- Return type:
- from_file(input_file)[source]¶
Read annotated documents from a file and create AMR documents.
- Parameters:
input_file (
Path
) – the JSON file to read the doc text- Return type:
- from_str(sents, stype)[source]¶
Parse and create AMR sentences from a string.
- Parameters:
sents (
str
) – the string containing a space separated list of sentencesstype (
SentenceType
) – the sentence type assigned to each new AMR sentence
- Return type:
-
remove_alignments:
bool
= False¶ Whether to remove text-to-graph alignments in all sentence graphs after parsing.
-
remove_wiki_attribs:
bool
= False¶ Whether to remove the
:wiki
roles from all sentence graphs after parsing.
- to_annotated_doc(doc)[source]¶
Clone
doc.amr
into anAnnotatedAmrDocument
.- Parameters:
sent – the document to convert to an
AnnotatedAmrDocument
- Return type:
- Returns:
a feature document with a new
amr
to newAnnotatedAmrDocument
, which is a new instance ifsent
isn’t an annotated AMR document
- to_annotated_sent(sent, sent_type=None)[source]¶
Clone
sent.amr
into anAnnotatedAmrSentence
.- Parameters:
sent (
AmrFeatureSentence
) – the sentence to convert to anAnnotatedAmrSentence
sent_type (
SentenceType
) – the type of sentence to set on
- Return type:
- Returns:
a feature sentence with a new
amr
to newAnnotatedAmrSentence
, which is a new instance ifsent
isn’t an annotated AMR sentence
- class zensols.amr.annotate.AnnotatedAmrFeatureDocumentStash(feature_doc_factory, doc_stash, amr_stash, coref_resolver=None)[source]¶
Bases:
PrimeableStash
A stash that persists
AmrFeatureDocument
instances using AMR annotates fromAnnotatedAmrDocumentStash
as a source. The key set and exists behavior is identical between to two stashes. However, the instances ofAmrFeatureDocument
(and its constituent sentences) are generated from the AMR annotated sentences (i.e. from the ``::snt` metadata field).This stash keeps the persistance of the
AmrDocument
separate from instance of the feature document to avoid persisting it twice acrossdoc_stash
andamr_stash
. On load, these two data structures are stitched together.- __init__(feature_doc_factory, doc_stash, amr_stash, coref_resolver=None)¶
-
amr_stash:
AnnotatedAmrDocumentStash
¶ The stash used to persist
AmrDocument
instances that are stitched together with theAmrFeatureDocument
(see class docs).
- clear()[source]¶
Delete all data from the from the stash.
Important: Exercise caution with this method, of course.
-
coref_resolver:
CoreferenceResolver
= None¶ Adds coreferences between the sentences of the document.
- delete(name=None)[source]¶
Delete the resource for data pointed to by
name
or the entire resource ifname
is not given.
-
doc_stash:
Stash
¶ The stash used to persist instances of
AmrFeatureDocument
. It does not persis theAmrDocument
(see class docs).
- exists(doc_id)[source]¶
Return
True
if data with keyname
exists.Implementation note: This
Stash.exists()
method is very inefficient and should be overriden.- Return type:
-
feature_doc_factory:
AmrFeatureDocumentFactory
¶ Creates
AmrFeatureDocument
fromAmrDocument
instances.
- class zensols.amr.annotate.AnnotatedAmrSectionDocument(sents, path=None, section_sents=())[source]¶
Bases:
AmrDocument
Represents a section from an annotated document.
- __init__(sents, path=None, section_sents=())¶
Initialize.
- Parameters:
sents (Tuple[AmrSentence, …]) – the document’s sentences
path (Optional[Path, …]) – the path to file containing the Penman notation sentence graphs used in
sents
model – the model to initailize
AmrSentence
whensents
is a list of string Penman graphs
- section_sents: Tuple[AmrSentence] = ()¶
The sentences that make up the section title (usually just one).
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
- Parameters:
limit_sent – the max number of sentences to write
add_sent_id – add the sentence ID to the output
include_metadata – whether to add graph metadata to the output
text_ony – whether to only write the sentence text rather than the AMR Penman notation
- class zensols.amr.annotate.AnnotatedAmrSentence(data, model, doc_sent_idx, sent_type)[source]¶
Bases:
AmrSentence
A sentence containing its index in the document and the funtional type.
- __init__(data, model, doc_sent_idx, sent_type)[source]¶
Initialize based on the kind of data given.
- Parameters:
data (
Union
[str
,Graph
]) – either a Penman formatted string graph, an already parsed graph or anAmrFailure
for upstream issuesmodel (
str
) – the model to use for encoding and decoding
- class zensols.amr.annotate.CorpusWriter(anon_doc_factory)[source]¶
Bases:
Writable
Writes
AmrDocument
instances to a file. To use, first add documents either directly withdocs
or using theadd()
.- __init__(anon_doc_factory)¶
- add(data)[source]¶
Add document(s) to this corpus writer. This uses the
AnnotatedAmrFeatureDocumentFactory.from_data()
and adds the instances ofAmrFeatureDocument
.
- anon_doc_factory: AnnotatedAmrFeatureDocumentFactory¶
The factory used to create the
AmrFeatureDocument
instances that are in turn used to format that graphs as Penman text output.
- property docs: List[AmrDocument]¶
The document to write.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of the documents added to this writer to
writer
as flat formatted Penman AMRs.- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.amr.annotate.FileCorpusWriter(anon_doc_factory, input_file, output_file)[source]¶
Bases:
CorpusWriter
A corpus writer that parses a JSON file for its source input, then uses a the configured AMR parser to generate the graphs.
- __init__(anon_doc_factory, input_file, output_file)¶
- input_file: Path¶
The JSON file as formatted per
AnnotatedAmrFeatureDocumentFactory
.
- output_file: Path¶
The file path to write the AMR sentences.
- class zensols.amr.annotate.SentenceType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
Enum
The type of sentence in relation to its function in the document.
- BODY = 'b'¶
- FIGURE = 'f'¶
- FIGURE_TITLE = 'ft'¶
- OTHER = 'o'¶
- SECTION = 's'¶
- SUMMARY = 'a'¶
- TITLE = 't'¶
zensols.amr.app module¶
Adapts amrlib
in the Zensols framework.
- class zensols.amr.app.Application(log_config, config_factory, doc_parser, anon_doc_stash, dumper)[source]¶
Bases:
BaseApplication
Parse and plot AMR graphs in Penman notation.
- __init__(log_config, config_factory, doc_parser, anon_doc_stash, dumper)¶
- anon_doc_stash: Stash¶
The annotated document stash.
- config_factory: ConfigFactory¶
Application context used by programmatic clients of this class.
- count(input_file)[source]¶
Provide counts on an AMR corpus file.
- Parameters:
input_file (
Path
) – a file with newline separated AMR Penman graphs
- doc_parser: FeatureDocumentParser¶
The feature document parser for the app. This is not done via the application config to allow overriding of the defaults.
- dumper: Dumper¶
Plots and writes AMR content in human readable formats.
- parse(text)[source]¶
Parse the natural language text to an AMR graphs.
- Parameters:
text (
str
) – the sentence(s) to parse
- class zensols.amr.app.BaseApplication(log_config)[source]¶
Bases:
object
Base class for applications.
- __init__(log_config)¶
-
log_config:
LogConfigurator
¶ Used to update logging levels based on the ran action.
- class zensols.amr.app.Format(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
Enum
Format output type for AMR corpous documents.
- csv = 3¶
- json = 2¶
- txt = 1¶
- class zensols.amr.app.ScorerApplication(log_config, config_factory, doc_factory)[source]¶
Bases:
BaseApplication
Creates parsed files for comparing, and scores.
- __init__(log_config, config_factory, doc_factory)¶
- config_factory: ConfigFactory¶
Application context.
- doc_factory: AmrFeatureDocumentFactory¶
Creates
AmrFeatureDocument
fromAmrDocument
instances.
- parse_penman(input_file, output_dir=None, meta_keys='id,snt', limit=None)[source]¶
Parse Penman sentence(s) by
id
and write a parsed AMR.
- score(input_gold, input_parsed=None, output_dir=None, output_format=Format.csv, limit=None, methods=None)[source]¶
Score AMRs by ID and dump the results to a file or directory.
- Parameters:
input_gold (Path) – the file containing the gold AMR graphs
input_parsed (Path) – the file containing the parser output graphs, defaults to
gold-parsed.txt
output_dir (Path) – the output directory
output_format (Format) – the output format
limit (int) – the max of items to process
methods (str) – a comma separated list of scoring methods
- Return type:
ScoreSet
- class zensols.amr.app.TrainerApplication(log_config, config_factory)[source]¶
Bases:
BaseApplication
Trains and evaluates models.
- __init__(log_config, config_factory)¶
-
config_factory:
ConfigFactory
¶ Application context.
- train(dry_run=False)[source]¶
Continue fine tuning on additional corpora.
- Parameters:
dry_run (
bool
) – don’t do anything; just act like it
zensols.amr.cli module¶
Command line entry point to the application.
- class zensols.amr.cli.ApplicationFactory(*args, **kwargs)[source]¶
Bases:
ApplicationFactory
zensols.amr.container module¶
Extensions of zensols.nlp
feature containers.
- class zensols.amr.container.AmrFeatureDocument(sents, text=None, spacy_doc=None, amr=None, coreference_relations=None)[source]¶
Bases:
FeatureDocument
A feature document that contains an
amr
graph.- __init__(sents, text=None, spacy_doc=None, amr=None, coreference_relations=None)¶
- add_coreferences(to_populate)[source]¶
Add
coreference_relations
toto_populate
using this instance’s coreferences. Note thatfrom_sentences()
,from_amr_sentences()
,get_overlapping_document()
and meth:clone already do this.
- property amr: AmrDocument¶
The AMR representation of the document.
- clone(cls=None, **kwargs)[source]¶
- Parameters:
kwargs – if copy_spacy is
True
, the spacy document is copied to the clone in addition parameters passed to new clone initializer- Return type:
- property coreference_relations: Tuple[Tuple[Tuple[int, str], ...], ...]¶
The coreferences tuple sets between the sentences of the document:
((<sentence index 1>, <variable 1>), (<sentence index 2>, <variable 2>)...)
- from_amr_sentences(amr_sents)[source]¶
Like
from_sentences()
, return a new document withFeatureDocument
sentences sync’d withAmrSentence
.- Parameters:
amr_sents (
Iterable
[AmrSentence
]) – the sentences that will make up the returned document- Return type:
- Returns:
a new document composed of
amr_sent
- See:
- from_sentences(sents, deep=False)[source]¶
Return a new cloned document using the given sentences.
- Parameters:
sents (
Iterable
[FeatureSentence
]) – the sentences to add to the new cloned documentdeep (
bool
) – whether or not to clone the sentences
- See:
- See:
- Return type:
- get_overlapping_span(span, inclusive=True)[source]¶
Return a feature span that includes the lexical scope of
span
.- Return type:
- property relation_set: RelationSet¶
The relations in the contained document as a set of relations.
- sync_amr_sents()[source]¶
Copy
amr
sentences to each respectiveAmrFeatureSentence.amr
. This is necessary when thenAmrDocument
is updated with new sentences that need to percolate down to the feature sentences.
- class zensols.amr.container.AmrFeatureSentence(tokens, text=None, spacy_span=None, amr=None)[source]¶
Bases:
FeatureSentence
A sentence that holds an instance of
AmrSentence
.- __init__(tokens, text=None, spacy_span=None, amr=None)¶
- property alignments: Dict[Tuple[str, str, str], Tuple[FeatureToken, ...]]¶
The tokens only returnd from
indexed_alignments
.
- property amr: AmrSentence¶
The AMR representation of the sentence.
- clone(cls=None, **kwargs)[source]¶
Clone an instance of this token container.
- Parameters:
cls (
Type
) – the type of the new instancekwargs – arguments to add to as attributes to the clone
- Return type:
- Returns:
the cloned instance of this instance
- property indexed_alignments: Dict[Tuple[str, str, str], Tuple[Tuple[int, FeatureToken]], ...]¶
The graph alignments as a triple-to-token dict. The values are tuples 0-index token offset and the feature token pointed to by the alignment.
- class zensols.amr.container.Reference(sent, variable)[source]¶
Bases:
ReferenceObject
A multi-document coreference target, which points to a node in an AMR graph.
- __init__(sent, variable)¶
-
sent:
AmrFeatureSentence
¶ The sentence containing the reference.
- property short¶
A short string describing the reference.
- property subtree: AmrSentence¶
The subtree of the sentence containing the target as an
AmrFeatureSentence
.
- property triple: Tuple[str, str, str]¶
The AMR tripple of
(source relation target)
of the reference.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.amr.container.ReferenceObject[source]¶
Bases:
PersistableContainer
,Dictable
A base class reference and relation classes.
- __init__()¶
- class zensols.amr.container.ReindexVariableFeatureDocumentDecorator[source]¶
Bases:
FeatureDocumentDecorator
Reindex AMR concept variables to be unique across all sentences.
- __init__()¶
- class zensols.amr.container.Relation(seq_id, references)[source]¶
Bases:
ReferenceObject
A relation makes up a set of references across multuiple sentences of a document. This is what Christopher Manning calls a cluster.
- __init__(seq_id, references)¶
- property by_sent: Dict[AmrFeatureSentence, Reference]¶
An association from sentences to their references.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.amr.container.RelationSet(relations)[source]¶
Bases:
ReferenceObject
All coreference relations for a given document.
- __init__(relations)¶
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
zensols.amr.coref module¶
Wrap the amr_coref
module for AMR Co-refernce resolution.
- class zensols.amr.coref.CoreferenceResolver(installer, stash=<factory>, use_multithreading=True, robust=True)[source]¶
Bases:
object
Resolve coreferences in AMR graphs.
- __init__(installer, stash=<factory>, use_multithreading=True, robust=True)¶
-
installer:
Installer
¶ The
amr_coref
module’s coreference module installer.
- property model: Inference¶
The
amr_coref
coreference model.
-
robust:
bool
= True¶ Whether to robustly deal with exceptions in the coreference model. If
True
, instances ofAmrFailure
are stored in the stash and empty coreferences used for caught errors.
zensols.amr.corpprep module¶
Prepare and compile AMR corpora for training.
- class zensols.amr.corpprep.AmrReleaseCorpusPrepper(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)[source]¶
Bases:
CorpusPrepper
Writes the `AMR 3 release`_ corpus files.
- __init__(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)¶
- class zensols.amr.corpprep.CorpusPrepper(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)[source]¶
Bases:
Dictable
Subclasses know where to download, install, and split the corpus in to train and dev data sets. Each subclass generates only the training and dev/validation datasets, which is an aspect of AMR parser and text generation models. Both the input and outupt are Penman encoded AMR graphs.
- __init__(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)¶
-
installer:
Installer
¶ The location and decompression details.
- abstract read_docs(target)[source]¶
Read and return tuples of where to write the output of the sentences of the corresponding document.
- Parameters:
target (
Path
) – the location of where to copy the finished files- Return type:
- Returns:
tuples of the dataset name and the read document
-
remove_wiki:
bool
= True¶ Whether to remove
:wiki
relations, which are not predicted by the model and negatively effect validation performance set while training.
- class zensols.amr.corpprep.CorpusPrepperManager(name, preppers, stage_dir, shuffle=True, key_splits=None)[source]¶
Bases:
Dictable
Aggregates and applies corpus prepare instances.
- __init__(name, preppers, stage_dir, shuffle=True, key_splits=None)¶
-
key_splits:
Path
= None¶ The AMR ``id``s from the sentence metadatas for each split are written to this JSON file if specified.
- prepare()[source]¶
Download, install and write the corpus to disk from all
preppers
. The output of each is placed in the correspondingtraining
ordev
directories instage_dir
. The data is then ready for AMR parser and generator trainers.
-
preppers:
Tuple
[CorpusPrepper
,...
]¶ The corpus prepare instances used to create the training files.
-
shuffle:
bool
= True¶ Whether to shuffle the AMR sentences before writing to the target directory. This is used the shuffle across each corpora per split.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.amr.corpprep.SingletonCorpusPrepper(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False, dev_portion=0.15)[source]¶
Bases:
CorpusPrepper
Prepares the corpus training files from a single AMR Penman encoded file.
- __init__(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False, dev_portion=0.15)¶
-
dev_portion:
float
= 0.15¶ The portion of the dev/validation set in sentences of the single input file.
zensols.amr.doc module¶
AMR container classes that fit a document/sentence hierarchy.
- class zensols.amr.doc.AmrDocument(sents, path=None, model=None)[source]¶
Bases:
PersistableContainer
,Writable
A document of AMR graphs, which is indexible and iterable.
- __init__(sents, path=None, model=None)[source]¶
Initialize.
- Parameters:
sents (
Iterable
[Union
[str
,Graph
,AmrSentence
]]) – the document’s sentencespath (
Optional
[Path
]) – the path to file containing the Penman notation sentence graphs used insents
model (
str
) – the model to initailizeAmrSentence
whensents
is a list of string Penman graphs
- from_sentences(sents, deep=False)[source]¶
Return a new cloned document using the given sentences.
- Parameters:
sents (
Iterable
[AmrSentence
]) – the sentences to add to the new cloned documentdeep (
bool
) – whether or not to clone the sentences
- See:
- Return type:
- classmethod from_source(source, transform_ascii=False, **kwargs)[source]¶
Return a new document created for
source
.- Parameters:
source (
Union
[Path
,Installer
]) – either a double newline list of AMR graphs or an installer that has a singleton path to a like filetransform_ascii (
bool
) – whether to replace non-ASCII characters to their ASCII equivalents (i.e. removes umlauts)kwargs – additional keyword arguments given to the initializer of the document
- Return type:
- get_doc_id()[source]¶
Get the ID of the document from the first sentence’s ID, if there is one. For example, if the first sentence’s ID is
liu-example.0
, the stringliu-example
is returned.
- property graph_string: str¶
The graph of all sentences with two newlines as a separator as a string in Penman format.
- path: Optional[Path, ...] = None¶
If set, the file the sentences were parsed from in Penman notation.
- reindex_variables()[source]¶
Reindexes all variables for sentences of a
AmrDocument
so all node variables are unique in the document.
- sents: Tuple[AmrSentence, ...]¶
The AMR sentences that make up the document.
- property text: str¶
The text of the natural language form of the document. This is the concatenation of all the sentinel text.
- class zensols.amr.doc.AmrGeneratedDocument(sents, amr)[source]¶
Bases:
Writable
A sentence generated by the graph-to-text model.
- __init__(sents, amr)¶
zensols.amr.docfac module¶
Feature sentence and document utilities.
- class zensols.amr.docfac.AmrFeatureDocumentFactory(name, doc_parser, alignment_populator=None)[source]¶
Bases:
object
Creates
AmrFeatureDocument
fromAmrDocument
instances.- __init__(name, doc_parser, alignment_populator=None)¶
-
alignment_populator:
AmrAlignmentPopulator
= None¶ Adds the alighment markings.
-
doc_parser:
FeatureDocumentParser
¶ The document parser used to creates
AmrFeatureDocument
instances.
- to_feature_doc(amr_doc, catch=False, add_metadata=False, add_alignment=False)[source]¶
Create a
AmrFeatureDocument
from a class:.AmrDocument by parsing thesnt
metadata with aFeatureDocumentParser
.- Parameters:
add_metadata (
Union
[str
,bool
]) – add missing annotation metadata toamr_doc
parsed from spaCy if missing (seeAmrParser.add_metadata()
) ifTrue
and replace any previous metadata if this value is the stringclobber
catch (
bool
) – ifTrue
, return caught exceptions creating aAmrFailure
from each and return them
- Return type:
Union
[AmrFeatureDocument
,Tuple
[AmrFeatureDocument
,List
[AmrFailure
]]]- Returns:
an AMR feature document if
catch
isFalse
; otherwise, a tuple of a document with sentences that were successfully parsed and a list any exceptions raised during the parsing
- class zensols.amr.docfac.EntityCopySpacyFeatureDocumentParser(config_factory, name, lang='en', model_name=None, token_feature_ids=<factory>, components=(), token_decorators=(), sentence_decorators=(), document_decorators=(), disable_component_names=None, token_normalizer=None, special_case_tokens=<factory>, doc_class=<class 'zensols.nlp.container.FeatureDocument'>, sent_class=<class 'zensols.nlp.container.FeatureSentence'>, token_class=<class 'zensols.nlp.tok.SpacyFeatureToken'>, remove_empty_sentences=None, reload_components=False, auto_install_model=False)[source]¶
Bases:
SpacyFeatureDocumentParser
Copy spaCy
ent_type_
named entity (NER) tags toFeatureToken
ent_
tags.The AMR document’s metadata
ner_tags
is populated inAmrParser
from the spaCy document. But this document parser instance is configured with embedded entities turned off so whitespace delimited tokens match with the alignments.- __init__(config_factory, name, lang='en', model_name=None, token_feature_ids=<factory>, components=(), token_decorators=(), sentence_decorators=(), document_decorators=(), disable_component_names=None, token_normalizer=None, special_case_tokens=<factory>, doc_class=<class 'zensols.nlp.container.FeatureDocument'>, sent_class=<class 'zensols.nlp.container.FeatureSentence'>, token_class=<class 'zensols.nlp.tok.SpacyFeatureToken'>, remove_empty_sentences=None, reload_components=False, auto_install_model=False)¶
zensols.amr.docparser module¶
AMR document annotation.
- exception zensols.amr.docparser.AmrParseError(msg, sent=None)[source]¶
Bases:
AmrError
- __module__ = 'zensols.amr.docparser'¶
- class zensols.amr.docparser.AnnotationFeatureDocumentParser(name, delegate, token_decorators=(), sentence_decorators=(), document_decorators=(), token_feature_ids=<factory>, silencer=None, stash=None, hasher=<factory>, amr_parser=None, alignment_populator=None, coref_resolver=None, reparse=True, amr_doc_class=<class 'zensols.amr.container.AmrFeatureDocument'>, amr_sent_class=<class 'zensols.amr.container.AmrFeatureSentence'>)[source]¶
Bases:
CachingFeatureDocumentParser
A document parser that adds and further annotates AMR graphs. This has the advantage of avoiding a second AMR construction when annotating a graph with features (i.e. ent, POS tag, etc) because it uses (adapted) spaCy class’s normalized features. For this reason, use this class if your application needs such annotations.
This parses and popluates AMR graphs as
AmrDocument
at the document level and aAmrSentence
at the sentence level using azensols.nlp.FeatureDocument
.This class will also recreate the AMR on normalized text of the document. This is necessary since AMR parsing and alignment happen at the spaCy level and token normalization happen at the
zensole.nlp
feature token level. Since spaCy does not allow for filter tokens (i.e. stop words) there is no way to avoid a reparse.However, if your application makes no modification to the document, a second reparse is not needed and you should set
reparse
to False.A consideration is the adaptation spaCy module (
spacyadapt
) is not thoroughly tested and future updates might break. If you do not feel comfortable using it, or can not, use the spacy pipline by settingamr_default:doc_parser = amr_anon_doc_parser
in the application configuration and annotated the graph yourself.The AMR graphs are optionally cached using a
Stash
whenstash
is set.Important: when using stash caching only the
AmrDocument
is cached and not the entire feature document. This could lead to the documents and AMR graphs getting out of sync if both are cached. Use theclear()
method to clear the stash if ever in doubt.A new instance of
AmrFeatureDocument
are returned.- __init__(name, delegate, token_decorators=(), sentence_decorators=(), document_decorators=(), token_feature_ids=<factory>, silencer=None, stash=None, hasher=<factory>, amr_parser=None, alignment_populator=None, coref_resolver=None, reparse=True, amr_doc_class=<class 'zensols.amr.container.AmrFeatureDocument'>, amr_sent_class=<class 'zensols.amr.container.AmrFeatureSentence'>)¶
-
alignment_populator:
AmrAlignmentPopulator
= None¶ Adds the alighment markings.
- amr_doc_class¶
The
FeatureDocument
class created to storezensols.amr.AmrDocument
instances.alias of
AmrFeatureDocument
- amr_sent_class¶
The
FeatureSentence
class created to storezensols.amr.AmrSentence
instances.alias of
AmrFeatureSentence
- annotate(doc)[source]¶
Parse, annotate and annotate a new AMR feature document using features from
doc
. Since the AMR document itself is not cached, using a separate document cache is necessary for caching/storage.- Parameters:
doc (
FeatureDocument
) – the source feature document to parse in to AMRskey – the key used to cache the
AmrDocument
. instash
if provided (see class docs)
- Return type:
-
coref_resolver:
CoreferenceResolver
= None¶ Adds coreferences between the sentences of the document.
- parse(text, *args, **kwargs)[source]¶
Parse text or a text as a list of sentences.
- Parameters:
text (
str
) – either a string or a list of strings; if the former a document with one sentence will be created, otherwise a document is returned with a sentence for each string in the listargs – the arguments used to create the FeatureDocument instance
kwargs – the key word arguments used to create the FeatureDocument instance
- Return type:
-
reparse:
bool
= True¶ Reparse the normalized
FeatureSentence
text for each AMR sentence, which is necessary when tokens are remove (i.e. stop words). See the class docs.
- class zensols.amr.docparser.TokenAnnotationFeatureDocumentDecorator(name, feature_id, indexed=False, add_none=False, use_sent_index=True, method='attribute')[source]¶
Bases:
FeatureDocumentDecorator
Annotate features in AMR sentence graphs from indexes annotated from
AmrAlignmentPopulator
.- __init__(name, feature_id, indexed=False, add_none=False, use_sent_index=True, method='attribute')¶
-
add_none:
bool
= False¶ Whether add missing or empty values. This includes string values of
zensols.nlp.FeatureToken.NONE
.
-
method:
str
= 'attribute'¶ Where to add the data, which may be one of:
attribute
: add as a new attribute node usingname
as the role and the value as the attribute constantepi
: as epigraph data; however, the current Penman implementation assume only alignments and the graph string will no longer be parsable
Otherwise, it uses the string to format a replacement node text using
target
as the previous/original node text andvalue
as the feature value text.
-
name:
str
¶ The triple role (if
add_to_epi
isFalse
) used to label the edge between the token and the feature. Otherwise, this string is used in the epidata of the graph.
-
use_sent_index:
bool
= True¶ Whether to map alignments to by (iterated) index position, or by using the per sentence index
FeatureToken
attributei_sent
. Set this toFalse
if the theFeatureDocumentParser
was configured with a token normalizer configured with embedding named entities turned off.
zensols.amr.domain module¶
Error and exception classes.
- exception zensols.amr.domain.AmrError(msg, sent=None)[source]¶
Bases:
APIError
Raised for package API errors.
- __annotations__ = {}¶
- __module__ = 'zensols.amr.domain'¶
- to_failure()[source]¶
Create an
AmrFailure
from this error.- Return type:
- class zensols.amr.domain.AmrFailure(exception=None, thrower=None, traceback=None, message=None, sent=None)[source]¶
Bases:
Failure
A container class that describes AMR graph creation or handling error.
- __init__(exception=None, thrower=None, traceback=None, message=None, sent=None)¶
- sent: str = None¶
The natural language sentence that cased the error (usually parsing).
- class zensols.amr.domain.Feature(feat_id, value)[source]¶
Bases:
FeatureMarker
zensols.amr.dumper module¶
Plot AMR graphs.
- class zensols.amr.dumper.Dumper(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False)[source]¶
Bases:
Dictable
Plots and writes AMR content in human readable formats.
- __init__(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False)¶
- clean()[source]¶
Remove the output directory if it exists.
- Return type:
- Returns:
whether the output directory existed
- dump_doc(doc, target_name=None)[source]¶
Dump the contents of the document to a directory. This includes the plots and graph strings of all sentence. This also includes a
doc.txt
file that has the graph strings and their sentence index.- Parameters:
doc (
AmrDocument
) – the document to plot- Return type:
- Returns:
the paths to each file that was generated
- plot_doc(doc, target_name=None)[source]¶
Create a plot for each AMR sentence as a graph. The file is generated with graphviz in a temporary space, then moved to the target directory.
If the directory doesn’t exist, it is created.
- plot_sent(sent, target_name=None)[source]¶
Create a plot of the AMR graph visually. The file is generated with graphviz in a temporary space, then moved to the target path.
- Parameters:
target_name (
str
) – the file name added to :object:`target_dir`, or ifNone
, computed from the sentence text- Return type:
- Returns:
the path(s) where the file(s) were generated
- render(cont, target_name=None)[source]¶
Create a PDF for an AMR document or sentence as a graph. The file is generated with graphviz in a temporary space, then moved to the target directory.
- See:
- See:
- Return type:
- Returns:
the path(s) where the file(s) were generated
- class zensols.amr.dumper.GraphvizDumper(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False, extension='pdf', attribs=<factory>)[source]¶
Bases:
Dumper
Dumps plots created by graphviz using the
dot
program.- __init__(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False, extension='pdf', attribs=<factory>)¶
zensols.amr.model module¶
AMR parsing spaCy pipeline component and sentence generator.
- class zensols.amr.model.AmrGenerator[source]¶
Bases:
object
A callable that generates natural language text from an AMR graph.
- See:
__call__()
- __init__()¶
- abstract generate(doc)[source]¶
Generate a sentence from the AMR graph
doc
.- Parameters:
doc (
AmrDocument
) – the spaCy document used to generate the sentence- Return type:
- Returns:
a document with the generated sentences
- class zensols.amr.model.AmrParser(model='noop', add_missing_metadata=True)[source]¶
Bases:
ComponentInitializer
Parses natural language into AMR graphs. It has the ability to change out different installed models in the same Python session.
- __init__(model='noop', add_missing_metadata=True)¶
- classmethod add_metadata(amr_sent, sent, clobber=False)[source]¶
Add missing annotation metadata parsed from spaCy if missing, which happens in the case of using the T5 AMR model.
- Parameters:
amr_sent (
AmrSentence
) – the sentence to populatesent (
Span
) – the spacCy sentence used as the sourceclobber (
bool
) – whether or not to overwrite any existing metadata fields
- See:
- annotate_amr(doc)[source]¶
Add an
amr
attribute to the spaCy document.- Parameters:
doc (
Doc
) – the document to annotate
- static is_missing_metadata(amr_sent)[source]¶
Return whether
amr_sent
is missing annotated metadata. T5 model sentences only have thesnt
metadata entry.- Parameters:
amr_sent (
AmrSentence
) – the sentence to populate- See:
- Return type:
-
model:
str
= 'noop'¶ The
penman
AMR model to use when creatingAmrSentence
instances, which is one ofnoop
oramr
. The first does not modify the graph but the latter normalizes out inverse relationships such asARG*-of
.
zensols.amr.score module¶
Produces matching scores.
- class zensols.amr.score.AmrScoreParser(doc_parser, keep_keys=None)[source]¶
Bases:
object
Parses
AmrSentence
instances from thesnt
metadata text string from a human annotated AMR. It then returns an instance that is to later be scored byScoreMethod
such asSmatchScoreCalculator
.- __init__(doc_parser, keep_keys=None)¶
-
doc_parser:
FeatureDocumentParser
¶ The document parser used to generate the AMR. This should have sentence boundaries removed so only one
AmrSentence
is returned from the parse.
-
keep_keys:
Tuple
[str
] = None¶ The keys to keep/copy from the source
AmrSentence
.
- class zensols.amr.score.SmatchScoreCalculator(reverse_sents=False)[source]¶
Bases:
ScoreMethod
Computes the smatch scores of AMR sentences using the Smatch package.
Citation:
Shu Cai and Kevin Knight 2013. Smatch: an Evaluation Metric for Semantic Feature Structures. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 748–752, Sofia, Bulgaria. Association for Computational Linguistics.
- __init__(reverse_sents=False)¶
- document_smatch(gold, pred)[source]¶
Return the smatch score produced from the sentences as pairs from two documents.
- Return type:
- Returns:
a score with the precision, recall and F1
zensols.amr.sent module¶
AMR container classes that fit a document/sentence hierarchy.
- class zensols.amr.sent.AmrGeneratedSentence(text, clipped, amr)[source]¶
Bases:
Writable
A sentence generated by the graph-to-text model.
- __init__(text, clipped, amr)¶
-
amr:
AmrSentence
¶ The input sentence to the model used to predict
text
.
- class zensols.amr.sent.AmrSentence(data, model=None)[source]¶
Bases:
PersistableContainer
,Writable
Contains a sentence that contains an AMR graph and a Penman string version of the graph. Instances can be create with a Penman formatted string, an already parsed
Graph
or anAmrFailure
for any upstream issues.These kinds of issues result for situations where downstream APIs expect instances of this class, such as in bulk processing situations. When this happens, instance renders with an error message in the AMR metadata.
-
DEFAULT_MODEL:
ClassVar
[str
] = 'noop'¶ The default
penman
AMR model to use in the initializer, which is one ofnoop
oramr
. The first does not modify the graph but the latter normalizes out inverse relationships such asARG*-of
.
- __init__(data, model=None)[source]¶
Initialize based on the kind of data given.
- Parameters:
data (
Union
[str
,Graph
,AmrFailure
]) – either a Penman formatted string graph, an already parsed graph or anAmrFailure
for upstream issuesmodel (
str
) – the model to use for encoding and decoding
- get_data()[source]¶
Return the
graph
if it is parse, else return thegraph_string
.
- property graph_only: str¶
Like
graph_string
but without metadata
- property graph_single_line: str¶
Like
graph_only
but return as a single one line string.
- invalidate_graph_string(check_graph=True)[source]¶
To be called when the graph changes that should be propagated to
graph_string
.
- iter_aligns(include_types=False)[source]¶
Return an iterator of the alignments of the graph as a tuple. Each iteration is a tuple of triple, the list of alignment indexes, and a tuple of bools if the index is a role alignment.
- Parameters:
include_types – whether to include types, which is the third element in each tuple, else that element is
None
- property tokenized_text: str¶
This is useful when it is necessary to force white space tokenization to match the already tokenized metadata (‘tokens’ key). Examples include numbers followed by commas such as dates like
April 25 , 2008
.
-
DEFAULT_MODEL:
zensols.amr.serial module¶
A small serialization framework for AmrDocument
and
AmrSentence
and other AMR artifcats.
- class zensols.amr.serial.AmrSerializedFactory(includes)[source]¶
Bases:
Dictable
Creates instances of
Serialized
from instances ofAmrDocument
,AmrSentence
orAnnotatedAmrDocument
. These can then be used asDictable
instances, specifically with theasdict
andasjson
methods.- __init__(includes)¶
- create(instance)[source]¶
Create a serializer from
instance
(see class docs).- Parameters:
instance (
Union
[AmrSentence
,AmrDocument
]) – the instance to be serialized- Return type:
- Returns:
an object that can be serialized using
asdict
andasjson
method.
- class zensols.amr.serial.Include(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
Enum
Indicates what to include at each level.
- annotated_body = 9¶
- annotated_document_id = 7¶
- annotated_sections = 10¶
- annotated_summary = 8¶
- document_text = 1¶
- sentence_graph = 5¶
- sentence_id = 3¶
- sentence_metadata = 6¶
- sentence_text = 4¶
- sentences = 2¶
- class zensols.amr.serial.Serialized(includes)[source]¶
Bases:
Dictable
A base strategy class that can serialize
AmrDocument
andAmrSentence
and other AMR artifcats.- __init__(includes)¶
- class zensols.amr.serial.SerializedAmrDocument(includes, document)[source]¶
Bases:
Serialized
Serializes instance of
AmrDocument
.- __init__(includes, document)¶
-
document:
AmrDocument
¶ The document to serialize.
- class zensols.amr.serial.SerializedAmrSentence(includes, sentence)[source]¶
Bases:
Serialized
Serializes instance of
AmrSentence
.- __init__(includes, sentence)¶
-
sentence:
AmrSentence
¶ The sentence to serialize.
- class zensols.amr.serial.SerializedAnnotatedAmrDocument(includes, document)[source]¶
Bases:
SerializedAmrDocument
Serializes instance of
AnnotatedAmrDocument
.- __init__(includes, document)¶
zensols.amr.spacyadapt module¶
A set of adaptor classes from zensols.nlp.FeatureToken
to
spacy.tokens.Doc
.
zensols.amr.trainer module¶
Continues training on an AMR model.
- class zensols.amr.trainer.HFTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]¶
Bases:
Trainer
Interface in to the
amrlib
package’s HuggingFace model trainers.- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))¶
- class zensols.amr.trainer.SpringTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), train_files=None, dev_files=None)[source]¶
Bases:
Trainer
SPRING model trainer.
Citation:
Michele Bevilacqua et al. 2021. One SPRING to Rule Them Both: Symmetric AMR Semantic Parsing and Generation without a Complex Pipeline. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12564–12573, Virtual, May.
- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), train_files=None, dev_files=None)¶
- class zensols.amr.trainer.T5Trainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]¶
Bases:
XfmTrainer
T5 model trainer.
Citation:
Colin Raffel et al. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):140:5485-140:5551, January.
- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))¶
- class zensols.amr.trainer.T5WithTenseGeneratorTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), nltk_lib_dir=None, annotate_dir=None, annotate_model='en_core_web_sm')[source]¶
Bases:
XfmTrainer
- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), nltk_lib_dir=None, annotate_dir=None, annotate_model='en_core_web_sm')¶
- class zensols.amr.trainer.Trainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]¶
Bases:
Dictable
Interface in to the
amrlib
package’s trainers- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))¶
-
corpus_prep_manager:
CorpusPrepperManager
¶ Aggregates and applies corpus prepare instances.
-
model_installer:
Installer
= None¶ The installer for the model used to train the model previously (i.e. by
amrlib
).
-
model_name:
str
¶ Some human readable string identifying the model, and ends up in the
amrlib_meta.json
.
- property pretrained_path_or_model: str | Path¶
The path to the checkpoint file or the string
scratch
if starting from scratch.
- train(dry_run=False)[source]¶
Train the model (see class docs).
- Parameters:
dry_run (
bool
) – whenTrue
, don’t do anything, just act like it.
- property training_config: Dict[str, Any]¶
The parameters given to the instance of the trainer, which is the class derived with
trainer_class
.
- property training_config_file: Path¶
The path to the JSON configuration file in the
amrlib
repo in such asamrlib/configs/model_parse_*.json
. IfNone
, then try to find the configuration file genereted by the last pretrained model.
-
training_config_overrides:
Dict
[str
,Any
]¶ More configuration that overrides/clobbers from the contents found in
training_config_file
.
- class zensols.amr.trainer.XfmTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]¶
Bases:
HFTrainer
Trainer for XFM and T5 models.
- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))¶
zensols.amr.tree module¶
Penman graph utilities and algorithms. These classes are currently only used for debugging and do not have any significant bearing on the overall package.
Bases:
object
Finds nodes using indexed paths.
Get a triple representing a graph node/edge of the given path.
The graph that will be populated with alignments.
Whether or not to strip alignment tags from the output. This is set to
True
inget_missing_alignments()
for test cases. However, it’s might also useful forget_node()
.
- class zensols.amr.tree.TreePruner(graph, keep_root_meta=True)[source]¶
Bases:
object
Create a subgraph using a tuple found in the graph configured (:mth:`penman.configure`) as a tree
- __init__(graph, keep_root_meta=True)¶
- create_sub(query)[source]¶
Create a subgraph using a tuple found in the graph configured (:mth:`penman.configure`) as a tree. Everything starting at
query
and down is included in the resulting graph.
-
keep_root_meta:
bool
= True¶ Whether to keep the original metadata when the query is the root. When this is
True
, the originalgraph
is returned fromcreate_sub()
when the itsquery
parameter is the root ofgraph
.
zensols.amr.varidx module¶
A utility class to reindex variables in an :class`.AmrDocument`.
- class zensols.amr.varidx.VariableIndexer[source]¶
Bases:
object
This reentrant class reindexes all variables for sentences of a
AmrDocument
so all node variables are unique. This is done by:Index concepts by the first character of their name (i.e.
s
forsee-01
) across sentences.Compile a list of variable replacements (i.e.
s2
->s5
) on a per sentence basis.Replace variable names based on their document level index order (i.e.
s
,s2
, etc). This is done for all concepts, edges, roles, and the epigraph. A new graph is created for those that have at least one modification, otherwise the original sentence is kept.
- __init__()¶
- reindex(sents)[source]¶
Reindex and repalce variables in
sents
. Any modified graphs are updated in thesents
instances.- Parameters:
sents (
Sequence
[AmrSentence
]) – sentences whose variables will be reindexed