zensols.amr package¶
Subpackages¶
- zensols.amr.wlk package
- Submodules
- zensols.amr.wlk.amr_similarity module
- zensols.amr.wlk.graph_helpers module
- zensols.amr.wlk.score module
- Module contents
Submodules¶
zensols.amr.align module¶
Add alignments to AMR sentences. To use the Fast Aligner (aka FAA aligner)
set the enviornment variable FABIN_DIR to the directory to where it is
installed.
- class zensols.amr.align.AmrAlignmentPopulator(aligner, add_missing_metadata=True, raise_exception=True)[source]¶
Bases:
objectAdds alignment markers to AMR graphs.
- __init__(aligner, add_missing_metadata=True, raise_exception=True)¶
- zensols.amr.align.create_amr_align_component(nlp, name, aligner)[source]¶
Create an instance of
AmrAlignmentPopulator.
zensols.amr.alignpop module¶
Includes classes to add alginments to AMR graphs using an ISI formatted alignment string.
- class zensols.amr.alignpop.AlignmentPopulator(graph, alignment_key='alignments')[source]¶
Bases:
objectAdds alignments from an ISI formatted string.
- __init__(graph, alignment_key='alignments')¶
-
alignment_key:
str= 'alignments'¶ The key in the graph’s metadata with the ISI formatted alignment string.
- class zensols.amr.alignpop.PathAlignment(index, path, alignment_str, alignment, triple)[source]¶
Bases:
objectAn alignment that contains the path and alignment to node, or an edge for role alignments.
- __init__(index, path, alignment_str, alignment, triple)¶
-
alignment:
Union[Alignment,RoleAlignment]¶ The alignment of the node or edge.
zensols.amr.amrlib module¶
AMR parser and generator model implementations using amrlib.
- class zensols.amr.amrlib.AmrlibGenerator(name=None, installer=None, alternate_path=None, use_tense=True)[source]¶
Bases:
_AmrlibModelContainer,AmrGenerator- __init__(name=None, installer=None, alternate_path=None, use_tense=True)¶
- generate(doc)[source]¶
Generate a sentence from a spaCy document.
- Parameters:
doc (
AmrDocument) – the spaCy document used to generate the sentence- Return type:
- Returns:
a text sentence for each respective sentence in
doc
zensols.amr.annotate module¶
AMR annotated corpus utility classes.
- class zensols.amr.annotate.AnnotatedAmrDocument(sents, path=None, doc_id=None)[source]¶
Bases:
AmrDocumentAn AMR document containing a unique document identifier from the corpus.
- __init__(sents, path=None, doc_id=None)¶
Initialize.
- Parameters:
sents (Tuple[AmrSentence, …]) – the document’s sentences
path (Optional[Path, …]) – the path to file containing the Penman notation sentence graphs used in
sentsmodel – the model to initailize
AmrSentencewhensentsis a list of string Penman graphs
- property body: AmrDocument¶
The sentences that make up the body of the document.
- static get_feature_sentences(feature_doc, amr_docs)[source]¶
Return the feature sentences of those that refer to the AMR sentences, but starting from the AMR side.
- Parameters:
feature_doc (
AmrFeatureDocument) – the document having theFeatureSentenceinstancesamr_docs (
Iterable[AmrDocument]) – the documents having the sentences, such assummary
- Return type:
- property sections: Tuple[AnnotatedAmrSectionDocument]¶
The sections of the document.
- property summary: AmrDocument¶
The sentences that make up the summary of the document.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_summary=True, include_sections=True, include_body=False, include_amr=True, **kwargs)[source]¶
Write the contents of this instance to
writerusing indentiondepth.- Parameters:
include_summary (
bool) – whether to include the summary sentencesinclude_sectional – whether to include the sectional sentences
include_body (
bool) – whether to include the body sentencesinclude_amr (
bool) – whether to include the super class AMR outputkwargs – arguments given to the super classe’s write, such as
limit_sent=0to effectively disable it
- class zensols.amr.annotate.AnnotatedAmrDocumentStash(installer, doc_dir, corpus_cache_dir, id_name, id_regexp=re.compile('([^.]+)\\\\.(\\\\d+)'), sent_type_col='snt-type', sent_type_mapping=None, doc_parser=None, amr_sent_model=None, amr_sent_class=<class 'zensols.amr.annotate.AnnotatedAmrSentence'>, amr_doc_class=<class 'zensols.amr.annotate.AnnotatedAmrDocument'>, doc_annotator=None)[source]¶
Bases:
PrimeableStashA factory stash that creates
AnnotatedAmrDocumentinstances of annotated documents from a single text file containing a corpus of AMR Penman formatted graphs.- __init__(installer, doc_dir, corpus_cache_dir, id_name, id_regexp=re.compile('([^.]+)\\\\.(\\\\d+)'), sent_type_col='snt-type', sent_type_mapping=None, doc_parser=None, amr_sent_model=None, amr_sent_class=<class 'zensols.amr.annotate.AnnotatedAmrSentence'>, amr_doc_class=<class 'zensols.amr.annotate.AnnotatedAmrDocument'>, doc_annotator=None)¶
- amr_doc_class¶
The class used to create new instances of
AmrDocument.alias of
AnnotatedAmrDocument
- amr_sent_class¶
The class used to create new instances of
AmrSentence.alias of
AnnotatedAmrSentence
-
amr_sent_model:
str= None¶ The model set in the
AmrSentenceinitializer.
- property corpus_df: DataFrame¶
A data frame containing the identifier, text of the sentences and the annotated sentence types of the corpus.
- property corpus_doc: AmrDocument¶
A document containing all the sentences from the corpus.
- delete(name=None)[source]¶
Delete the resource for data pointed to by
nameor the entire resource ifnameis not given.
-
doc_annotator:
AnnotationFeatureDocumentParser= None¶ Used to annotated AMR documents if not
None.
-
doc_dir:
Path¶ The directory containing sentence type mapping for documents or
Noneif there are no sentence type alignments.
-
doc_parser:
FeatureDocumentParser= None¶ If provided, AMR metadata is added to sentences, which is needed by the AMR populator.
- export_sent_type_template(doc_id, out_path=None)[source]¶
Create a CSV file that contains the sentences and other metadata of an annotated document used to annotated sentence types.
-
id_regexp:
Pattern= re.compile('([^.]+)\\.(\\d+)')¶ The regular expression used to create the
id_nameif it exists. The regular expression must have with two groups: the first the ID and the second is the sentence index.
-
installer:
Installer¶ The installer containing the AMR annotated corpus.
-
sent_type_mapping:
Dict[str,str] = None¶ Used to map what’s in the corpus to a value of
SentenceTypeif given.
- class zensols.amr.annotate.AnnotatedAmrFeatureDocumentFactory(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0)[source]¶
Bases:
objectCreates instances of
AmrFeatureDocumenteach withAmrFeatureDocument.amrinstance ofAnnotatedAmrDocumentandAmrFeatureSentence.amrwithAnnotatedAmrSentence. This is created using a JSON file or a list ofdict.The keys of each dictionary are the case-insensitive enumeration values of
SentenceType. Keysidandcommentare the unique document identifier and a comment that is added to the AMR sentence metadata. Both are optional, and ifidis missing, :obj:doc_id.An example JSON creates a document with ID
ex1, acommentmetadata, oneSentenceType.SUMMARYand twoSentenceType.BODYsentences:[{ "id": "ex1", "comment": "very short", "body": "The man ran to make the train. He just missed it.", "summary": "A man got caught in the door of a train he just missed." }]
- See:
- __init__(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0)¶
-
doc_id:
int= 0¶ An instance based enumerated value is used, which is enumerated for each document missing an ID.
-
doc_parser:
FeatureDocumentParser¶ The feature document parser used to create the Penman formatted graphs.
- from_data(data)[source]¶
Create AMR documents based on the type of
data.
- from_dicts(data)[source]¶
Parse and create an AMR documents from a list of
dict.- Parameters:
- See:
- Return type:
- from_file(input_file)[source]¶
Read annotated documents from a file and create AMR documents.
- Parameters:
input_file (
Path) – the JSON file to read the doc text- Return type:
- from_str(sents, stype)[source]¶
Parse and create AMR sentences from a string.
- Parameters:
sents (
str) – the string containing a space separated list of sentencesstype (
SentenceType) – the sentence type assigned to each new AMR sentence
- Return type:
-
remove_alignments:
bool= False¶ Whether to remove text-to-graph alignments in all sentence graphs after parsing.
-
remove_wiki_attribs:
bool= False¶ Whether to remove the
:wikiroles from all sentence graphs after parsing.
- to_annotated_doc(doc)[source]¶
Clone
doc.amrinto anAnnotatedAmrDocument.- Parameters:
sent – the document to convert to an
AnnotatedAmrDocument- Return type:
- Returns:
a feature document with a new
amrto newAnnotatedAmrDocument, which is a new instance ifsentisn’t an annotated AMR document
- to_annotated_sent(sent, sent_type=None)[source]¶
Clone
sent.amrinto anAnnotatedAmrSentence.- Parameters:
sent (
AmrFeatureSentence) – the sentence to convert to anAnnotatedAmrSentencesent_type (
SentenceType) – the type of sentence to set on
- Return type:
- Returns:
a feature sentence with a new
amrto newAnnotatedAmrSentence, which is a new instance ifsentisn’t an annotated AMR sentence
- class zensols.amr.annotate.AnnotatedAmrFeatureDocumentStash(feature_doc_factory, doc_stash, amr_stash, coref_resolver=None)[source]¶
Bases:
PrimeableStashA stash that persists
AmrFeatureDocumentinstances using AMR annotates fromAnnotatedAmrDocumentStashas a source. The key set and exists behavior is identical between to two stashes. However, the instances ofAmrFeatureDocument(and its constituent sentences) are generated from the AMR annotated sentences (i.e. from the::sntmetadata field).This stash keeps the persistance of the
AmrDocumentseparate from instance of the feature document to avoid persisting it twice acrossdoc_stashandamr_stash. On load, these two data structures are stitched together.- __init__(feature_doc_factory, doc_stash, amr_stash, coref_resolver=None)¶
-
amr_stash:
AnnotatedAmrDocumentStash¶ The stash used to persist
AmrDocumentinstances that are stitched together with theAmrFeatureDocument(see class docs).
- clear()[source]¶
Delete all data from the from the stash.
Important: Exercise caution with this method, of course.
-
coref_resolver:
CoreferenceResolver= None¶ Adds coreferences between the sentences of the document.
- delete(name=None)[source]¶
Delete the resource for data pointed to by
nameor the entire resource ifnameis not given.
-
doc_stash:
Stash¶ The stash used to persist instances of
AmrFeatureDocument. It does not persis theAmrDocument(see class docs).
- exists(doc_id)[source]¶
Return
Trueif data with keynameexists.Implementation note: This
Stash.exists()method is very inefficient and should be overriden.- Return type:
-
feature_doc_factory:
AmrFeatureDocumentFactory¶ Creates
AmrFeatureDocumentfromAmrDocumentinstances.
- class zensols.amr.annotate.AnnotatedAmrSectionDocument(sents, path=None, section_sents=())[source]¶
Bases:
AmrDocumentRepresents a section from an annotated document.
- __init__(sents, path=None, section_sents=())¶
Initialize.
- Parameters:
sents (Tuple[AmrSentence, …]) – the document’s sentences
path (Optional[Path, …]) – the path to file containing the Penman notation sentence graphs used in
sentsmodel – the model to initailize
AmrSentencewhensentsis a list of string Penman graphs
-
section_sents:
Tuple[AmrSentence] = ()¶ The sentences that make up the section title (usually just one).
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
- Parameters:
limit_sent – the max number of sentences to write
add_sent_id – add the sentence ID to the output
include_metadata – whether to add graph metadata to the output
text_ony – whether to only write the sentence text rather than the AMR Penman notation
- class zensols.amr.annotate.AnnotatedAmrSentence(data, model, doc_sent_idx, sent_type)[source]¶
Bases:
AmrSentenceA sentence containing its index in the document and the funtional type.
- __init__(data, model, doc_sent_idx, sent_type)[source]¶
Initialize based on the kind of data given.
- Parameters:
data (
Union[str,Graph]) – either a Penman formatted string graph, an already parsed graph or anAmrFailurefor upstream issuesmodel (
str) – the model to use for encoding and decoding
- class zensols.amr.annotate.CorpusWriter(anon_doc_factory)[source]¶
Bases:
WritableWrites
AmrDocumentinstances to a file. To use, first add documents either directly withdocsor using theadd().- __init__(anon_doc_factory)¶
- add(data)[source]¶
Add document(s) to this corpus writer. This uses the
AnnotatedAmrFeatureDocumentFactory.from_data()and adds the instances ofAmrFeatureDocument.
-
anon_doc_factory:
AnnotatedAmrFeatureDocumentFactory¶ The factory used to create the
AmrFeatureDocumentinstances that are in turn used to format that graphs as Penman text output.
- property docs: List[AmrDocument]¶
The document to write.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of the documents added to this writer to
writeras flat formatted Penman AMRs.- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.amr.annotate.FileCorpusWriter(anon_doc_factory, input_file, output_file)[source]¶
Bases:
CorpusWriterA corpus writer that parses a JSON file for its source input, then uses a the configured AMR parser to generate the graphs.
- __init__(anon_doc_factory, input_file, output_file)¶
-
input_file:
Path¶ The JSON file as formatted per
AnnotatedAmrFeatureDocumentFactory.
- class zensols.amr.annotate.SentenceType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
EnumThe type of sentence in relation to its function in the document.
- BODY = 'b'¶
- FIGURE = 'f'¶
- FIGURE_TITLE = 'ft'¶
- OTHER = 'o'¶
- SECTION = 's'¶
- SUMMARY = 'a'¶
- TITLE = 't'¶
zensols.amr.app module¶
Adapts amrlib in the Zensols framework.
- class zensols.amr.app.Application(log_config, config_factory, doc_parser, anon_doc_stash, dumper)[source]¶
Bases:
BaseApplicationParse and plot AMR graphs in Penman notation.
- __init__(log_config, config_factory, doc_parser, anon_doc_stash, dumper)¶
- anon_doc_stash: Stash¶
The annotated document stash.
- config_factory: ConfigFactory¶
Application context used by programmatic clients of this class.
- count(input_file)[source]¶
Provide counts on an AMR corpus file.
- Parameters:
input_file (
Path) – a file with newline separated AMR Penman graphs
- doc_parser: FeatureDocumentParser¶
The feature document parser for the app. This is not done via the application config to allow overriding of the defaults.
- dumper: Dumper¶
Plots and writes AMR content in human readable formats.
- parse(text)[source]¶
Parse the natural language text to an AMR graphs.
- Parameters:
text (
str) – the sentence(s) to parse
- class zensols.amr.app.BaseApplication(log_config)[source]¶
Bases:
objectBase class for applications.
- __init__(log_config)¶
-
log_config:
LogConfigurator¶ Used to update logging levels based on the ran action.
- class zensols.amr.app.Format(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
EnumFormat output type for AMR corpous documents.
- csv = 3¶
- json = 2¶
- txt = 1¶
- class zensols.amr.app.ScorerApplication(log_config, config_factory, doc_factory)[source]¶
Bases:
BaseApplicationCreates parsed files for comparing, and scores.
- __init__(log_config, config_factory, doc_factory)¶
- config_factory: ConfigFactory¶
Application context.
- doc_factory: AmrFeatureDocumentFactory¶
Creates
AmrFeatureDocumentfromAmrDocumentinstances.
- parse_penman(input_file, output_dir=None, meta_keys='id,snt', limit=None)[source]¶
Parse Penman sentence(s) by
idand write a parsed AMR.
- score(input_gold, input_parsed=None, output_dir=None, output_format=Format.csv, limit=None, methods=None)[source]¶
Score AMRs by ID and dump the results to a file or directory.
- Parameters:
input_gold (Path) – the file containing the gold AMR graphs
input_parsed (Path) – the file containing the parser output graphs, defaults to
gold-parsed.txtoutput_dir (Path) – the output directory
output_format (Format) – the output format
limit (int) – the max of items to process
methods (str) – a comma separated list of scoring methods
- Return type:
ScoreSet
- class zensols.amr.app.TrainerApplication(log_config, config_factory)[source]¶
Bases:
BaseApplicationTrains and evaluates models.
- __init__(log_config, config_factory)¶
-
config_factory:
ConfigFactory¶ Application context.
- train(dry_run=False)[source]¶
Continue fine tuning on additional corpora.
- Parameters:
dry_run (
bool) – don’t do anything; just act like it
zensols.amr.cli module¶
Command line entry point to the application.
- class zensols.amr.cli.ApplicationFactory(*args, **kwargs)[source]¶
Bases:
ApplicationFactory
zensols.amr.container module¶
Extensions of zensols.nlp feature containers.
- class zensols.amr.container.AmrFeatureDocument(sents, text=None, spacy_doc=None, amr=None, coreference_relations=None)[source]¶
Bases:
FeatureDocumentA feature document that contains an
amrgraph.- __init__(sents, text=None, spacy_doc=None, amr=None, coreference_relations=None)¶
- add_coreferences(to_populate)[source]¶
Add
coreference_relationstoto_populateusing this instance’s coreferences. Note thatfrom_sentences(),from_amr_sentences(),get_overlapping_document()and meth:clone already do this.
- property amr: AmrDocument¶
The AMR representation of the document.
- clone(cls=None, **kwargs)[source]¶
- Parameters:
kwargs – if copy_spacy is
True, the spacy document is copied to the clone in addition parameters passed to new clone initializer- Return type:
- property coreference_relations: Tuple[Tuple[Tuple[int, str], ...], ...]¶
The coreferences tuple sets between the sentences of the document:
((<sentence index 1>, <variable 1>), (<sentence index 2>, <variable 2>)...)
- from_amr_sentences(amr_sents)[source]¶
Like
from_sentences(), return a new document withFeatureDocumentsentences sync’d withAmrSentence.- Parameters:
amr_sents (
Iterable[AmrSentence]) – the sentences that will make up the returned document- Return type:
- Returns:
a new document composed of
amr_sent- See:
- from_sentences(sents, deep=False)[source]¶
Return a new cloned document using the given sentences.
- Parameters:
sents (
Iterable[FeatureSentence]) – the sentences to add to the new cloned documentdeep (
bool) – whether or not to clone the sentences
- See:
- See:
- Return type:
- get_overlapping_span(span, inclusive=True)[source]¶
Return a feature span that includes the lexical scope of
span.- Return type:
- property relation_set: RelationSet¶
The relations in the contained document as a set of relations.
- sync_amr_sents()[source]¶
Copy
amrsentences to each respectiveAmrFeatureSentence.amr. This is necessary when thenAmrDocumentis updated with new sentences that need to percolate down to the feature sentences.
- class zensols.amr.container.AmrFeatureSentence(tokens, text=None, spacy_span=None, amr=None)[source]¶
Bases:
FeatureSentenceA sentence that holds an instance of
AmrSentence.- __init__(tokens, text=None, spacy_span=None, amr=None)¶
- property alignments: Dict[Tuple[str, str, str], Tuple[FeatureToken, ...]]¶
The tokens only returnd from
indexed_alignments.
- property amr: AmrSentence¶
The AMR representation of the sentence.
- clone(cls=None, **kwargs)[source]¶
Clone an instance of this token container.
- Parameters:
cls (
Type) – the type of the new instancekwargs – arguments to add to as attributes to the clone
- Return type:
- Returns:
the cloned instance of this instance
- property indexed_alignments: Dict[Tuple[str, str, str], Tuple[Tuple[int, FeatureToken]], ...]¶
The graph alignments as a triple-to-token dict. The values are tuples 0-index token offset and the feature token pointed to by the alignment.
- class zensols.amr.container.Reference(sent, variable)[source]¶
Bases:
ReferenceObjectA multi-document coreference target, which points to a node in an AMR graph.
- __init__(sent, variable)¶
-
sent:
AmrFeatureSentence¶ The sentence containing the reference.
- property short¶
A short string describing the reference.
- property subtree: AmrSentence¶
The subtree of the sentence containing the target as an
AmrFeatureSentence.
- property triple: Tuple[str, str, str]¶
The AMR tripple of
(source relation target)of the reference.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.amr.container.ReferenceObject[source]¶
Bases:
PersistableContainer,DictableA base class reference and relation classes.
- __init__()¶
- class zensols.amr.container.ReindexVariableFeatureDocumentDecorator[source]¶
Bases:
FeatureDocumentDecoratorReindex AMR concept variables to be unique across all sentences.
- __init__()¶
- class zensols.amr.container.Relation(seq_id, references)[source]¶
Bases:
ReferenceObjectA relation makes up a set of references across multuiple sentences of a document. This is what Christopher Manning calls a cluster.
- __init__(seq_id, references)¶
- property by_sent: Dict[AmrFeatureSentence, Reference]¶
An association from sentences to their references.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.amr.container.RelationSet(relations)[source]¶
Bases:
ReferenceObjectAll coreference relations for a given document.
- __init__(relations)¶
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
zensols.amr.coref module¶
Wrap the amr_coref module for AMR Co-refernce resolution.
- class zensols.amr.coref.CoreferenceResolver(installer, stash=<factory>, use_multithreading=True, robust=True, hasher=<factory>)[source]¶
Bases:
objectResolve coreferences in AMR graphs.
- __init__(installer, stash=<factory>, use_multithreading=True, robust=True, hasher=<factory>)¶
-
installer:
Installer¶ The
amr_corefmodule’s coreference module installer.
- property model: Inference¶
The
amr_corefcoreference model.
-
robust:
bool= True¶ Whether to robustly deal with exceptions in the coreference model. If
True, instances ofAmrFailureare stored in the stash and empty coreferences used for caught errors.
zensols.amr.corpprep module¶
Prepare and compile AMR corpora for training.
- class zensols.amr.corpprep.AmrReleaseCorpusPrepper(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)[source]¶
Bases:
CorpusPrepperWrites the AMR 3 release corpus files.
- __init__(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)¶
- class zensols.amr.corpprep.CorpusPrepper(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)[source]¶
Bases:
DictableSubclasses know where to download, install, and split the corpus in to train and dev data sets. Each subclass generates only the training and dev/validation datasets, which is an aspect of AMR parser and text generation models. Both the input and outupt are Penman encoded AMR graphs.
- __init__(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)¶
-
installer:
Installer¶ The location and decompression details.
- abstract read_docs(target)[source]¶
Read and return tuples of where to write the output of the sentences of the corresponding document.
- Parameters:
target (
Path) – the location of where to copy the finished files- Return type:
- Returns:
tuples of the dataset name and the read document
-
remove_wiki:
bool= True¶ Whether to remove
:wikirelations, which are not predicted by the model and negatively effect validation performance set while training.
- class zensols.amr.corpprep.CorpusPrepperManager(name, preppers, stage_dir, shuffle=True, key_splits=None)[source]¶
Bases:
DictableAggregates and applies corpus prepare instances.
- __init__(name, preppers, stage_dir, shuffle=True, key_splits=None)¶
-
key_splits:
Path= None¶ The AMR
idkeys from the sentence metadatas for each split are written to this JSON file if specified.
- prepare()[source]¶
Download, install and write the corpus to disk from all
preppers. The output of each is placed in the correspondingtrainingordevdirectories instage_dir. The data is then ready for AMR parser and generator trainers.
-
preppers:
Tuple[CorpusPrepper,...]¶ The corpus prepare instances used to create the training files.
-
shuffle:
bool= True¶ Whether to shuffle the AMR sentences before writing to the target directory. This is used the shuffle across each corpora per split.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.amr.corpprep.SingletonCorpusPrepper(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False, dev_portion=0.15)[source]¶
Bases:
CorpusPrepperPrepares the corpus training files from a single AMR Penman encoded file.
- __init__(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False, dev_portion=0.15)¶
-
dev_portion:
float= 0.15¶ The portion of the dev/validation set in sentences of the single input file.
zensols.amr.doc module¶
AMR container classes that fit a document/sentence hierarchy.
- class zensols.amr.doc.AmrDocument(sents, path=None, model=None)[source]¶
Bases:
PersistableContainer,WritableA document of AMR graphs, which is indexible and iterable.
- __init__(sents, path=None, model=None)[source]¶
Initialize.
- Parameters:
sents (
Iterable[Union[str,Graph,AmrSentence]]) – the document’s sentencespath (
Optional[Path]) – the path to file containing the Penman notation sentence graphs used insentsmodel (
str) – the model to initailizeAmrSentencewhensentsis a list of string Penman graphs
- from_sentences(sents, deep=False)[source]¶
Return a new cloned document using the given sentences.
- Parameters:
sents (
Iterable[AmrSentence]) – the sentences to add to the new cloned documentdeep (
bool) – whether or not to clone the sentences
- See:
- Return type:
- classmethod from_source(source, transform_ascii=False, **kwargs)[source]¶
Return a new document created for
source.- Parameters:
source (
Union[Path,Installer]) – either a double newline list of AMR graphs or an installer that has a singleton path to a like filetransform_ascii (
bool) – whether to replace non-ASCII characters to their ASCII equivalents (i.e. removes umlauts)kwargs – additional keyword arguments given to the initializer of the document
- Return type:
- get_doc_id()[source]¶
Get the ID of the document from the first sentence’s ID, if there is one. For example, if the first sentence’s ID is
liu-example.0, the stringliu-exampleis returned.
- property graph_string: str¶
The graph of all sentences with two newlines as a separator as a string in Penman format.
- path: Optional[Path, ...] = None¶
If set, the file the sentences were parsed from in Penman notation.
- reindex_variables()[source]¶
Reindexes all variables for sentences of a
AmrDocumentso all node variables are unique in the document.
- sents: Tuple[AmrSentence, ...]¶
The AMR sentences that make up the document.
- property text: str¶
The text of the natural language form of the document. This is the concatenation of all the sentinel text.
- class zensols.amr.doc.AmrGeneratedDocument(sents, amr)[source]¶
Bases:
WritableA sentence generated by the graph-to-text model.
- __init__(sents, amr)¶
zensols.amr.docfac module¶
Feature sentence and document utilities.
- class zensols.amr.docfac.AmrFeatureDocumentFactory(name, doc_parser, alignment_populator=None)[source]¶
Bases:
objectCreates
AmrFeatureDocumentfromAmrDocumentinstances.- __init__(name, doc_parser, alignment_populator=None)¶
-
alignment_populator:
AmrAlignmentPopulator= None¶ Adds the alighment markings.
-
doc_parser:
FeatureDocumentParser¶ The document parser used to creates
AmrFeatureDocumentinstances.
- to_feature_doc(amr_doc, catch=False, add_metadata=False, add_alignment=False)[source]¶
Create a
AmrFeatureDocumentfrom a class:.AmrDocument by parsing thesntmetadata with aFeatureDocumentParser.- Parameters:
add_metadata (
Union[str,bool]) – add missing annotation metadata toamr_docparsed from spaCy if missing (seeAmrParser.add_metadata()) ifTrueand replace any previous metadata if this value is the stringclobbercatch (
bool) – ifTrue, return caught exceptions creating aAmrFailurefrom each and return them
- Return type:
Union[AmrFeatureDocument,Tuple[AmrFeatureDocument,List[AmrFailure]]]- Returns:
an AMR feature document if
catchisFalse; otherwise, a tuple of a document with sentences that were successfully parsed and a list any exceptions raised during the parsing
- class zensols.amr.docfac.EntityCopySpacyFeatureDocumentParser(config_factory, name, lang='en', model_name=None, token_feature_ids=<factory>, components=(), token_decorators=(), sentence_decorators=(), document_decorators=(), disable_component_names=None, token_normalizer=None, special_case_tokens=<factory>, doc_class=<class 'zensols.nlp.container.FeatureDocument'>, sent_class=<class 'zensols.nlp.container.FeatureSentence'>, token_class=<class 'zensols.nlp.tok.SpacyFeatureToken'>, remove_empty_sentences=None, reload_components=False, auto_install_model=False, package_manager=<factory>)[source]¶
Bases:
SpacyFeatureDocumentParserCopy spaCy
ent_type_named entity (NER) tags toFeatureTokenent_tags.The AMR document’s metadata
ner_tagsis populated inAmrParserfrom the spaCy document. But this document parser instance is configured with embedded entities turned off so whitespace delimited tokens match with the alignments.- __init__(config_factory, name, lang='en', model_name=None, token_feature_ids=<factory>, components=(), token_decorators=(), sentence_decorators=(), document_decorators=(), disable_component_names=None, token_normalizer=None, special_case_tokens=<factory>, doc_class=<class 'zensols.nlp.container.FeatureDocument'>, sent_class=<class 'zensols.nlp.container.FeatureSentence'>, token_class=<class 'zensols.nlp.tok.SpacyFeatureToken'>, remove_empty_sentences=None, reload_components=False, auto_install_model=False, package_manager=<factory>)¶
zensols.amr.docparser module¶
AMR document annotation.
- exception zensols.amr.docparser.AmrParseError(msg, sent=None)[source]¶
Bases:
AmrError- __module__ = 'zensols.amr.docparser'¶
- class zensols.amr.docparser.AnnotationFeatureDocumentParser(name, delegate, token_decorators=(), sentence_decorators=(), document_decorators=(), token_feature_ids=<factory>, silencer=None, stash=None, hasher=<factory>, amr_parser=None, alignment_populator=None, coref_resolver=None, reparse=True, amr_doc_class=<class 'zensols.amr.container.AmrFeatureDocument'>, amr_sent_class=<class 'zensols.amr.container.AmrFeatureSentence'>)[source]¶
Bases:
CachingFeatureDocumentParserA document parser that adds and further annotates AMR graphs. This has the advantage of avoiding a second AMR construction when annotating a graph with features (i.e. ent, POS tag, etc) because it uses (adapted) spaCy class’s normalized features. For this reason, use this class if your application needs such annotations.
This parses and popluates AMR graphs as
AmrDocumentat the document level and aAmrSentenceat the sentence level using azensols.nlp.FeatureDocument.This class will also recreate the AMR on normalized text of the document. This is necessary since AMR parsing and alignment happen at the spaCy level and token normalization happen at the
zensole.nlpfeature token level. Since spaCy does not allow for filter tokens (i.e. stop words) there is no way to avoid a reparse.However, if your application makes no modification to the document, a second reparse is not needed and you should set
reparseto False.A consideration is the adaptation spaCy module (
spacyadapt) is not thoroughly tested and future updates might break. If you do not feel comfortable using it, or can not, use the spacy pipline by settingamr_default:doc_parser = amr_anon_doc_parserin the application configuration and annotated the graph yourself.The AMR graphs are optionally cached using a
Stashwhenstashis set.Important: when using stash caching only the
AmrDocumentis cached and not the entire feature document. This could lead to the documents and AMR graphs getting out of sync if both are cached. Use theclear()method to clear the stash if ever in doubt.A new instance of
AmrFeatureDocumentare returned.- __init__(name, delegate, token_decorators=(), sentence_decorators=(), document_decorators=(), token_feature_ids=<factory>, silencer=None, stash=None, hasher=<factory>, amr_parser=None, alignment_populator=None, coref_resolver=None, reparse=True, amr_doc_class=<class 'zensols.amr.container.AmrFeatureDocument'>, amr_sent_class=<class 'zensols.amr.container.AmrFeatureSentence'>)¶
-
alignment_populator:
AmrAlignmentPopulator= None¶ Adds the alighment markings.
- amr_doc_class¶
The
FeatureDocumentclass created to storezensols.amr.AmrDocumentinstances.alias of
AmrFeatureDocument
- amr_sent_class¶
The
FeatureSentenceclass created to storezensols.amr.AmrSentenceinstances.alias of
AmrFeatureSentence
- annotate(doc)[source]¶
Parse, annotate and annotate a new AMR feature document using features from
doc. Since the AMR document itself is not cached, using a separate document cache is necessary for caching/storage.- Parameters:
doc (
FeatureDocument) – the source feature document to parse in to AMRskey – the key used to cache the
AmrDocument. instashif provided (see class docs)
- Return type:
-
coref_resolver:
CoreferenceResolver= None¶ Adds coreferences between the sentences of the document.
- parse(text, *args, **kwargs)[source]¶
Parse text or a text as a list of sentences.
- Parameters:
text (
str) – either a string or a list of strings; if the former a document with one sentence will be created, otherwise a document is returned with a sentence for each string in the listargs – the arguments used to create the FeatureDocument instance
kwargs – the key word arguments used to create the FeatureDocument instance
- Return type:
-
reparse:
bool= True¶ Reparse the normalized
FeatureSentencetext for each AMR sentence, which is necessary when tokens are remove (i.e. stop words). See the class docs.
- class zensols.amr.docparser.TokenAnnotationFeatureDocumentDecorator(name, feature_id, indexed=False, add_none=False, use_sent_index=True, method='attribute')[source]¶
Bases:
FeatureDocumentDecoratorAnnotate features in AMR sentence graphs from indexes annotated from
AmrAlignmentPopulator.- __init__(name, feature_id, indexed=False, add_none=False, use_sent_index=True, method='attribute')¶
-
add_none:
bool= False¶ Whether add missing or empty values. This includes string values of
zensols.nlp.FeatureToken.NONE.
-
method:
str= 'attribute'¶ Where to add the data, which may be one of:
attribute: add as a new attribute node usingnameas the role and the value as the attribute constantepi: as epigraph data; however, the current Penman implementation assume only alignments and the graph string will no longer be parsable
Otherwise, it uses the string to format a replacement node text using
targetas the previous/original node text andvalueas the feature value text.
-
name:
str¶ The triple role (if
add_to_epiisFalse) used to label the edge between the token and the feature. Otherwise, this string is used in the epidata of the graph.
-
use_sent_index:
bool= True¶ Whether to map alignments to by (iterated) index position, or by using the per sentence index
FeatureTokenattributei_sent. Set this toFalseif the theFeatureDocumentParserwas configured with a token normalizer configured with embedding named entities turned off.
zensols.amr.domain module¶
Error and exception classes.
- exception zensols.amr.domain.AmrError(msg, sent=None)[source]¶
Bases:
APIErrorRaised for package API errors.
- __annotations__ = {}¶
- __module__ = 'zensols.amr.domain'¶
- to_failure()[source]¶
Create an
AmrFailurefrom this error.- Return type:
- class zensols.amr.domain.AmrFailure(exception=None, thrower=None, traceback=None, message=None, sent=None)[source]¶
Bases:
FailureA container class that describes AMR graph creation or handling error.
- __init__(exception=None, thrower=None, traceback=None, message=None, sent=None)¶
- sent: str = None¶
The natural language sentence that cased the error (usually parsing).
- class zensols.amr.domain.Feature(feat_id, value)[source]¶
Bases:
FeatureMarker
zensols.amr.dumper module¶
Plot AMR graphs.
- class zensols.amr.dumper.Dumper(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False)[source]¶
Bases:
DictablePlots and writes AMR content in human readable formats.
- __init__(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False)¶
- clean()[source]¶
Remove the output directory if it exists.
- Return type:
- Returns:
whether the output directory existed
- dump_doc(doc, target_name=None)[source]¶
Dump the contents of the document to a directory. This includes the plots and graph strings of all sentence. This also includes a
doc.txtfile that has the graph strings and their sentence index.- Parameters:
doc (
AmrDocument) – the document to plot- Return type:
- Returns:
the paths to each file that was generated
- plot_doc(doc, target_name=None)[source]¶
Create a plot for each AMR sentence as a graph. The file is generated with graphviz in a temporary space, then moved to the target directory.
If the directory doesn’t exist, it is created.
- plot_sent(sent, target_name=None)[source]¶
Create a plot of the AMR graph visually. The file is generated with graphviz in a temporary space, then moved to the target path.
- Parameters:
target_name (
str) – the file name added totarget_dir, or ifNone, computed from the sentence text- Return type:
- Returns:
the path(s) where the file(s) were generated
- render(cont, target_name=None)[source]¶
Create a PDF for an AMR document or sentence as a graph. The file is generated with graphviz in a temporary space, then moved to the target directory.
- See:
- See:
- Return type:
- Returns:
the path(s) where the file(s) were generated
- class zensols.amr.dumper.GraphvizDumper(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False, extension='pdf', attribs=<factory>)[source]¶
Bases:
DumperDumps plots created by graphviz using the
dotprogram.- __init__(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False, extension='pdf', attribs=<factory>)¶
zensols.amr.model module¶
AMR parsing spaCy pipeline component and sentence generator.
- class zensols.amr.model.AmrGenerator[source]¶
Bases:
objectA callable that generates natural language text from an AMR graph.
- See:
__call__()
- __init__()¶
- abstract generate(doc)[source]¶
Generate a sentence from the AMR graph
doc.- Parameters:
doc (
AmrDocument) – the spaCy document used to generate the sentence- Return type:
- Returns:
a document with the generated sentences
- class zensols.amr.model.AmrParser(model='noop', add_missing_metadata=True)[source]¶
Bases:
ComponentInitializerParses natural language into AMR graphs. It has the ability to change out different installed models in the same Python session.
- __init__(model='noop', add_missing_metadata=True)¶
- classmethod add_metadata(amr_sent, sent, clobber=False)[source]¶
Add missing annotation metadata parsed from spaCy if missing, which happens in the case of using the T5 AMR model.
- Parameters:
amr_sent (
AmrSentence) – the sentence to populatesent (
Span) – the spacCy sentence used as the sourceclobber (
bool) – whether or not to overwrite any existing metadata fields
- See:
- annotate_amr(doc)[source]¶
Add an
amrattribute to the spaCy document.- Parameters:
doc (
Doc) – the document to annotate
- static is_missing_metadata(amr_sent)[source]¶
Return whether
amr_sentis missing annotated metadata. T5 model sentences only have thesntmetadata entry.- Parameters:
amr_sent (
AmrSentence) – the sentence to populate- See:
- Return type:
-
model:
str= 'noop'¶ The
penmanAMR model to use when creatingAmrSentenceinstances, which is one ofnooporamr. The first does not modify the graph but the latter normalizes out inverse relationships such asARG*-of.
zensols.amr.score module¶
Produces matching scores.
- class zensols.amr.score.AmrScoreParser(doc_parser, keep_keys=None)[source]¶
Bases:
objectParses
AmrSentenceinstances from thesntmetadata text string from a human annotated AMR. It then returns an instance that is to later be scored byScoreMethodsuch asSmatchScoreCalculator.- __init__(doc_parser, keep_keys=None)¶
-
doc_parser:
FeatureDocumentParser¶ The document parser used to generate the AMR. This should have sentence boundaries removed so only one
AmrSentenceis returned from the parse.
-
keep_keys:
Tuple[str] = None¶ The keys to keep/copy from the source
AmrSentence.
- class zensols.amr.score.SmatchScoreCalculator(reverse_sents=False)[source]¶
Bases:
ScoreMethodComputes the smatch scores of AMR sentences using the Smatch package.
Citation:
Shu Cai and Kevin Knight 2013. Smatch: an Evaluation Metric for Semantic Feature Structures. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 748–752, Sofia, Bulgaria. Association for Computational Linguistics.
- __init__(reverse_sents=False)¶
- document_smatch(gold, pred)[source]¶
Return the smatch score produced from the sentences as pairs from two documents.
- Return type:
- Returns:
a score with the precision, recall and F1
zensols.amr.sent module¶
AMR container classes that fit a document/sentence hierarchy.
- class zensols.amr.sent.AmrGeneratedSentence(text, clipped, amr)[source]¶
Bases:
WritableA sentence generated by the graph-to-text model.
- __init__(text, clipped, amr)¶
-
amr:
AmrSentence¶ The input sentence to the model used to predict
text.
- class zensols.amr.sent.AmrSentence(data, model=None)[source]¶
Bases:
PersistableContainer,WritableContains a sentence that contains an AMR graph and a Penman string version of the graph. Instances can be create with a Penman formatted string, an already parsed
Graphor anAmrFailurefor any upstream issues.These kinds of issues result for situations where downstream APIs expect instances of this class, such as in bulk processing situations. When this happens, instance renders with an error message in the AMR metadata.
-
DEFAULT_MODEL:
ClassVar[str] = 'noop'¶ The default
penmanAMR model to use in the initializer, which is one ofnooporamr. The first does not modify the graph but the latter normalizes out inverse relationships such asARG*-of.
- __init__(data, model=None)[source]¶
Initialize based on the kind of data given.
- Parameters:
data (
Union[str,Graph,AmrFailure]) – either a Penman formatted string graph, an already parsed graph or anAmrFailurefor upstream issuesmodel (
str) – the model to use for encoding and decoding
- property failure: AmrFailure¶
The failure if
is_failureisTrue.
- property failure_reason: str¶
Get the reason for the parse failure or
Noneif this instanceis_failurereturnFalse.
- get_data()[source]¶
Return the
graphif it is parse, else return thegraph_string.
- property graph_only: str¶
Like
graph_stringbut without metadata
- property graph_single_line: str¶
Like
graph_onlybut return as a single one line string.
- invalidate_graph_string(check_graph=True)[source]¶
To be called when the graph changes that should be propagated to
graph_string.
- iter_aligns(include_types=False)[source]¶
Return an iterator of the alignments of the graph as a tuple. Each iteration is a tuple of triple, the list of alignment indexes, and a tuple of bools if the index is a role alignment.
- Parameters:
include_types – whether to include types, which is the third element in each tuple, else that element is
None
- property tokenized_text: str¶
This is useful when it is necessary to force white space tokenization to match the already tokenized metadata (‘tokens’ key). Examples include numbers followed by commas such as dates like
April 25 , 2008.
-
DEFAULT_MODEL:
zensols.amr.serial module¶
A small serialization framework for AmrDocument and
AmrSentence and other AMR artifcats.
- class zensols.amr.serial.AmrSerializedFactory(includes)[source]¶
Bases:
DictableCreates instances of
Serializedfrom instances ofAmrDocument,AmrSentenceorAnnotatedAmrDocument. These can then be used asDictableinstances, specifically with theasdict()andasjson()methods.- __init__(includes)¶
- create(instance)[source]¶
Create a serializer from
instance(see class docs).- Parameters:
instance (
Union[AmrSentence,AmrDocument]) – the instance to be serialized- Return type:
- Returns:
an object that can be serialized using
asdictandasjsonmethod.
- class zensols.amr.serial.Include(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
EnumIndicates what to include at each level.
- annotated_body = 9¶
- annotated_document_id = 7¶
- annotated_sections = 10¶
- annotated_summary = 8¶
- document_text = 1¶
- sentence_graph = 5¶
- sentence_id = 3¶
- sentence_metadata = 6¶
- sentence_text = 4¶
- sentences = 2¶
- class zensols.amr.serial.Serialized(includes)[source]¶
Bases:
DictableA base strategy class that can serialize
AmrDocumentandAmrSentenceand other AMR artifcats.- __init__(includes)¶
- class zensols.amr.serial.SerializedAmrDocument(includes, document)[source]¶
Bases:
SerializedSerializes instance of
AmrDocument.- __init__(includes, document)¶
-
document:
AmrDocument¶ The document to serialize.
- class zensols.amr.serial.SerializedAmrSentence(includes, sentence)[source]¶
Bases:
SerializedSerializes instance of
AmrSentence.- __init__(includes, sentence)¶
-
sentence:
AmrSentence¶ The sentence to serialize.
- class zensols.amr.serial.SerializedAnnotatedAmrDocument(includes, document)[source]¶
Bases:
SerializedAmrDocumentSerializes instance of
AnnotatedAmrDocument.- __init__(includes, document)¶
zensols.amr.spacyadapt module¶
A set of adaptor classes from zensols.nlp.FeatureToken to
spacy.tokens.Doc.
zensols.amr.trainer module¶
Continues training on an AMR model.
- class zensols.amr.trainer.HFTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]¶
Bases:
TrainerInterface in to the
amrlibpackage’s HuggingFace model trainers.- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))¶
- class zensols.amr.trainer.SpringTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), train_files=None, dev_files=None)[source]¶
Bases:
TrainerSPRING model trainer.
Citation:
Michele Bevilacqua et al. 2021. One SPRING to Rule Them Both: Symmetric AMR Semantic Parsing and Generation without a Complex Pipeline. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12564–12573, Virtual, May.
- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), train_files=None, dev_files=None)¶
- class zensols.amr.trainer.T5Trainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]¶
Bases:
XfmTrainerT5 model trainer.
Citation:
Colin Raffel et al. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):140:5485-140:5551, January.
- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))¶
- class zensols.amr.trainer.T5WithTenseGeneratorTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), nltk_lib_dir=None, annotate_dir=None, annotate_model='en_core_web_sm')[source]¶
Bases:
XfmTrainer- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), nltk_lib_dir=None, annotate_dir=None, annotate_model='en_core_web_sm')¶
- class zensols.amr.trainer.Trainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]¶
Bases:
DictableInterface in to the
amrlibpackage’s trainers- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))¶
-
corpus_prep_manager:
CorpusPrepperManager¶ Aggregates and applies corpus prepare instances.
-
model_installer:
Installer= None¶ The installer for the model used to train the model previously (i.e. by
amrlib).
-
model_name:
str¶ Some human readable string identifying the model, and ends up in the
amrlib_meta.json.
- property pretrained_path_or_model: str | Path¶
The path to the checkpoint file or the string
scratchif starting from scratch.
- train(dry_run=False)[source]¶
Train the model (see class docs).
- Parameters:
dry_run (
bool) – whenTrue, don’t do anything, just act like it.
- property training_config: Dict[str, Any]¶
The parameters given to the instance of the trainer, which is the class derived with
trainer_class.
- property training_config_file: Path¶
The path to the JSON configuration file in the
amrlibrepo in such asamrlib/configs/model_parse_*.json. IfNone, then try to find the configuration file genereted by the last pretrained model.
-
training_config_overrides:
Dict[str,Any]¶ More configuration that overrides/clobbers from the contents found in
training_config_file.
- class zensols.amr.trainer.XfmTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]¶
Bases:
HFTrainerTrainer for XFM and T5 models.
- __init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))¶
zensols.amr.tree module¶
Penman graph utilities and algorithms. These classes are currently only used for debugging and do not have any significant bearing on the overall package.
Bases:
objectFinds nodes using indexed paths.
Get a triple representing a graph node/edge of the given path.
The graph that will be populated with alignments.
Whether or not to strip alignment tags from the output. This is set to
Trueinget_missing_alignments()for test cases. However, it’s might also useful forget_node().
- class zensols.amr.tree.TreePruner(graph, keep_root_meta=True)[source]¶
Bases:
objectCreate a subgraph using a tuple found in the graph configured (
penman.configure()) as a tree- __init__(graph, keep_root_meta=True)¶
- create_sub(query)[source]¶
Create a subgraph using a tuple found in the graph configured (
penman.configure()) as a tree. Everything starting atqueryand down is included in the resulting graph.
-
keep_root_meta:
bool= True¶ Whether to keep the original metadata when the query is the root. When this is
True, the originalgraphis returned fromcreate_sub()when the itsqueryparameter is the root ofgraph.
zensols.amr.varidx module¶
A utility class to reindex variables in an :class`.AmrDocument`.
- class zensols.amr.varidx.VariableIndexer[source]¶
Bases:
objectThis reentrant class reindexes all variables for sentences of a
AmrDocumentso all node variables are unique. This is done by:Index concepts by the first character of their name (i.e.
sforsee-01) across sentences.Compile a list of variable replacements (i.e.
s2->s5) on a per sentence basis.Replace variable names based on their document level index order (i.e.
s,s2, etc). This is done for all concepts, edges, roles, and the epigraph. A new graph is created for those that have at least one modification, otherwise the original sentence is kept.
- __init__()¶
- reindex(sents)[source]¶
Reindex and repalce variables in
sents. Any modified graphs are updated in thesentsinstances.- Parameters:
sents (
Sequence[AmrSentence]) – sentences whose variables will be reindexed