zensols.amr package#

Subpackages#

Submodules#

zensols.amr.align#

Inheritance diagram of zensols.amr.align

Add alignments to AMR sentences.

class zensols.amr.align.AmrAlignmentPopulator(aligner, add_missing_metadata=True, raise_exception=True)[source]#

Bases: object

Adds alignment markers to AMR graphs.

__init__(aligner, add_missing_metadata=True, raise_exception=True)#
add_missing_metadata: bool = True#

Whether to add missing annotation metadata to sentences when missing.

align(doc)[source]#

Add alignment markers to sentence AMR graphs.

aligner: Union[str, Callable]#

The aligner used to annotate sentence AMR graphs.

raise_exception: bool = True#

Whether to raise exceptions when adding alignments on a per sentence basis. If set to False, alignments will be missing for any sentence producing and errors.

zensols.amr.align.create_amr_align_component(nlp, name, aligner)[source]#

Create an instance of AmrAlignmentPopulator.

zensols.amr.alignpop#

Inheritance diagram of zensols.amr.alignpop

Includes classes to add alginments to AMR graphs using an ISI formatted alignment string.

class zensols.amr.alignpop.AlignmentPopulator(graph, alignment_key='alignments')[source]#

Bases: object

Adds alignments from an ISI formatted string.

__init__(graph, alignment_key='alignments')#
alignment_key: str = 'alignments'#

The key in the graph’s metadata with the ISI formatted alignment string.

get_alignments()[source]#

Return the alignments for the graph.

Return type:

Tuple[PathAlignment, ...]

get_missing_alignments()[source]#

Find all path alignments not in the graph. This is done by matching against the epi mapping. This is only useful for testing.

Return type:

Tuple[PathAlignment, ...]

graph: Graph#

The graph that will be populated with alignments.

class zensols.amr.alignpop.PathAlignment(index, path, alignment_str, alignment, triple)[source]#

Bases: object

An alignment that contains the path and alignment to node, or an edge for role alignments.

__init__(index, path, alignment_str, alignment, triple)#
alignment: Union[Alignment, RoleAlignment]#

The alignment of the node or edge.

alignment_str: str#

The original unparsed alignment.

index: int#

The index of this alignment in the ISI formatted alignment string.

property is_role: bool#

Whether or not the alignment is a role alignment.

path: Tuple[int, ...]#

The path 0-index path to the node or the edge.

triple: Tuple[str, str, str]#

The triple specifying the node or edge of the alignment.

zensols.amr.amrlib#

Inheritance diagram of zensols.amr.amrlib

AMR parser and generator model implementations using amrlib.

class zensols.amr.amrlib.AmrlibGenerator(name=None, installer=None, alternate_path=None, use_tense=True)[source]#

Bases: _AmrlibModelContainer, AmrGenerator

__init__(name=None, installer=None, alternate_path=None, use_tense=True)#
generate(doc)[source]#

Generate a sentence from a spaCy document.

Parameters:

doc (AmrDocument) – the spaCy document used to generate the sentence

Return type:

AmrGeneratedDocument

Returns:

a text sentence for each respective sentence in doc

use_tense: bool = True#

Try to add tense information by trying to tag the graph, which requires the sentence or annotations and then goes through an alignment.

See:

amrlib.models.generate_t5wtense.inference.Inference

class zensols.amr.amrlib.AmrlibParser(model='noop', add_missing_metadata=True, name=None, installer=None, alternate_path=None)[source]#

Bases: _AmrlibModelContainer, AmrParser

__init__(model='noop', add_missing_metadata=True, name=None, installer=None, alternate_path=None)#
init_nlp_model(model, component)[source]#

Reset the installer to all reloads in a Python REPL with different installers.

zensols.amr.annotate#

Inheritance diagram of zensols.amr.annotate

AMR annotated corpus utility classes.

class zensols.amr.annotate.AnnotatedAmrDocument(sents, path=None, doc_id=None)[source]#

Bases: AmrDocument

An AMR document containing a unique document identifier from the corpus.

__init__(sents, path=None, doc_id=None)#

Initialize.

Parameters:
  • sents (Tuple[AmrSentence, …]) – the document’s sentences

  • path (Optional[Path, …]) – the path to file containing the Penman notation sentence graphs used in sents

  • model – the model to initailize AmrSentence when sents is a list of string Penman graphs

property body: AmrDocument#

The sentences that make up the body of the document.

clone(**kwargs)[source]#

Return a deep copy of this instance.

Return type:

AmrDocument

doc_id: str = None#

The unique document identifier.

static get_feature_sentences(feature_doc, amr_docs)[source]#

Return the feature sentences of those that refer to the AMR sentences, but starting from the AMR side.

Parameters:
Return type:

Iterable[AmrFeatureSentence]

property sections: Tuple[AnnotatedAmrSectionDocument]#

The sections of the document.

property summary: AmrDocument#

The sentences that make up the summary of the document.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_summary=True, include_sections=True, include_body=False, include_amr=True, **kwargs)[source]#

Write the contents of this instance to writer using indention depth.

Parameters:
  • include_summary (bool) – whether to include the summary sentences

  • include_sectional – whether to include the sectional sentences

  • include_body (bool) – whether to include the body sentences

  • include_amr (bool) – whether to include the super class AMR output

  • kwargs – arguments given to the super classe’s write, such as limit_sent=0 to effectively disable it

class zensols.amr.annotate.AnnotatedAmrDocumentStash(installer, doc_dir, corpus_cache_dir, id_name, id_regexp=re.compile('([^.]+)\\\\.(\\\\d+)'), sent_type_col='snt-type', sent_type_mapping=None, doc_parser=None, amr_sent_model=None, amr_sent_class=<class 'zensols.amr.annotate.AnnotatedAmrSentence'>, amr_doc_class=<class 'zensols.amr.annotate.AnnotatedAmrDocument'>, doc_annotator=None)[source]#

Bases: Stash

A factory stash that creates AnnotatedAmrDocument instances of annotated documents from a single text file containing a corpus of AMR Penman formatted graphs.

__init__(installer, doc_dir, corpus_cache_dir, id_name, id_regexp=re.compile('([^.]+)\\\\.(\\\\d+)'), sent_type_col='snt-type', sent_type_mapping=None, doc_parser=None, amr_sent_model=None, amr_sent_class=<class 'zensols.amr.annotate.AnnotatedAmrSentence'>, amr_doc_class=<class 'zensols.amr.annotate.AnnotatedAmrDocument'>, doc_annotator=None)#
amr_doc_class#

The class used to create new instances of AmrDocument.

alias of AnnotatedAmrDocument

amr_sent_class#

The class used to create new instances of AmrSentence.

alias of AnnotatedAmrSentence

amr_sent_model: str = None#

The model set in the AmrSentence initializer.

clear()[source]#

Remove all corpus cache files.

corpus_cache_dir: Path#

A directory to store pickle cache files of the annotated corpus.

property corpus_df: pd.DataFrame#

A data frame containing the identifier, text of the sentences and the annotated sentence types of the corpus.

property corpus_doc: AmrDocument#

A document containing all the sentences from the corpus.

delete(name=None)[source]#

Delete the resource for data pointed to by name or the entire resource if name is not given.

doc_annotator: AnnotationFeatureDocumentParser = None#

Used to annotated AMR documents if not None.

property doc_counts: pd.DataFrame#

A data frame of the counts by unique identifier.

doc_dir: Path#

The directory containing sentence type mapping for documents or None if there are no sentence type alignments.

doc_parser: FeatureDocumentParser = None#

If provided, AMR metadata is added to sentences, which is needed by the AMR populator.

dump(name, inst)[source]#

Persist data value inst with key name.

exists(doc_id)[source]#
Parameters:

doc_id (str) – the document unique identifier

Return type:

bool

export_sent_type_template(doc_id, out_path=None)[source]#

Create a CSV file that contains the sentences and other metadata of an annotated document used to annotated sentence types.

id_name: str#

The ID used in the graph string comments containing the document ID.

id_regexp: Pattern = re.compile('([^.]+)\\.(\\d+)')#

The regular expression used to create the id_name if it exists. The regular expression must have with two groups: the first the ID and the second is the sentence index.

installer: Installer#

The installer containing the AMR annotated corpus.

keys()[source]#

Return an iterable of keys in the collection.

Return type:

Iterable[str]

load(doc_id)[source]#
Parameters:

doc_id (str) – the document unique identifier

Return type:

AnnotatedAmrDocument

parse_id(id)[source]#

Parse an AMR ID and return it as (doc_id, sent_id), both strings.

Return type:

Tuple[str, str]

sent_type_col: str = 'snt-type'#

The AMR metadata ID used for the sentence type.

sent_type_mapping: Dict[str, str] = None#

Used to map what’s in the corpus to a value of SentenceType if given.

class zensols.amr.annotate.AnnotatedAmrFeatureDocumentFactory(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0)[source]#

Bases: object

Creates instances of AmrFeatureDocument each with AmrFeatureDocument.amr instance of AnnotatedAmrDocument and AmrFeatureSentence.amr with AnnotatedAmrSentence. This is created using a JSON file or a list of dict.

The keys each dictionary are the case-insensitive enumeration values of SentenceType. Keys id and comment are the unique document identifier and a comment that is added to the AMR sentence metadata. Both are optional, and if id is missing, :obj:doc_id.

An example JSON creates a document with ID ex1, a comment metadata, one SentenceType.SUMMARY and two SentenceType.BODY sentences:

[{
    "id": "ex1",
    "comment": "very short",
    "body": "The man ran to make the train. He just missed it.",
    "summary": "A man got caught in the door of a train he just missed."
}]
See:

from_dict()

__init__(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0)#
doc_id: int = 0#

An instance based enumerated value is used, which is enumerated for each document missing an ID.

doc_parser: FeatureDocumentParser#

The feature document parser used to create the Penman formatted graphs.

from_data(data)[source]#

Create AMR documents based on the type of data.

Parameters:

data (Union[Path, Dict, Sequence]) – the data that contains the annotated AMR document

See:

from_file()

See:

from_dicts()

See:

from_dict()

Return type:

Iterable[AmrFeatureDocument]

from_dict(data)[source]#

Parse and create an AMR document from a dict.

Parameters:
  • data (Dict[str, str]) – the AMR text to be parsed each entry having keys summary and body

  • doc_id – the document ID to set as AmrFeatureDocument.doc_id

Return type:

AmrFeatureDocument

from_dicts(data)[source]#

Parse and create an AMR documents from a list of dict.

Parameters:
See:

from_dict()

Return type:

Iterable[AmrFeatureDocument]

from_file(input_file)[source]#

Read annotated documents from a file and create AMR documents.

Parameters:

input_file (Path) – the JSON file to read the doc text

Return type:

Iterable[AmrFeatureDocument]

from_str(sents, stype)[source]#

Parse and create AMR sentences from a string.

Parameters:
  • sents (str) – the string containing a space separated list of sentences

  • stype (SentenceType) – the sentence type assigned to each new AMR sentence

Return type:

Iterable[AmrFeatureSentence]

remove_alignments: bool = False#

Whether to remove text-to-graph alignments in all sentence graphs after parsing.

remove_wiki_attribs: bool = False#

Whether to remove the :wiki roles from all sentence graphs after parsing.

class zensols.amr.annotate.AnnotatedAmrFeatureDocumentStash(feature_doc_factory, doc_stash, amr_stash, coref_resolver=None)[source]#

Bases: PrimeableStash

A stash that persists AmrFeatureDocument instances using AMR annotates from AnnotatedAmrDocumentStash as a source. The key set and exists behavior is identical between to two stashes. However, the instances of AmrFeatureDocument (and its constituent sentences) are generated from the AMR annotated sentences (i.e. from the ``::snt` metadata field).

This stash keeps the persistance of the AmrDocument separate from instance of the feature document to avoid persisting it twice across doc_stash and amr_stash. On load, these two data structures are stitched together.

__init__(feature_doc_factory, doc_stash, amr_stash, coref_resolver=None)#
amr_stash: AnnotatedAmrDocumentStash#

The stash used to persist AmrDocument instances that are stitched together with the AmrFeatureDocument (see class docs).

clear()[source]#

Delete all data from the from the stash.

Important: Exercise caution with this method, of course.

coref_resolver: CoreferenceResolver = None#

Adds coreferences between the sentences of the document.

delete(name=None)[source]#

Delete the resource for data pointed to by name or the entire resource if name is not given.

doc_stash: Stash#

The stash used to persist instances of AmrFeatureDocument. It does not persis the AmrDocument (see class docs).

dump(name, inst)[source]#

Persist data value inst with key name.

exists(doc_id)[source]#

Return True if data with key name exists.

Implementation note: This Stash.exists() method is very inefficient and should be overriden.

Return type:

bool

feature_doc_factory: AmrFeatureDocumentFactory#

Creates AmrFeatureDocument from AmrDocument instances.

keys()[source]#

Return an iterable of keys in the collection.

Return type:

Iterable[str]

load(doc_id)[source]#

Load a data value from the pickled data with key name. Semantically, this method loads the using the stash’s implementation. For example DirectoryStash loads the data from a file if it exists, but factory type stashes will always re-generate the data.

See:

get()

Return type:

AmrFeatureDocument

prime()[source]#
class zensols.amr.annotate.AnnotatedAmrSectionDocument(sents, path=None, section_sents=())[source]#

Bases: AmrDocument

Represents a section from an annotated document.

__init__(sents, path=None, section_sents=())#

Initialize.

Parameters:
  • sents (Tuple[AmrSentence, …]) – the document’s sentences

  • path (Optional[Path, …]) – the path to file containing the Penman notation sentence graphs used in sents

  • model – the model to initailize AmrSentence when sents is a list of string Penman graphs

section_sents: Tuple[AmrSentence] = ()#

The sentences that make up the section title (usually just one).

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#
Parameters:
  • limit_sent – the max number of sentences to write

  • add_sent_id – add the sentence ID to the output

  • include_metadata – whether to add graph metadata to the output

  • text_ony – whether to only write the sentence text rather than the AMR Penman notation

class zensols.amr.annotate.AnnotatedAmrSentence(data, model, doc_sent_idx, sent_type)[source]#

Bases: AmrSentence

A sentence containing its index in the document and the funtional type.

__init__(data, model, doc_sent_idx, sent_type)[source]#

Initialize based on the kind of data given.

Parameters:
  • data (Union[str, Graph]) – either a Penman formatted string graph, an already parsed graph or an AmrFailure for upstream issues

  • model (str) – the model to use for encoding and decoding

clone(cls=None, **kwargs)[source]#

Return a deep copy of this instance.

Return type:

AmrSentence

class zensols.amr.annotate.CorpusWriter(anon_doc_factory)[source]#

Bases: Writable

Writes AmrDocument instances to a file. To use, first add documents either directly with docs or using the add().

__init__(anon_doc_factory)#
add(data)[source]#

Add document(s) to this corpus writer. This uses the AnnotatedAmrFeatureDocumentFactory.from_data() and adds the instances of AmrFeatureDocument.

anon_doc_factory: AnnotatedAmrFeatureDocumentFactory#

The factory used to create the AmrFeatureDocument instances that are in turn used to format that graphs as Penman text output.

property docs: List[AmrDocument]#

The document to write.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write the contents of the documents added to this writer to writer as flat formatted Penman AMRs.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.amr.annotate.FileCorpusWriter(anon_doc_factory, input_file, output_file)[source]#

Bases: CorpusWriter

A corpus writer that parses a JSON file for its source input, then uses a the configured AMR parser to generate the graphs.

__init__(anon_doc_factory, input_file, output_file)#
input_file: Path#

The JSON file as formatted per AnnotatedAmrFeatureDocumentFactory.

output_file: Path#

The file path to write the AMR sentences.

class zensols.amr.annotate.SentenceType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

The type of sentence in relation to its function in the document.

BODY = 'b'#
FIGURE = 'f'#
FIGURE_TITLE = 'ft'#
OTHER = 'o'#
SECTION = 's'#
SUMMARY = 'a'#
TITLE = 't'#

zensols.amr.app#

Inheritance diagram of zensols.amr.app

Adapts amrlib in the Zensols framework.

class zensols.amr.app.Application(log_config, config_factory, doc_parser, anon_doc_stash, dumper)[source]#

Bases: BaseApplication

Parse and plot AMR graphs in Penman notation.

__init__(log_config, config_factory, doc_parser, anon_doc_stash, dumper)#
anon_doc_stash: Stash#

The annotated document stash.

clear()[source]#

Clear all cached parsed AMR documents and data.

config_factory: ConfigFactory#

Application context used by programmatic clients of this class.

count(input_file)[source]#

Provide counts on an AMR corpus file.

Parameters:

input_file (Path) – a file with newline separated AMR Penman graphs

doc_parser: FeatureDocumentParser#

The feature document parser for the app. This is not done via the application config to allow overriding of the defaults.

dumper: Dumper#

Plots and writes AMR content in human readable formats.

parse(text)[source]#

Parse the natural language text to an AMR graphs.

Parameters:

text (str) – the sentence(s) to parse

plot(text, output_dir=None)[source]#

Parse a sentence in to an AMR graph.

Parameters:
  • text (str) – the sentence(s) to parse or a number a pre-written sentence

  • output_dir (Path) – the output directory

plot_file(input_file, output_dir=None)[source]#

Render a Penman files or a JSON formatted sentence list.

Parameters:
  • input_file (Path) – a file with newline separated AMR Penman graphs

  • output_dir (Path) – the output directory

write_metadata(input_file, output_dir=None)[source]#

Write the metadata of each AMR in a corpus file.

Parameters:
  • input_file (Path) – a file with newline separated AMR Penman graphs

  • output_dir (Path) – the output directory

class zensols.amr.app.BaseApplication(log_config)[source]#

Bases: object

Base class for applications.

__init__(log_config)#
log_config: LogConfigurator#

Used to update logging levels based on the ran action.

class zensols.amr.app.Format(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

Format output type for AMR corpous documents.

csv = 3#
json = 2#
classmethod to_ext(f)[source]#
Return type:

str

txt = 1#
class zensols.amr.app.ScorerApplication(log_config, config_factory, doc_factory)[source]#

Bases: BaseApplication

Creates parsed files for comparing, and scores.

__init__(log_config, config_factory, doc_factory)#
config_factory: ConfigFactory#

Application context.

doc_factory: AmrFeatureDocumentFactory#

Creates AmrFeatureDocument from AmrDocument instances.

parse_penman(input_file, output_dir=None, meta_keys='id,snt', limit=None)[source]#

Parse Penman sentence(s) by id and write a parsed AMR.

Parameters:
  • input_file (Path) – a file with newline separated AMR Penman graphs

  • output_dir (Path) – the output directory

  • meta_keys (str) – a comma separated list of metadata keys

  • limit (int) – the max of items to process

Return type:

List[Path]

remove_wiki(input_file, output_dir=None)[source]#

Remove wiki attributes necessary for scoring.

Parameters:
  • input_file (Path) – a file with newline separated AMR Penman graphs

  • output_dir (Path) – the output directory

score(input_gold, input_parsed=None, output_dir=None, output_format=Format.csv, limit=None, methods=None)[source]#

Score AMRs by ID and dump the results to a file or directory.

Parameters:
  • input_gold (Path) – the file containing the gold AMR graphs

  • input_parsed (Path) – the file containing the parser output graphs, defaults to gold-parsed.txt

  • output_dir (Path) – the output directory

  • output_format (Format) – the output format

  • limit (int) – the max of items to process

  • methods (str) – a comma separated list of scoring methods

Return type:

ScoreSet

class zensols.amr.app.TrainerApplication(log_config, config_factory)[source]#

Bases: BaseApplication

Trains and evaluates models.

__init__(log_config, config_factory)#
config_factory: ConfigFactory#

Application context.

prep_corpus()[source]#

Download and install the training corpus.

restore_splits(output_dir=None, id_pattern=None)[source]#

Restore corpus splits used for training.

Parameters:
  • output_dir (Path) – the output directory

  • id_pattern (str) – the AMR metadata ID regular expression to match

train(dry_run=False)[source]#

Continue fine tuning on additional corpora.

Parameters:

dry_run (bool) – don’t do anything; just act like it

property trainer: Trainer#

Interface in to the amrlib package’s trainer. This is not done via the application config to allow overriding of the defaults.

write_corpus(text_or_file, out_file=None)[source]#

Write a corpus from ad hoc text.

Parameters:
  • text_or_file (str) – if the file exists, use the contents of the file, otherwise, the sentence(s) to parse

  • out_file (Path) – the output file

zensols.amr.cli#

Inheritance diagram of zensols.amr.cli

Command line entry point to the application.

class zensols.amr.cli.ApplicationFactory(*args, **kwargs)[source]#

Bases: ApplicationFactory

__init__(*args, **kwargs)[source]#
classmethod get_doc_parser()[source]#

Return the natural language parser that also creates AMR graphs as the amr attribute in the document.

Return type:

FeatureDocumentParser

classmethod get_dumper()[source]#

Return a object that plots and writes AMR content in human readable formats.

Return type:

Dumper

zensols.amr.cli.main(args=['/Users/landes/opt/lib/python/bin/sphinx-build', '-M', 'html', '/Users/landes/view/nlp/amr/target/doc/src', '/Users/landes/view/nlp/amr/target/doc/build'], **kwargs)[source]#
Return type:

ActionResult

zensols.amr.container#

Inheritance diagram of zensols.amr.container

Extensions of zensols.nlp feature containers.

class zensols.amr.container.AmrFeatureDocument(sents, text=None, spacy_doc=None, amr=None, coreference_relations=None)[source]#

Bases: FeatureDocument

A feature document that contains an amr graph.

__init__(sents, text=None, spacy_doc=None, amr=None, coreference_relations=None)#
add_coreferences(to_populate)[source]#

Add coreference_relations to to_populate using this instance’s coreferences. Note that from_sentences(), from_amr_sentences(), get_overlapping_document() and meth:clone already do this.

property amr: AmrDocument#

The AMR representation of the document.

clone(cls=None, **kwargs)[source]#
Parameters:

kwargs – if copy_spacy is True, the spacy document is copied to the clone in addition parameters passed to new clone initializer

Return type:

TokenContainer

property coreference_relations: Tuple[Tuple[Tuple[int, str], ...], ...]#

The coreferences tuple sets between the sentences of the document:

((<sentence index 1>, <variable 1>),
 (<sentence index 2>, <variable 2>)...)
from_amr_sentences(amr_sents)[source]#

Like from_sentences(), return a new document with FeatureDocument sentences sync’d with AmrSentence.

See:

add_coreferences()

Return type:

AmrFeatureDocument

from_sentences(sents, deep=False)[source]#

Return a new cloned document using the given sentences.

Parameters:
  • sents (Iterable[FeatureSentence]) – the sentences to add to the new cloned document

  • deep (bool) – whether or not to clone the sentences

See:

clone()

See:

add_coreferences()

Return type:

AmrFeatureDocument

get_overlapping_span(span, inclusive=True)[source]#

Return a feature span that includes the lexical scope of span.

Return type:

TokenContainer

property relation_set: RelationSet#

The relations in the contained document as a set of relations.

sync_amr_sents()[source]#

Copy amr sentences to each respective AmrFeatureSentence.amr. This is necessary when then AmrDocument is updated with new sentences that need to percolate down to the feature sentences.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, n_tokens=0, include_relation_set=False, include_original=False, include_normalized=True, include_amr=None, sent_kwargs={}, amr_kwargs={})[source]#

Write the document and optionally sentence features.

Parameters:
  • n_sents – the number of sentences to write

  • n_tokens (int) – the number of tokens to print across all sentences

  • include_original (bool) – whether to include the original text

  • include_normalized (bool) – whether to include the normalized text

class zensols.amr.container.AmrFeatureSentence(tokens, text=None, spacy_span=None, amr=None)[source]#

Bases: FeatureSentence

A sentence that holds an instance of AmrSentence.

__init__(tokens, text=None, spacy_span=None, amr=None)#
property alignments: Dict[Tuple[str, str, str], Tuple[FeatureToken, ...]]#

The tokens only returnd from indexed_alignments.

property amr: AmrSentence#

The AMR representation of the sentence.

clone(cls=None, **kwargs)[source]#

Clone an instance of this token container.

Parameters:
  • cls (Type) – the type of the new instance

  • kwargs – arguments to add to as attributes to the clone

Return type:

TokenContainer

Returns:

the cloned instance of this instance

property indexed_alignments: Dict[Tuple[str, str, str], Tuple[Tuple[int, FeatureToken]], ...]#

The graph alignments as a triple-to-token dict. The values are tuples 0-index token offset and the feature token pointed to by the alignment.

property is_failure: bool#

Whether the AMR graph failed to be parsed.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, n_tokens=0, include_metadata=False, include_original=False, include_normalized=True, include_stack=False, include_amr=True)[source]#
Parameters:
  • include_stack (bool) – whether to add the stack trace of the parse of an error occured while trying to do so

  • include_metadata (bool) – whether to add graph metadata to the output

class zensols.amr.container.Reference(sent, variable)[source]#

Bases: ReferenceObject

A multi-document coreference target, which points to a node in an AMR graph.

__init__(sent, variable)#
sent: AmrFeatureSentence#

The sentence containing the reference.

property short#

A short string describing the reference.

property subtree: AmrSentence#

The subtree of the sentence containing the target as an AmrFeatureSentence.

property target: str#

The target of the coreference.

property triple: Tuple[str, str, str]#

The AMR tripple of (source relation target) of the reference.

variable: str#

The variable in the AMR graph.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.amr.container.ReferenceObject[source]#

Bases: PersistableContainer, Dictable

A base class reference and relation classes.

__init__()#
class zensols.amr.container.ReindexVariableFeatureDocumentDecorator[source]#

Bases: FeatureDocumentDecorator

Reindex AMR concept variables to be unique across all sentences.

See:

AmrDocument.reindex_variables().

__init__()#
decorate(doc)[source]#
class zensols.amr.container.Relation(seq_id, references)[source]#

Bases: ReferenceObject

A relation makes up a set of references across multuiple sentences of a document. This is what Christopher Manning calls a cluster.

__init__(seq_id, references)#
property by_sent: Dict[AmrFeatureSentence, Reference]#

An association from sentences to their references.

references: Tuple[Reference, ...]#

The references for this relation.

seq_id: int#

The sequence identifier of the relation.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.amr.container.RelationSet(relations)[source]#

Bases: ReferenceObject

All coreference relations for a given document.

__init__(relations)#
as_set(**kwargs) Set[Relation]#

A set version of relations.

Return type:

Set[Relation]

relations: Tuple[Relation, ...]#

The relations for all documents computed the coreferencer.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

zensols.amr.coref#

Inheritance diagram of zensols.amr.coref

Wrap the amr_coref module for AMR Co-refernce resolution.

class zensols.amr.coref.CoreferenceResolver(installer, stash=<factory>, use_multithreading=True, robust=True)[source]#

Bases: object

Resolve coreferences in AMR graphs.

__init__(installer, stash=<factory>, use_multithreading=True, robust=True)#
clear()[source]#

Clear the stash cashe.

installer: Installer#

The amr_coref module’s coreference module installer.

property model: Inference#

The amr_coref coreference model.

robust: bool = True#

Whether to robustly deal with exceptions in the coreference model. If True, instances of AmrFailure are stored in the stash and empty coreferences used for caught errors.

stash: Stash#

The stash used to cache results. It takes a while to inference but the results in memory size are small.

use_multithreading: bool = True#

By default, multithreading is enabled for Linux systems. However, an error is raised when invoked from a child thread. Set to False to off multithreading for coreference resolution.

zensols.amr.corpprep#

Inheritance diagram of zensols.amr.corpprep

Prepare and compile AMR corpora for training.

class zensols.amr.corpprep.AmrReleaseCorpusPrepper(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)[source]#

Bases: CorpusPrepper

Writes the `AMR 3 release`_ corpus files.

__init__(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)#
read_docs(target)[source]#

Read and return tuples of where to write the output of the sentences of the corresponding document.

Parameters:

target (Path) – the location of where to copy the finished files

Return type:

Iterable[Tuple[str, AmrDocument]]

Returns:

tuples of the dataset name and the read document

class zensols.amr.corpprep.CorpusPrepper(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)[source]#

Bases: Dictable

Subclasses know where to download, install, and split the corpus in to train and dev data sets. Each subclass generates only the training and dev/validation datasets, which is an aspect of AMR parser and text generation models. Both the input and outupt are Penman encoded AMR graphs.

DEV_SPLIT_NAME: ClassVar[str] = 'dev'#

The development / validation dataset name.

TRAINING_SPLIT_NAME: ClassVar[str] = 'training'#

The training dataset name.

__init__(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False)#
installer: Installer#

The location and decompression details.

name: str#

Used for logging and directory naming.

abstract read_docs(target)[source]#

Read and return tuples of where to write the output of the sentences of the corresponding document.

Parameters:

target (Path) – the location of where to copy the finished files

Return type:

Iterable[Tuple[str, AmrDocument]]

Returns:

tuples of the dataset name and the read document

remove_wiki: bool = True#

Whether to remove :wiki relations, which are not predicted by the model and negatively effect validation performance set while training.

shuffle: bool = False#

Whether to shuffle the AMR sentences before writing to the target directory. Use this to randomize across a per-corpus train and dev sets.

transform_ascii: bool = True#

Whether to replace non-ASCII characters for models.

class zensols.amr.corpprep.CorpusPrepperManager(name, preppers, stage_dir, shuffle=True, key_splits=None)[source]#

Bases: Dictable

Aggregates and applies corpus prepare instances.

__init__(name, preppers, stage_dir, shuffle=True, key_splits=None)#
clear()[source]#
property dev_file: Path#

The deveopment / validation dataset directory.

property is_done: bool#

Whether or not the preparation is already complete.

key_splits: Path = None#

The AMR ``id``s from the sentence metadatas for each split are written to this JSON file if specified.

name: str#

Name of application configuration instance for debugging.

prepare()[source]#

Download, install and write the corpus to disk from all preppers. The output of each is placed in the corresponding training or dev directories in stage_dir. The data is then ready for AMR parser and generator trainers.

preppers: Tuple[CorpusPrepper, ...]#

The corpus prepare instances used to create the training files.

restore_splits(output_dir, id_pattern)[source]#

Restore corpus splits used for training.

Parameters:
  • output_dir (Path) – the output directory

  • id_pattern (Pattern) – the AMR metadata ID regular expression to match

shuffle: bool = True#

Whether to shuffle the AMR sentences before writing to the target directory. This is used the shuffle across each corpora per split.

stage_dir: Path#

The location of where to copy the finished files.

property training_file: Path#

The training dataset directory.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.amr.corpprep.SingletonCorpusPrepper(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False, dev_portion=0.15)[source]#

Bases: CorpusPrepper

Prepares the corpus training files from a single AMR Penman encoded file.

__init__(name, installer, transform_ascii=True, remove_wiki=True, shuffle=False, dev_portion=0.15)#
dev_portion: float = 0.15#

The portion of the dev/validation set in sentences of the single input file.

read_docs(target)[source]#

Read and return tuples of where to write the output of the sentences of the corresponding document.

Parameters:

target (Path) – the location of where to copy the finished files

Return type:

Iterable[Tuple[str, AmrDocument]]

Returns:

tuples of the dataset name and the read document

zensols.amr.doc#

Inheritance diagram of zensols.amr.doc

AMR container classes that fit a document/sentence hierarchy.

class zensols.amr.doc.AmrDocument(sents, path=None, model=None)[source]#

Bases: PersistableContainer, Writable

A document of AMR graphs, which is indexible and iterable.

__init__(sents, path=None, model=None)[source]#

Initialize.

Parameters:
clone(**kwargs)[source]#

Return a deep copy of this instance.

Return type:

AmrDocument

from_sentences(sents, deep=False)[source]#

Return a new cloned document using the given sentences.

Parameters:
  • sents (Iterable[AmrSentence]) – the sentences to add to the new cloned document

  • deep (bool) – whether or not to clone the sentences

See:

clone()

Return type:

AmrDocument

classmethod from_source(source, transform_ascii=False, **kwargs)[source]#

Return a new document created for source.

Parameters:
  • source (Union[Path, Installer]) – either a double newline list of AMR graphs or an installer that has a singleton path to a like file

  • transform_ascii (bool) – whether to replace non-ASCII characters to their ASCII equivalents (i.e. removes umlauts)

  • kwargs – additional keyword arguments given to the initializer of the document

Return type:

AmrDocument

property graph_string: str#

The graph of all sentences with two newlines as a separator as a string in Penman format.

classmethod is_comment(doc)[source]#

Return whethor or not doc is all comments.

Return type:

bool

normalize()[source]#

Normalize the graph string to standard notation per the Penman API.

See:

AmrSentence.normalize()

path: Optional[Path, ...] = None#

If set, the file the sentences were parsed from in Penman notation.

reindex_variables()[source]#

Reindexes all variables for sentences of a AmrDocument so all node variables are unique in the document.

remove_alignments()[source]#

Remove text-to-graph alignments in all sentence graphs.

remove_wiki_attribs()[source]#

Removes the :wiki roles from all sentence graphs.

static resolve_source(source)[source]#

Coerce a an (optionally) installer to a path.

Return type:

Path

sents: Tuple[AmrSentence, ...]#

The AMR sentences that make up the document.

property text: str#

The text of the natural language form of the document. This is the concatenation of all the sentinel text.

classmethod to_document(sents)[source]#

Create a new document from an iterable of sentences.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, limit_sent=9223372036854775807, add_sent_id=False, include_metadata=True, text_only=False)[source]#
Parameters:
  • limit_sent (int) – the max number of sentences to write

  • add_sent_id (bool) – add the sentence ID to the output

  • include_metadata (bool) – whether to add graph metadata to the output

  • text_ony – whether to only write the sentence text rather than the AMR Penman notation

class zensols.amr.doc.AmrGeneratedDocument(sents, amr)[source]#

Bases: Writable

A sentence generated by the graph-to-text model.

__init__(sents, amr)#
amr: AmrDocument#

The input document to the model used to predict sents.

sents: Tuple[AmrGeneratedSentence, ...]#

The predicted sentences using amr as observation.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, clipped_inline=True, amr_doc_kwargs=None, amr_sent_kwargs=None)[source]#
Parameters:
  • clipped_inline (bool) – whether to render clipped sentences in the text output line

  • amr_kwargs – the keyword arguments to pass on to the rendering of amr

zensols.amr.docfac#

Inheritance diagram of zensols.amr.docfac

Feature sentence and document utilities.

class zensols.amr.docfac.AmrFeatureDocumentFactory(name, doc_parser, alignment_populator=None)[source]#

Bases: object

Creates AmrFeatureDocument from AmrDocument instances.

__init__(name, doc_parser, alignment_populator=None)#
alignment_populator: AmrAlignmentPopulator = None#

Adds the alighment markings.

doc_parser: FeatureDocumentParser#

The document parser used to creates AmrFeatureDocument instances.

name: str#

The name of this factory in the application config.

to_feature_doc(amr_doc, catch=False, add_metadata=False, add_alignment=False)[source]#

Create a AmrFeatureDocument from a class:.AmrDocument by parsing the snt metadata with a FeatureDocumentParser.

Parameters:
  • add_metadata (Union[str, bool]) – add missing annotation metadata to amr_doc parsed from spaCy if missing (see AmrParser.add_metadata()) if True and replace any previous metadata if this value is the string clobber

  • catch (bool) – if True, return caught exceptions creating a AmrFailure from each and return them

Return type:

Union[AmrFeatureDocument, Tuple[AmrFeatureDocument, List[AmrFailure]]]

Returns:

an AMR feature document if catch is False; otherwise, a tuple of a document with sentences that were successfully parsed and a list any exceptions raised during the parsing

class zensols.amr.docfac.EntityCopySpacyFeatureDocumentParser(config_factory, name, lang='en', model_name=None, token_feature_ids=<factory>, components=(), token_decorators=(), sentence_decorators=(), document_decorators=(), disable_component_names=None, token_normalizer=None, special_case_tokens=<factory>, doc_class=<class 'zensols.nlp.container.FeatureDocument'>, sent_class=<class 'zensols.nlp.container.FeatureSentence'>, token_class=<class 'zensols.nlp.tok.SpacyFeatureToken'>, remove_empty_sentences=None, reload_components=False, auto_install_model=False)[source]#

Bases: SpacyFeatureDocumentParser

Copy spaCy ent_type_ named entity (NER) tags to FeatureToken ent_ tags.

The AMR document’s metadata ner_tags is populated in AmrParser from the spaCy document. But this document parser instance is configured with embedded entities turned off so whitespace delimited tokens match with the alignments.

__init__(config_factory, name, lang='en', model_name=None, token_feature_ids=<factory>, components=(), token_decorators=(), sentence_decorators=(), document_decorators=(), disable_component_names=None, token_normalizer=None, special_case_tokens=<factory>, doc_class=<class 'zensols.nlp.container.FeatureDocument'>, sent_class=<class 'zensols.nlp.container.FeatureSentence'>, token_class=<class 'zensols.nlp.tok.SpacyFeatureToken'>, remove_empty_sentences=None, reload_components=False, auto_install_model=False)#

zensols.amr.docparser#

Inheritance diagram of zensols.amr.docparser

AMR document annotation.

exception zensols.amr.docparser.AmrParseError(msg, sent=None)[source]#

Bases: AmrError

__module__ = 'zensols.amr.docparser'#
class zensols.amr.docparser.AnnotationFeatureDocumentParser(name, delegate, token_decorators=(), sentence_decorators=(), document_decorators=(), token_feature_ids=<factory>, stash=None, hasher=<factory>, amr_parser=None, alignment_populator=None, coref_resolver=None, reparse=True, amr_doc_class=<class 'zensols.amr.container.AmrFeatureDocument'>, amr_sent_class=<class 'zensols.amr.container.AmrFeatureSentence'>)[source]#

Bases: CachingFeatureDocumentParser

A document parser that adds and further annotates AMR graphs. This has the advantage of avoiding a second AMR construction when annotating a graph with features (i.e. ent, POS tag, etc) because it uses (adapted) spaCy class’s normalized features. For this reason, use this class if your application needs such annotations.

This parses and popluates AMR graphs as AmrDocument at the document level and a AmrSentence at the sentence level using a zensols.nlp.FeatureDocument.

This class will also recreate the AMR on normalized text of the document. This is necessary since AMR parsing and alignment happen at the spaCy level and token normalization happen at the zensole.nlp feature token level. Since spaCy does not allow for filter tokens (i.e. stop words) there is no way to avoid a reparse.

However, if your application makes no modification to the document, a second reparse is not needed and you should set reparse to False.

A consideration is the adaptation spaCy module (spacyadapt) is not thoroughly tested and future updates might break. If you do not feel comfortable using it, or can not, use the spacy pipline by setting amr_default:doc_parser = amr_anon_doc_parser in the application configuration and annotated the graph yourself.

The AMR graphs are optionally cached using a Stash when stash is set.

Important: when using stash caching only the AmrDocument is cached and not the entire feature document. This could lead to the documents and AMR graphs getting out of sync if both are cached. Use the clear() method to clear the stash if ever in doubt.

A new instance of AmrFeatureDocument are returned.

__init__(name, delegate, token_decorators=(), sentence_decorators=(), document_decorators=(), token_feature_ids=<factory>, stash=None, hasher=<factory>, amr_parser=None, alignment_populator=None, coref_resolver=None, reparse=True, amr_doc_class=<class 'zensols.amr.container.AmrFeatureDocument'>, amr_sent_class=<class 'zensols.amr.container.AmrFeatureSentence'>)#
alignment_populator: AmrAlignmentPopulator = None#

Adds the alighment markings.

amr_doc_class#

The FeatureDocument class created to store zensols.amr.AmrDocument instances.

alias of AmrFeatureDocument

amr_parser: AmrParser = None#

The AMR parser used to induce the graphs.

amr_sent_class#

The FeatureSentence class created to store zensols.amr.AmrSentence instances.

alias of AmrFeatureSentence

annotate(doc)[source]#

Parse, annotate and annotate a new AMR feature document using features from doc. Since the AMR document itself is not cached, using a separate document cache is necessary for caching/storage.

Parameters:
  • doc (FeatureDocument) – the source feature document to parse in to AMRs

  • key – the key used to cache the AmrDocument. in stash if provided (see class docs)

Return type:

AmrFeatureDocument

clear()[source]#

Clear the caching stash.

coref_resolver: CoreferenceResolver = None#

Adds coreferences between the sentences of the document.

parse(text, *args, **kwargs)[source]#

Parse text or a text as a list of sentences.

Parameters:
  • text (str) – either a string or a list of strings; if the former a document with one sentence will be created, otherwise a document is returned with a sentence for each string in the list

  • args – the arguments used to create the FeatureDocument instance

  • kwargs – the key word arguments used to create the FeatureDocument instance

Return type:

FeatureDocument

reparse: bool = True#

Reparse the normalized FeatureSentence text for each AMR sentence, which is necessary when tokens are remove (i.e. stop words). See the class docs.

class zensols.amr.docparser.TokenAnnotationFeatureDocumentDecorator(name, feature_id, indexed=False, add_none=False, use_sent_index=True, method='attribute')[source]#

Bases: FeatureDocumentDecorator

Annotate features in AMR sentence graphs from indexes annotated from AmrAlignmentPopulator.

__init__(name, feature_id, indexed=False, add_none=False, use_sent_index=True, method='attribute')#
add_none: bool = False#

Whether add missing or empty values. This includes string values of zensols.nlp.FeatureToken.NONE.

decorate(doc)[source]#
feature_id: str#

The FeatureToken ID (attribute) to annotate in the AMR graph.

indexed: bool = False#

Whether or not to append an index to the role.

method: str = 'attribute'#

Where to add the data, which may be one of:

  • attribute: add as a new attribute node using name as the role and the value as the attribute constant

  • epi: as epigraph data; however, the current Penman implementation assume only alignments and the graph string will no longer be parsable

Otherwise, it uses the string to format a replacement node text using target as the previous/original node text and value as the feature value text.

name: str#

The triple role (if add_to_epi is False) used to label the edge between the token and the feature. Otherwise, this string is used in the epidata of the graph.

use_sent_index: bool = True#

Whether to map alignments to by (iterated) index position, or by using the per sentence index FeatureToken attribute i_sent. Set this to False if the the FeatureDocumentParser was configured with a token normalizer configured with embedding named entities turned off.

zensols.amr.domain#

Inheritance diagram of zensols.amr.domain

Error and exception classes.

exception zensols.amr.domain.AmrError(msg, sent=None)[source]#

Bases: APIError

Raised for package API errors.

__annotations__ = {}#
__init__(msg, sent=None)[source]#
__module__ = 'zensols.amr.domain'#
to_failure()[source]#

Create an AmrFailure from this error.

Return type:

AmrFailure

class zensols.amr.domain.AmrFailure(exception=None, thrower=None, traceback=None, message=None, sent=None)[source]#

Bases: Failure

A container class that describes AMR graph creation or handling error.

__init__(exception=None, thrower=None, traceback=None, message=None, sent=None)#
sent: str = None#

The natural language sentence that cased the error (usually parsing).

class zensols.amr.domain.Feature(feat_id, value)[source]#

Bases: FeatureMarker

mode = 2#

The mode attribute specifies what the Epidatum annotates:

  • mode=0 – unspecified

  • mode=1 – role epidata

  • mode=2 – target epidata

class zensols.amr.domain.FeatureMarker(feat_id, value)[source]#

Bases: Epidatum

__init__(feat_id, value)[source]#
feat_id#
classmethod from_string(s)[source]#
Return type:

Feature

value#

zensols.amr.dumper#

Inheritance diagram of zensols.amr.dumper

Plot AMR graphs.

class zensols.amr.dumper.Dumper(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False)[source]#

Bases: Dictable

Plots and writes AMR content in human readable formats.

__init__(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False)#
add_description: bool = True#

Whether to the description.

add_doc_dir: bool = True#

Whether to write files to a directory for the document.

clean()[source]#

Remove the output directory if it exists.

Return type:

bool

Returns:

whether the output directory existed

dump_doc(doc, target_name=None)[source]#

Dump the contents of the document to a directory. This includes the plots and graph strings of all sentence. This also includes a doc.txt file that has the graph strings and their sentence index.

Parameters:

doc (AmrDocument) – the document to plot

Return type:

List[Path]

Returns:

the paths to each file that was generated

dump_sent(sent, target_name=None)[source]#

Dump the contents of the sentence.

Return type:

List[Path]

front_text: str = None#

Text to add in the description.

overwrite_dir: bool = False#

Whether to remove the output directories.

overwrite_sent_file: bool = False#

Whether to remove the output sentence files.

plot_doc(doc, target_name=None)[source]#

Create a plot for each AMR sentence as a graph. The file is generated with graphviz in a temporary space, then moved to the target directory.

If the directory doesn’t exist, it is created.

Return type:

List[Path]

Returns:

the path(s) where the file(s) were generated

plot_sent(sent, target_name=None)[source]#

Create a plot of the AMR graph visually. The file is generated with graphviz in a temporary space, then moved to the target path.

Parameters:

target_name (str) – the file name added to :object:`target_dir`, or if None, computed from the sentence text

Return type:

List[Path]

Returns:

the path(s) where the file(s) were generated

render(cont, target_name=None)[source]#

Create a PDF for an AMR document or sentence as a graph. The file is generated with graphviz in a temporary space, then moved to the target directory.

See:

dump_sent()

See:

dump_doc()

Return type:

List[Path]

Returns:

the path(s) where the file(s) were generated

sent_file_format: str = '{sent.short_name}'#

The file format for sentence files when not supplied.

target_dir: Path#

The path where the file ends up; this defaults to the text of the sentence with an extension (i.e. .pdf).

width: int = 79#

The width of the text when rendering the graph.

write_text: bool = True#

Whether to write the sentence text in to the generated diagram.

class zensols.amr.dumper.GraphvizDumper(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False, extension='pdf', attribs=<factory>)[source]#

Bases: Dumper

Dumps plots created by graphviz using the dot program.

__init__(target_dir, add_doc_dir=True, write_text=True, sent_file_format='{sent.short_name}', add_description=True, front_text=None, width=79, overwrite_dir=False, overwrite_sent_file=False, extension='pdf', attribs=<factory>)#
attribs: Dict[str, str]#
extension: str = 'pdf'#

zensols.amr.model#

Inheritance diagram of zensols.amr.model

AMR parsing spaCy pipeline component and sentence generator.

class zensols.amr.model.AmrGenerator[source]#

Bases: object

A callable that generates natural language text from an AMR graph.

See:

__call__()

__init__()#
abstract generate(doc)[source]#

Generate a sentence from the AMR graph doc.

Parameters:

doc (AmrDocument) – the spaCy document used to generate the sentence

Return type:

AmrGeneratedDocument

Returns:

a document with the generated sentences

class zensols.amr.model.AmrParser(model='noop', add_missing_metadata=True)[source]#

Bases: ComponentInitializer

Parses natural language into AMR graphs. It has the ability to change out different installed models in the same Python session.

__init__(model='noop', add_missing_metadata=True)#
classmethod add_metadata(amr_sent, sent, clobber=False)[source]#

Add missing annotation metadata parsed from spaCy if missing, which happens in the case of using the T5 AMR model.

Parameters:
  • amr_sent (AmrSentence) – the sentence to populate

  • sent (Span) – the spacCy sentence used as the source

  • clobber (bool) – whether or not to overwrite any existing metadata fields

See:

is_missing_metadata()

add_missing_metadata: bool = True#

Whether to add missing metadata to sentences when missing.

See:

add_metadata()

annotate_amr(doc)[source]#

Add an amr attribute to the spaCy document.

Parameters:

doc (Doc) – the document to annotate

init_nlp_model(model, component)[source]#

Initialize the parser with spaCy API components.

static is_missing_metadata(amr_sent)[source]#

Return whether amr_sent is missing annotated metadata. T5 model sentences only have the snt metadata entry.

Parameters:

amr_sent (AmrSentence) – the sentence to populate

See:

add_metadata()

Return type:

bool

model: str = 'noop'#

The penman AMR model to use when creating AmrSentence instances, which is one of noop or amr. The first does not modify the graph but the latter normalizes out inverse relationships such as ARG*-of.

zensols.amr.model.create_amr_parser(nlp, name, parser_name)[source]#

Create an instance of AmrParser.

Return type:

AmrParser

zensols.amr.score#

Inheritance diagram of zensols.amr.score

Produces matching scores.

class zensols.amr.score.AmrScoreParser(doc_parser, keep_keys=None)[source]#

Bases: object

Parses AmrSentence instances from the snt metadata text string from a human annotated AMR. It then returns an instance that is to later be scored by ScoreMethod such as SmatchScoreCalculator.

__init__(doc_parser, keep_keys=None)#
doc_parser: FeatureDocumentParser#

The document parser used to generate the AMR. This should have sentence boundaries removed so only one AmrSentence is returned from the parse.

keep_keys: Tuple[str] = None#

The keys to keep/copy from the source AmrSentence.

parse(sent)[source]#

Parse the snt metadata from sent and as an AMR sentence.

Return type:

AmrSentence

class zensols.amr.score.SmatchScoreCalculator(reverse_sents=False)[source]#

Bases: ScoreMethod

Computes the smatch scores of AMR sentences using the Smatch package.

Citation:

Shu Cai and Kevin Knight 2013. Smatch: an Evaluation Metric for Semantic Feature Structures. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 748–752, Sofia, Bulgaria. Association for Computational Linguistics.

__init__(reverse_sents=False)#
document_smatch(gold, pred)[source]#

Return the smatch score produced from the sentences as pairs from two documents.

Return type:

HarmonicMeanScore

Returns:

a score with the precision, recall and F1

sentence_score(gold, pred)[source]#

Return the smatch score produced from two AMR sentences.

Return type:

HarmonicMeanScore

Returns:

a score with the precision, recall and F-score

zensols.amr.sent#

Inheritance diagram of zensols.amr.sent

AMR container classes that fit a document/sentence hierarchy.

class zensols.amr.sent.AmrGeneratedSentence(text, clipped, amr)[source]#

Bases: Writable

A sentence generated by the graph-to-text model.

__init__(text, clipped, amr)#
amr: AmrSentence#

The input sentence to the model used to predict text.

clipped: bool#

Whether the prediction output sentence truncated.

text: str#

The predicted text using amr as observation.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, clipped_inline=True, amr_kwargs=None)[source]#
Parameters:
  • clipped_inline (bool) – whether to render clipped in the text output line

  • amr_kwargs (Dict[str, Any]) – the keyword arguments to pass on to the rendering of amr

class zensols.amr.sent.AmrSentence(data, model=None)[source]#

Bases: PersistableContainer, Writable

Contains a sentence that contains an AMR graph and a Penman string version of the graph. Instances can be create with a Penman formatted string, an already parsed Graph or an AmrFailure for any upstream issues.

These kinds of issues result for situations where downstream APIs expect instances of this class, such as in bulk processing situations. When this happens, instance renders with an error message in the AMR metadata.

DEFAULT_MODEL: ClassVar[str] = 'noop'#

The default penman AMR model to use in the initializer, which is one of noop or amr. The first does not modify the graph but the latter normalizes out inverse relationships such as ARG*-of.

MAX_SHORT_NAME_LEN: ClassVar[int] = 30#

Max length of short_name property.

__init__(data, model=None)[source]#

Initialize based on the kind of data given.

Parameters:
  • data (Union[str, Graph, AmrFailure]) – either a Penman formatted string graph, an already parsed graph or an AmrFailure for upstream issues

  • model (str) – the model to use for encoding and decoding

clone(cls=None, **kwargs)[source]#

Return a deep copy of this instance.

Return type:

AmrSentence

get_data()[source]#

Return the graph if it is parse, else return the graph_string.

Return type:

Union[Graph, str]

property graph: Graph#

The graph as an in memory data structure.

property graph_only: str#

Like graph_string but without metadata

property graph_single_line: str#

Like graph_only but return as a single one line string.

property graph_string: str#

The graph as a string in Penman format.

property has_alignments: bool#

Whether this sentence has any alignments.

property instances: Dict[str, Tuple[str, str, str]]#
invalidate_graph_string(check_graph=True)[source]#

To be called when the graph changes that should be propagated to graph_string.

property is_failure: bool#

Whether the AMR graph failed to be parsed.

iter_aligns(include_types=False)[source]#

Return an iterator of the alignments of the graph as a tuple. Each iteration is a tuple of triple, the list of alignment indexes, and a tuple of bools if the index is a role alignment.

Parameters:

include_types – whether to include types, which is the third element in each tuple, else that element is None

property metadata: Dict[str, str]#

The graph metadata as a dict.

normalize()[source]#

Normalize the graph string to standard notation per the Penman API.

remove_alignments()[source]#

Remove text-to-graph alignments.

remove_wiki_attribs()[source]#

Removes the :wiki roles from the graph.

set_metadata(k, v)[source]#

Set a metadata value on the graph.

property short_name: str#

The short name of the sentences, which is the first several words.

property text: str#

The text of the natural language form of the sentence.

property tokenized_text: str#

This is useful when it is necessary to force white space tokenization to match the already tokenized metadata (‘tokens’ key). Examples include numbers followed by commas such as dates like April 25 , 2008.

property tree: Tree#

Return a tree structure of the graph using the top node.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_metadata=True, include_stack=False)[source]#
Parameters:
  • include_metadata (bool) – whether to add graph metadata to the output

  • include_stack (bool) – whether to add the stack trace of the parse of an error occured while trying to do so

zensols.amr.serial#

Inheritance diagram of zensols.amr.serial

A small serialization framework for AmrDocument and AmrSentence and other AMR artifcats.

class zensols.amr.serial.AmrSerializedFactory(includes)[source]#

Bases: Dictable

Creates instances of Serialized from instances of AmrDocument, AmrSentence or AnnotatedAmrDocument. These can then be used as Dictable instances, specifically with the asdict and asjson methods.

__init__(includes)#
create(instance)[source]#

Create a serializer from instance (see class docs).

Parameters:

instance (Union[AmrSentence, AmrDocument]) – the instance to be serialized

Return type:

Serialized

Returns:

an object that can be serialized using asdict and asjson method.

includes: Sequence[Union[Include, str]]#

The AMR data to include in the serialized output.

See:

Include

class zensols.amr.serial.Include(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

Indicates what to include at each level.

annotated_body = 9#
annotated_document_id = 7#
annotated_sections = 10#
annotated_summary = 8#
document_text = 1#
sentence_graph = 5#
sentence_id = 3#
sentence_metadata = 6#
sentence_text = 4#
sentences = 2#
class zensols.amr.serial.Serialized(includes)[source]#

Bases: Dictable

A base strategy class that can serialize AmrDocument and AmrSentence and other AMR artifcats.

__init__(includes)#
includes: Set[str]#

The AMR data to include in the serialized output.

See:

Include

class zensols.amr.serial.SerializedAmrDocument(includes, document)[source]#

Bases: Serialized

Serializes instance of AmrDocument.

__init__(includes, document)#
document: AmrDocument#

The document to serialize.

class zensols.amr.serial.SerializedAmrSentence(includes, sentence)[source]#

Bases: Serialized

Serializes instance of AmrSentence.

__init__(includes, sentence)#
sentence: AmrSentence#

The sentence to serialize.

class zensols.amr.serial.SerializedAnnotatedAmrDocument(includes, document)[source]#

Bases: SerializedAmrDocument

Serializes instance of AnnotatedAmrDocument.

__init__(includes, document)#

zensols.amr.spacyadapt#

Inheritance diagram of zensols.amr.spacyadapt

A set of adaptor classes from zensols.nlp.FeatureToken to spacy.tokens.Doc.

class zensols.amr.spacyadapt.SpacyDocAdapter(cont)[source]#

Bases: _SpacySpanAdapter

Adaps a zensols.nlp.FeatureDocument to a spacy.tokens.Doc.

__init__(cont)[source]#
property sents#

zensols.amr.trainer#

Inheritance diagram of zensols.amr.trainer

Continues training on an AMR model.

class zensols.amr.trainer.HFTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]#

Bases: Trainer

Interface in to the amrlib package’s HuggingFace model trainers.

__init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))#
property token_model_name: str#

The name of the tokenziation model as the pretrained AMR model files do not have these files.

class zensols.amr.trainer.SpringTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), train_files=None, dev_files=None)[source]#

Bases: Trainer

SPRING model trainer.

Citation:

Michele Bevilacqua et al. 2021. One SPRING to Rule Them Both: Symmetric AMR Semantic Parsing and Generation without a Complex Pipeline. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12564–12573, Virtual, May.

__init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), train_files=None, dev_files=None)#
dev_files: str = None#
train_files: str = None#
property training_config_file: Path#
class zensols.amr.trainer.T5Trainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]#

Bases: XfmTrainer

T5 model trainer.

Citation:

Colin Raffel et al. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):140:5485-140:5551, January.

__init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))#
class zensols.amr.trainer.T5WithTenseGeneratorTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), nltk_lib_dir=None, annotate_dir=None, annotate_model='en_core_web_sm')[source]#

Bases: XfmTrainer

__init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'), nltk_lib_dir=None, annotate_dir=None, annotate_model='en_core_web_sm')#
annotate_dir: Path = None#

The directory to add the annotated graphs.

annotate_model: str = 'en_core_web_sm'#

The spaCy model used to annotate graphs as features to the model.

nltk_lib_dir: Path = None#

Where to install the punkt tokenizer used by the trainer.

class zensols.amr.trainer.Trainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]#

Bases: Dictable

Interface in to the amrlib package’s trainers

__init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))#
corpus_prep_manager: CorpusPrepperManager#

Aggregates and applies corpus prepare instances.

model_installer: Installer = None#

The installer for the model used to train the model previously (i.e. by amrlib).

model_name: str#

Some human readable string identifying the model, and ends up in the amrlib_meta.json.

property output_dir: Path#
output_model_dir: Path#

The path where the model is copied and metadata files generated.

package_dir: Path = PosixPath('.')#

The directory to install the compressed distribution file.

property pretrained_path_or_model: str | Path#

The path to the checkpoint file or the string scratch if starting from scratch.

temporary_dir: Path#

The path where the trained model is saved.

train(dry_run=False)[source]#

Train the model (see class docs).

Parameters:

dry_run (bool) – when True, don’t do anything, just act like it.

property trainer_class: Type#

The AMR API class used for the training.

property training_config: Dict[str, Any]#

The parameters given to the instance of the trainer, which is the class derived with trainer_class.

property training_config_file: Path#

The path to the JSON configuration file in the amrlib repo in such as amrlib/configs/model_parse_*.json. If None, then try to find the configuration file genereted by the last pretrained model.

training_config_overrides: Dict[str, Any]#

More configuration that overrides/clobbers from the contents found in training_config_file.

version: str = '0.1.0'#

The version used in the amrlib_meta.json output metadata file.

class zensols.amr.trainer.XfmTrainer(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))[source]#

Bases: HFTrainer

Trainer for XFM and T5 models.

__init__(corpus_prep_manager, model_name, output_model_dir, temporary_dir, version='0.1.0', model_installer=None, training_config_file=None, training_config_overrides=<factory>, pretrained_path_or_model=None, package_dir=PosixPath('.'))#

zensols.amr.tree#

Inheritance diagram of zensols.amr.tree

Penman graph utilities and algorithms. These classes are currently only used for debugging and do not have any significant bearing on the overall package.

class zensols.amr.tree.TreeNavigator(graph, strip_alignments=False)[source]#

Bases: object

Finds nodes using indexed paths.

__init__(graph, strip_alignments=False)#
get_node(path)[source]#

Get a triple representing a graph node/edge of the given path.

Parameters:

path (Tuple[int, ...]) – a tuple of 0-based index used to get a node or edge.

Return type:

Tuple[Tuple[str, str, str], bool]

Returns:

the node/edge triple and True if a role edge

graph: Graph#

The graph that will be populated with alignments.

strip_alignments: bool = False#

Whether or not to strip alignment tags from the output. This is set to True in get_missing_alignments() for test cases. However, it’s might also useful for get_node().

class zensols.amr.tree.TreePruner(graph, keep_root_meta=True)[source]#

Bases: object

Create a subgraph using a tuple found in the graph configured (:mth:`penman.configure`) as a tree

__init__(graph, keep_root_meta=True)#
create_sub(query)[source]#

Create a subgraph using a tuple found in the graph configured (:mth:`penman.configure`) as a tree. Everything starting at query and down is included in the resulting graph.

Parameters:

query (Tuple[str, str, Any]) – a triple found in the contained graph

Return type:

Graph

Returns:

a subgraph starting at the query node

graph: Graph#

The graph to be pruned.

keep_root_meta: bool = True#

Whether to keep the original metadata when the query is the root. When this is True, the original graph is returned from create_sub() when the its query parameter is the root of graph.

zensols.amr.varidx#

Inheritance diagram of zensols.amr.varidx

A utility class to reindex variables in an :class`.AmrDocument`.

class zensols.amr.varidx.VariableIndexer[source]#

Bases: object

This reentrant class reindexes all variables for sentences of a AmrDocument so all node variables are unique. This is done by:

  1. Index concepts by the first character of their name (i.e. s for see-01) across sentences.

  2. Compile a list of variable replacements (i.e. s2 -> s5) on a per sentence basis.

  3. Replace variable names based on their document level index order (i.e. s, s2, etc). This is done for all concepts, edges, roles, and the epigraph. A new graph is created for those that have at least one modification, otherwise the original sentence is kept.

__init__()#
reindex(sents)[source]#

Reindex and repalce variables in sents. Any modified graphs are updated in the sents instances.

Parameters:

sents (Sequence[AmrSentence]) – sentences whose variables will be reindexed

Module contents#