zensols.calamr package

Subpackages

Submodules

zensols.calamr.alignconst module

Defines a class that aligns components of a (bipartite) graph.

class zensols.calamr.alignconst.GraphAlignmentConstructor(doc_graph=None, source_flow=10000000000.0)[source]

Bases: object

Adds additional nodes and edges that enable the maxflow algorithm to be used on the graph. This include component alignment edges, source node and sink node. Capacities for the component alignment edges are also set.

__init__(doc_graph=None, source_flow=10000000000.0)
add_edges(capacities, cls=<class 'zensols.calamr.attr.ComponentAlignmentGraphEdge'>)[source]

Add capacities as graph capacities to the graph.

Parameters:
  • capacities (Iterable[Tuple[int, int, float]]) – the vertexes and capacities in the form: (<source vertex index>, <summary index>, <capacity>)

  • cls (Type[GraphEdge]) – the type of object to instantiate for the GraphEdge alignment

Return type:

List[GraphEdge]

build()[source]

Build the graph by adding component alignment capacities.

property doc_graph: DocumentGraph

A document graph that contains the graph to be aligned.

requires_reversed_edges()[source]
Return type:

bool

set_capacities(edges, capacity=10000000000.0)[source]

Set capacity on all edges.

property sink_flow_node: Vertex

The sink flow node.

source_flow: float = 10000000000.0

The capacity to use for the source node of the transporation graph.

property source_flow_node: Vertex

The source flow node.

update_capacities(caps)[source]

Update the capacities of the graph component.

Parameters:

caps (Dict[int, int]) – the capacities with key/value pairs as <edge ID>/<capacity>

Return type:

Dict[int, int]

Returns:

the caps parameter

zensols.calamr.aligner module

Contains classes that run the algorithm to compute graph component alignments.

class zensols.calamr.aligner.DocumentGraphAligner(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir)[source]

Bases: ABC

Aligns the graph components of doc_graph and visualizes them with renderer.

MAX_RENDER_LEVEL: ClassVar[int] = 10

The maximum value for render_level.

__init__(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir)
align(doc_graph)[source]

Align the graph components of doc_graph and optionally visualize them with renderer. To disable rendering, set render_level to 0.

Parameters:

doc_graph (DocumentGraph) – the graph created by the DocumentGraphFactory

Return type:

FlowGraphResult

Returns:

the alignments available as the in memory graph and object graph, Pandas dataframes, statistics and scores

config_factory: ConfigFactory

Used to create the GraphSequencer instance.

create_error_result(ex, msg='Could not align')[source]

Create an error graph result (rather than an alignment result). This should be called in a try/catch to obtain the error information.

Parameters:

ex (Exception) – the exception that caused the issue

Param:

msg: the error message for the failure

Return type:

FlowGraphResult

doc_graph_name: str

The DocumentGraph.name document return from align().

flow_graph_result_name: str

The app configuration section name of FlowGraphResult.

init_loops_render_level: int

The render_level to use for all iteration loops except for the last before the algorithm converges.

classmethod is_valid_render_level(render_level, should_raise=False)[source]
Return type:

bool

Return whether render_level is a valid value for :obj:`render_level.

output_dir: Path

If this is set, the graphs are written to this created directory on the file system. Otherwise, they are displayed and cleaned up afterward.

render_level: int

How many graphs to render on a scale from 0 - 10. The higher the number the more likely a graph is to be rendered. A value of 0 prevents rendering and a setting of 10 will render all graphs.

See:

MAX_RENDER_LEVEL

renderer: GraphRenderer

Visually render the graph in to a human understandable presentation.

class zensols.calamr.aligner.DocumentGraphController(name)[source]

Bases: Dictable

Executes the maxflow/min cut algorithm on a document graph.

__init__(name)
invoke(doc_graph)[source]

Perform operations on the graph algorithm.

Parameters:

doc_graph (DocumentGraph) – the graph to edit

Return type:

int

Returns:

the number of edits made to the graph

name: str

The configuration instance name for debugging

reset()[source]

Reset all state in application context shared objects so new data is forced to be created on the next alignment request.

class zensols.calamr.aligner.GraphIteration(sequence, render_level, updates)[source]

Bases: Dictable

An iteration of the alignment algorithm.

__init__(sequence, render_level, updates)
render_level: int

Whether to render graphs on a scale from 0 - 10. The higher the number the more likely it is to be rendered with 0 never rendering the graph, and 10 always rendering the graph.

See:

DocumentGraphAligner.MAX_RENDER_LEVEL

reset()[source]

Reset all state in application context shared objects so new data is forced to be created on the next alignment request.

sequence: GraphSequence

The sequence to use for this iteration.

updates: bool

Whether to report updates by the iteration, otherwise the iteration updates are counted.

class zensols.calamr.aligner.GraphSequence(name, process_name, render_name, heading, controller, sequencer)[source]

Bases: Dictable

A strategy GoF pattern that models what to do during a sequence of graph modifications using DocumentGraphController. It also contains rendering information for visualization.

__init__(name, process_name, render_name, heading, controller, sequencer)
controller: Optional[DocumentGraphController]

The controller used in the invocation of this strategy.

heading: str

The text used in the heading of the graph rendering.

invoke()[source]

Invoke the strategy. This implementation calls the controller with the process_graph to be processed and passes back the update count.

Return type:

int

name: str

The name of the sequence, which is used to key its graphs.

populate_render_context(context)[source]

Alows the sequence to override the parameters before being sent to the graph rendinger API.

property process_graph: DocumentGraph

The graph provided to the graph controller.

process_name: str

The name of the graph provided to the graph controller. See process_graph.

property render_graph

The graph to render.

render_name: str

The name of the graph to render. See render_graph.

reset()[source]

Reset all state in application context shared objects so new data is forced to be created on the next alignment request.

sequencer: GraphSequencer

Owns and controls this instance.

class zensols.calamr.aligner.GraphSequencer(config_factory, sequence_path, nascent_graph, render, descriptor=None, heading_format=None)[source]

Bases: object

This invokes the GraphSequence objects in the provided sequence to automate the graph alignment algorithm and used by MaxflowDocumentGraphAligner.

__init__(config_factory, sequence_path, nascent_graph, render, descriptor=None, heading_format=None)[source]

Initialize this instance.

Parameters:
get_graph(name=None)[source]

Return a graph by name.

Return type:

DocumentGraph

property render_level: int

Whether to render graphs on a scale from 0 - 10. See DocumentGraphAligner.MAX_RENDER_LEVEL.

reset()[source]

Reset all state in application context shared objects so new data is forced to be created on the next alignment request.

run(name)[source]
Return type:

int

class zensols.calamr.aligner.MaxflowDocumentGraphAligner(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir, graph_sequencer_name, max_sequencer_iterations, hyp)[source]

Bases: DocumentGraphAligner

Uses the maxflow/min cut algorithm to compute graph component alignments.

__init__(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir, graph_sequencer_name, max_sequencer_iterations, hyp)
graph_sequencer_name: str

The app configuration section name of GraphSequencer.

hyp: HyperparamModel

The capacity calculator hyperparameters.

See:

summary.CapacityCalculator.hyp

max_sequencer_iterations: int

The max number of iterations of the sequencer loop. This is the max number of times the loop iteration set runs if the maxflow algorithm doesn’t converge (0 changes on bipartite capacities) first.

class zensols.calamr.aligner.RenderUpSideDownGraphSequence(name, process_name, render_name, heading, controller, sequencer)[source]

Bases: GraphSequence

A graph sequence that tells graphviz to render the diagram upside down, which is useful for reverse flow graphs.

__init__(name, process_name, render_name, heading, controller, sequencer)
populate_render_context(context)[source]

Alows the sequence to override the parameters before being sent to the graph rendinger API.

zensols.calamr.annotate module

Contain a class to add embeddings to AMR feature documents.

class zensols.calamr.annotate.AddEmbeddingsFeatureDocumentStash(delegate, word_piece_doc_factory=None)[source]

Bases: DelegateStash, PrimeableStash

Add embeddings to AMR feature documents. Embedding population is disabled by configuring word_piece_doc_factory as None.

__init__(delegate, word_piece_doc_factory=None)
get(name, default=None)[source]

Load an object or a default if key name doesn’t exist.

Implementation note: sub classes will probably want to override this method given the super method is cavalier about calling exists:() and load(). Based on the implementation, this can be problematic.

Return type:

Any

load(name)[source]

Load a data value from the pickled data with key name. Semantically, this method loads the using the stash’s implementation. For example DirectoryStash loads the data from a file if it exists, but factory type stashes will always re-generate the data.

See:

get()

Return type:

AmrFeatureDocument

word_piece_doc_factory: WordPieceFeatureDocumentFactory = None

The feature document factory that populates embeddings.

class zensols.calamr.annotate.CalamrAnnotatedAmrFeatureDocumentFactory(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0, word_piece_doc_factory=None)[source]

Bases: AnnotatedAmrFeatureDocumentFactory

Adds wordpiece embeddings to AmrFeatureDocument instances.

__init__(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0, word_piece_doc_factory=None)
from_dict(data)[source]

Parse and create an AMR document from a dict.

Parameters:
  • data (Dict[str, str]) – the AMR text to be parsed each entry having keys summary and body

  • doc_id – the document ID to set as AmrFeatureDocument.doc_id

Return type:

AmrFeatureDocument

to_annotated_doc(doc)[source]

Clone doc.amr into an AnnotatedAmrDocument.

Parameters:

sent – the document to convert to an AnnotatedAmrDocument

Return type:

AmrFeatureDocument

Returns:

a feature document with a new amr to new AnnotatedAmrDocument, which is a new instance if sent isn’t an annotated AMR document

word_piece_doc_factory: WordPieceFeatureDocumentFactory = None

The feature document factory that populates embeddings.

class zensols.calamr.annotate.ProxyReportAnnotatedAmrDocument(sents, path=None, doc_id=None)[source]

Bases: AnnotatedAmrDocument

Overrides the sections property to skip duplicate summary sentences also found in the body.

__init__(sents, path=None, doc_id=None)

Initialize.

Parameters:
  • sents (Tuple[AmrSentence, …]) – the document’s sentences

  • path (Optional[Path, …]) – the path to file containing the Penman notation sentence graphs used in sents

  • model – the model to initailize AmrSentence when sents is a list of string Penman graphs

property sections: Tuple[AnnotatedAmrSectionDocument]

The sentences that make up the body of the document.

zensols.calamr.app module

Alignment entry point application.

class zensols.calamr.app.AlignmentApplication(resource, config_factory)[source]

Bases: _AlignmentBaseApplication

This application aligns data in files.

__init__(resource, config_factory)
align_file(input_file, output_dir=None, output_format=Format.csv, render_level=None)[source]

Align annotated documents from a JSON file.

Parameters:
  • input_file (Path) – the input JSON file.

  • output_dir (Path) – the output directory

  • output_format (Format) – the output format

  • render_level (int) – how many graphs to render (0 - 10), higher means more

config_factory: ConfigFactory

Application configuration factory.

class zensols.calamr.app.CorpusApplication(resource, config_factory, results_dir)[source]

Bases: _AlignmentBaseApplication

AMR graph aligment.

__init__(resource, config_factory, results_dir)
align_corpus(keys, output_dir=None, output_format=Format.csv, render_level=None, use_cached=False)[source]

Align an annotated AMR document from the corpus.

Parameters:
  • keys (str) – comma-separated list of dataset keys or file name

  • output_dir (Path) – the output directory

  • output_format (Format) – the output format

  • render_level (int) – how many graphs to render (0 - 10), higher means more

  • use_cached (bool) – whether to use a cached result if available

config_factory: ConfigFactory

For prototyping.

dump_annotated(limit=None, output_dir=None, output_format=Format.csv)[source]

Write annotated documents and their keys.

Parameters:
  • limit (int) – the max of items to process

  • output_dir (Path) – the output directory

  • output_format (Format) – the output format

get_annotated_summary(limit=None)[source]

Return a CSV file with a summary of the annotated AMR dataset.

Parameters:

limit (int) – the max of items to process

Return type:

DataFrame

results_dir: Path

The directory where the output results are written, then read back for analysis reporting.

write_adhoc_corpus(corpus_file=None)[source]

Write the adhoc corpus from the JSON created file.

Parameters:

corpus_file (Path) – the file with the source and summary sentences

write_keys()[source]

Write the keys of the configured corpus.

class zensols.calamr.app.Resource(anon_doc_stash, doc_factory, serialized_factory, doc_graph_factory, doc_graph_aligner, flow_results_stash, anon_doc_factory)[source]

Bases: object

A client facade (GoF) for Calamr annotated AMR corpus access and alginment.

__init__(anon_doc_stash, doc_factory, serialized_factory, doc_graph_factory, doc_graph_aligner, flow_results_stash, anon_doc_factory)
align(doc_graph)[source]

Create flow results from a document graph.

Parameters:

doc_graph (DocumentGraph) – the source/summary components to align

Return type:

FlowGraphResult

Returns:

the aligned bipartite graph and its statistics

align_corpus_document(doc_id, use_cache=True)[source]

Create flow results of a corpus AMR document.

Parameters:
  • doc_id (str) – the corpus document ID (i.e. liu-example or 20041010_0024)

  • use_cache (bool) – whether to cache (and use cached) results

Return type:

Optional[FlowGraphResult]

Returns:

the flow results for the corpus document or None if doc_id is not a valid key

anon_doc_factory: AnnotatedAmrFeatureDocumentFactory

Creates instances of AmrFeatureDocument.

anon_doc_stash: Stash

Contains human annotated AMRs. This could be from the adhoc (micro) corpus (small toy corpus), AMR 3.0 Proxy Report corpus, Little Prince, or the Bio AMR corpus.

create_graph(doc)[source]

Return a new document graph based on feature document.

Parameters:

doc (AmrFeatureDocument) – the document on which to base the new graph

Return type:

DocumentGraph

Returns:

a new AMR document graph

doc_factory: AmrFeatureDocumentFactory

Creates AmrFeatureDocument from AmrDocument instances.

doc_graph_aligner: DocumentGraphAligner

Create document graphs.

doc_graph_factory: DocumentGraphFactory

Create document graphs.

flow_results_stash: Stash

Creates cached instances of FlowGraphResult.

get_corpus_document(doc_id)[source]

Get an AMR feature document by key from the application configured corpus.

Parameters:

doc_id (str) – the corpus document ID (i.e. liu-example or 20041010_0024)

Return type:

AmrFeatureDocument

Returns:

the AMR feature document

get_corpus_keys()[source]

Get the keys of the application configured AMR corpus.

Return type:

Iterable[str]

parse_documents(data)[source]

Parse documents with keys id, comment, body, and summary from a dict, sequence of dict instanaces. or JSON file in the format:

[{
    "id": "ex1",
    "comment": "very short",
    "body": "The man ran to make the train. He just missed it.",
    "summary": "A man got caught in the door of a train he missed."
}]
Return type:

Iterable[AmrFeatureDocument]

Returns:

the parsed AMR feature document

See:

AnnotatedAmrFeatureDocumentFactory

serialized_factory: AmrSerializedFactory

Creates a Serialized from AmrDocument, AmrSentence or AnnotatedAmrDocument.

to_annotated_doc(doc)[source]

Return an annotated feature document, creating the feature document if necessary. The doc.amr attribute is set to annotated AMR document.

Parameters:

doc (Union[AmrDocument, AmrFeatureDocument]) – an AMR document or an AMR feature document

Return type:

AmrFeatureDocument

Returns:

a new instance of a document if doc is not a AmrFeatureDocument or if doc.amr is not an AnnotatedAmrDocument

to_feature_doc(amr_doc, catch=False, add_metadata=False, add_alignment=False)[source]

Create a AmrFeatureDocument from a class:.AmrDocument by parsing the snt metadata with a FeatureDocumentParser.

Parameters:
  • add_metadata (Union[str, bool]) – add missing annotation metadata to amr_doc parsed from spaCy if missing (see AmrParser.add_metadata()) if True and replace any previous metadata if this value is the string clobber

  • catch (bool) – if True, return caught exceptions creating a AmrFailure from each and return them

Return type:

Union[AmrFeatureDocument, Tuple[AmrFeatureDocument, List[AmrFailure]]]

Returns:

an AMR feature document if catch is False; otherwise, a tuple of a document with sentences that were successfully parsed and a list any exceptions raised during the parsing

zensols.calamr.attr module

Graph node and edge domain classes.

Terminology: token alignments refer to the sentence index based token alignments to AMR nodes. This is not to be confused with alignment edges (aka graph component alignment edges).

class zensols.calamr.attr.AmrDocumentNode(context, name, root, children, doc)[source]

Bases: DocumentNode

A composite note containing a subset of the DocumentNode.root sentences. This includes the text, text features, and AMR Penman graph data.

__init__(context, name, root, children, doc)
doc: AmrFeatureDocument

A document containing a subset of sentences that fall under this portion of the graph.

class zensols.calamr.attr.AttributeGraphNode(context, sent, token_aligns, triple)[source]

Bases: TripleGraphNode

Attribute data from AMR attribute nodes grafted on to the igraph.Graph.

ATTRIB_TYPE: ClassVar[str] = 'attribute'

The attribute type this class represents.

__init__(context, sent, token_aligns, triple)
property constant: Any

The constant defined by the attribute from the Penman graph.

property role: str

The AMR role taken from Penman graph node.

class zensols.calamr.attr.ComponentAlignmentGraphEdge(context, capacity=0, flow=0)[source]

Bases: GraphEdge

An edge that spans graph components.

ATTRIB_TYPE: ClassVar[str] = 'component alignment'

The attribute type this class represents.

__init__(context, capacity=0, flow=0)
class zensols.calamr.attr.ComponentCorefAlignmentGraphEdge(context, capacity=0, flow=0, relation=None, is_bipartite=False)[source]

Bases: ComponentAlignmentGraphEdge

An edge that spans graph components.

ATTRIB_TYPE: ClassVar[str] = 'component coref alignment'

The attribute type this class represents.

__init__(context, capacity=0, flow=0, relation=None, is_bipartite=False)
is_bipartite: bool = False

Whether the coreference spans components.

relation: Relation = None

The AMR coreference relation between this node and all other refs.

class zensols.calamr.attr.ConceptGraphNode(context, sent, token_aligns, triple)[source]

Bases: TripleGraphNode

Attribute data from AMR concept nodes grafted on to the igraph.Graph.

ATTRIB_TYPE: ClassVar[str] = 'concept'

The attribute type this class represents.

__init__(context, sent, token_aligns, triple)
property has_roleset: bool
property instance: str

The concept instance, such as the propbank entry (i.e. see-01). Other examples include nouns.

property roleset: Roleset
property roleset_embedding: Tensor
property roleset_id: RolesetId
property token_embedding: Tensor | None
class zensols.calamr.attr.DocumentGraphEdge(context, capacity=0, flow=0, relation='')[source]

Bases: GraphEdge

An edge that has data about the non-AMR parts of the graph, such as sentence.

ATTRIB_TYPE: ClassVar[str] = 'doc'

The attribute type this class represents.

__init__(context, capacity=0, flow=0, relation='')
relation: str = ''

The edge relation between two document nodes or document to igraph node.

class zensols.calamr.attr.DocumentGraphNode(context, level_name, doc_node)[source]

Bases: GraphNode

A node that has data about the non-AMR parts of the graph, such as the unifying top level node that ties the sentences together. However, it can contain the root to an AMR sentence (see AmrDocumentNode).

ATTRIB_TYPE: ClassVar[str] = 'doc'

The attribute type this class represents.

__init__(context, level_name, doc_node)
doc_node: DocumentNode

The document node associated with the attached igraph node.

level_name: str

The descriptive name of the node such as doc or section.

class zensols.calamr.attr.DocumentNode(context, name, root, children)[source]

Bases: GraphNode

A composite of a node in the document tree that are associated with the FeatureDocument as root node. This class represents nodes in a graph that either:

  • make up the part of the graph that’s disjoint from the AMR sentinel subgraphs (i.e. a root doc node), or

  • the root to an AMR sentence (see AmrDocumentNode)

The in-memory object graph of these instances are dependent on the type of data it represents. For example, the Proxy Report corpus has a top level a summary and body nodes with AMR sentences below (root on top).

__init__(context, name, root, children)
children: Tuple[DocumentNode, ...]

The children of this node with respect to the composite pattern.

property children_by_name: DocumentNode

The children’s names as keys and respective document nodes as capacitys.

name: str

The descriptive name of the node such as doc or section.

root: AmrFeatureDocument

The owning feature document containing all sentences/tokens of the graph.

property sents: Tuple[AmrFeatureSentence]

The sentences of the this document level.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.attr.GraphEdge(context, capacity=0, flow=0)[source]

Bases: GraphAttribute

Graph attriubte data added to the igraph.Graph edges.

MAX_CAPACITY: ClassVar[float] = 10000000000.0

Maximum value a capacity.

Implementation note: It seems igraph can only handle large values to represent infinity, and not float inf or the system defined largest float value.

__init__(context, capacity=0, flow=0)
property capacity: float

The capacity of the edge.

capacity_str(precision=None)[source]
Return type:

str

property flow: float

The flow calculated by the maxflow algorithm.

flow_str(precision=None)[source]
Return type:

str

property value_str: str
class zensols.calamr.attr.GraphNode(context)[source]

Bases: GraphAttribute

Graph attribute data added to the igraph.Graph vertexes.

__init__(context)
property partition: int
class zensols.calamr.attr.RoleGraphEdge(context, sent, token_aligns, capacity=0, flow=0, triple=None, role=None)[source]

Bases: GraphEdge, SentenceGraphAttribute

Attribute data from the AMR role edges grafted on to the igraph.Graph.

ATTRIB_TYPE: ClassVar[str] = 'role'

The attribute type this class represents.

__init__(context, sent, token_aligns, capacity=0, flow=0, triple=None, role=None)
role: Union[str, Role] = None

The role name of the edge such as :ARG0.

triple: Union[Instance, Attribute] = None

The AMR Penman graph triple.

class zensols.calamr.attr.SentenceGraphAttribute(context, sent, token_aligns)[source]

Bases: GraphAttribute

A node containing zero or more tokens with its parent sentence. Usually the AMR node represents a single token, but can have more than one token alignment.

__init__(context, sent, token_aligns)
property has_token_embedding: bool

Whether this attriubte node has token embeddings.

property indices: Tuple[int, ...]

Return the concatenated list of indices of the alginments.

sent: AmrFeatureSentence

The sentence from which this node was created.

property token_align_str: str

A string representation of the AMR Penman representation of the token alignment.

token_aligns: Tuple[Union[Alignment, RoleAlignment], ...]

The node to sentinel token index.

property tokens: Tuple[FeatureToken, ...]

The tokens

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.attr.SentenceGraphEdge(context, capacity=0, flow=0, relation='', sent=None, sent_ix=None)[source]

Bases: DocumentGraphEdge

An edge from a document node to a SentenceGraphNode.

ATTRIB_TYPE: ClassVar[str] = 'sentence'

The attribute type this class represents.

__init__(context, capacity=0, flow=0, relation='', sent=None, sent_ix=None)
sent: AmrFeatureSentence = None

The sentence from which this node was created.

sent_ix: int = None

The sentence index.

class zensols.calamr.attr.SentenceGraphNode(context, sent, sent_ix)[source]

Bases: GraphNode

A graph node containing the root of a sentence.

ATTRIB_TYPE: ClassVar[str] = 'sentence'

The attribute type this class represents.

SENT_TEXT_LEN: ClassVar[int] = 20

The truncated sentence length

__init__(context, sent, sent_ix)
sent: AmrFeatureSentence

The sentence from which this node was created.

sent_ix: int

The sentence index.

class zensols.calamr.attr.TerminalGraphEdge(context, capacity=0, flow=0)[source]

Bases: GraphEdge

An edge that connects to terminal a TerminalGraphNode.

ATTRIB_TYPE: ClassVar[str] = 'terminal'

The attribute type this class represents.

__init__(context, capacity=0, flow=0)
class zensols.calamr.attr.TerminalGraphNode(context, is_source)[source]

Bases: GraphNode

A flow control: source or sink.

ATTRIB_TYPE: ClassVar[str] = 'control'

The attribute type this class represents.

__init__(context, is_source)
is_source: bool

Whether or not this source (s) or sink (t).

class zensols.calamr.attr.TripleGraphNode(context, sent, token_aligns, triple)[source]

Bases: SentenceGraphAttribute, GraphNode

Contains a Penman triple with token alignments used for concepts and AMR attributes. Instances of this class get their embedding via SentenceGraphAttribute._get_embedding().

__init__(context, sent, token_aligns, triple)
triple: Union[Instance, Attribute]

The AMR Penman graph triple.

property variable: str

The variable, which comes from the source of the triple, such as s0.

zensols.calamr.cli module

Command line entry point to the application.

class zensols.calamr.cli.ApplicationFactory(*args, **kwargs)[source]

Bases: ApplicationFactory

__init__(*args, **kwargs)[source]
classmethod get_resource(*args, **kwargs)[source]

A client facade (GoF) for Calamr annotated AMR corpus access and alginment.

Return type:

Resource

zensols.calamr.cli.main(args=['/Users/landes/opt/lib/python/util/bin/sphinx-build', '-M', 'html', '/Users/landes/view/nlp/calamr/target/doc/src', '/Users/landes/view/nlp/calamr/target/doc/build'], **kwargs)[source]
Return type:

ActionResult

zensols.calamr.comp module

Base graph component class.

class zensols.calamr.comp.GraphComponent(graph)[source]

Bases: PersistableContainer, Writable

A container class for an igraph.Graph, which also has caching data structures for fast access to graph attributes.

GRAPH_ATTRIB_NAME: ClassVar[str] = 'ga'

The name of the graph attributes on igraph nodes and edges.

ROOT_ATTRIB_NAME: ClassVar[str] = 'root'

The attribute that identifies the root vertex.

__init__(graph)
property adjacency_list: List[List[int]]

“An adjacency list of vertexes based on their relation to each other in the graph. The outer list’s index is the source vertex and the inner list is that vertex’s neighbors.

Implementation note: the list is sub-setted at both the inner and outer level for those vertexes in this component.

clone(reverse_edges=False, deep=True, **kwargs)[source]

Clone an instance and return it. The graph is deep copied, but all GraphAttribute instances are not.

Parameters:
  • reverse_edges (bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithm

  • deep (bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed of vs; otherwise the graph is copied as a subcomponent starting from root

  • kwargs – arguments to add to as attributes to the clone; include cls is the type of the new instance

Return type:

GraphComponent

Returns:

the cloned instance of this instance

copy_graph(reverse_edges=False, subgraph_type=None)[source]

Return a copy of the class:igraph.Graph.

Parameters:
  • reverse_edges (bool) – whether to reverse the direction on all edges in the graph

  • subgraph_type (str) – the method of creating a subgraph, which is either induced to create it from the nodes of the graph; or sub (the default) to create it from the subcomponent from the root

Return type:

Graph

static create_edges(g, es)[source]

Add the edges of list es and return them after being added.

Return type:

Tuple[Edge]

static create_vertexes(g, n)[source]

Create n vertexes and return them after being added.

Return type:

Tuple[Vertex]

deallocate()[source]

Deallocate all resources for this instance.

delete_edges(edges, include_nodes=False)[source]

Remove edges from the graph.

Parameters:
  • edges (Iterable[GraphEdge]) – the edges to remove form the graph

  • include_nodes (bool) – whether to remove nodes that become orphans after deleting edges

Return type:

Tuple[Set[GraphEdge], Set[GraphNode]]

Returns:

a tuple the edges and nodes removed

edge_by_graph_edge_id(ge_id)[source]

Return a edge based on the graph (attribute) edge id.

Return type:

GraphEdge

edge_by_id(ix)[source]

Return the edge for the vertex ID.

Return type:

GraphEdge

edge_ref_by_id(ix)[source]

Get the igraph.Edge instance by its index.

Return type:

Edge

property edges_reversed: bool

Whether the edge direction in the graph is reversed. This is True for reverse flow graphs.

See:

summary.ReverseFlowGraphAlignmentConstructor

property es: Dict[Edge, GraphEdge]

The igraph to domain object edge mapping.

get_attributes()[source]

Return all graph attributes of the component, which include instances of both GraphNode and GraphEdge.

Return type:

Iterable[GraphAttribute]

property graph: Graph

The graph used for computational manipulation of the synthesized AMR sentences.

property graph_edge_id_to_edge_ref: Dict[int, int]

Graph node index to vertex index.

property graph_edge_to_edge: Dict[GraphEdge, Edge]

A mapping from graph nodes to vertexes.

static graph_instance()[source]

Create a new directory nascent graph.

Return type:

Graph

property graph_node_id_to_vertex_ref: Dict[int, int]

Graph node index to vertex index.

invalidate()[source]

Clear cached data structures to force them to be recreated after igraph level data has changed.

node_by_graph_node_id(gn_id)[source]

Return a node based on the graph (attribute) node id.

Return type:

GraphNode

node_by_id(ix)[source]

Return the graph node for the vertex ID.

Return type:

GraphNode

property node_to_vertex: Dict[GraphNode, Vertex]

A mapping from graph nodes to vertexes.

property root: Vertex | None

The singular (first found) root of the graph, which is usually the top level DocumentNode instance.

property roots: Iterable[Vertex]

The roots of the graph, which are usually top level DocumentNode instances.

select_edges(**kwargs)[source]

Return matched graph edges from an igraph.Graph.vs.select().

Return type:

Iterable[Edge]

select_vertices(**kwargs)[source]

Return matched graph nodes from an igraph.Graph.vs.select().

Return type:

Iterable[Vertex]

classmethod set_edge(e, ge)[source]

Set the graph edge data in the igraph edge.

classmethod set_node(v, n)[source]

Set the graph node data in the igraph vertex.

classmethod to_edge(e)[source]

Narrow a vertex to a edge.

Return type:

GraphEdge

classmethod to_node(v)[source]

Narrow a vertex to a node.

Return type:

GraphNode

vertex_ref_by_id(ix)[source]

Get the igraph.Vertex instance by its index.

Return type:

Vertex

property vs: Dict[Vertex, GraphNode]

The igraph to domain object vertex mapping.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write the contents of this instance to writer using indention depth.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

zensols.calamr.ctrl module

Document graph controller implementations.

class zensols.calamr.ctrl.AlignmentCapacitySetDocumentGraphController(name, min_capacity, capacity)[source]

Bases: DocumentGraphController

Set the capacity on edges if the criteria matches min_flow, component_names and match_edge_classes.

__init__(name, min_capacity, capacity)
capacity: float

The capacity to set.

min_capacity: float

The minimum capacity to clamping the capacity of a target GraphEdge to capacity.

class zensols.calamr.ctrl.ConstructDocumentGraphController(name, build_graph_name, constructor, renderer)[source]

Bases: DocumentGraphController

Constructs the graph that will later be used for the min cut/max flow algorithm (see MaxflowDocumentGraphController). After its invoke() method is called, build_graph is available, which is the constructed graph provided by constructor.

__init__(name, build_graph_name, constructor, renderer)
build_graph_name: str

The name given to newly instances of DocumentGraph.

constructor: GraphAlignmentConstructor

The constructor used to get the source and sink nodes.

renderer: GraphRenderer

Visually render the graph in to a human understandable presentation.

class zensols.calamr.ctrl.FixReentrancyDocumentGraphController(name, component_name, maxflow_controller, only_report)[source]

Bases: DocumentGraphController

Fix reentrancies by splitting the flow of the last calculated maxflow as the capacity of the outgoing edges in the reversed graph. This fixes the issue edges getting flow starved, then later eliminated in the graph reduction steps.

Subsequently, the maxflow algorithm is rerun if we have at least one reentrancy after reallocating the capacit(ies).

__init__(name, component_name, maxflow_controller, only_report)
component_name: str

The name of the components to restore.

maxflow_controller: MaxflowDocumentGraphController

The maxflow component used to recalculate the maxflow .

only_report: bool

Whether to only report reentrancies rather than fix them.

class zensols.calamr.ctrl.FlowDiscountDocumentGraphController(name, discount_sum, component_names=<factory>)[source]

Bases: DocumentGraphController

Decrease/constrict the capacities by making the sum of the incoming flows from the bipartitie edges the value of discount_sum. The capacities are only updated if the sum of the incoming bipartitie edges have a flow greater than discount_sum.

__init__(name, discount_sum, component_names=<factory>)
component_names: Set[str]

The name of the components to discount.

discount_sum: float

The capacity sum will be this value (see class docs).

class zensols.calamr.ctrl.FlowSetDocumentGraphController(name, component_names=<factory>, match_edge_classes=<factory>, flow=0)[source]

Bases: DocumentGraphController

Set a static flow on components based on name and edges based on class.

__init__(name, component_names=<factory>, match_edge_classes=<factory>, flow=0)
component_names: Set[str]

The components on which to set the flow.

flow: float = 0

The flow to set.

match_edge_classes: Set[Type[GraphEdge]]

The edge classes (i.e. TerminalGraphEdge) to set the flow.

class zensols.calamr.ctrl.MaxflowDocumentGraphController(name, constructor)[source]

Bases: DocumentGraphController

Executes the maxflow/min cut algorithm on a document graph.

__init__(name, constructor)
constructor: GraphAlignmentConstructor

The constructor used to get the source and sink nodes.

reset()[source]

Clears the cached build_graph instance.

class zensols.calamr.ctrl.NormFlowDocumentGraphController(name, component_names, constructor, normalize_mode='fpn')[source]

Bases: DocumentGraphController

Normalizes flow on edges as the flow going through the edge and the total number of descendants. Descendants are counted as the edge’s source node and all children/descendants of that node.

This is done recursively to calculate flow per node. For each call recursive iteration, it computes the flow per node of the parent edge(s) from the perspective of the nascent graph, (root at top with arrows pointed to children underneath). However, the graph this operates on are the reverese flow max flow graphs (flow diretion is taken care of adjacency list computed in GraphComponent.

Since an AMR node can have multiple parents, we keep track of descendants as a set rather than a count to avoid duplicate counts when nodes have more than one parent. Otherwise, in multiple parent case, duplicates would be counted when the path later converges closer to the root.

__init__(name, component_names, constructor, normalize_mode='fpn')
component_names: Set[str]

The name of the components to minimize.

constructor: GraphAlignmentConstructor

The instance used to construct the graph passed in the invoke() method.

normalize_mode: str = 'fpn'

How to normalize nodes (if at all), which is one of:

  • fpn: leaves flow values as they were after the initial flow per node

    calculation

  • norm: normalize so all values add to one

  • vis: same as norm but add a vis_flow attribute to the edges

    so the original flow is displayed and visualized as the flow color

class zensols.calamr.ctrl.RemoveAlignsDocumentGraphController(name, min_capacity)[source]

Bases: DocumentGraphController

Removes graph component alignment for low capacity links.

__init__(name, min_capacity)
min_capacity: float

The graph component alignment edges are removed if their capacities are at or below this value.

class zensols.calamr.ctrl.RoleCapacitySetDocumentGraphController(name, min_flow, capacity, component_names)[source]

Bases: DocumentGraphController

This finds low flow role edges and sets (zeros out) all the capacities of all the connected edge alignments recursively for all descendants. We “slough off” entire subtrees (sometimes entire sentences or document nodes) for low flow ancestors.

__init__(name, min_flow, capacity, component_names)
capacity: float

The capacity (and flow) to set.

component_names: Set[str]

The name of the components to minimize.

min_flow: float

The minimum amount of flow to trigger setting the capacity of a target GraphEdge capacity to capacity.

class zensols.calamr.ctrl.SnapshotDocumentGraphController(name, component_names, snapshot_source)[source]

Bases: DocumentGraphController

Record flows, then later restore. If snapshot_source is not None, then this instance restores from it. Otherwise it records.

__init__(name, component_names, snapshot_source)
component_names: Set[str]

The name of the components on which to record or restore flows.

reset()[source]

Clears the cached build_graph instance.

snapshot_source: SnapshotDocumentGraphController

The source instance that contains the data from which to restore.

zensols.calamr.dcomp module

A document centric graph component.

class zensols.calamr.dcomp.DocumentGraphComponent(graph, root_node, sent_index=<factory>, description=None)[source]

Bases: GraphComponent

A class containing the root information of the document tree and the igraph.Graph vertex. When the igraph.Graph is set with the graph property, a strongly connected subgraph component is induced. It does this by traversing all reachable verticies and edges from the root. Examples of these induced components include source and summary components of a document AMR graph.

Instances are created by DocumentGraphFactory.

__init__(graph, root_node, sent_index=<factory>, description=None)
clone(reverse_edges=False, deep=True, **kwargs)[source]

Clone an instance and return it. The graph is deep copied, but all GraphAttribute instances are not.

Parameters:
  • reverse_edges (bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithm

  • deep (bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed of vs; otherwise the graph is copied as a subcomponent starting from root

  • kwargs – arguments to add to as attributes to the clone; include cls is the type of the new instance

Return type:

GraphComponent

Returns:

the cloned instance of this instance

deallocate()[source]

Deallocate all resources for this instance.

description: str = None

A description of the component used for debugging.

property doc_vertices: Iterable[Vertex]

Get the vertices of DocuemntGraphNode. This only fetches those document nodes that do not branch.

get_attributes()[source]

Return all graph attributes of the component, which include instances of both GraphNode and GraphEdge.

Return type:

Iterable[GraphAttribute]

property name: str

Return the name of the AMR document node.

property relation_set: RelationSet

The relations in the contained root node document.

property root: Vertex | None

The roots of the graph, which are usually top level DocumentNode instances.

root_node: AmrDocumentNode

The root of the document tree.

sent_index: SentenceIndex

An index of the sentences of a DocumentGraphComponent.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write the contents of this instance to writer using indention depth.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.dcomp.SentenceEntry(node=None, concepts=None, attributes=None)[source]

Bases: Dictable

Contains the sentence node of a sentence, and the respective concept and attribute nodes.

__init__(node=None, concepts=None, attributes=None)
attributes: Tuple[AttributeGraphNode] = None

The AMR attribute nodes of the sentence.

property concept_by_variable: Dict[str, ConceptGraphNode]
concepts: Tuple[ConceptGraphNode] = None

The AMR concept nodes of the sentence.

node: SentenceGraphNode = None

The sentence node, which is the root of the sentence subgraph.

class zensols.calamr.dcomp.SentenceIndex(entries=None)[source]

Bases: Dictable

An index of the sentences of a DocumentGraphComponent.

__init__(entries=None)
property by_sentence: Dict[AmrFeatureSentence, SentenceEntry]
entries: Tuple[SentenceEntry] = None

Then entries of the index, each of which is a sentence.

zensols.calamr.doc module

Document based graph container, factory and strategy classes.

class zensols.calamr.doc.DocumentGraph(graph, name, graph_attrib_context, doc, components, children=<factory>)[source]

Bases: GraphComponent

A graph containing the text, text features, AMR Penman graph and igraph.

This class roughly follows a GoF composite pattern with children a collection of instance of this class, which are the reversed source and summary graphs created for the max flow algorithm. The root is constructed from the DocumentGraphFactory class and the children are built by the DocumentGraphController instances.

The children of this composite are not to be confused with components, which are the disconnected source and summary graph components in the root graph instance. Each child also has the reversed flow graphs, but are connected as a bipartite flow graph for use by the max flow algorithm.

__init__(graph, name, graph_attrib_context, doc, components, children=<factory>)
add_child(child)[source]

Add a child graph.

See:

children

property bipartite_relation_set: RelationSet

The bipartite relations that span components. This set includes all top level relations that are not self contained in any components.

children: Dict[str, DocumentGraph]

The children of this instance, which for now, are only instances of FlowDocumentGraph.

clone(reverse_edges=False, deep=True, **kwargs)[source]

Clone an instance and return it. The graph is deep copied, but all GraphAttribute instances are not.

Parameters:
  • reverse_edges (bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithm

  • deep (bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed of vs; otherwise the graph is copied as a subcomponent starting from root

  • kwargs – arguments to add to as attributes to the clone; include cls is the type of the new instance

Return type:

GraphComponent

Returns:

the cloned instance of this instance

component_iter()[source]

Return an iterable of the components of this graph and recursively over the children.

Return type:

Iterable[GraphComponent]

components: Tuple[DocumentGraphComponent, ...]

The roots of the trees created by the DocumentGraphFactory.

property components_by_name: Dict[str, DocumentGraphComponent]

Get document graph components by name.

property components_by_name_sorted: Tuple[Tuple[str, DocumentGraphComponent], ...]

Get document graph components sorted name.

deallocate()[source]

Deallocate all resources for this instance.

delete_edges(edges, include_nodes=False)[source]

Remove edges from the graph.

Parameters:
  • edges (Iterable[GraphEdge]) – the edges to remove form the graph

  • include_nodes (bool) – whether to remove nodes that become orphans after deleting edges

Returns:

a tuple the edges and nodes removed

doc: AmrFeatureDocument

The document that represents the graph.

get_containing_component(n)[source]

Return the component that contains graph node n.

Return type:

DocumentGraphComponent

property graph_attrib_context: GraphAttributeContext

The context given to all nodees and edges of the graph.

name: str

The name of the graph used to identify it. For now, this is only reversed_source for the graph that flows from the summary to the source, and ``reversed_summary for the graph that flows from the source to the summary. These are “reversed” because the flow is reversed from the leaf nodes to the root.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write the contents of this instance to writer using indention depth.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.doc.DocumentGraphDecorator[source]

Bases: ABC

A strategy to create a graph from a document structure.

__init__()
abstract decorate(component)[source]

Creates the graph from a DocumentNode root node.

Parameters:

component (DocumentGraphComponent) – the graph to populate from the decorateing process

class zensols.calamr.doc.DocumentGraphFactory(config_factory, graph_decorators, doc_graph_section_name, graph_attrib_context)[source]

Bases: ABC

Creates a document graph. After the document portion of the graph is created, the igraph is built and merged using a DocumentGraphDecorator. This igraph has the corresponding vertexes and edges associated with the document graph, which includes AMR Penman graph and feature document artifacts.

__init__(config_factory, graph_decorators, doc_graph_section_name, graph_attrib_context)
config_factory: ConfigFactory

Used to create a DocumentGraphDecorator.

create(root)[source]

Create a document graph and return it starting from the root note. See class docs.

Parameters:

root (AmrFeatureDocument) – the feature document from which to create the graph

Return type:

DocumentGraph

doc_graph_section_name: str

The name of a section in the configuration that defines new instances of DocumentGraph.

graph_attrib_context: GraphAttributeContext

The context given to all nodees and edges of the graph.

graph_decorators: Tuple[DocumentGraphDecorator, ...]

The name of the section that defines a DocumentGraphDecorator instance.

zensols.calamr.domain module

Classes that organize document in content in to a hierarchy.

Terminology: token alignments refer to the sentence index based token alignments to AMR nodes. This is not to be confused with alignment edges (aka graph component alignment edges).

exception zensols.calamr.domain.ComponentAlignmentError(msg, sent=None)[source]

Bases: AmrError

Package level errors.

__module__ = 'zensols.calamr.domain'
class zensols.calamr.domain.ComponentAlignmentFailure(exception=None, thrower=None, traceback=None, message=None)[source]

Bases: Failure

Package level failures.

__init__(exception=None, thrower=None, traceback=None, message=None)
class zensols.calamr.domain.EmbeddingResource(torch_config, word_piece_doc_parser=None, word_piece_doc_factory=None, roleset_stash=None)[source]

Bases: object

Generates embeddings for roles, role sets, text, and feature tokens.

__init__(torch_config, word_piece_doc_parser=None, word_piece_doc_factory=None, roleset_stash=None)
get_role_embedding(role)[source]

Return an embedding for a role. This uses the role’s relation’s embedding if available. Otherwise, it uses the embedding created fromi the role’s prefix.

Return type:

Tensor

get_roleset_embedding(roleset_id, roleset)[source]

Return the embedding for roleset.

Parameters:
  • roleset_id (Optional[RolesetId]) – the role’s ID, which is used when roleset is None

  • roleset (Roleset) – the role set to use for the embedding if available

Return type:

Tensor

get_sentence_tokens_embedding(sent)[source]

Return the sentence embeddings of sent.

Return type:

Tensor

get_token_embedding(text)[source]

Return the mean of the token embeddings of text.

Return type:

Tensor

get_tokens_embedding(tokens)[source]

Return the mean of the embeddings of tokens.

Return type:

Tensor

get_word_piece_document(text)[source]

Return a word piece document parsed from text.

Return type:

WordPieceFeatureDocument

roleset_stash: Stash = None

A stash with RolesetId as keys and Roleset as values.

torch_config: TorchConfig

Used to create unknown_edge_embedding

property unknown_edge_embedding: Tensor

A zero embedding.

property unknown_node_embedding: Tensor

A zero embedding.

word_piece_doc_factory: WordPieceFeatureDocumentFactory = None

Creates word piece data structures that have embeddings.

word_piece_doc_parser: FeatureDocumentParser = None

Used to get single token embeddings for nodes with no token alignments.

class zensols.calamr.domain.GraphAttribute(context)[source]

Bases: PersistableContainer, Dictable

Contains AMR document attribute data added to the igraph.Graph. This is added as vertexes or edge attribute data.

ATTRIB_TYPE: ClassVar[str] = 'base'

The attribute type this class represents.

__init__(context)
property attrib_type: str

The attribute type this class represents.

context: GraphAttributeContext

Contains context data used by nodes and edges of the graph.

deallocate()[source]

Deallocate all resources for this instance.

property description: str

A human readable description that is usually used as the label and __str__().

property embedding: Tensor

The default embedding of the attribute. Note that some attributes have several different embeddings.

property embedding_resource: EmbeddingResource

Generates embeddings for roles, role sets, text, and feature tokens.

property id: int

The unqiue identifier for this graph attribute.

property label: str

Text used when rendering graphs.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.domain.GraphAttributeContext(embedding_resource, relation_stash, default_capacity, sink_capacity, component_alignment_capacity, doc_capacity, similarity_threshold, default_format_strlen)[source]

Bases: Dictable

Contains context data used by nodes and edges of the graph.

__init__(embedding_resource, relation_stash, default_capacity, sink_capacity, component_alignment_capacity, doc_capacity, similarity_threshold, default_format_strlen)
component_alignment_capacity: float

The default initial capacity for source/summary component alignment edges.

default_capacity: float

The default initial capacity for inter-AMR edges.

default_format_strlen: int

The default capacity and flow string formatting label length.

doc_capacity: float

The bipartitie (between source and summary) capacity value of DocumentGraphNode.

embedding_resource: EmbeddingResource

The manager that contains vectorizers that create node and edge embeddings.

relation_stash: Stash

`~zensols.propbankdb.domain..Relation.

Type:

Creates instances of role

Type:

class

reset_attrib_id()[source]

Reset the unique attribute ID counter.

similarity_threshold: float

The (range [0, 1]) necessary to allow component alignment edges from the source to summary graph..

sink_capacity: float

The value to use as the sink terminal node.

to_role(role_str)[source]
Return type:

Role

class zensols.calamr.domain.Role(label)[source]

Bases: Dictable

Represents an AMR role, which is a label on the edge an an AMR graph such as :ARG0-of.

__init__(label)[source]
index: Optional[int]

The prefix of the role (i.e. ARG in :ARG0-of).

is_inverted: bool

True if the role is inverted (i.e. has of in :ARG0-of).

label: str

ARG0-of``).

Type:

The surface name of the role (i.e. `

prefix: str

The prefix of the role (i.e. ARG in :ARG0-of).

relation: Optional[Relation]

The relation metadata, which has the same label as this role.

zensols.calamr.flow module

Provides container classes and computes statistics for graph alignments.

class zensols.calamr.flow.Flow(source, target, edge)[source]

Bases: Dictable

A triple of a source node, target node and connecting edge from the graph. The connecting edge has a flow value associated with it.

__init__(source, target, edge)
edge: GraphEdge

The edge that connects source and target.

property edge_type: str

Whether the edge is an AMR role or ``align``ment.

source: GraphNode

The starting node in the DAG.

target: GraphNode

The ending node (arrow head) in the DAG.

property to_row: List[Any]

Create a row from the data in this flow used in FlowDocumentGraphComponent.create_df().

class zensols.calamr.flow.FlowDocumentGraph(graph, name, graph_attrib_context, doc, components, children=<factory>)[source]

Bases: DocumentGraph

Contains all the flows of a DocumentGraph and has FlowDocumentGraphComponent as components. Instances of this document graph have no children.

__init__(graph, name, graph_attrib_context, doc, components, children=<factory>)
class zensols.calamr.flow.FlowDocumentGraphComponent(graph, root_node, sent_index=<factory>, description=None, reentrancy_set=None)[source]

Bases: DocumentGraphComponent

Contains all the flows of a DocumentComponent.

__init__(graph, root_node, sent_index=<factory>, description=None, reentrancy_set=None)
property flows: Tuple[Flow, ...]

The flows aggregated from the document components.

reentrancy_set: ReentrancySet = None

Concept nodes with multiple parents.

property result: FlowGraphComponentResult

The flow results for this component.

property root_flow: Flow

The root flow of the document component, which has the component’s DocumentGraphNode as the source node and the sink as the target node.

property root_flow_value: float

The flow from the root node to the sink in the reversed graph.

class zensols.calamr.flow.FlowGraphComponentResult(component)[source]

Bases: Dictable

A container class for the flow data from a DocumentComponent flow instance (aka reverse flow graph). This includes the data as dictionaries of statistics, pandas.DataFrame and DataDescriber instances.

__init__(component)[source]
property connected_stats: Dict[str, int | float]

The statistics on how well the two graphs are aligned by counting as:

  • alignable: the number of nodes that are eligible for having an

    alignment (i.e. sentence, concept, and attribute notes)

  • aligned: the number aligned nodes in the

    FlowDocumentGraphComponent this instance holds

  • aligned_portion: the quotient of $aligned / alignable$, which is

    a number between $[0, 1]$ representing a score of how well the two graphs match

create_data_frame_describer()[source]

Like create_align_df() but includes a human readable description of the data.

Return type:

DataFrameDescriber

property df: pd.DataFrame

The data in flows and root as a dataframe. Note the terms source and target refer to the nodes at the ends of the directed edge in a reversed graph.

  • s_descr: source node descriptions such as concept names,

    attribute constants and sentence text

  • t_descr: target node of s_descr

  • s_toks: any source node aligned tokens

  • t_toks: any target node aligned tokens

  • s_attr: source node attribute name give by

    GraphAttribute.attrib_type, such as doc, sentence, concept, attribute

  • t_attr: target node of ``s_attr

  • s_id: source node igraph ID

  • t_id: target node igraph ID

  • edge_type: whether the edge is an AMR role or alignment

  • rel_id: the coreference relation ID or null if the edge is

    not a corefernce

  • is_bipartite: whether relation rel_id spans components or

    null if the edge is not a coreference

  • flow: the (normalized/flow per node) flow of the edge

  • reentrancy: whether the edge participates an AMR reentrancy

  • align_flow: the flow sum of the alignment edges for the

    respective edge

  • align_count: the count of incoming alignment edges to the target

    node in the FlowDocumentGraphComponent this instance holds

property edge_counts: Dict[Type[GraphEdge], int]

The number of edges by type.

property n_alignable_nodes: int

The number of nodes in the component that can take alignment edges. Whether those nodes in the count have edges does not effect the result.

property node_counts: Dict[Type[GraphNode], int]

The number of nodes by type.

property stats: Dict[str, Any]

All statistics/scores available for this instances, which include:

  • root_flow: the flow from the root node to the sink

  • connected: connected_stats

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.flow.FlowGraphResult(component_paths, context, data)[source]

Bases: PersistableContainer, Dictable

A container class for flow document results, which include the detailed data as dictionaries of statistics, pandas.DataFrame and DataDescriber instances. This is aggregated from doc_graph and the flow children’s flow graph components.

All graphs (from nascent to the reversed flow children graphs) have the final state of the actions of the DocumentGraphController as coordinated by the GraphSequencer. Since the flows are copied from the reversed source graph to the root level (doc_graph) factory built nascent graph, all flows are the same. However, the nascent graph will still be the disconnect source and summary graphs.

__init__(component_paths, context, data)[source]

Initialize the flow results.

Parameters:
create_data_describer()[source]

Like create_align_df() but includes a human readable description of the data.

Throws ComponentAlignmentError:

if this instanced resulted in an error

See:

is_error

Return type:

DataDescriber

deallocate()[source]

Deallocate all resources for this instance.

property df: pd.DataFrame

A concatenation of frames created with FlowDocumentGraphComponent.create_align_df() with the name of each component.

Throws ComponentAlignmentError:

if this instanced resulted in an error

See:

is_error

property doc_graph: DocumentGraph

The root nascent DocumentGraphFactory build graph.

Throws ComponentAlignmentError:

if this instanced resulted in an error

See:

is_error

property failure: ComponentAlignmentFailure | None

What caused the alignment to fail, or None if it was a success.

get_render_contexts(child_names=None, include_nascent=False)[source]

Get contexts used to render the graphs with render.base.rendergroup.

Parameters:
  • child_names (Iterable[str]) – the name of the DocumentGraph.children to render, which defaults the the nascent grah and the final bipartite graph rendered (“restore previous flow on source”)

  • include_nascent (bool) – whether to include the nascent graphs

Return type:

List[RenderContext]

property is_error: bool

Whether the graph resulted in an error.

render(contexts=None, graph_id='graph', display=True, directory=None)[source]

Render several graphs at a time, then optionally display them.

Parameters:
  • contexts (Tuple[RenderContext]) – the data to render, which defaults to the output of get_render_contexts()

  • graph_id (str) – a unique identifier prefixed to files generated if none provided in the call method

  • display (bool) – whether to display the files after generated

  • directory (Path) – the directory to create the files in place of the temporary directory; if provided the directory is not removed after the graphs are rendered

property stats: Dict[str, Any]

The statistics with keys as component names and values taken from FlowDocumentGraphComponent.stats.

Throws ComponentAlignmentError:

if this instanced resulted in an error

See:

is_error

property stats_df: pd.DataFrame

A Pandas dataframe version of stats.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=True)[source]

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

zensols.calamr.flowmeta module

Metadata for the flow module.

zensols.calamr.morph module

Populate an igraph from AMR graphs.

class zensols.calamr.morph.IsomorphDocumentGraphDecorator(config_factory, graph_attrib_context_name)[source]

Bases: DocumentGraphDecorator

Populates a igraph.Graph attributes from a DocumentGraph data by adding AMR node and edge information.

__init__(config_factory, graph_attrib_context_name)
config_factory: ConfigFactory

The configuration factory used to create a GraphAttributeContext.

decorate(comp)[source]

Creates the graph from a DocumentNode root node.

Parameters:

component – the graph to populate from the decorateing process

graph_attrib_context_name: str

The section name of the GraphAttributeContext context given to all nodees and edges of the graph.

zensols.calamr.proto module

Prototyping and cookbook.

zensols.calamr.reentrancy module

Reentrancy container classes.

class zensols.calamr.reentrancy.EdgeFlow(edge, flow=None)[source]

Bases: PersistableContainer, Dictable

The flow over a graph edge. This keeps the flow of the edge as a “snapshot” of the value at a particular point in the algorithm, before it is modified to fix the issue.

__init__(edge, flow=None)
edge: GraphEdge

The outgoing (in the reverse graph) edge of the reentrancy.

flow: float = None

The flow value at the time of the algorithm.

class zensols.calamr.reentrancy.Reentrancy(concept_node, concept_node_vertex, edge_flows)[source]

Bases: PersistableContainer, Dictable

Reentrancies are concept nodes with multiple parents (in the forward graph) and have side effects when running the algorithm.

Note: an AMR (always acyclic) graph with no reentrancies are trees.

__init__(concept_node, concept_node_vertex, edge_flows)
concept_node: ConceptGraphNode

The concept node of the reentrancy

concept_node_vertex: int

The igraph.Vertex.index associated with the node.

edge_flows: Tuple[EdgeFlow]

The outgoing edges connected to the reentrant concept_node.

property has_zero_flow: bool

Whether the reentracy has any edges with no flow.

property total_flow: float

The total flow of all outgoing (in the reverse graph) edges.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

property zero_flows: Tuple[GraphEdge]

The edges that have no flow.

class zensols.calamr.reentrancy.ReentrancySet(reentrancies=())[source]

Bases: PersistableContainer, Dictable

A set of reentrancies, one for each iteration of the algorithm.

__init__(reentrancies=())
property by_vertex: Dict[int, Reentrancy]
classmethod combine(sets)[source]

Combine by adding reentrancy links for all in sets.

Return type:

ReentrancySet

reentrancies: Tuple[Reentrancy] = ()

Concept nodes with multiple parents.

property stats: Dict[str, Any]

Get the stats for this set of reentrancies.

zensols.calamr.score module

Produces CALAMR scores.

class zensols.calamr.score.CalamrScore(flow_graph_res)[source]

Bases: Score

Contains all CALAMR scores.

NAN_INSTANCE = CalamrScore()
__init__(flow_graph_res)
flow_graph_res: FlowGraphResult
class zensols.calamr.score.CalamrScoreMethod(reverse_sents=False, word_piece_doc_factory=None, doc_graph_factory=None, doc_graph_aligner=None)[source]

Bases: ScoreMethod

Computes the smatch scores of AMR sentences. Sentence pairs are ordered (<summary>, <source>).

__init__(reverse_sents=False, word_piece_doc_factory=None, doc_graph_factory=None, doc_graph_aligner=None)
clear()[source]
doc_graph_aligner: DocumentGraphAligner = None

Create document graphs.

doc_graph_factory: DocumentGraphFactory = None

Create document graphs.

score_annotated_doc(doc)[source]

Score a document that has an amr of type AnnotatedAmrDocument.

Raises:

[zensols.amr.domain.AmrError]: if the AMR could not be parsed or aligned

Return type:

CalamrScore

score_pair(smy, src)[source]
Return type:

CalamrScore

word_piece_doc_factory: WordPieceFeatureDocumentFactory = None

The feature document factory that populates embeddings.

zensols.calamr.stash module

Alignment dataframe stash.

class zensols.calamr.stash.FlowGraphRestoreStash(delegate, flow_graph_result_context)[source]

Bases: DelegateStash, PrimeableStash

The a stash that restores transient data on FlowGraphResult instances.

__init__(delegate, flow_graph_result_context)
exists(name)[source]

Return True if data with key name exists.

Implementation note: This Stash.exists() method is very inefficient and should be overriden.

Return type:

bool

flow_graph_result_context: _FlowGraphResultContext

Contains in memory/interperter session data needed by FlowGraphResult when it is created or unpickled.

get(name, default=None)[source]

Load an object or a default if key name doesn’t exist.

Implementation note: sub classes will probably want to override this method given the super method is cavalier about calling exists:() and load(). Based on the implementation, this can be problematic.

Return type:

FlowGraphResult

load(name)[source]

Load a data value from the pickled data with key name. Semantically, this method loads the using the stash’s implementation. For example DirectoryStash loads the data from a file if it exists, but factory type stashes will always re-generate the data.

See:

get()

Return type:

FlowGraphResult

class zensols.calamr.stash.FlowGraphResultFactoryStash(anon_doc_stash, doc_graph_aligner, doc_graph_factory, limit=9223372036854775807)[source]

Bases: ReadOnlyStash, PrimeableStash

A factory stash that creates aligned FlowGraphResult instances or ComponentAlignmentFailure when the document cannot be aligned.

__init__(anon_doc_stash, doc_graph_aligner, doc_graph_factory, limit=9223372036854775807)
anon_doc_stash: Stash

Contains human annotated AMRs.

doc_graph_aligner: DocumentGraphAligner

Create document graphs.

doc_graph_factory: DocumentGraphFactory

Create document graphs.

exists(name)[source]

Return True if data with key name exists.

Implementation note: This Stash.exists() method is very inefficient and should be overriden.

Return type:

bool

keys(**kwargs) Iterable[str]

Return an iterable of keys in the collection.

Return type:

Iterable[str]

limit: int = 9223372036854775807

The limit of the number of items to create.

load(name)[source]

Load a data value from the pickled data with key name. Semantically, this method loads the using the stash’s implementation. For example DirectoryStash loads the data from a file if it exists, but factory type stashes will always re-generate the data.

See:

get()

Return type:

FlowGraphResult

prime()[source]

Module contents