zensols.calamr package¶

Subpackages¶

Submodules¶

zensols.calamr.alignconst module¶

Defines a class that aligns components of a (bipartite) graph.

class zensols.calamr.alignconst.GraphAlignmentConstructor(doc_graph=None, source_flow=10000000000.0)[source]¶

Bases: object

Adds additional nodes and edges that enable the maxflow algorithm to be used on the graph. This include component alignment edges, source node and sink node. Capacities for the component alignment edges are also set.

__init__(doc_graph=None, source_flow=10000000000.0)¶

add_edges(capacities, cls=<class 'zensols.calamr.attr.ComponentAlignmentGraphEdge'>)[source]¶

Add capacities as graph capacities to the graph.

Parameters:

capacities (Iterable[Tuple[int, int, float]]) – the vertexes and capacities in the form: (<source vertex index>, <summary index>, <capacity>)
cls (Type[GraphEdge]) – the type of object to instantiate for the GraphEdge alignment

Return type:

List[GraphEdge]

build()[source]¶: Build the graph by adding component alignment capacities.

property doc_graph: DocumentGraph¶: A document graph that contains the graph to be aligned.

requires_reversed_edges()[source]¶

Return type:: bool

set_capacities(edges, capacity=10000000000.0)[source]¶: Set capacity on all edges.

property sink_flow_node: Vertex¶: The sink flow node.

source_flow: float = 10000000000.0¶: The capacity to use for the source node of the transporation graph.

property source_flow_node: Vertex¶: The source flow node.

update_capacities(caps)[source]¶

Update the capacities of the graph component.

Parameters:: caps (Dict[int, int]) – the capacities with key/value pairs as <edge ID>/<capacity>
Return type:: Dict[int, int]
Returns:: the caps parameter

zensols.calamr.aligner module¶

Contains classes that run the algorithm to compute graph component alignments.

class zensols.calamr.aligner.DocumentGraphAligner(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir)[source]¶

Bases: ABC

Aligns the graph components of doc_graph and visualizes them with renderer.

MAX_RENDER_LEVEL: ClassVar[int] = 10¶: The maximum value for render_level.

__init__(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir)¶

align(doc_graph)[source]¶

Align the graph components of doc_graph and optionally visualize them with renderer. To disable rendering, set render_level to 0.

Parameters:: doc_graph (DocumentGraph) – the graph created by the DocumentGraphFactory
Return type:: FlowGraphResult
Returns:: the alignments available as the in memory graph and object graph, Pandas dataframes, statistics and scores

config_factory: ConfigFactory¶: Used to create the GraphSequencer instance.

create_error_result(ex, msg='Could not align')[source]¶

Create an error graph result (rather than an alignment result). This should be called in a try/catch to obtain the error information.

Parameters:: ex (Exception) – the exception that caused the issue
Param:: msg: the error message for the failure
Return type:: FlowGraphResult

doc_graph_name: str¶: The DocumentGraph.name document return from align().

flow_graph_result_name: str¶: The app configuration section name of FlowGraphResult.

init_loops_render_level: int¶: The render_level to use for all iteration loops except for the last before the algorithm converges.

classmethod is_valid_render_level(render_level, should_raise=False)[source]¶

Return type:: bool

Return whether render_level is a valid value for :obj:`render_level.

output_dir: Path¶: If this is set, the graphs are written to this created directory on the file system. Otherwise, they are displayed and cleaned up afterward.

render_level: int¶

How many graphs to render on a scale from 0 - 10. The higher the number the more likely a graph is to be rendered. A value of 0 prevents rendering and a setting of 10 will render all graphs.

See:: MAX_RENDER_LEVEL

renderer: GraphRenderer¶: Visually render the graph in to a human understandable presentation.

class zensols.calamr.aligner.DocumentGraphController(name)[source]¶

Bases: Dictable

Executes the maxflow/min cut algorithm on a document graph.

__init__(name)¶

invoke(doc_graph)[source]¶

Perform operations on the graph algorithm.

Parameters:: doc_graph (DocumentGraph) – the graph to edit
Return type:: int
Returns:: the number of edits made to the graph

name: str¶: The configuration instance name for debugging

reset()[source]¶: Reset all state in application context shared objects so new data is forced to be created on the next alignment request.

class zensols.calamr.aligner.GraphIteration(sequence, render_level, updates)[source]¶

Bases: Dictable

An iteration of the alignment algorithm.

__init__(sequence, render_level, updates)¶

render_level: int¶

Whether to render graphs on a scale from 0 - 10. The higher the number the more likely it is to be rendered with 0 never rendering the graph, and 10 always rendering the graph.

See:: DocumentGraphAligner.MAX_RENDER_LEVEL

reset()[source]¶: Reset all state in application context shared objects so new data is forced to be created on the next alignment request.

sequence: GraphSequence¶: The sequence to use for this iteration.

updates: bool¶: Whether to report updates by the iteration, otherwise the iteration updates are counted.

class zensols.calamr.aligner.GraphSequence(name, process_name, render_name, heading, controller, sequencer)[source]¶

Bases: Dictable

A strategy GoF pattern that models what to do during a sequence of graph modifications using DocumentGraphController. It also contains rendering information for visualization.

__init__(name, process_name, render_name, heading, controller, sequencer)¶

controller: Optional[DocumentGraphController]¶: The controller used in the invocation of this strategy.

heading: str¶: The text used in the heading of the graph rendering.

invoke()[source]¶

Invoke the strategy. This implementation calls the controller with the process_graph to be processed and passes back the update count.

Return type:: int

name: str¶: The name of the sequence, which is used to key its graphs.

populate_render_context(context)[source]¶: Alows the sequence to override the parameters before being sent to the graph rendinger API.

property process_graph: DocumentGraph¶: The graph provided to the graph controller.

process_name: str¶: The name of the graph provided to the graph controller. See process_graph.

property render_graph¶: The graph to render.

render_name: str¶: The name of the graph to render. See render_graph.

reset()[source]¶: Reset all state in application context shared objects so new data is forced to be created on the next alignment request.

sequencer: GraphSequencer¶: Owns and controls this instance.

class zensols.calamr.aligner.GraphSequencer(config_factory, sequence_path, nascent_graph, render, descriptor=None, heading_format=None)[source]¶

Bases: object

This invokes the GraphSequence objects in the provided sequence to automate the graph alignment algorithm and used by MaxflowDocumentGraphAligner.

__init__(config_factory, sequence_path, nascent_graph, render, descriptor=None, heading_format=None)[source]¶

Initialize this instance.

Parameters:

config_factory (ConfigFactory) – used to create the controller in the sequence instances
sequence_path (Path) – the path to the JSON file that has the sequences’ configuration
nascent_graph (DocumentGraph) – the initial disconnected graph created by DocumentGraphFactory
render (rendergroup) – the render object created by base.rendergroup

get_graph(name=None)[source]¶

Return a graph by name.

Return type:: DocumentGraph

property render_level: int¶: Whether to render graphs on a scale from 0 - 10. See DocumentGraphAligner.MAX_RENDER_LEVEL.

reset()[source]¶: Reset all state in application context shared objects so new data is forced to be created on the next alignment request.

run(name)[source]¶

Return type:: int

class zensols.calamr.aligner.MaxflowDocumentGraphAligner(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir, graph_sequencer_name, max_sequencer_iterations, hyp)[source]¶

Bases: DocumentGraphAligner

Uses the maxflow/min cut algorithm to compute graph component alignments.

__init__(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir, graph_sequencer_name, max_sequencer_iterations, hyp)¶

graph_sequencer_name: str¶: The app configuration section name of GraphSequencer.

hyp: HyperparamModel¶

The capacity calculator hyperparameters.

See:: summary.CapacityCalculator.hyp

max_sequencer_iterations: int¶: The max number of iterations of the sequencer loop. This is the max number of times the loop iteration set runs if the maxflow algorithm doesn’t converge (0 changes on bipartite capacities) first.

class zensols.calamr.aligner.RenderUpSideDownGraphSequence(name, process_name, render_name, heading, controller, sequencer)[source]¶

Bases: GraphSequence

A graph sequence that tells graphviz to render the diagram upside down, which is useful for reverse flow graphs.

__init__(name, process_name, render_name, heading, controller, sequencer)¶

populate_render_context(context)[source]¶: Alows the sequence to override the parameters before being sent to the graph rendinger API.

zensols.calamr.annotate module¶

Contain a class to add embeddings to AMR feature documents.

class zensols.calamr.annotate.AddEmbeddingsFeatureDocumentStash(delegate, word_piece_doc_factory=None)[source]¶

Bases: DelegateStash, PrimeableStash

Add embeddings to AMR feature documents. Embedding population is disabled by configuring word_piece_doc_factory as None.

__init__(delegate, word_piece_doc_factory=None)¶

get(name, default=None)[source]¶

Load an object or a default if key name doesn’t exist.

Implementation note: sub classes will probably want to override this method given the super method is cavalier about calling exists:() and load(). Based on the implementation, this can be problematic.

Return type:: Any

load(name)[source]¶

Load a data value from the pickled data with key name. Semantically, this method loads the using the stash’s implementation. For example DirectoryStash loads the data from a file if it exists, but factory type stashes will always re-generate the data.

See:: get()
Return type:: AmrFeatureDocument

word_piece_doc_factory: WordPieceFeatureDocumentFactory = None¶: The feature document factory that populates embeddings.

class zensols.calamr.annotate.CalamrAnnotatedAmrFeatureDocumentFactory(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0, word_piece_doc_factory=None)[source]¶

Bases: AnnotatedAmrFeatureDocumentFactory

Adds wordpiece embeddings to AmrFeatureDocument instances.

__init__(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0, word_piece_doc_factory=None)¶

from_dict(data)[source]¶

Parse and create an AMR document from a dict.

Parameters:

data (Dict[str, str]) – the AMR text to be parsed each entry having keys summary and body
doc_id – the document ID to set as AmrFeatureDocument.doc_id

Return type:

AmrFeatureDocument

to_annotated_doc(doc)[source]¶

Clone doc.amr into an AnnotatedAmrDocument.

Parameters:: sent – the document to convert to an AnnotatedAmrDocument
Return type:: AmrFeatureDocument
Returns:: a feature document with a new amr to new AnnotatedAmrDocument, which is a new instance if sent isn’t an annotated AMR document

word_piece_doc_factory: WordPieceFeatureDocumentFactory = None¶: The feature document factory that populates embeddings.

class zensols.calamr.annotate.ProxyReportAnnotatedAmrDocument(sents, path=None, doc_id=None)[source]¶

Bases: AnnotatedAmrDocument

Overrides the sections property to skip duplicate summary sentences also found in the body.

__init__(sents, path=None, doc_id=None)¶

Initialize.

Parameters:

sents (Tuple[AmrSentence, …]) – the document’s sentences
path (Optional[Path, …]) – the path to file containing the Penman notation sentence graphs used in sents
model – the model to initailize AmrSentence when sents is a list of string Penman graphs

property sections: Tuple[AnnotatedAmrSectionDocument]¶: The sentences that make up the body of the document.

zensols.calamr.app module¶

Alignment entry point application.

class zensols.calamr.app.AlignmentApplication(resource, config_factory)[source]¶

Bases: _AlignmentBaseApplication

This application aligns data in files.

__init__(resource, config_factory)¶

align_file(input_file, output_dir=None, output_format=Format.csv, render_level=None)[source]¶

Align annotated documents from a JSON file.

Parameters:

input_file (Path) – the input JSON file.
output_dir (Path) – the output directory
output_format (Format) – the output format
render_level (int) – how many graphs to render (0 - 10), higher means more

config_factory: ConfigFactory¶: Application configuration factory.

class zensols.calamr.app.CorpusApplication(resource, config_factory, results_dir)[source]¶

Bases: _AlignmentBaseApplication

AMR graph aligment.

__init__(resource, config_factory, results_dir)¶

align_corpus(keys, output_dir=None, output_format=Format.csv, render_level=None, use_cached=False)[source]¶

Align an annotated AMR document from the corpus.

Parameters:

keys (str) – comma-separated list of dataset keys or file name
output_dir (Path) – the output directory
output_format (Format) – the output format
render_level (int) – how many graphs to render (0 - 10), higher means more
use_cached (bool) – whether to use a cached result if available

config_factory: ConfigFactory¶: For prototyping.

dump_annotated(limit=None, output_dir=None, output_format=Format.csv)[source]¶

Write annotated documents and their keys.

Parameters:

limit (int) – the max of items to process
output_dir (Path) – the output directory
output_format (Format) – the output format

get_annotated_summary(limit=None)[source]¶

Return a CSV file with a summary of the annotated AMR dataset.

Parameters:: limit (int) – the max of items to process
Return type:: DataFrame

results_dir: Path¶: The directory where the output results are written, then read back for analysis reporting.

write_adhoc_corpus(corpus_file=None)[source]¶

Write the adhoc corpus from the JSON created file.

Parameters:: corpus_file (Path) – the file with the source and summary sentences

write_keys()[source]¶: Write the keys of the configured corpus.

class zensols.calamr.app.Resource(anon_doc_stash, doc_factory, serialized_factory, doc_graph_factory, doc_graph_aligner, flow_results_stash, anon_doc_factory)[source]¶

Bases: object

A client facade (GoF) for Calamr annotated AMR corpus access and alginment.

__init__(anon_doc_stash, doc_factory, serialized_factory, doc_graph_factory, doc_graph_aligner, flow_results_stash, anon_doc_factory)¶

align(doc_graph)[source]¶

Create flow results from a document graph.

Parameters:: doc_graph (DocumentGraph) – the source/summary components to align
Return type:: FlowGraphResult
Returns:: the aligned bipartite graph and its statistics

align_corpus_document(doc_id, use_cache=True)[source]¶

Create flow results of a corpus AMR document.

Parameters:

doc_id (str) – the corpus document ID (i.e. liu-example or 20041010_0024)
use_cache (bool) – whether to cache (and use cached) results

Return type:

Optional[FlowGraphResult]

Returns:

the flow results for the corpus document or None if doc_id is not a valid key

anon_doc_factory: AnnotatedAmrFeatureDocumentFactory¶: Creates instances of AmrFeatureDocument.

anon_doc_stash: Stash¶: Contains human annotated AMRs. This could be from the adhoc (micro) corpus (small toy corpus), AMR 3.0 Proxy Report corpus, Little Prince, or the Bio AMR corpus.

create_graph(doc)[source]¶

Return a new document graph based on feature document.

Parameters:: doc (AmrFeatureDocument) – the document on which to base the new graph
Return type:: DocumentGraph
Returns:: a new AMR document graph

doc_factory: AmrFeatureDocumentFactory¶: Creates AmrFeatureDocument from AmrDocument instances.

doc_graph_aligner: DocumentGraphAligner¶: Create document graphs.

doc_graph_factory: DocumentGraphFactory¶: Create document graphs.

flow_results_stash: Stash¶: Creates cached instances of FlowGraphResult.

get_corpus_document(doc_id)[source]¶

Get an AMR feature document by key from the application configured corpus.

Parameters:: doc_id (str) – the corpus document ID (i.e. liu-example or 20041010_0024)
Return type:: AmrFeatureDocument
Returns:: the AMR feature document

get_corpus_keys()[source]¶

Get the keys of the application configured AMR corpus.

Return type:: Iterable[str]

parse_documents(data)[source]¶

Parse documents with keys id, comment, body, and summary from a dict, sequence of dict instanaces. or JSON file in the format:

[{
    "id": "ex1",
    "comment": "very short",
    "body": "The man ran to make the train. He just missed it.",
    "summary": "A man got caught in the door of a train he missed."
}]

Return type:: Iterable[AmrFeatureDocument]
Returns:: the parsed AMR feature document
See:: AnnotatedAmrFeatureDocumentFactory

serialized_factory: AmrSerializedFactory¶: Creates a Serialized from AmrDocument, AmrSentence or AnnotatedAmrDocument.

to_annotated_doc(doc)[source]¶

Return an annotated feature document, creating the feature document if necessary. The doc.amr attribute is set to annotated AMR document.

Parameters:: doc (Union[AmrDocument, AmrFeatureDocument]) – an AMR document or an AMR feature document
Return type:: AmrFeatureDocument
Returns:: a new instance of a document if doc is not a AmrFeatureDocument or if doc.amr is not an AnnotatedAmrDocument

to_feature_doc(amr_doc, catch=False, add_metadata=False, add_alignment=False)[source]¶

Create a AmrFeatureDocument from a class:.AmrDocument by parsing the snt metadata with a FeatureDocumentParser.

Parameters:

add_metadata (Union[str, bool]) – add missing annotation metadata to amr_doc parsed from spaCy if missing (see AmrParser.add_metadata()) if True and replace any previous metadata if this value is the string clobber
catch (bool) – if True, return caught exceptions creating a AmrFailure from each and return them

Return type:

Union[AmrFeatureDocument, Tuple[AmrFeatureDocument, List[AmrFailure]]]

Returns:

an AMR feature document if catch is False; otherwise, a tuple of a document with sentences that were successfully parsed and a list any exceptions raised during the parsing

zensols.calamr.attr module¶

Graph node and edge domain classes.

Terminology: token alignments refer to the sentence index based token alignments to AMR nodes. This is not to be confused with alignment edges (aka graph component alignment edges).

class zensols.calamr.attr.AmrDocumentNode(context, name, root, children, doc)[source]¶

Bases: DocumentNode

A composite note containing a subset of the DocumentNode.root sentences. This includes the text, text features, and AMR Penman graph data.

__init__(context, name, root, children, doc)¶

doc: AmrFeatureDocument¶: A document containing a subset of sentences that fall under this portion of the graph.

class zensols.calamr.attr.AttributeGraphNode(context, sent, token_aligns, triple)[source]¶

Bases: TripleGraphNode

Attribute data from AMR attribute nodes grafted on to the igraph.Graph.

ATTRIB_TYPE: ClassVar[str] = 'attribute'¶: The attribute type this class represents.

__init__(context, sent, token_aligns, triple)¶

property constant: Any¶: The constant defined by the attribute from the Penman graph.

property role: str¶: The AMR role taken from Penman graph node.

class zensols.calamr.attr.ComponentAlignmentGraphEdge(context, capacity=0, flow=0)[source]¶

Bases: GraphEdge

An edge that spans graph components.

ATTRIB_TYPE: ClassVar[str] = 'component alignment'¶: The attribute type this class represents.

__init__(context, capacity=0, flow=0)¶

class zensols.calamr.attr.ComponentCorefAlignmentGraphEdge(context, capacity=0, flow=0, relation=None, is_bipartite=False)[source]¶

Bases: ComponentAlignmentGraphEdge

An edge that spans graph components.

ATTRIB_TYPE: ClassVar[str] = 'component coref alignment'¶: The attribute type this class represents.

__init__(context, capacity=0, flow=0, relation=None, is_bipartite=False)¶

is_bipartite: bool = False¶: Whether the coreference spans components.

relation: Relation = None¶: The AMR coreference relation between this node and all other refs.

class zensols.calamr.attr.ConceptGraphNode(context, sent, token_aligns, triple)[source]¶

Bases: TripleGraphNode

Attribute data from AMR concept nodes grafted on to the igraph.Graph.

ATTRIB_TYPE: ClassVar[str] = 'concept'¶: The attribute type this class represents.

__init__(context, sent, token_aligns, triple)¶

property has_roleset: bool¶

property instance: str¶: The concept instance, such as the propbank entry (i.e. see-01). Other examples include nouns.

property roleset: Roleset¶

property roleset_embedding: Tensor¶

property roleset_id: RolesetId¶

property token_embedding: Tensor | None¶

class zensols.calamr.attr.DocumentGraphEdge(context, capacity=0, flow=0, relation='')[source]¶

Bases: GraphEdge

An edge that has data about the non-AMR parts of the graph, such as sentence.

ATTRIB_TYPE: ClassVar[str] = 'doc'¶: The attribute type this class represents.

__init__(context, capacity=0, flow=0, relation='')¶

relation: str = ''¶: The edge relation between two document nodes or document to igraph node.

class zensols.calamr.attr.DocumentGraphNode(context, level_name, doc_node)[source]¶

Bases: GraphNode

A node that has data about the non-AMR parts of the graph, such as the unifying top level node that ties the sentences together. However, it can contain the root to an AMR sentence (see AmrDocumentNode).

ATTRIB_TYPE: ClassVar[str] = 'doc'¶: The attribute type this class represents.

__init__(context, level_name, doc_node)¶

doc_node: DocumentNode¶: The document node associated with the attached igraph node.

level_name: str¶: The descriptive name of the node such as doc or section.

class zensols.calamr.attr.DocumentNode(context, name, root, children)[source]¶

Bases: GraphNode

A composite of a node in the document tree that are associated with the FeatureDocument as root node. This class represents nodes in a graph that either:

make up the part of the graph that’s disjoint from the AMR sentinel subgraphs (i.e. a root doc node), or

the root to an AMR sentence (see AmrDocumentNode)

The in-memory object graph of these instances are dependent on the type of data it represents. For example, the Proxy Report corpus has a top level a summary and body nodes with AMR sentences below (root on top).

__init__(context, name, root, children)¶

children: Tuple[DocumentNode, ...]¶: The children of this node with respect to the composite pattern.

property children_by_name: DocumentNode¶: The children’s names as keys and respective document nodes as capacitys.

name: str¶: The descriptive name of the node such as doc or section.

root: AmrFeatureDocument¶: The owning feature document containing all sentences/tokens of the graph.

property sents: Tuple[AmrFeatureSentence]¶: The sentences of the this document level.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.attr.GraphEdge(context, capacity=0, flow=0)[source]¶

Bases: GraphAttribute

Graph attriubte data added to the igraph.Graph edges.

MAX_CAPACITY: ClassVar[float] = 10000000000.0¶

Maximum value a capacity.

Implementation note: It seems igraph can only handle large values to represent infinity, and not float inf or the system defined largest float value.

__init__(context, capacity=0, flow=0)¶

property capacity: float¶: The capacity of the edge.

capacity_str(precision=None)[source]¶

Return type:: str

property flow: float¶: The flow calculated by the maxflow algorithm.

flow_str(precision=None)[source]¶

Return type:: str

property value_str: str¶

class zensols.calamr.attr.GraphNode(context)[source]¶

Bases: GraphAttribute

Graph attribute data added to the igraph.Graph vertexes.

__init__(context)¶

property partition: int¶

class zensols.calamr.attr.RoleGraphEdge(context, sent, token_aligns, capacity=0, flow=0, triple=None, role=None)[source]¶

Bases: GraphEdge, SentenceGraphAttribute

Attribute data from the AMR role edges grafted on to the igraph.Graph.

ATTRIB_TYPE: ClassVar[str] = 'role'¶: The attribute type this class represents.

__init__(context, sent, token_aligns, capacity=0, flow=0, triple=None, role=None)¶

role: Union[str, Role] = None¶: The role name of the edge such as :ARG0.

triple: Union[Instance, Attribute] = None¶: The AMR Penman graph triple.

class zensols.calamr.attr.SentenceGraphAttribute(context, sent, token_aligns)[source]¶

Bases: GraphAttribute

A node containing zero or more tokens with its parent sentence. Usually the AMR node represents a single token, but can have more than one token alignment.

__init__(context, sent, token_aligns)¶

property has_token_embedding: bool¶: Whether this attriubte node has token embeddings.

property indices: Tuple[int, ...]¶: Return the concatenated list of indices of the alginments.

sent: AmrFeatureSentence¶: The sentence from which this node was created.

property token_align_str: str¶: A string representation of the AMR Penman representation of the token alignment.

token_aligns: Tuple[Union[Alignment, RoleAlignment], ...]¶: The node to sentinel token index.

property tokens: Tuple[FeatureToken, ...]¶: The tokens

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.attr.SentenceGraphEdge(context, capacity=0, flow=0, relation='', sent=None, sent_ix=None)[source]¶

Bases: DocumentGraphEdge

An edge from a document node to a SentenceGraphNode.

ATTRIB_TYPE: ClassVar[str] = 'sentence'¶: The attribute type this class represents.

__init__(context, capacity=0, flow=0, relation='', sent=None, sent_ix=None)¶

sent: AmrFeatureSentence = None¶: The sentence from which this node was created.

sent_ix: int = None¶: The sentence index.

class zensols.calamr.attr.SentenceGraphNode(context, sent, sent_ix)[source]¶

Bases: GraphNode

A graph node containing the root of a sentence.

ATTRIB_TYPE: ClassVar[str] = 'sentence'¶: The attribute type this class represents.

SENT_TEXT_LEN: ClassVar[int] = 20¶: The truncated sentence length

__init__(context, sent, sent_ix)¶

sent: AmrFeatureSentence¶: The sentence from which this node was created.

sent_ix: int¶: The sentence index.

class zensols.calamr.attr.TerminalGraphEdge(context, capacity=0, flow=0)[source]¶

Bases: GraphEdge

An edge that connects to terminal a TerminalGraphNode.

ATTRIB_TYPE: ClassVar[str] = 'terminal'¶: The attribute type this class represents.

__init__(context, capacity=0, flow=0)¶

class zensols.calamr.attr.TerminalGraphNode(context, is_source)[source]¶

Bases: GraphNode

A flow control: source or sink.

ATTRIB_TYPE: ClassVar[str] = 'control'¶: The attribute type this class represents.

__init__(context, is_source)¶

is_source: bool¶: Whether or not this source (s) or sink (t).

class zensols.calamr.attr.TripleGraphNode(context, sent, token_aligns, triple)[source]¶

Bases: SentenceGraphAttribute, GraphNode

Contains a Penman triple with token alignments used for concepts and AMR attributes. Instances of this class get their embedding via SentenceGraphAttribute._get_embedding().

__init__(context, sent, token_aligns, triple)¶

triple: Union[Instance, Attribute]¶: The AMR Penman graph triple.

property variable: str¶: The variable, which comes from the source of the triple, such as s0.

zensols.calamr.cli module¶

Command line entry point to the application.

class zensols.calamr.cli.ApplicationFactory(*args, **kwargs)[source]¶

Bases: ApplicationFactory

__init__(*args, **kwargs)[source]¶

classmethod get_resource(*args, **kwargs)[source]¶

A client facade (GoF) for Calamr annotated AMR corpus access and alginment.

Return type:: Resource

zensols.calamr.cli.main(args=['/Users/landes/opt/lib/python/util/bin/sphinx-build', '-M', 'html', '/Users/landes/view/nlp/calamr/target/doc/src', '/Users/landes/view/nlp/calamr/target/doc/build'], **kwargs)[source]¶

Return type:: ActionResult

zensols.calamr.comp module¶

Base graph component class.

class zensols.calamr.comp.GraphComponent(graph)[source]¶

Bases: PersistableContainer, Writable

A container class for an igraph.Graph, which also has caching data structures for fast access to graph attributes.

GRAPH_ATTRIB_NAME: ClassVar[str] = 'ga'¶: The name of the graph attributes on igraph nodes and edges.

ROOT_ATTRIB_NAME: ClassVar[str] = 'root'¶: The attribute that identifies the root vertex.

__init__(graph)¶

property adjacency_list: List[List[int]]¶

“An adjacency list of vertexes based on their relation to each other in the graph. The outer list’s index is the source vertex and the inner list is that vertex’s neighbors.

Implementation note: the list is sub-setted at both the inner and outer level for those vertexes in this component.

clone(reverse_edges=False, deep=True, **kwargs)[source]¶

Clone an instance and return it. The graph is deep copied, but all GraphAttribute instances are not.

Parameters:

reverse_edges (bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithm
deep (bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed of vs; otherwise the graph is copied as a subcomponent starting from root
kwargs – arguments to add to as attributes to the clone; include cls is the type of the new instance

Return type:

GraphComponent

Returns:

the cloned instance of this instance

copy_graph(reverse_edges=False, subgraph_type=None)[source]¶

Return a copy of the class:igraph.Graph.

Parameters:

reverse_edges (bool) – whether to reverse the direction on all edges in the graph
subgraph_type (str) – the method of creating a subgraph, which is either induced to create it from the nodes of the graph; or sub (the default) to create it from the subcomponent from the root

Return type:

Graph

static create_edges(g, es)[source]¶

Add the edges of list es and return them after being added.

Return type:: Tuple[Edge]

static create_vertexes(g, n)[source]¶

Create n vertexes and return them after being added.

Return type:: Tuple[Vertex]

deallocate()[source]¶: Deallocate all resources for this instance.

delete_edges(edges, include_nodes=False)[source]¶

Remove edges from the graph.

Parameters:

edges (Iterable[GraphEdge]) – the edges to remove form the graph
include_nodes (bool) – whether to remove nodes that become orphans after deleting edges

Return type:

Tuple[Set[GraphEdge], Set[GraphNode]]

Returns:

a tuple the edges and nodes removed

edge_by_graph_edge_id(ge_id)[source]¶

Return a edge based on the graph (attribute) edge id.

Return type:: GraphEdge

edge_by_id(ix)[source]¶

Return the edge for the vertex ID.

Return type:: GraphEdge

edge_ref_by_id(ix)[source]¶

Get the igraph.Edge instance by its index.

Return type:: Edge

property edges_reversed: bool¶

Whether the edge direction in the graph is reversed. This is True for reverse flow graphs.

See:: summary.ReverseFlowGraphAlignmentConstructor

property es: Dict[Edge, GraphEdge]¶: The igraph to domain object edge mapping.

get_attributes()[source]¶

Return all graph attributes of the component, which include instances of both GraphNode and GraphEdge.

Return type:: Iterable[GraphAttribute]

property graph: Graph¶: The graph used for computational manipulation of the synthesized AMR sentences.

property graph_edge_id_to_edge_ref: Dict[int, int]¶: Graph node index to vertex index.

property graph_edge_to_edge: Dict[GraphEdge, Edge]¶: A mapping from graph nodes to vertexes.

static graph_instance()[source]¶

Create a new directory nascent graph.

Return type:: Graph

property graph_node_id_to_vertex_ref: Dict[int, int]¶: Graph node index to vertex index.

invalidate()[source]¶: Clear cached data structures to force them to be recreated after igraph level data has changed.

node_by_graph_node_id(gn_id)[source]¶

Return a node based on the graph (attribute) node id.

Return type:: GraphNode

node_by_id(ix)[source]¶

Return the graph node for the vertex ID.

Return type:: GraphNode

property node_to_vertex: Dict[GraphNode, Vertex]¶: A mapping from graph nodes to vertexes.

property root: Vertex | None¶: The singular (first found) root of the graph, which is usually the top level DocumentNode instance.

property roots: Iterable[Vertex]¶: The roots of the graph, which are usually top level DocumentNode instances.

select_edges(**kwargs)[source]¶

Return matched graph edges from an igraph.Graph.vs.select().

Return type:: Iterable[Edge]

select_vertices(**kwargs)[source]¶

Return matched graph nodes from an igraph.Graph.vs.select().

Return type:: Iterable[Vertex]

classmethod set_edge(e, ge)[source]¶: Set the graph edge data in the igraph edge.

classmethod set_node(v, n)[source]¶: Set the graph node data in the igraph vertex.

classmethod to_edge(e)[source]¶

Narrow a vertex to a edge.

Return type:: GraphEdge

classmethod to_node(v)[source]¶

Narrow a vertex to a node.

Return type:: GraphNode

vertex_ref_by_id(ix)[source]¶

Get the igraph.Vertex instance by its index.

Return type:: Vertex

property vs: Dict[Vertex, GraphNode]¶: The igraph to domain object vertex mapping.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

Write the contents of this instance to writer using indention depth.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

zensols.calamr.ctrl module¶

Document graph controller implementations.

class zensols.calamr.ctrl.AlignmentCapacitySetDocumentGraphController(name, min_capacity, capacity)[source]¶

Bases: DocumentGraphController

Set the capacity on edges if the criteria matches min_flow, component_names and match_edge_classes.

__init__(name, min_capacity, capacity)¶

capacity: float¶: The capacity to set.

min_capacity: float¶: The minimum capacity to clamping the capacity of a target GraphEdge to capacity.

class zensols.calamr.ctrl.ConstructDocumentGraphController(name, build_graph_name, constructor, renderer)[source]¶

Bases: DocumentGraphController

Constructs the graph that will later be used for the min cut/max flow algorithm (see MaxflowDocumentGraphController). After its invoke() method is called, build_graph is available, which is the constructed graph provided by constructor.

__init__(name, build_graph_name, constructor, renderer)¶

build_graph_name: str¶: The name given to newly instances of DocumentGraph.

constructor: GraphAlignmentConstructor¶: The constructor used to get the source and sink nodes.

renderer: GraphRenderer¶: Visually render the graph in to a human understandable presentation.

class zensols.calamr.ctrl.FixReentrancyDocumentGraphController(name, component_name, maxflow_controller, only_report)[source]¶

Bases: DocumentGraphController

Fix reentrancies by splitting the flow of the last calculated maxflow as the capacity of the outgoing edges in the reversed graph. This fixes the issue edges getting flow starved, then later eliminated in the graph reduction steps.

Subsequently, the maxflow algorithm is rerun if we have at least one reentrancy after reallocating the capacit(ies).

__init__(name, component_name, maxflow_controller, only_report)¶

component_name: str¶: The name of the components to restore.

maxflow_controller: MaxflowDocumentGraphController¶: The maxflow component used to recalculate the maxflow .

only_report: bool¶: Whether to only report reentrancies rather than fix them.

class zensols.calamr.ctrl.FlowDiscountDocumentGraphController(name, discount_sum, component_names=<factory>)[source]¶

Bases: DocumentGraphController

Decrease/constrict the capacities by making the sum of the incoming flows from the bipartitie edges the value of discount_sum. The capacities are only updated if the sum of the incoming bipartitie edges have a flow greater than discount_sum.

__init__(name, discount_sum, component_names=<factory>)¶

component_names: Set[str]¶: The name of the components to discount.

discount_sum: float¶: The capacity sum will be this value (see class docs).

class zensols.calamr.ctrl.FlowSetDocumentGraphController(name, component_names=<factory>, match_edge_classes=<factory>, flow=0)[source]¶

Bases: DocumentGraphController

Set a static flow on components based on name and edges based on class.

__init__(name, component_names=<factory>, match_edge_classes=<factory>, flow=0)¶

component_names: Set[str]¶: The components on which to set the flow.

flow: float = 0¶: The flow to set.

match_edge_classes: Set[Type[GraphEdge]]¶: The edge classes (i.e. TerminalGraphEdge) to set the flow.

class zensols.calamr.ctrl.MaxflowDocumentGraphController(name, constructor)[source]¶

Bases: DocumentGraphController

Executes the maxflow/min cut algorithm on a document graph.

__init__(name, constructor)¶

constructor: GraphAlignmentConstructor¶: The constructor used to get the source and sink nodes.

reset()[source]¶: Clears the cached build_graph instance.

class zensols.calamr.ctrl.NormFlowDocumentGraphController(name, component_names, constructor, normalize_mode='fpn')[source]¶

Bases: DocumentGraphController

Normalizes flow on edges as the flow going through the edge and the total number of descendants. Descendants are counted as the edge’s source node and all children/descendants of that node.

This is done recursively to calculate flow per node. For each call recursive iteration, it computes the flow per node of the parent edge(s) from the perspective of the nascent graph, (root at top with arrows pointed to children underneath). However, the graph this operates on are the reverese flow max flow graphs (flow diretion is taken care of adjacency list computed in GraphComponent.

Since an AMR node can have multiple parents, we keep track of descendants as a set rather than a count to avoid duplicate counts when nodes have more than one parent. Otherwise, in multiple parent case, duplicates would be counted when the path later converges closer to the root.

__init__(name, component_names, constructor, normalize_mode='fpn')¶

component_names: Set[str]¶: The name of the components to minimize.

constructor: GraphAlignmentConstructor¶: The instance used to construct the graph passed in the invoke() method.

normalize_mode: str = 'fpn'¶

How to normalize nodes (if at all), which is one of:

fpn: leaves flow values as they were after the initial flow per node
calculation
norm: normalize so all values add to one
vis: same as norm but add a vis_flow attribute to the edges
so the original flow is displayed and visualized as the flow color

class zensols.calamr.ctrl.RemoveAlignsDocumentGraphController(name, min_capacity)[source]¶

Bases: DocumentGraphController

Removes graph component alignment for low capacity links.

__init__(name, min_capacity)¶

min_capacity: float¶: The graph component alignment edges are removed if their capacities are at or below this value.

class zensols.calamr.ctrl.RoleCapacitySetDocumentGraphController(name, min_flow, capacity, component_names)[source]¶

Bases: DocumentGraphController

This finds low flow role edges and sets (zeros out) all the capacities of all the connected edge alignments recursively for all descendants. We “slough off” entire subtrees (sometimes entire sentences or document nodes) for low flow ancestors.

__init__(name, min_flow, capacity, component_names)¶

capacity: float¶: The capacity (and flow) to set.

component_names: Set[str]¶: The name of the components to minimize.

min_flow: float¶: The minimum amount of flow to trigger setting the capacity of a target GraphEdge capacity to capacity.

class zensols.calamr.ctrl.SnapshotDocumentGraphController(name, component_names, snapshot_source)[source]¶

Bases: DocumentGraphController

Record flows, then later restore. If snapshot_source is not None, then this instance restores from it. Otherwise it records.

__init__(name, component_names, snapshot_source)¶

component_names: Set[str]¶: The name of the components on which to record or restore flows.

reset()[source]¶: Clears the cached build_graph instance.

snapshot_source: SnapshotDocumentGraphController¶: The source instance that contains the data from which to restore.

zensols.calamr.dcomp module¶

A document centric graph component.

class zensols.calamr.dcomp.DocumentGraphComponent(graph, root_node, sent_index=<factory>, description=None)[source]¶

Bases: GraphComponent

A class containing the root information of the document tree and the igraph.Graph vertex. When the igraph.Graph is set with the graph property, a strongly connected subgraph component is induced. It does this by traversing all reachable verticies and edges from the root. Examples of these induced components include source and summary components of a document AMR graph.

Instances are created by DocumentGraphFactory.

__init__(graph, root_node, sent_index=<factory>, description=None)¶

clone(reverse_edges=False, deep=True, **kwargs)[source]¶

Clone an instance and return it. The graph is deep copied, but all GraphAttribute instances are not.

Parameters:

reverse_edges (bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithm
deep (bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed of vs; otherwise the graph is copied as a subcomponent starting from root
kwargs – arguments to add to as attributes to the clone; include cls is the type of the new instance

Return type:

GraphComponent

Returns:

the cloned instance of this instance

deallocate()[source]¶: Deallocate all resources for this instance.

description: str = None¶: A description of the component used for debugging.

property doc_vertices: Iterable[Vertex]¶: Get the vertices of DocuemntGraphNode. This only fetches those document nodes that do not branch.

get_attributes()[source]¶

Return all graph attributes of the component, which include instances of both GraphNode and GraphEdge.

Return type:: Iterable[GraphAttribute]

property name: str¶: Return the name of the AMR document node.

property relation_set: RelationSet¶: The relations in the contained root node document.

property root: Vertex | None¶: The roots of the graph, which are usually top level DocumentNode instances.

root_node: AmrDocumentNode¶: The root of the document tree.

sent_index: SentenceIndex¶: An index of the sentences of a DocumentGraphComponent.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

Write the contents of this instance to writer using indention depth.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.dcomp.SentenceEntry(node=None, concepts=None, attributes=None)[source]¶

Bases: Dictable

Contains the sentence node of a sentence, and the respective concept and attribute nodes.

__init__(node=None, concepts=None, attributes=None)¶

attributes: Tuple[AttributeGraphNode] = None¶: The AMR attribute nodes of the sentence.

property concept_by_variable: Dict[str, ConceptGraphNode]¶

concepts: Tuple[ConceptGraphNode] = None¶: The AMR concept nodes of the sentence.

node: SentenceGraphNode = None¶: The sentence node, which is the root of the sentence subgraph.

class zensols.calamr.dcomp.SentenceIndex(entries=None)[source]¶

Bases: Dictable

An index of the sentences of a DocumentGraphComponent.

__init__(entries=None)¶

property by_sentence: Dict[AmrFeatureSentence, SentenceEntry]¶

entries: Tuple[SentenceEntry] = None¶: Then entries of the index, each of which is a sentence.

zensols.calamr.doc module¶

Document based graph container, factory and strategy classes.

class zensols.calamr.doc.DocumentGraph(graph, name, graph_attrib_context, doc, components, children=<factory>)[source]¶

Bases: GraphComponent

A graph containing the text, text features, AMR Penman graph and igraph.

This class roughly follows a GoF composite pattern with children a collection of instance of this class, which are the reversed source and summary graphs created for the max flow algorithm. The root is constructed from the DocumentGraphFactory class and the children are built by the DocumentGraphController instances.

The children of this composite are not to be confused with components, which are the disconnected source and summary graph components in the root graph instance. Each child also has the reversed flow graphs, but are connected as a bipartite flow graph for use by the max flow algorithm.

__init__(graph, name, graph_attrib_context, doc, components, children=<factory>)¶

add_child(child)[source]¶

Add a child graph.

See:: children

property bipartite_relation_set: RelationSet¶: The bipartite relations that span components. This set includes all top level relations that are not self contained in any components.

children: Dict[str, DocumentGraph]¶: The children of this instance, which for now, are only instances of FlowDocumentGraph.

clone(reverse_edges=False, deep=True, **kwargs)[source]¶

Clone an instance and return it. The graph is deep copied, but all GraphAttribute instances are not.

Parameters:

reverse_edges (bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithm
deep (bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed of vs; otherwise the graph is copied as a subcomponent starting from root
kwargs – arguments to add to as attributes to the clone; include cls is the type of the new instance

Return type:

GraphComponent

Returns:

the cloned instance of this instance

component_iter()[source]¶

Return an iterable of the components of this graph and recursively over the children.

Return type:: Iterable[GraphComponent]

components: Tuple[DocumentGraphComponent, ...]¶: The roots of the trees created by the DocumentGraphFactory.

property components_by_name: Dict[str, DocumentGraphComponent]¶: Get document graph components by name.

property components_by_name_sorted: Tuple[Tuple[str, DocumentGraphComponent], ...]¶: Get document graph components sorted name.

deallocate()[source]¶: Deallocate all resources for this instance.

delete_edges(edges, include_nodes=False)[source]¶

Remove edges from the graph.

Parameters:

edges (Iterable[GraphEdge]) – the edges to remove form the graph
include_nodes (bool) – whether to remove nodes that become orphans after deleting edges

Returns:

a tuple the edges and nodes removed

doc: AmrFeatureDocument¶: The document that represents the graph.

get_containing_component(n)[source]¶

Return the component that contains graph node n.

Return type:: DocumentGraphComponent

property graph_attrib_context: GraphAttributeContext¶: The context given to all nodees and edges of the graph.

name: str¶: The name of the graph used to identify it. For now, this is only reversed_source for the graph that flows from the summary to the source, and ``reversed_summary for the graph that flows from the source to the summary. These are “reversed” because the flow is reversed from the leaf nodes to the root.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

Write the contents of this instance to writer using indention depth.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.doc.DocumentGraphDecorator[source]¶

Bases: ABC

A strategy to create a graph from a document structure.

__init__()¶

abstract decorate(component)[source]¶

Creates the graph from a DocumentNode root node.

Parameters:: component (DocumentGraphComponent) – the graph to populate from the decorateing process

class zensols.calamr.doc.DocumentGraphFactory(config_factory, graph_decorators, doc_graph_section_name, graph_attrib_context)[source]¶

Bases: ABC

Creates a document graph. After the document portion of the graph is created, the igraph is built and merged using a DocumentGraphDecorator. This igraph has the corresponding vertexes and edges associated with the document graph, which includes AMR Penman graph and feature document artifacts.

__init__(config_factory, graph_decorators, doc_graph_section_name, graph_attrib_context)¶

config_factory: ConfigFactory¶: Used to create a DocumentGraphDecorator.

create(root)[source]¶

Create a document graph and return it starting from the root note. See class docs.

Parameters:: root (AmrFeatureDocument) – the feature document from which to create the graph
Return type:: DocumentGraph

doc_graph_section_name: str¶: The name of a section in the configuration that defines new instances of DocumentGraph.

graph_attrib_context: GraphAttributeContext¶: The context given to all nodees and edges of the graph.

graph_decorators: Tuple[DocumentGraphDecorator, ...]¶: The name of the section that defines a DocumentGraphDecorator instance.

zensols.calamr.domain module¶

Classes that organize document in content in to a hierarchy.

Terminology: token alignments refer to the sentence index based token alignments to AMR nodes. This is not to be confused with alignment edges (aka graph component alignment edges).

exception zensols.calamr.domain.ComponentAlignmentError(msg, sent=None)[source]¶

Bases: AmrError

Package level errors.

__module__ = 'zensols.calamr.domain'¶

class zensols.calamr.domain.ComponentAlignmentFailure(exception=None, thrower=None, traceback=None, message=None)[source]¶

Bases: Failure

Package level failures.

__init__(exception=None, thrower=None, traceback=None, message=None)¶

class zensols.calamr.domain.EmbeddingResource(torch_config, word_piece_doc_parser=None, word_piece_doc_factory=None, roleset_stash=None)[source]¶

Bases: object

Generates embeddings for roles, role sets, text, and feature tokens.

__init__(torch_config, word_piece_doc_parser=None, word_piece_doc_factory=None, roleset_stash=None)¶

get_role_embedding(role)[source]¶

Return an embedding for a role. This uses the role’s relation’s embedding if available. Otherwise, it uses the embedding created fromi the role’s prefix.

Return type:: Tensor

get_roleset_embedding(roleset_id, roleset)[source]¶

Return the embedding for roleset.

Parameters:

roleset_id (Optional[RolesetId]) – the role’s ID, which is used when roleset is None
roleset (Roleset) – the role set to use for the embedding if available

Return type:

Tensor

get_sentence_tokens_embedding(sent)[source]¶

Return the sentence embeddings of sent.

Return type:: Tensor

get_token_embedding(text)[source]¶

Return the mean of the token embeddings of text.

Return type:: Tensor

get_tokens_embedding(tokens)[source]¶

Return the mean of the embeddings of tokens.

Return type:: Tensor

get_word_piece_document(text)[source]¶

Return a word piece document parsed from text.

Return type:: WordPieceFeatureDocument

roleset_stash: Stash = None¶: A stash with RolesetId as keys and Roleset as values.

torch_config: TorchConfig¶: Used to create unknown_edge_embedding

property unknown_edge_embedding: Tensor¶: A zero embedding.

property unknown_node_embedding: Tensor¶: A zero embedding.

word_piece_doc_factory: WordPieceFeatureDocumentFactory = None¶: Creates word piece data structures that have embeddings.

word_piece_doc_parser: FeatureDocumentParser = None¶: Used to get single token embeddings for nodes with no token alignments.

class zensols.calamr.domain.GraphAttribute(context)[source]¶

Bases: PersistableContainer, Dictable

Contains AMR document attribute data added to the igraph.Graph. This is added as vertexes or edge attribute data.

ATTRIB_TYPE: ClassVar[str] = 'base'¶: The attribute type this class represents.

__init__(context)¶

property attrib_type: str¶: The attribute type this class represents.

context: GraphAttributeContext¶: Contains context data used by nodes and edges of the graph.

deallocate()[source]¶: Deallocate all resources for this instance.

property description: str¶: A human readable description that is usually used as the label and __str__().

property embedding: Tensor¶: The default embedding of the attribute. Note that some attributes have several different embeddings.

property embedding_resource: EmbeddingResource¶: Generates embeddings for roles, role sets, text, and feature tokens.

property id: int¶: The unqiue identifier for this graph attribute.

property label: str¶: Text used when rendering graphs.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.domain.GraphAttributeContext(embedding_resource, relation_stash, default_capacity, sink_capacity, component_alignment_capacity, doc_capacity, similarity_threshold, default_format_strlen)[source]¶

Bases: Dictable

Contains context data used by nodes and edges of the graph.

__init__(embedding_resource, relation_stash, default_capacity, sink_capacity, component_alignment_capacity, doc_capacity, similarity_threshold, default_format_strlen)¶

component_alignment_capacity: float¶: The default initial capacity for source/summary component alignment edges.

default_capacity: float¶: The default initial capacity for inter-AMR edges.

default_format_strlen: int¶: The default capacity and flow string formatting label length.

doc_capacity: float¶: The bipartitie (between source and summary) capacity value of DocumentGraphNode.

embedding_resource: EmbeddingResource¶: The manager that contains vectorizers that create node and edge embeddings.

relation_stash: Stash¶

`~zensols.propbankdb.domain..Relation.

Type:: Creates instances of role
Type:: class

reset_attrib_id()[source]¶: Reset the unique attribute ID counter.

similarity_threshold: float¶: The (range [0, 1]) necessary to allow component alignment edges from the source to summary graph..

sink_capacity: float¶: The value to use as the sink terminal node.

to_role(role_str)[source]¶

Return type:: Role

class zensols.calamr.domain.Role(label)[source]¶

Bases: Dictable

Represents an AMR role, which is a label on the edge an an AMR graph such as :ARG0-of.

__init__(label)[source]¶

index: Optional[int]¶: The prefix of the role (i.e. ARG in :ARG0-of).

is_inverted: bool¶: True if the role is inverted (i.e. has of in :ARG0-of).

label: str¶

ARG0-of``).

Type:: The surface name of the role (i.e. `

prefix: str¶: The prefix of the role (i.e. ARG in :ARG0-of).

relation: Optional[Relation]¶: The relation metadata, which has the same label as this role.

zensols.calamr.flow module¶

Provides container classes and computes statistics for graph alignments.

class zensols.calamr.flow.Flow(source, target, edge)[source]¶

Bases: Dictable

A triple of a source node, target node and connecting edge from the graph. The connecting edge has a flow value associated with it.

__init__(source, target, edge)¶

edge: GraphEdge¶: The edge that connects source and target.

property edge_type: str¶: Whether the edge is an AMR role or ``align``ment.

source: GraphNode¶: The starting node in the DAG.

target: GraphNode¶: The ending node (arrow head) in the DAG.

property to_row: List[Any]¶: Create a row from the data in this flow used in FlowDocumentGraphComponent.create_df().

class zensols.calamr.flow.FlowDocumentGraph(graph, name, graph_attrib_context, doc, components, children=<factory>)[source]¶

Bases: DocumentGraph

Contains all the flows of a DocumentGraph and has FlowDocumentGraphComponent as components. Instances of this document graph have no children.

__init__(graph, name, graph_attrib_context, doc, components, children=<factory>)¶

class zensols.calamr.flow.FlowDocumentGraphComponent(graph, root_node, sent_index=<factory>, description=None, reentrancy_set=None)[source]¶

Bases: DocumentGraphComponent

Contains all the flows of a DocumentComponent.

__init__(graph, root_node, sent_index=<factory>, description=None, reentrancy_set=None)¶

property flows: Tuple[Flow, ...]¶: The flows aggregated from the document components.

reentrancy_set: ReentrancySet = None¶: Concept nodes with multiple parents.

property result: FlowGraphComponentResult¶: The flow results for this component.

property root_flow: Flow¶: The root flow of the document component, which has the component’s DocumentGraphNode as the source node and the sink as the target node.

property root_flow_value: float¶: The flow from the root node to the sink in the reversed graph.

class zensols.calamr.flow.FlowGraphComponentResult(component)[source]¶

Bases: Dictable

A container class for the flow data from a DocumentComponent flow instance (aka reverse flow graph). This includes the data as dictionaries of statistics, pandas.DataFrame and DataDescriber instances.

__init__(component)[source]¶

property connected_stats: Dict[str, int | float]¶

The statistics on how well the two graphs are aligned by counting as:

alignable: the number of nodes that are eligible for having an
alignment (i.e. sentence, concept, and attribute notes)
aligned: the number aligned nodes in the
FlowDocumentGraphComponent this instance holds
aligned_portion: the quotient of $aligned / alignable$, which is
a number between $[0, 1]$ representing a score of how well the two graphs match

create_data_frame_describer()[source]¶

Like create_align_df() but includes a human readable description of the data.

Return type:: DataFrameDescriber

property df: pd.DataFrame¶

The data in flows and root as a dataframe. Note the terms source and target refer to the nodes at the ends of the directed edge in a reversed graph.

s_descr: source node descriptions such as concept names,
attribute constants and sentence text

t_descr: target node of s_descr

s_toks: any source node aligned tokens

t_toks: any target node aligned tokens

s_attr: source node attribute name give by
GraphAttribute.attrib_type, such as doc, sentence, concept, attribute

t_attr: target node of ``s_attr

s_id: source node igraph ID

t_id: target node igraph ID

edge_type: whether the edge is an AMR role or alignment

rel_id: the coreference relation ID or null if the edge is
not a corefernce

is_bipartite: whether relation rel_id spans components or
null if the edge is not a coreference

flow: the (normalized/flow per node) flow of the edge

reentrancy: whether the edge participates an AMR reentrancy

align_flow: the flow sum of the alignment edges for the
respective edge

align_count: the count of incoming alignment edges to the target
node in the FlowDocumentGraphComponent this instance holds

property edge_counts: Dict[Type[GraphEdge], int]¶: The number of edges by type.

property n_alignable_nodes: int¶: The number of nodes in the component that can take alignment edges. Whether those nodes in the count have edges does not effect the result.

property node_counts: Dict[Type[GraphNode], int]¶: The number of nodes by type.

property stats: Dict[str, Any]¶

All statistics/scores available for this instances, which include:

root_flow: the flow from the root node to the sink
connected: connected_stats

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

class zensols.calamr.flow.FlowGraphResult(component_paths, context, data)[source]¶

Bases: PersistableContainer, Dictable

A container class for flow document results, which include the detailed data as dictionaries of statistics, pandas.DataFrame and DataDescriber instances. This is aggregated from doc_graph and the flow children’s flow graph components.

All graphs (from nascent to the reversed flow children graphs) have the final state of the actions of the DocumentGraphController as coordinated by the GraphSequencer. Since the flows are copied from the reversed source graph to the root level (doc_graph) factory built nascent graph, all flows are the same. However, the nascent graph will still be the disconnect source and summary graphs.

__init__(component_paths, context, data)[source]¶

Initialize the flow results.

Parameters:

data (Union[DocumentGraph, ComponentAlignmentFailure]) – the root nascent DocumentGraphFactory build graph or an instance of ComponentAlignmentFailure if the alignment failed
component_paths (Tuple[Tuple[str, str], ...]) – a set of paths that indicate which flow components to use for the results in the form (<child name>, <component name>)

create_data_describer()[source]¶

Like create_align_df() but includes a human readable description of the data.

Throws ComponentAlignmentError:: if this instanced resulted in an error
See:: is_error
Return type:: DataDescriber

deallocate()[source]¶: Deallocate all resources for this instance.

property df: pd.DataFrame¶

A concatenation of frames created with FlowDocumentGraphComponent.create_align_df() with the name of each component.

Throws ComponentAlignmentError:: if this instanced resulted in an error
See:: is_error

property doc_graph: DocumentGraph¶

The root nascent DocumentGraphFactory build graph.

Throws ComponentAlignmentError:: if this instanced resulted in an error
See:: is_error

property failure: ComponentAlignmentFailure | None¶: What caused the alignment to fail, or None if it was a success.

get_render_contexts(child_names=None, include_nascent=False)[source]¶

Get contexts used to render the graphs with render.base.rendergroup.

Parameters:

child_names (Iterable[str]) – the name of the DocumentGraph.children to render, which defaults the the nascent grah and the final bipartite graph rendered (“restore previous flow on source”)
include_nascent (bool) – whether to include the nascent graphs

Return type:

List[RenderContext]

property is_error: bool¶: Whether the graph resulted in an error.

render(contexts=None, graph_id='graph', display=True, directory=None)[source]¶

Render several graphs at a time, then optionally display them.

Parameters:

contexts (Tuple[RenderContext]) – the data to render, which defaults to the output of get_render_contexts()
graph_id (str) – a unique identifier prefixed to files generated if none provided in the call method
display (bool) – whether to display the files after generated
directory (Path) – the directory to create the files in place of the temporary directory; if provided the directory is not removed after the graphs are rendered

property stats: Dict[str, Any]¶

The statistics with keys as component names and values taken from FlowDocumentGraphComponent.stats.

Throws ComponentAlignmentError:: if this instanced resulted in an error
See:: is_error

property stats_df: pd.DataFrame¶: A Pandas dataframe version of stats.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=True)[source]¶

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

zensols.calamr.flowmeta module¶

Metadata for the flow module.

zensols.calamr.morph module¶

Populate an igraph from AMR graphs.

class zensols.calamr.morph.IsomorphDocumentGraphDecorator(config_factory, graph_attrib_context_name)[source]¶

Bases: DocumentGraphDecorator

Populates a igraph.Graph attributes from a DocumentGraph data by adding AMR node and edge information.

__init__(config_factory, graph_attrib_context_name)¶

config_factory: ConfigFactory¶: The configuration factory used to create a GraphAttributeContext.

decorate(comp)[source]¶

Creates the graph from a DocumentNode root node.

Parameters:: component – the graph to populate from the decorateing process

graph_attrib_context_name: str¶: The section name of the GraphAttributeContext context given to all nodees and edges of the graph.

zensols.calamr.proto module¶

Prototyping and cookbook.

zensols.calamr.reentrancy module¶

Reentrancy container classes.

class zensols.calamr.reentrancy.EdgeFlow(edge, flow=None)[source]¶

Bases: PersistableContainer, Dictable

The flow over a graph edge. This keeps the flow of the edge as a “snapshot” of the value at a particular point in the algorithm, before it is modified to fix the issue.

__init__(edge, flow=None)¶

edge: GraphEdge¶: The outgoing (in the reverse graph) edge of the reentrancy.

flow: float = None¶: The flow value at the time of the algorithm.

class zensols.calamr.reentrancy.Reentrancy(concept_node, concept_node_vertex, edge_flows)[source]¶

Bases: PersistableContainer, Dictable

Reentrancies are concept nodes with multiple parents (in the forward graph) and have side effects when running the algorithm.

Note: an AMR (always acyclic) graph with no reentrancies are trees.

__init__(concept_node, concept_node_vertex, edge_flows)¶

concept_node: ConceptGraphNode¶: The concept node of the reentrancy

concept_node_vertex: int¶: The igraph.Vertex.index associated with the node.

edge_flows: Tuple[EdgeFlow]¶: The outgoing edges connected to the reentrant concept_node.

property has_zero_flow: bool¶: Whether the reentracy has any edges with no flow.

property total_flow: float¶: The total flow of all outgoing (in the reverse graph) edges.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

property zero_flows: Tuple[GraphEdge]¶: The edges that have no flow.

class zensols.calamr.reentrancy.ReentrancySet(reentrancies=())[source]¶

Bases: PersistableContainer, Dictable

A set of reentrancies, one for each iteration of the algorithm.

__init__(reentrancies=())¶

property by_vertex: Dict[int, Reentrancy]¶

classmethod combine(sets)[source]¶

Combine by adding reentrancy links for all in sets.

Return type:: ReentrancySet

reentrancies: Tuple[Reentrancy] = ()¶: Concept nodes with multiple parents.

property stats: Dict[str, Any]¶: Get the stats for this set of reentrancies.

zensols.calamr.score module¶

Produces CALAMR scores.

class zensols.calamr.score.CalamrScore(flow_graph_res)[source]¶

Bases: Score

Contains all CALAMR scores.

NAN_INSTANCE = CalamrScore()¶

__init__(flow_graph_res)¶

flow_graph_res: FlowGraphResult¶

class zensols.calamr.score.CalamrScoreMethod(reverse_sents=False, word_piece_doc_factory=None, doc_graph_factory=None, doc_graph_aligner=None)[source]¶

Bases: ScoreMethod

Computes the smatch scores of AMR sentences. Sentence pairs are ordered (<summary>, <source>).

__init__(reverse_sents=False, word_piece_doc_factory=None, doc_graph_factory=None, doc_graph_aligner=None)¶

clear()[source]¶

doc_graph_aligner: DocumentGraphAligner = None¶: Create document graphs.

doc_graph_factory: DocumentGraphFactory = None¶: Create document graphs.

score_annotated_doc(doc)[source]¶

Score a document that has an amr of type AnnotatedAmrDocument.

Raises:: [zensols.amr.domain.AmrError]: if the AMR could not be parsed or aligned
Return type:: CalamrScore

score_pair(smy, src)[source]¶

Return type:: CalamrScore

word_piece_doc_factory: WordPieceFeatureDocumentFactory = None¶: The feature document factory that populates embeddings.

zensols.calamr.stash module¶

Alignment dataframe stash.

class zensols.calamr.stash.FlowGraphRestoreStash(delegate, flow_graph_result_context)[source]¶

Bases: DelegateStash, PrimeableStash

The a stash that restores transient data on FlowGraphResult instances.

__init__(delegate, flow_graph_result_context)¶

exists(name)[source]¶

Return True if data with key name exists.

Implementation note: This Stash.exists() method is very inefficient and should be overriden.

Return type:: bool

flow_graph_result_context: _FlowGraphResultContext¶: Contains in memory/interperter session data needed by FlowGraphResult when it is created or unpickled.

get(name, default=None)[source]¶

Load an object or a default if key name doesn’t exist.

Return type:: FlowGraphResult

load(name)[source]¶

See:: get()
Return type:: FlowGraphResult

class zensols.calamr.stash.FlowGraphResultFactoryStash(anon_doc_stash, doc_graph_aligner, doc_graph_factory, limit=9223372036854775807)[source]¶

Bases: ReadOnlyStash, PrimeableStash

A factory stash that creates aligned FlowGraphResult instances or ComponentAlignmentFailure when the document cannot be aligned.

__init__(anon_doc_stash, doc_graph_aligner, doc_graph_factory, limit=9223372036854775807)¶

anon_doc_stash: Stash¶: Contains human annotated AMRs.

doc_graph_aligner: DocumentGraphAligner¶: Create document graphs.

doc_graph_factory: DocumentGraphFactory¶: Create document graphs.

exists(name)[source]¶

Return True if data with key name exists.

Implementation note: This Stash.exists() method is very inefficient and should be overriden.

Return type:: bool

keys(**kwargs) → Iterable[str]¶

Return an iterable of keys in the collection.

Return type:: Iterable[str]

limit: int = 9223372036854775807¶: The limit of the number of items to create.

load(name)[source]¶

See:: get()
Return type:: FlowGraphResult

prime()[source]¶

zensols.calamr package¶

Subpackages¶

Submodules¶

zensols.calamr.alignconst module¶

zensols.calamr.aligner module¶

zensols.calamr.annotate module¶

zensols.calamr.app module¶

zensols.calamr.attr module¶

zensols.calamr.cli module¶

zensols.calamr.comp module¶

zensols.calamr.ctrl module¶

zensols.calamr.dcomp module¶

zensols.calamr.doc module¶

zensols.calamr.domain module¶

zensols.calamr.flow module¶

zensols.calamr.flowmeta module¶

zensols.calamr.morph module¶

zensols.calamr.proto module¶

zensols.calamr.reentrancy module¶

zensols.calamr.score module¶

zensols.calamr.stash module¶

Module contents¶