zensols.calamr package¶
Subpackages¶
- zensols.calamr.render namespace
- zensols.calamr.summary namespace
- Submodules
- zensols.calamr.summary.alignconst module
ReverseFlowGraphAlignmentConstructorSharedGraphAlignmentConstructorSummaryGraphAlignmentConstructorSummaryGraphAlignmentConstructor.__init__()SummaryGraphAlignmentConstructor.build()SummaryGraphAlignmentConstructor.capacity_calculatorSummaryGraphAlignmentConstructor.component_alignment_capacitiesSummaryGraphAlignmentConstructor.connectionsSummaryGraphAlignmentConstructor.find_flow_diffs()SummaryGraphAlignmentConstructor.sourceSummaryGraphAlignmentConstructor.summary
- zensols.calamr.summary.capacity module
- zensols.calamr.summary.coref module
- zensols.calamr.summary.factory module
Submodules¶
zensols.calamr.alignconst module¶
Defines a class that aligns components of a (bipartite) graph.
- class zensols.calamr.alignconst.GraphAlignmentConstructor(doc_graph=None, source_flow=10000000000.0)[source]¶
Bases:
objectAdds additional nodes and edges that enable the maxflow algorithm to be used on the graph. This include component alignment edges, source node and sink node. Capacities for the component alignment edges are also set.
- __init__(doc_graph=None, source_flow=10000000000.0)¶
- add_edges(capacities, cls=<class 'zensols.calamr.attr.ComponentAlignmentGraphEdge'>)[source]¶
Add
capacitiesas graph capacities to the graph.
- property doc_graph: DocumentGraph¶
A document graph that contains the graph to be aligned.
zensols.calamr.aligner module¶
Contains classes that run the algorithm to compute graph component alignments.
- class zensols.calamr.aligner.DocumentGraphAligner(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir)[source]¶
Bases:
ABCAligns the graph components of
doc_graphand visualizes them withrenderer.-
MAX_RENDER_LEVEL:
ClassVar[int] = 10¶ The maximum value for
render_level.
- __init__(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir)¶
- align(doc_graph)[source]¶
Align the graph components of
doc_graphand optionally visualize them withrenderer. To disable rendering, setrender_levelto 0.- Parameters:
doc_graph (
DocumentGraph) – the graph created by theDocumentGraphFactory- Return type:
- Returns:
the alignments available as the in memory graph and object graph, Pandas dataframes, statistics and scores
-
config_factory:
ConfigFactory¶ Used to create the
GraphSequencerinstance.
- create_error_result(ex, msg='Could not align')[source]¶
Create an error graph result (rather than an alignment result). This should be called in a try/catch to obtain the error information.
- Parameters:
ex (
Exception) – the exception that caused the issue- Param:
msg: the error message for the failure
- Return type:
-
doc_graph_name:
str¶ The
DocumentGraph.namedocument return fromalign().
-
flow_graph_result_name:
str¶ The app configuration section name of
FlowGraphResult.
-
init_loops_render_level:
int¶ The
render_levelto use for all iteration loops except for the last before the algorithm converges.
- classmethod is_valid_render_level(render_level, should_raise=False)[source]¶
- Return type:
Return whether
render_levelis a valid value for :obj:`render_level.
-
output_dir:
Path¶ If this is set, the graphs are written to this created directory on the file system. Otherwise, they are displayed and cleaned up afterward.
-
render_level:
int¶ How many graphs to render on a scale from 0 - 10. The higher the number the more likely a graph is to be rendered. A value of 0 prevents rendering and a setting of 10 will render all graphs.
- See:
-
renderer:
GraphRenderer¶ Visually render the graph in to a human understandable presentation.
-
MAX_RENDER_LEVEL:
- class zensols.calamr.aligner.DocumentGraphController(name)[source]¶
Bases:
DictableExecutes the maxflow/min cut algorithm on a document graph.
- __init__(name)¶
- invoke(doc_graph)[source]¶
Perform operations on the graph algorithm.
- Parameters:
doc_graph (
DocumentGraph) – the graph to edit- Return type:
- Returns:
the number of edits made to the graph
- class zensols.calamr.aligner.GraphIteration(sequence, render_level, updates)[source]¶
Bases:
DictableAn iteration of the alignment algorithm.
- __init__(sequence, render_level, updates)¶
-
render_level:
int¶ Whether to render graphs on a scale from 0 - 10. The higher the number the more likely it is to be rendered with 0 never rendering the graph, and 10 always rendering the graph.
- reset()[source]¶
Reset all state in application context shared objects so new data is forced to be created on the next alignment request.
-
sequence:
GraphSequence¶ The sequence to use for this iteration.
- class zensols.calamr.aligner.GraphSequence(name, process_name, render_name, heading, controller, sequencer)[source]¶
Bases:
DictableA strategy GoF pattern that models what to do during a sequence of graph modifications using
DocumentGraphController. It also contains rendering information for visualization.- __init__(name, process_name, render_name, heading, controller, sequencer)¶
-
controller:
Optional[DocumentGraphController]¶ The controller used in the invocation of this strategy.
- invoke()[source]¶
Invoke the strategy. This implementation calls the controller with the
process_graphto be processed and passes back the update count.- Return type:
- populate_render_context(context)[source]¶
Alows the sequence to override the parameters before being sent to the graph rendinger API.
- property process_graph: DocumentGraph¶
The graph provided to the graph controller.
-
process_name:
str¶ The name of the graph provided to the graph controller. See
process_graph.
- property render_graph¶
The graph to render.
-
render_name:
str¶ The name of the graph to render. See
render_graph.
- reset()[source]¶
Reset all state in application context shared objects so new data is forced to be created on the next alignment request.
-
sequencer:
GraphSequencer¶ Owns and controls this instance.
- class zensols.calamr.aligner.GraphSequencer(config_factory, sequence_path, nascent_graph, render, descriptor=None, heading_format=None)[source]¶
Bases:
objectThis invokes the
GraphSequenceobjects in the provided sequence to automate the graph alignment algorithm and used byMaxflowDocumentGraphAligner.- __init__(config_factory, sequence_path, nascent_graph, render, descriptor=None, heading_format=None)[source]¶
Initialize this instance.
- Parameters:
config_factory (
ConfigFactory) – used to create the controller in the sequence instancessequence_path (
Path) – the path to the JSON file that has the sequences’ configurationnascent_graph (
DocumentGraph) – the initial disconnected graph created byDocumentGraphFactoryrender (
rendergroup) – the render object created bybase.rendergroup
- property render_level: int¶
Whether to render graphs on a scale from 0 - 10. See
DocumentGraphAligner.MAX_RENDER_LEVEL.
- class zensols.calamr.aligner.MaxflowDocumentGraphAligner(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir, graph_sequencer_name, max_sequencer_iterations, hyp)[source]¶
Bases:
DocumentGraphAlignerUses the maxflow/min cut algorithm to compute graph component alignments.
- __init__(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir, graph_sequencer_name, max_sequencer_iterations, hyp)¶
-
graph_sequencer_name:
str¶ The app configuration section name of
GraphSequencer.
-
hyp:
HyperparamModel¶ The capacity calculator hyperparameters.
- See:
summary.CapacityCalculator.hyp
- class zensols.calamr.aligner.RenderUpSideDownGraphSequence(name, process_name, render_name, heading, controller, sequencer)[source]¶
Bases:
GraphSequenceA graph sequence that tells
graphvizto render the diagram upside down, which is useful for reverse flow graphs.- __init__(name, process_name, render_name, heading, controller, sequencer)¶
zensols.calamr.annotate module¶
Contain a class to add embeddings to AMR feature documents.
- class zensols.calamr.annotate.AddEmbeddingsFeatureDocumentStash(delegate, word_piece_doc_factory=None)[source]¶
Bases:
DelegateStash,PrimeableStashAdd embeddings to AMR feature documents. Embedding population is disabled by configuring
word_piece_doc_factoryasNone.- __init__(delegate, word_piece_doc_factory=None)¶
- get(name, default=None)[source]¶
Load an object or a default if key
namedoesn’t exist.Implementation note: sub classes will probably want to override this method given the super method is cavalier about calling
exists:()andload(). Based on the implementation, this can be problematic.- Return type:
- load(name)[source]¶
Load a data value from the pickled data with key
name. Semantically, this method loads the using the stash’s implementation. For exampleDirectoryStashloads the data from a file if it exists, but factory type stashes will always re-generate the data.- See:
- Return type:
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory= None¶ The feature document factory that populates embeddings.
- class zensols.calamr.annotate.CalamrAnnotatedAmrFeatureDocumentFactory(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0, word_piece_doc_factory=None)[source]¶
Bases:
AnnotatedAmrFeatureDocumentFactoryAdds wordpiece embeddings to
AmrFeatureDocumentinstances.- __init__(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0, word_piece_doc_factory=None)¶
- to_annotated_doc(doc)[source]¶
Clone
doc.amrinto anAnnotatedAmrDocument.- Parameters:
sent – the document to convert to an
AnnotatedAmrDocument- Return type:
- Returns:
a feature document with a new
amrto newAnnotatedAmrDocument, which is a new instance ifsentisn’t an annotated AMR document
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory= None¶ The feature document factory that populates embeddings.
- class zensols.calamr.annotate.ProxyReportAnnotatedAmrDocument(sents, path=None, doc_id=None)[source]¶
Bases:
AnnotatedAmrDocumentOverrides the sections property to skip duplicate summary sentences also found in the body.
- __init__(sents, path=None, doc_id=None)¶
Initialize.
- Parameters:
sents (Tuple[AmrSentence, …]) – the document’s sentences
path (Optional[Path, …]) – the path to file containing the Penman notation sentence graphs used in
sentsmodel – the model to initailize
AmrSentencewhensentsis a list of string Penman graphs
- property sections: Tuple[AnnotatedAmrSectionDocument]¶
The sentences that make up the body of the document.
zensols.calamr.app module¶
Alignment entry point application.
- class zensols.calamr.app.AlignmentApplication(resource, config_factory)[source]¶
Bases:
_AlignmentBaseApplicationThis application aligns data in files.
- __init__(resource, config_factory)¶
- align_file(input_file, output_dir=None, output_format=Format.csv, render_level=None)[source]¶
Align annotated documents from a JSON file.
-
config_factory:
ConfigFactory¶ Application configuration factory.
- class zensols.calamr.app.CorpusApplication(resource, config_factory, results_dir)[source]¶
Bases:
_AlignmentBaseApplicationAMR graph aligment.
- __init__(resource, config_factory, results_dir)¶
- align_corpus(keys, output_dir=None, output_format=Format.csv, render_level=None, use_cached=False)[source]¶
Align an annotated AMR document from the corpus.
-
config_factory:
ConfigFactory¶ For prototyping.
- dump_annotated(limit=None, output_dir=None, output_format=Format.csv)[source]¶
Write annotated documents and their keys.
- get_annotated_summary(limit=None)[source]¶
Return a CSV file with a summary of the annotated AMR dataset.
-
results_dir:
Path¶ The directory where the output results are written, then read back for analysis reporting.
- class zensols.calamr.app.Resource(anon_doc_stash, doc_factory, serialized_factory, doc_graph_factory, doc_graph_aligner, flow_results_stash, anon_doc_factory)[source]¶
Bases:
objectA client facade (GoF) for Calamr annotated AMR corpus access and alginment.
- __init__(anon_doc_stash, doc_factory, serialized_factory, doc_graph_factory, doc_graph_aligner, flow_results_stash, anon_doc_factory)¶
- align(doc_graph)[source]¶
Create flow results from a document graph.
- Parameters:
doc_graph (
DocumentGraph) – the source/summary components to align- Return type:
- Returns:
the aligned bipartite graph and its statistics
- align_corpus_document(doc_id, use_cache=True)[source]¶
Create flow results of a corpus AMR document.
- Parameters:
- Return type:
- Returns:
the flow results for the corpus document or
Noneifdoc_idis not a valid key
-
anon_doc_factory:
AnnotatedAmrFeatureDocumentFactory¶ Creates instances of
AmrFeatureDocument.
-
anon_doc_stash:
Stash¶ Contains human annotated AMRs. This could be from the adhoc (micro) corpus (small toy corpus), AMR 3.0 Proxy Report corpus, Little Prince, or the Bio AMR corpus.
- create_graph(doc)[source]¶
Return a new document graph based on feature document.
- Parameters:
doc (
AmrFeatureDocument) – the document on which to base the new graph- Return type:
- Returns:
a new AMR document graph
-
doc_factory:
AmrFeatureDocumentFactory¶ Creates
AmrFeatureDocumentfromAmrDocumentinstances.
-
doc_graph_aligner:
DocumentGraphAligner¶ Create document graphs.
-
doc_graph_factory:
DocumentGraphFactory¶ Create document graphs.
-
flow_results_stash:
Stash¶ Creates cached instances of
FlowGraphResult.
- get_corpus_document(doc_id)[source]¶
Get an AMR feature document by key from the application configured corpus.
- Parameters:
doc_id (
str) – the corpus document ID (i.e.liu-exampleor20041010_0024)- Return type:
- Returns:
the AMR feature document
- parse_documents(data)[source]¶
Parse documents with keys
id,comment,body, andsummaryfrom adict, sequence ofdictinstanaces. or JSON file in the format:[{ "id": "ex1", "comment": "very short", "body": "The man ran to make the train. He just missed it.", "summary": "A man got caught in the door of a train he missed." }]
- Return type:
- Returns:
the parsed AMR feature document
- See:
-
serialized_factory:
AmrSerializedFactory¶ Creates a
SerializedfromAmrDocument,AmrSentenceorAnnotatedAmrDocument.
- to_annotated_doc(doc)[source]¶
Return an annotated feature document, creating the feature document if necessary. The
doc.amrattribute is set to annotated AMR document.- Parameters:
doc (
Union[AmrDocument,AmrFeatureDocument]) – an AMR document or an AMR feature document- Return type:
- Returns:
a new instance of a document if
docis not aAmrFeatureDocumentor ifdoc.amris not anAnnotatedAmrDocument
- to_feature_doc(amr_doc, catch=False, add_metadata=False, add_alignment=False)[source]¶
Create a
AmrFeatureDocumentfrom a class:.AmrDocument by parsing thesntmetadata with aFeatureDocumentParser.- Parameters:
add_metadata (
Union[str,bool]) – add missing annotation metadata toamr_docparsed from spaCy if missing (seeAmrParser.add_metadata()) ifTrueand replace any previous metadata if this value is the stringclobbercatch (
bool) – ifTrue, return caught exceptions creating aAmrFailurefrom each and return them
- Return type:
Union[AmrFeatureDocument,Tuple[AmrFeatureDocument,List[AmrFailure]]]- Returns:
an AMR feature document if
catchisFalse; otherwise, a tuple of a document with sentences that were successfully parsed and a list any exceptions raised during the parsing
zensols.calamr.attr module¶
Graph node and edge domain classes.
Terminology: token alignments refer to the sentence index based token alignments to AMR nodes. This is not to be confused with alignment edges (aka graph component alignment edges).
- class zensols.calamr.attr.AmrDocumentNode(context, name, root, children, doc)[source]¶
Bases:
DocumentNodeA composite note containing a subset of the
DocumentNode.rootsentences. This includes the text, text features, and AMR Penman graph data.- __init__(context, name, root, children, doc)¶
- doc: AmrFeatureDocument¶
A document containing a subset of sentences that fall under this portion of the graph.
- class zensols.calamr.attr.AttributeGraphNode(context, sent, token_aligns, triple)[source]¶
Bases:
TripleGraphNodeAttribute data from AMR attribute nodes grafted on to the
igraph.Graph.- ATTRIB_TYPE: ClassVar[str] = 'attribute'¶
The attribute type this class represents.
- __init__(context, sent, token_aligns, triple)¶
- class zensols.calamr.attr.ComponentAlignmentGraphEdge(context, capacity=0, flow=0)[source]¶
Bases:
GraphEdgeAn edge that spans graph components.
- ATTRIB_TYPE: ClassVar[str] = 'component alignment'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0)¶
- class zensols.calamr.attr.ComponentCorefAlignmentGraphEdge(context, capacity=0, flow=0, relation=None, is_bipartite=False)[source]¶
Bases:
ComponentAlignmentGraphEdgeAn edge that spans graph components.
- ATTRIB_TYPE: ClassVar[str] = 'component coref alignment'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0, relation=None, is_bipartite=False)¶
- is_bipartite: bool = False¶
Whether the coreference spans components.
- relation: Relation = None¶
The AMR coreference relation between this node and all other refs.
- class zensols.calamr.attr.ConceptGraphNode(context, sent, token_aligns, triple)[source]¶
Bases:
TripleGraphNodeAttribute data from AMR concept nodes grafted on to the
igraph.Graph.- ATTRIB_TYPE: ClassVar[str] = 'concept'¶
The attribute type this class represents.
- __init__(context, sent, token_aligns, triple)¶
- property instance: str¶
The concept instance, such as the propbank entry (i.e. see-01). Other examples include nouns.
- property roleset: Roleset¶
- property roleset_embedding: Tensor¶
- property roleset_id: RolesetId¶
- class zensols.calamr.attr.DocumentGraphEdge(context, capacity=0, flow=0, relation='')[source]¶
Bases:
GraphEdgeAn edge that has data about the non-AMR parts of the graph, such as sentence.
- ATTRIB_TYPE: ClassVar[str] = 'doc'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0, relation='')¶
- relation: str = ''¶
The edge relation between two document nodes or document to igraph node.
- class zensols.calamr.attr.DocumentGraphNode(context, level_name, doc_node)[source]¶
Bases:
GraphNodeA node that has data about the non-AMR parts of the graph, such as the unifying top level node that ties the sentences together. However, it can contain the root to an AMR sentence (see
AmrDocumentNode).- ATTRIB_TYPE: ClassVar[str] = 'doc'¶
The attribute type this class represents.
- __init__(context, level_name, doc_node)¶
- level_name: str¶
The descriptive name of the node such as
docorsection.
- class zensols.calamr.attr.DocumentNode(context, name, root, children)[source]¶
Bases:
GraphNodeA composite of a node in the document tree that are associated with the
FeatureDocumentas root node. This class represents nodes in a graph that either:make up the part of the graph that’s disjoint from the AMR sentinel subgraphs (i.e. a root
docnode), orthe root to an AMR sentence (see
AmrDocumentNode)
The in-memory object graph of these instances are dependent on the type of data it represents. For example, the Proxy Report corpus has a top level a summary and body nodes with AMR sentences below (root on top).
- __init__(context, name, root, children)¶
- children: Tuple[DocumentNode, ...]¶
The children of this node with respect to the composite pattern.
- property children_by_name: DocumentNode¶
The children’s names as keys and respective document nodes as capacitys.
- name: str¶
The descriptive name of the node such as
docorsection.
- root: AmrFeatureDocument¶
The owning feature document containing all sentences/tokens of the graph.
- property sents: Tuple[AmrFeatureSentence]¶
The sentences of the this document level.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.attr.GraphEdge(context, capacity=0, flow=0)[source]¶
Bases:
GraphAttributeGraph attriubte data added to the
igraph.Graphedges.- MAX_CAPACITY: ClassVar[float] = 10000000000.0¶
Maximum value a capacity.
Implementation note: It seems
igraphcan only handle large values to represent infinity, and not floatinfor the system defined largest float value.
- __init__(context, capacity=0, flow=0)¶
- class zensols.calamr.attr.GraphNode(context)[source]¶
Bases:
GraphAttributeGraph attribute data added to the
igraph.Graphvertexes.- __init__(context)¶
- class zensols.calamr.attr.RoleGraphEdge(context, sent, token_aligns, capacity=0, flow=0, triple=None, role=None)[source]¶
Bases:
GraphEdge,SentenceGraphAttributeAttribute data from the AMR role edges grafted on to the
igraph.Graph.- ATTRIB_TYPE: ClassVar[str] = 'role'¶
The attribute type this class represents.
- __init__(context, sent, token_aligns, capacity=0, flow=0, triple=None, role=None)¶
- role: Union[str, Role] = None¶
The role name of the edge such as
:ARG0.
- triple: Union[Instance, Attribute] = None¶
The AMR Penman graph triple.
- class zensols.calamr.attr.SentenceGraphAttribute(context, sent, token_aligns)[source]¶
Bases:
GraphAttributeA node containing zero or more tokens with its parent sentence. Usually the AMR node represents a single token, but can have more than one token alignment.
- __init__(context, sent, token_aligns)¶
- sent: AmrFeatureSentence¶
The sentence from which this node was created.
- property token_align_str: str¶
A string representation of the AMR Penman representation of the token alignment.
- token_aligns: Tuple[Union[Alignment, RoleAlignment], ...]¶
The node to sentinel token index.
- property tokens: Tuple[FeatureToken, ...]¶
The tokens
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.attr.SentenceGraphEdge(context, capacity=0, flow=0, relation='', sent=None, sent_ix=None)[source]¶
Bases:
DocumentGraphEdgeAn edge from a document node to a
SentenceGraphNode.- ATTRIB_TYPE: ClassVar[str] = 'sentence'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0, relation='', sent=None, sent_ix=None)¶
- sent: AmrFeatureSentence = None¶
The sentence from which this node was created.
- sent_ix: int = None¶
The sentence index.
- class zensols.calamr.attr.SentenceGraphNode(context, sent, sent_ix)[source]¶
Bases:
GraphNodeA graph node containing the root of a sentence.
- ATTRIB_TYPE: ClassVar[str] = 'sentence'¶
The attribute type this class represents.
- SENT_TEXT_LEN: ClassVar[int] = 20¶
The truncated sentence length
- __init__(context, sent, sent_ix)¶
- sent: AmrFeatureSentence¶
The sentence from which this node was created.
- sent_ix: int¶
The sentence index.
- class zensols.calamr.attr.TerminalGraphEdge(context, capacity=0, flow=0)[source]¶
Bases:
GraphEdgeAn edge that connects to terminal a
TerminalGraphNode.- ATTRIB_TYPE: ClassVar[str] = 'terminal'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0)¶
- class zensols.calamr.attr.TerminalGraphNode(context, is_source)[source]¶
Bases:
GraphNodeA flow control: source or sink.
- ATTRIB_TYPE: ClassVar[str] = 'control'¶
The attribute type this class represents.
- __init__(context, is_source)¶
- is_source: bool¶
Whether or not this source (
s) or sink (t).
- class zensols.calamr.attr.TripleGraphNode(context, sent, token_aligns, triple)[source]¶
Bases:
SentenceGraphAttribute,GraphNodeContains a Penman triple with token alignments used for concepts and AMR attributes. Instances of this class get their embedding via
SentenceGraphAttribute._get_embedding().- __init__(context, sent, token_aligns, triple)¶
- triple: Union[Instance, Attribute]¶
The AMR Penman graph triple.
zensols.calamr.cli module¶
Command line entry point to the application.
- class zensols.calamr.cli.ApplicationFactory(*args, **kwargs)[source]¶
Bases:
ApplicationFactory
zensols.calamr.comp module¶
Base graph component class.
- class zensols.calamr.comp.GraphComponent(graph)[source]¶
Bases:
PersistableContainer,WritableA container class for an
igraph.Graph, which also has caching data structures for fast access to graph attributes.-
GRAPH_ATTRIB_NAME:
ClassVar[str] = 'ga'¶ The name of the graph attributes on igraph nodes and edges.
- __init__(graph)¶
- property adjacency_list: List[List[int]]¶
“An adjacency list of vertexes based on their relation to each other in the graph. The outer list’s index is the source vertex and the inner list is that vertex’s neighbors.
Implementation note: the list is sub-setted at both the inner and outer level for those vertexes in this component.
- clone(reverse_edges=False, deep=True, **kwargs)[source]¶
Clone an instance and return it. The
graphis deep copied, but allGraphAttributeinstances are not.- Parameters:
reverse_edges (
bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithmdeep (
bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed ofvs; otherwise the graph is copied as a subcomponent starting fromrootkwargs – arguments to add to as attributes to the clone; include
clsis the type of the new instance
- Return type:
- Returns:
the cloned instance of this instance
- copy_graph(reverse_edges=False, subgraph_type=None)[source]¶
Return a copy of the class:igraph.Graph.
- Parameters:
- Return type:
- edge_by_graph_edge_id(ge_id)[source]¶
Return a edge based on the graph (attribute) edge id.
- Return type:
- edge_ref_by_id(ix)[source]¶
Get the
igraph.Edgeinstance by its index.- Return type:
- property edges_reversed: bool¶
Whether the edge direction in the graph is reversed. This is
Truefor reverse flow graphs.- See:
summary.ReverseFlowGraphAlignmentConstructor
- get_attributes()[source]¶
Return all graph attributes of the component, which include instances of both
GraphNodeandGraphEdge.- Return type:
- property graph: Graph¶
The graph used for computational manipulation of the synthesized AMR sentences.
- invalidate()[source]¶
Clear cached data structures to force them to be recreated after igraph level data has changed.
- node_by_graph_node_id(gn_id)[source]¶
Return a node based on the graph (attribute) node id.
- Return type:
- property root: Vertex | None¶
The singular (first found) root of the graph, which is usually the top level
DocumentNodeinstance.
- property roots: Iterable[Vertex]¶
The roots of the graph, which are usually top level
DocumentNodeinstances.
- vertex_ref_by_id(ix)[source]¶
Get the
igraph.Vertexinstance by its index.- Return type:
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writerusing indentiondepth.- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
-
GRAPH_ATTRIB_NAME:
zensols.calamr.ctrl module¶
Document graph controller implementations.
- class zensols.calamr.ctrl.AlignmentCapacitySetDocumentGraphController(name, min_capacity, capacity)[source]¶
Bases:
DocumentGraphControllerSet the capacity on edges if the criteria matches
min_flow,component_namesandmatch_edge_classes.- __init__(name, min_capacity, capacity)¶
- capacity: float¶
The capacity to set.
- class zensols.calamr.ctrl.ConstructDocumentGraphController(name, build_graph_name, constructor, renderer)[source]¶
Bases:
DocumentGraphControllerConstructs the graph that will later be used for the min cut/max flow algorithm (see
MaxflowDocumentGraphController). After itsinvoke()method is called,build_graphis available, which is the constructed graph provided byconstructor.- __init__(name, build_graph_name, constructor, renderer)¶
- build_graph_name: str¶
The name given to newly instances of
DocumentGraph.
- constructor: GraphAlignmentConstructor¶
The constructor used to get the source and sink nodes.
- renderer: GraphRenderer¶
Visually render the graph in to a human understandable presentation.
- class zensols.calamr.ctrl.FixReentrancyDocumentGraphController(name, component_name, maxflow_controller, only_report)[source]¶
Bases:
DocumentGraphControllerFix reentrancies by splitting the flow of the last calculated maxflow as the capacity of the outgoing edges in the reversed graph. This fixes the issue edges getting flow starved, then later eliminated in the graph reduction steps.
Subsequently, the maxflow algorithm is rerun if we have at least one reentrancy after reallocating the capacit(ies).
- __init__(name, component_name, maxflow_controller, only_report)¶
- component_name: str¶
The name of the components to restore.
- maxflow_controller: MaxflowDocumentGraphController¶
The maxflow component used to recalculate the maxflow .
- only_report: bool¶
Whether to only report reentrancies rather than fix them.
- class zensols.calamr.ctrl.FlowDiscountDocumentGraphController(name, discount_sum, component_names=<factory>)[source]¶
Bases:
DocumentGraphControllerDecrease/constrict the capacities by making the sum of the incoming flows from the bipartitie edges the value of
discount_sum. The capacities are only updated if the sum of the incoming bipartitie edges have a flow greater thandiscount_sum.- __init__(name, discount_sum, component_names=<factory>)¶
- component_names: Set[str]¶
The name of the components to discount.
- discount_sum: float¶
The capacity sum will be this value (see class docs).
- class zensols.calamr.ctrl.FlowSetDocumentGraphController(name, component_names=<factory>, match_edge_classes=<factory>, flow=0)[source]¶
Bases:
DocumentGraphControllerSet a static flow on components based on name and edges based on class.
- __init__(name, component_names=<factory>, match_edge_classes=<factory>, flow=0)¶
- component_names: Set[str]¶
The components on which to set the flow.
- flow: float = 0¶
The flow to set.
- match_edge_classes: Set[Type[GraphEdge]]¶
The edge classes (i.e.
TerminalGraphEdge) to set the flow.
- class zensols.calamr.ctrl.MaxflowDocumentGraphController(name, constructor)[source]¶
Bases:
DocumentGraphControllerExecutes the maxflow/min cut algorithm on a document graph.
- __init__(name, constructor)¶
- constructor: GraphAlignmentConstructor¶
The constructor used to get the source and sink nodes.
- class zensols.calamr.ctrl.NormFlowDocumentGraphController(name, component_names, constructor, normalize_mode='fpn')[source]¶
Bases:
DocumentGraphControllerNormalizes flow on edges as the flow going through the edge and the total number of descendants. Descendants are counted as the edge’s source node and all children/descendants of that node.
This is done recursively to calculate flow per node. For each call recursive iteration, it computes the flow per node of the parent edge(s) from the perspective of the nascent graph, (root at top with arrows pointed to children underneath). However, the graph this operates on are the reverese flow max flow graphs (flow diretion is taken care of adjacency list computed in
GraphComponent.Since an AMR node can have multiple parents, we keep track of descendants as a set rather than a count to avoid duplicate counts when nodes have more than one parent. Otherwise, in multiple parent case, duplicates would be counted when the path later converges closer to the root.
- __init__(name, component_names, constructor, normalize_mode='fpn')¶
- component_names: Set[str]¶
The name of the components to minimize.
- constructor: GraphAlignmentConstructor¶
The instance used to construct the graph passed in the
invoke()method.
- normalize_mode: str = 'fpn'¶
How to normalize nodes (if at all), which is one of:
fpn: leaves flow values as they were after the initial flow per nodecalculation
norm: normalize so all values add to onevis: same asnormbut add avis_flowattribute to the edgesso the original flow is displayed and visualized as the flow color
- class zensols.calamr.ctrl.RemoveAlignsDocumentGraphController(name, min_capacity)[source]¶
Bases:
DocumentGraphControllerRemoves graph component alignment for low capacity links.
- __init__(name, min_capacity)¶
- min_capacity: float¶
The graph component alignment edges are removed if their capacities are at or below this value.
- class zensols.calamr.ctrl.RoleCapacitySetDocumentGraphController(name, min_flow, capacity, component_names)[source]¶
Bases:
DocumentGraphControllerThis finds low flow role edges and sets (zeros out) all the capacities of all the connected edge alignments recursively for all descendants. We “slough off” entire subtrees (sometimes entire sentences or document nodes) for low flow ancestors.
- __init__(name, min_flow, capacity, component_names)¶
- capacity: float¶
The capacity (and flow) to set.
- component_names: Set[str]¶
The name of the components to minimize.
- class zensols.calamr.ctrl.SnapshotDocumentGraphController(name, component_names, snapshot_source)[source]¶
Bases:
DocumentGraphControllerRecord flows, then later restore. If
snapshot_sourceis notNone, then this instance restores from it. Otherwise it records.- __init__(name, component_names, snapshot_source)¶
- component_names: Set[str]¶
The name of the components on which to record or restore flows.
- snapshot_source: SnapshotDocumentGraphController¶
The source instance that contains the data from which to restore.
zensols.calamr.dcomp module¶
A document centric graph component.
- class zensols.calamr.dcomp.DocumentGraphComponent(graph, root_node, sent_index=<factory>, description=None)[source]¶
Bases:
GraphComponentA class containing the root information of the document tree and the
igraph.Graphvertex. When theigraph.Graphis set with thegraphproperty, a strongly connected subgraph component is induced. It does this by traversing all reachable verticies and edges from theroot. Examples of these induced components include source and summary components of a document AMR graph.Instances are created by
DocumentGraphFactory.- __init__(graph, root_node, sent_index=<factory>, description=None)¶
- clone(reverse_edges=False, deep=True, **kwargs)[source]¶
Clone an instance and return it. The
graphis deep copied, but allGraphAttributeinstances are not.- Parameters:
reverse_edges (
bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithmdeep (
bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed ofvs; otherwise the graph is copied as a subcomponent starting fromrootkwargs – arguments to add to as attributes to the clone; include
clsis the type of the new instance
- Return type:
- Returns:
the cloned instance of this instance
- property doc_vertices: Iterable[Vertex]¶
Get the vertices of
DocuemntGraphNode. This only fetches those document nodes that do not branch.
- get_attributes()[source]¶
Return all graph attributes of the component, which include instances of both
GraphNodeandGraphEdge.- Return type:
- property relation_set: RelationSet¶
The relations in the contained root node document.
- property root: Vertex | None¶
The roots of the graph, which are usually top level
DocumentNodeinstances.
-
root_node:
AmrDocumentNode¶ The root of the document tree.
-
sent_index:
SentenceIndex¶ An index of the sentences of a
DocumentGraphComponent.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writerusing indentiondepth.- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.dcomp.SentenceEntry(node=None, concepts=None, attributes=None)[source]¶
Bases:
DictableContains the sentence node of a sentence, and the respective concept and attribute nodes.
- __init__(node=None, concepts=None, attributes=None)¶
-
attributes:
Tuple[AttributeGraphNode] = None¶ The AMR attribute nodes of the sentence.
- property concept_by_variable: Dict[str, ConceptGraphNode]¶
-
concepts:
Tuple[ConceptGraphNode] = None¶ The AMR concept nodes of the sentence.
-
node:
SentenceGraphNode= None¶ The sentence node, which is the root of the sentence subgraph.
- class zensols.calamr.dcomp.SentenceIndex(entries=None)[source]¶
Bases:
DictableAn index of the sentences of a
DocumentGraphComponent.- __init__(entries=None)¶
- property by_sentence: Dict[AmrFeatureSentence, SentenceEntry]¶
-
entries:
Tuple[SentenceEntry] = None¶ Then entries of the index, each of which is a sentence.
zensols.calamr.doc module¶
Document based graph container, factory and strategy classes.
- class zensols.calamr.doc.DocumentGraph(graph, name, graph_attrib_context, doc, components, children=<factory>)[source]¶
Bases:
GraphComponentA graph containing the text, text features, AMR Penman graph and igraph.
This class roughly follows a GoF composite pattern with
childrena collection of instance of this class, which are the reversed source and summary graphs created for the max flow algorithm. The root is constructed from theDocumentGraphFactoryclass and the children are built by theDocumentGraphControllerinstances.The children of this composite are not to be confused with
components, which are the disconnected source and summary graph components in the root graph instance. Each child also has the reversed flow graphs, but are connected as a bipartite flow graph for use by the max flow algorithm.- __init__(graph, name, graph_attrib_context, doc, components, children=<factory>)¶
- property bipartite_relation_set: RelationSet¶
The bipartite relations that span components. This set includes all top level relations that are not self contained in any components.
-
children:
Dict[str,DocumentGraph]¶ The children of this instance, which for now, are only instances of
FlowDocumentGraph.
- clone(reverse_edges=False, deep=True, **kwargs)[source]¶
Clone an instance and return it. The
graphis deep copied, but allGraphAttributeinstances are not.- Parameters:
reverse_edges (
bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithmdeep (
bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed ofvs; otherwise the graph is copied as a subcomponent starting fromrootkwargs – arguments to add to as attributes to the clone; include
clsis the type of the new instance
- Return type:
- Returns:
the cloned instance of this instance
- component_iter()[source]¶
Return an iterable of the components of this graph and recursively over the children.
- Return type:
-
components:
Tuple[DocumentGraphComponent,...]¶ The roots of the trees created by the
DocumentGraphFactory.
- property components_by_name: Dict[str, DocumentGraphComponent]¶
Get document graph components by name.
- property components_by_name_sorted: Tuple[Tuple[str, DocumentGraphComponent], ...]¶
Get document graph components sorted name.
-
doc:
AmrFeatureDocument¶ The document that represents the graph.
- property graph_attrib_context: GraphAttributeContext¶
The context given to all nodees and edges of the graph.
-
name:
str¶ The name of the graph used to identify it. For now, this is only
reversed_sourcefor the graph that flows from the summary to the source, and ``reversed_summary for the graph that flows from the source to the summary. These are “reversed” because the flow is reversed from the leaf nodes to the root.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writerusing indentiondepth.- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.doc.DocumentGraphDecorator[source]¶
Bases:
ABCA strategy to create a graph from a document structure.
- __init__()¶
- abstract decorate(component)[source]¶
Creates the graph from a
DocumentNoderoot node.- Parameters:
component (
DocumentGraphComponent) – the graph to populate from the decorateing process
- class zensols.calamr.doc.DocumentGraphFactory(config_factory, graph_decorators, doc_graph_section_name, graph_attrib_context)[source]¶
Bases:
ABCCreates a document graph. After the document portion of the graph is created, the igraph is built and merged using a
DocumentGraphDecorator. This igraph has the corresponding vertexes and edges associated with the document graph, which includes AMR Penman graph and feature document artifacts.- __init__(config_factory, graph_decorators, doc_graph_section_name, graph_attrib_context)¶
-
config_factory:
ConfigFactory¶ Used to create a
DocumentGraphDecorator.
- create(root)[source]¶
Create a document graph and return it starting from the root note. See class docs.
- Parameters:
root (
AmrFeatureDocument) – the feature document from which to create the graph- Return type:
-
doc_graph_section_name:
str¶ The name of a section in the configuration that defines new instances of
DocumentGraph.
-
graph_attrib_context:
GraphAttributeContext¶ The context given to all nodees and edges of the graph.
-
graph_decorators:
Tuple[DocumentGraphDecorator,...]¶ The name of the section that defines a
DocumentGraphDecoratorinstance.
zensols.calamr.domain module¶
Classes that organize document in content in to a hierarchy.
Terminology: token alignments refer to the sentence index based token alignments to AMR nodes. This is not to be confused with alignment edges (aka graph component alignment edges).
- exception zensols.calamr.domain.ComponentAlignmentError(msg, sent=None)[source]¶
Bases:
AmrErrorPackage level errors.
- __module__ = 'zensols.calamr.domain'¶
- class zensols.calamr.domain.ComponentAlignmentFailure(exception=None, thrower=None, traceback=None, message=None)[source]¶
Bases:
FailurePackage level failures.
- __init__(exception=None, thrower=None, traceback=None, message=None)¶
- class zensols.calamr.domain.EmbeddingResource(torch_config, word_piece_doc_parser=None, word_piece_doc_factory=None, roleset_stash=None)[source]¶
Bases:
objectGenerates embeddings for roles, role sets, text, and feature tokens.
- __init__(torch_config, word_piece_doc_parser=None, word_piece_doc_factory=None, roleset_stash=None)¶
- get_role_embedding(role)[source]¶
Return an embedding for a role. This uses the role’s relation’s embedding if available. Otherwise, it uses the embedding created fromi the role’s prefix.
- Return type:
Tensor
- get_sentence_tokens_embedding(sent)[source]¶
Return the sentence embeddings of
sent.- Return type:
Tensor
- get_token_embedding(text)[source]¶
Return the mean of the token embeddings of
text.- Return type:
Tensor
- get_tokens_embedding(tokens)[source]¶
Return the mean of the embeddings of
tokens.- Return type:
Tensor
-
torch_config:
TorchConfig¶ Used to create
unknown_edge_embedding
- property unknown_edge_embedding: Tensor¶
A zero embedding.
- property unknown_node_embedding: Tensor¶
A zero embedding.
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory= None¶ Creates word piece data structures that have embeddings.
-
word_piece_doc_parser:
FeatureDocumentParser= None¶ Used to get single token embeddings for nodes with no token alignments.
- class zensols.calamr.domain.GraphAttribute(context)[source]¶
Bases:
PersistableContainer,DictableContains AMR document attribute data added to the
igraph.Graph. This is added as vertexes or edge attribute data.- ATTRIB_TYPE: ClassVar[str] = 'base'¶
The attribute type this class represents.
- __init__(context)¶
- context: GraphAttributeContext¶
Contains context data used by nodes and edges of the graph.
- property description: str¶
A human readable description that is usually used as the label and
__str__().
- property embedding: Tensor¶
The default embedding of the attribute. Note that some attributes have several different embeddings.
- property embedding_resource: EmbeddingResource¶
Generates embeddings for roles, role sets, text, and feature tokens.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.domain.GraphAttributeContext(embedding_resource, relation_stash, default_capacity, sink_capacity, component_alignment_capacity, doc_capacity, similarity_threshold, default_format_strlen)[source]¶
Bases:
DictableContains context data used by nodes and edges of the graph.
- __init__(embedding_resource, relation_stash, default_capacity, sink_capacity, component_alignment_capacity, doc_capacity, similarity_threshold, default_format_strlen)¶
-
component_alignment_capacity:
float¶ The default initial capacity for source/summary component alignment edges.
-
doc_capacity:
float¶ The bipartitie (between source and summary) capacity value of
DocumentGraphNode.
-
embedding_resource:
EmbeddingResource¶ The manager that contains vectorizers that create node and edge embeddings.
-
relation_stash:
Stash¶ `~zensols.propbankdb.domain..Relation.
- Type:
Creates instances of role
- Type:
class
zensols.calamr.flow module¶
Provides container classes and computes statistics for graph alignments.
- class zensols.calamr.flow.Flow(source, target, edge)[source]¶
Bases:
DictableA triple of a source node, target node and connecting edge from the graph. The connecting edge has a flow value associated with it.
- __init__(source, target, edge)¶
- class zensols.calamr.flow.FlowDocumentGraph(graph, name, graph_attrib_context, doc, components, children=<factory>)[source]¶
Bases:
DocumentGraphContains all the flows of a
DocumentGraphand hasFlowDocumentGraphComponentas components. Instances of this document graph have no children.- __init__(graph, name, graph_attrib_context, doc, components, children=<factory>)¶
- class zensols.calamr.flow.FlowDocumentGraphComponent(graph, root_node, sent_index=<factory>, description=None, reentrancy_set=None)[source]¶
Bases:
DocumentGraphComponentContains all the flows of a
DocumentComponent.- __init__(graph, root_node, sent_index=<factory>, description=None, reentrancy_set=None)¶
- reentrancy_set: ReentrancySet = None¶
Concept nodes with multiple parents.
- property result: FlowGraphComponentResult¶
The flow results for this component.
- property root_flow: Flow¶
The root flow of the document component, which has the component’s
DocumentGraphNodeas the source node and the sink as the target node.
- class zensols.calamr.flow.FlowGraphComponentResult(component)[source]¶
Bases:
DictableA container class for the flow data from a
DocumentComponentflow instance (aka reverse flow graph). This includes the data as dictionaries of statistics,pandas.DataFrameandDataDescriberinstances.- property connected_stats: Dict[str, int | float]¶
The statistics on how well the two graphs are aligned by counting as:
alignable: the number of nodes that are eligible for having analignment (i.e. sentence, concept, and attribute notes)
aligned: the number aligned nodes in theFlowDocumentGraphComponentthis instance holds
aligned_portion: the quotient of $aligned / alignable$, which isa number between $[0, 1]$ representing a score of how well the two graphs match
- create_data_frame_describer()[source]¶
Like
create_align_df()but includes a human readable description of the data.- Return type:
DataFrameDescriber
- property df: pd.DataFrame¶
The data in
flowsandrootas a dataframe. Note the terms source and target refer to the nodes at the ends of the directed edge in a reversed graph.s_descr: source node descriptions such as concept names,attribute constants and sentence text
t_descr: target node ofs_descrs_toks: any source node aligned tokenst_toks: any target node aligned tokenss_attr: source node attribute name give byGraphAttribute.attrib_type, such asdoc,sentence,concept,attribute
t_attr: target node of ``s_attrs_id: source nodeigraphIDt_id: target nodeigraphIDedge_type: whether the edge is an AMRroleoralignmentrel_id: the coreference relation ID ornullif the edge isnot a corefernce
is_bipartite: whether relationrel_idspans components ornullif the edge is not a coreference
flow: the (normalized/flow per node) flow of the edgereentrancy: whether the edge participates an AMR reentrancyalign_flow: the flow sum of the alignment edges for therespective edge
align_count: the count of incoming alignment edges to the targetnode in the
FlowDocumentGraphComponentthis instance holds
- property n_alignable_nodes: int¶
The number of nodes in the component that can take alignment edges. Whether those nodes in the count have edges does not effect the result.
- property stats: Dict[str, Any]¶
All statistics/scores available for this instances, which include:
root_flow: the flow from the root node to the sinkconnected:connected_stats
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.flow.FlowGraphResult(component_paths, context, data)[source]¶
Bases:
PersistableContainer,DictableA container class for flow document results, which include the detailed data as dictionaries of statistics,
pandas.DataFrameandDataDescriberinstances. This is aggregated fromdoc_graphand the flow children’s flow graph components.All graphs (from nascent to the reversed flow children graphs) have the final state of the actions of the
DocumentGraphControlleras coordinated by theGraphSequencer. Since the flows are copied from the reversed source graph to the root level (doc_graph) factory built nascent graph, all flows are the same. However, the nascent graph will still be the disconnect source and summary graphs.- __init__(component_paths, context, data)[source]¶
Initialize the flow results.
- Parameters:
data (
Union[DocumentGraph,ComponentAlignmentFailure]) – the root nascentDocumentGraphFactorybuild graph or an instance ofComponentAlignmentFailureif the alignment failedcomponent_paths (
Tuple[Tuple[str,str],...]) – a set of paths that indicate which flow components to use for the results in the form(<child name>, <component name>)
- create_data_describer()[source]¶
Like
create_align_df()but includes a human readable description of the data.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- Return type:
DataDescriber
- property df: pd.DataFrame¶
A concatenation of frames created with
FlowDocumentGraphComponent.create_align_df()with the name of each component.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- property doc_graph: DocumentGraph¶
The root nascent
DocumentGraphFactorybuild graph.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- property failure: ComponentAlignmentFailure | None¶
What caused the alignment to fail, or
Noneif it was a success.
- get_render_contexts(child_names=None, include_nascent=False)[source]¶
Get contexts used to render the graphs with
render.base.rendergroup.- Parameters:
child_names (
Iterable[str]) – the name of theDocumentGraph.childrento render, which defaults the the nascent grah and the final bipartite graph rendered (“restore previous flow on source”)include_nascent (
bool) – whether to include the nascent graphs
- Return type:
- render(contexts=None, graph_id='graph', display=True, directory=None)[source]¶
Render several graphs at a time, then optionally display them.
- Parameters:
contexts (
Tuple[RenderContext]) – the data to render, which defaults to the output ofget_render_contexts()graph_id (
str) – a unique identifier prefixed to files generated if none provided in the call methoddisplay (
bool) – whether to display the files after generateddirectory (
Path) – the directory to create the files in place of the temporary directory; if provided the directory is not removed after the graphs are rendered
- property stats: Dict[str, Any]¶
The statistics with keys as component names and values taken from
FlowDocumentGraphComponent.stats.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=True)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
zensols.calamr.flowmeta module¶
Metadata for the flow module.
zensols.calamr.morph module¶
Populate an igraph from AMR graphs.
- class zensols.calamr.morph.IsomorphDocumentGraphDecorator(config_factory, graph_attrib_context_name)[source]¶
Bases:
DocumentGraphDecoratorPopulates a
igraph.Graphattributes from aDocumentGraphdata by adding AMR node and edge information.- __init__(config_factory, graph_attrib_context_name)¶
-
config_factory:
ConfigFactory¶ The configuration factory used to create a
GraphAttributeContext.
- decorate(comp)[source]¶
Creates the graph from a
DocumentNoderoot node.- Parameters:
component – the graph to populate from the decorateing process
-
graph_attrib_context_name:
str¶ The section name of the
GraphAttributeContextcontext given to all nodees and edges of the graph.
zensols.calamr.proto module¶
Prototyping and cookbook.
zensols.calamr.reentrancy module¶
Reentrancy container classes.
- class zensols.calamr.reentrancy.EdgeFlow(edge, flow=None)[source]¶
Bases:
PersistableContainer,DictableThe flow over a graph edge. This keeps the flow of the edge as a “snapshot” of the value at a particular point in the algorithm, before it is modified to fix the issue.
- __init__(edge, flow=None)¶
- class zensols.calamr.reentrancy.Reentrancy(concept_node, concept_node_vertex, edge_flows)[source]¶
Bases:
PersistableContainer,DictableReentrancies are concept nodes with multiple parents (in the forward graph) and have side effects when running the algorithm.
Note: an AMR (always acyclic) graph with no reentrancies are trees.
- __init__(concept_node, concept_node_vertex, edge_flows)¶
-
concept_node:
ConceptGraphNode¶ The concept node of the reentrancy
-
edge_flows:
Tuple[EdgeFlow]¶ The outgoing edges connected to the reentrant
concept_node.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.reentrancy.ReentrancySet(reentrancies=())[source]¶
Bases:
PersistableContainer,DictableA set of reentrancies, one for each iteration of the algorithm.
- __init__(reentrancies=())¶
- property by_vertex: Dict[int, Reentrancy]¶
-
reentrancies:
Tuple[Reentrancy] = ()¶ Concept nodes with multiple parents.
zensols.calamr.score module¶
Produces CALAMR scores.
- class zensols.calamr.score.CalamrScore(flow_graph_res)[source]¶
Bases:
ScoreContains all CALAMR scores.
- NAN_INSTANCE = CalamrScore()¶
- __init__(flow_graph_res)¶
-
flow_graph_res:
FlowGraphResult¶
- class zensols.calamr.score.CalamrScoreMethod(reverse_sents=False, word_piece_doc_factory=None, doc_graph_factory=None, doc_graph_aligner=None)[source]¶
Bases:
ScoreMethodComputes the smatch scores of AMR sentences. Sentence pairs are ordered
(<summary>, <source>).- __init__(reverse_sents=False, word_piece_doc_factory=None, doc_graph_factory=None, doc_graph_aligner=None)¶
-
doc_graph_aligner:
DocumentGraphAligner= None¶ Create document graphs.
-
doc_graph_factory:
DocumentGraphFactory= None¶ Create document graphs.
- score_annotated_doc(doc)[source]¶
Score a document that has an
amrof typeAnnotatedAmrDocument.- Raises:
[zensols.amr.domain.AmrError]: if the AMR could not be parsed or aligned
- Return type:
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory= None¶ The feature document factory that populates embeddings.
zensols.calamr.stash module¶
Alignment dataframe stash.
- class zensols.calamr.stash.FlowGraphRestoreStash(delegate, flow_graph_result_context)[source]¶
Bases:
DelegateStash,PrimeableStashThe a stash that restores transient data on
FlowGraphResultinstances.- __init__(delegate, flow_graph_result_context)¶
- exists(name)[source]¶
Return
Trueif data with keynameexists.Implementation note: This
Stash.exists()method is very inefficient and should be overriden.- Return type:
-
flow_graph_result_context:
_FlowGraphResultContext¶ Contains in memory/interperter session data needed by
FlowGraphResultwhen it is created or unpickled.
- get(name, default=None)[source]¶
Load an object or a default if key
namedoesn’t exist.Implementation note: sub classes will probably want to override this method given the super method is cavalier about calling
exists:()andload(). Based on the implementation, this can be problematic.- Return type:
- class zensols.calamr.stash.FlowGraphResultFactoryStash(anon_doc_stash, doc_graph_aligner, doc_graph_factory, limit=9223372036854775807)[source]¶
Bases:
ReadOnlyStash,PrimeableStashA factory stash that creates aligned
FlowGraphResultinstances orComponentAlignmentFailurewhen the document cannot be aligned.- __init__(anon_doc_stash, doc_graph_aligner, doc_graph_factory, limit=9223372036854775807)¶
-
doc_graph_aligner:
DocumentGraphAligner¶ Create document graphs.
-
doc_graph_factory:
DocumentGraphFactory¶ Create document graphs.
- exists(name)[source]¶
Return
Trueif data with keynameexists.Implementation note: This
Stash.exists()method is very inefficient and should be overriden.- Return type: