zensols.calamr package¶
Subpackages¶
- zensols.calamr.render namespace
- zensols.calamr.summary namespace
- Submodules
- zensols.calamr.summary.alignconst module
ReverseFlowGraphAlignmentConstructor
SharedGraphAlignmentConstructor
SummaryGraphAlignmentConstructor
SummaryGraphAlignmentConstructor.__init__()
SummaryGraphAlignmentConstructor.build()
SummaryGraphAlignmentConstructor.capacity_calculator
SummaryGraphAlignmentConstructor.component_alignment_capacities
SummaryGraphAlignmentConstructor.connections
SummaryGraphAlignmentConstructor.find_flow_diffs()
SummaryGraphAlignmentConstructor.source
SummaryGraphAlignmentConstructor.summary
- zensols.calamr.summary.capacity module
- zensols.calamr.summary.coref module
- zensols.calamr.summary.factory module
Submodules¶
zensols.calamr.alignconst module¶
Defines a class that aligns components of a (bipartite) graph.
- class zensols.calamr.alignconst.GraphAlignmentConstructor(doc_graph=None, source_flow=10000000000.0)[source]¶
Bases:
object
Adds additional nodes and edges that enable the maxflow algorithm to be used on the graph. This include component alignment edges, source node and sink node. Capacities for the component alignment edges are also set.
- __init__(doc_graph=None, source_flow=10000000000.0)¶
- add_edges(capacities, cls=<class 'zensols.calamr.attr.ComponentAlignmentGraphEdge'>)[source]¶
Add
capacities
as graph capacities to the graph.
- property doc_graph: DocumentGraph¶
A document graph that contains the graph to be aligned.
zensols.calamr.aligner module¶
Contains classes that run the algorithm to compute graph component alignments.
- class zensols.calamr.aligner.DocumentGraphAligner(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir)[source]¶
Bases:
ABC
Aligns the graph components of
doc_graph
and visualizes them withrenderer
.-
MAX_RENDER_LEVEL:
ClassVar
[int
] = 10¶ The maximum value for
render_level
.
- __init__(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir)¶
- align(doc_graph)[source]¶
Align the graph components of
doc_graph
and optionally visualize them withrenderer
. To disable rendering, setrender_level
to 0.- Parameters:
doc_graph (
DocumentGraph
) – the graph created by theDocumentGraphFactory
- Return type:
- Returns:
the alignments available as the in memory graph and object graph, Pandas dataframes, statistics and scores
-
config_factory:
ConfigFactory
¶ Used to create the
GraphSequencer
instance.
- create_error_result(ex, msg='Could not align')[source]¶
Create an error graph result (rather than an alignment result). This should be called in a try/catch to obtain the error information.
- Parameters:
ex (
Exception
) – the exception that caused the issue- Param:
msg: the error message for the failure
- Return type:
-
doc_graph_name:
str
¶ The
DocumentGraph.name
document return fromalign()
.
-
flow_graph_result_name:
str
¶ The app configuration section name of
FlowGraphResult
.
-
init_loops_render_level:
int
¶ The
render_level
to use for all iteration loops except for the last before the algorithm converges.
- classmethod is_valid_render_level(render_level, should_raise=False)[source]¶
- Return type:
Return whether
render_level
is a valid value for :obj:`render_level.
-
output_dir:
Path
¶ If this is set, the graphs are written to this created directory on the file system. Otherwise, they are displayed and cleaned up afterward.
-
render_level:
int
¶ How many graphs to render on a scale from 0 - 10. The higher the number the more likely a graph is to be rendered. A value of 0 prevents rendering and a setting of 10 will render all graphs.
- See:
-
renderer:
GraphRenderer
¶ Visually render the graph in to a human understandable presentation.
-
MAX_RENDER_LEVEL:
- class zensols.calamr.aligner.DocumentGraphController(name)[source]¶
Bases:
Dictable
Executes the maxflow/min cut algorithm on a document graph.
- __init__(name)¶
- invoke(doc_graph)[source]¶
Perform operations on the graph algorithm.
- Parameters:
doc_graph (
DocumentGraph
) – the graph to edit- Return type:
- Returns:
the number of edits made to the graph
- class zensols.calamr.aligner.GraphIteration(sequence, render_level, updates)[source]¶
Bases:
Dictable
An iteration of the alignment algorithm.
- __init__(sequence, render_level, updates)¶
-
render_level:
int
¶ Whether to render graphs on a scale from 0 - 10. The higher the number the more likely it is to be rendered with 0 never rendering the graph, and 10 always rendering the graph.
- reset()[source]¶
Reset all state in application context shared objects so new data is forced to be created on the next alignment request.
-
sequence:
GraphSequence
¶ The sequence to use for this iteration.
- class zensols.calamr.aligner.GraphSequence(name, process_name, render_name, heading, controller, sequencer)[source]¶
Bases:
Dictable
A strategy GoF pattern that models what to do during a sequence of graph modifications using
DocumentGraphController
. It also contains rendering information for visualization.- __init__(name, process_name, render_name, heading, controller, sequencer)¶
-
controller:
Optional
[DocumentGraphController
]¶ The controller used in the invocation of this strategy.
- invoke()[source]¶
Invoke the strategy. This implementation calls the controller with the
process_graph
to be processed and passes back the update count.- Return type:
- populate_render_context(context)[source]¶
Alows the sequence to override the parameters before being sent to the graph rendinger API.
- property process_graph: DocumentGraph¶
The graph provided to the graph controller.
-
process_name:
str
¶ The name of the graph provided to the graph controller. See
process_graph
.
- property render_graph¶
The graph to render.
-
render_name:
str
¶ The name of the graph to render. See
render_graph
.
- reset()[source]¶
Reset all state in application context shared objects so new data is forced to be created on the next alignment request.
-
sequencer:
GraphSequencer
¶ Owns and controls this instance.
- class zensols.calamr.aligner.GraphSequencer(config_factory, sequence_path, nascent_graph, render, descriptor=None, heading_format=None)[source]¶
Bases:
object
This invokes the
GraphSequence
objects in the provided sequence to automate the graph alignment algorithm and used byMaxflowDocumentGraphAligner
.- __init__(config_factory, sequence_path, nascent_graph, render, descriptor=None, heading_format=None)[source]¶
Initialize this instance.
- Parameters:
config_factory (
ConfigFactory
) – used to create the controller in the sequence instancessequence_path (
Path
) – the path to the JSON file that has the sequences’ configurationnascent_graph (
DocumentGraph
) – the initial disconnected graph created byDocumentGraphFactory
render (
rendergroup
) – the render object created bybase.rendergroup
- property render_level: int¶
Whether to render graphs on a scale from 0 - 10. See
DocumentGraphAligner.MAX_RENDER_LEVEL
.
- class zensols.calamr.aligner.MaxflowDocumentGraphAligner(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir, graph_sequencer_name, max_sequencer_iterations, hyp)[source]¶
Bases:
DocumentGraphAligner
Uses the maxflow/min cut algorithm to compute graph component alignments.
- __init__(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir, graph_sequencer_name, max_sequencer_iterations, hyp)¶
-
graph_sequencer_name:
str
¶ The app configuration section name of
GraphSequencer
.
-
hyp:
HyperparamModel
¶ The capacity calculator hyperparameters.
- See:
summary.CapacityCalculator.hyp
- class zensols.calamr.aligner.RenderUpSideDownGraphSequence(name, process_name, render_name, heading, controller, sequencer)[source]¶
Bases:
GraphSequence
A graph sequence that tells
graphviz
to render the diagram upside down, which is useful for reverse flow graphs.- __init__(name, process_name, render_name, heading, controller, sequencer)¶
zensols.calamr.annotate module¶
Contain a class to add embeddings to AMR feature documents.
- class zensols.calamr.annotate.AddEmbeddingsFeatureDocumentStash(delegate, word_piece_doc_factory=None)[source]¶
Bases:
DelegateStash
,PrimeableStash
Add embeddings to AMR feature documents. Embedding population is disabled by configuring
word_piece_doc_factory
asNone
.- __init__(delegate, word_piece_doc_factory=None)¶
- get(name, default=None)[source]¶
Load an object or a default if key
name
doesn’t exist.Implementation note: sub classes will probably want to override this method given the super method is cavalier about calling
exists:()
andload()
. Based on the implementation, this can be problematic.- Return type:
- load(name)[source]¶
Load a data value from the pickled data with key
name
. Semantically, this method loads the using the stash’s implementation. For exampleDirectoryStash
loads the data from a file if it exists, but factory type stashes will always re-generate the data.- See:
- Return type:
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory
= None¶ The feature document factory that populates embeddings.
- class zensols.calamr.annotate.CalamrAnnotatedAmrFeatureDocumentFactory(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0, word_piece_doc_factory=None)[source]¶
Bases:
AnnotatedAmrFeatureDocumentFactory
Adds wordpiece embeddings to
AmrFeatureDocument
instances.- __init__(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0, word_piece_doc_factory=None)¶
- to_annotated_doc(doc)[source]¶
Clone
doc.amr
into anAnnotatedAmrDocument
.- Parameters:
sent – the document to convert to an
AnnotatedAmrDocument
- Return type:
- Returns:
a feature document with a new
amr
to newAnnotatedAmrDocument
, which is a new instance ifsent
isn’t an annotated AMR document
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory
= None¶ The feature document factory that populates embeddings.
- class zensols.calamr.annotate.ProxyReportAnnotatedAmrDocument(sents, path=None, doc_id=None)[source]¶
Bases:
AnnotatedAmrDocument
Overrides the sections property to skip duplicate summary sentences also found in the body.
- __init__(sents, path=None, doc_id=None)¶
Initialize.
- Parameters:
sents (Tuple[AmrSentence, …]) – the document’s sentences
path (Optional[Path, …]) – the path to file containing the Penman notation sentence graphs used in
sents
model – the model to initailize
AmrSentence
whensents
is a list of string Penman graphs
- property sections: Tuple[AnnotatedAmrSectionDocument]¶
The sentences that make up the body of the document.
zensols.calamr.app module¶
Alignment entry point application.
- class zensols.calamr.app.AlignmentApplication(resource, config_factory)[source]¶
Bases:
_AlignmentBaseApplication
This application aligns data in files.
- __init__(resource, config_factory)¶
- align_file(input_file, output_dir=None, output_format=Format.csv, render_level=None)[source]¶
Align annotated documents from a JSON file.
-
config_factory:
ConfigFactory
¶ Application configuration factory.
- class zensols.calamr.app.CorpusApplication(resource, config_factory, results_dir)[source]¶
Bases:
_AlignmentBaseApplication
AMR graph aligment.
- __init__(resource, config_factory, results_dir)¶
- align_corpus(keys, output_dir=None, output_format=Format.csv, render_level=None, use_cached=False)[source]¶
Align an annotated AMR document from the corpus.
-
config_factory:
ConfigFactory
¶ For prototyping.
- dump_annotated(limit=None, output_dir=None, output_format=Format.csv)[source]¶
Write annotated documents and their keys.
- get_annotated_summary(limit=None)[source]¶
Return a CSV file with a summary of the annotated AMR dataset.
-
results_dir:
Path
¶ The directory where the output results are written, then read back for analysis reporting.
- class zensols.calamr.app.Resource(anon_doc_stash, doc_factory, serialized_factory, doc_graph_factory, doc_graph_aligner, flow_results_stash, anon_doc_factory)[source]¶
Bases:
object
A client facade (GoF) for Calamr annotated AMR corpus access and alginment.
- __init__(anon_doc_stash, doc_factory, serialized_factory, doc_graph_factory, doc_graph_aligner, flow_results_stash, anon_doc_factory)¶
- align(doc_graph)[source]¶
Create flow results from a document graph.
- Parameters:
doc_graph (
DocumentGraph
) – the source/summary components to align- Return type:
- Returns:
the aligned bipartite graph and its statistics
- align_corpus_document(doc_id, use_cache=True)[source]¶
Create flow results of a corpus AMR document.
- Parameters:
- Return type:
- Returns:
the flow results for the corpus document or
None
ifdoc_id
is not a valid key
-
anon_doc_factory:
AnnotatedAmrFeatureDocumentFactory
¶ Creates instances of
AmrFeatureDocument
.
-
anon_doc_stash:
Stash
¶ Contains human annotated AMRs. This could be from the adhoc (micro) corpus (small toy corpus), AMR 3.0 Proxy Report corpus, Little Prince, or the Bio AMR corpus.
- create_graph(doc)[source]¶
Return a new document graph based on feature document.
- Parameters:
doc (
AmrFeatureDocument
) – the document on which to base the new graph- Return type:
- Returns:
a new AMR document graph
-
doc_factory:
AmrFeatureDocumentFactory
¶ Creates
AmrFeatureDocument
fromAmrDocument
instances.
-
doc_graph_aligner:
DocumentGraphAligner
¶ Create document graphs.
-
doc_graph_factory:
DocumentGraphFactory
¶ Create document graphs.
-
flow_results_stash:
Stash
¶ Creates cached instances of
FlowGraphResult
.
- get_corpus_document(doc_id)[source]¶
Get an AMR feature document by key from the application configured corpus.
- Parameters:
doc_id (
str
) – the corpus document ID (i.e.liu-example
or20041010_0024
)- Return type:
- Returns:
the AMR feature document
- parse_documents(data)[source]¶
Parse documents with keys
id
,comment
,body
, andsummary
from adict
, sequence ofdict
instanaces. or JSON file in the format:[{ "id": "ex1", "comment": "very short", "body": "The man ran to make the train. He just missed it.", "summary": "A man got caught in the door of a train he missed." }]
- Return type:
- Returns:
the parsed AMR feature document
- See:
-
serialized_factory:
AmrSerializedFactory
¶ Creates a
Serialized
fromAmrDocument
,AmrSentence
orAnnotatedAmrDocument
.
- to_annotated_doc(doc)[source]¶
Return an annotated feature document, creating the feature document if necessary. The
doc.amr
attribute is set to annotated AMR document.- Parameters:
doc (
Union
[AmrDocument
,AmrFeatureDocument
]) – an AMR document or an AMR feature document- Return type:
- Returns:
a new instance of a document if
doc
is not aAmrFeatureDocument
or ifdoc.amr
is not anAnnotatedAmrDocument
- to_feature_doc(amr_doc, catch=False, add_metadata=False, add_alignment=False)[source]¶
Create a
AmrFeatureDocument
from a class:.AmrDocument by parsing thesnt
metadata with aFeatureDocumentParser
.- Parameters:
add_metadata (
Union
[str
,bool
]) – add missing annotation metadata toamr_doc
parsed from spaCy if missing (seeAmrParser.add_metadata()
) ifTrue
and replace any previous metadata if this value is the stringclobber
catch (
bool
) – ifTrue
, return caught exceptions creating aAmrFailure
from each and return them
- Return type:
Union
[AmrFeatureDocument
,Tuple
[AmrFeatureDocument
,List
[AmrFailure
]]]- Returns:
an AMR feature document if
catch
isFalse
; otherwise, a tuple of a document with sentences that were successfully parsed and a list any exceptions raised during the parsing
zensols.calamr.attr module¶
Graph node and edge domain classes.
Terminology: token alignments refer to the sentence index based token alignments to AMR nodes. This is not to be confused with alignment edges (aka graph component alignment edges).
- class zensols.calamr.attr.AmrDocumentNode(context, name, root, children, doc)[source]¶
Bases:
DocumentNode
A composite note containing a subset of the
DocumentNode.root
sentences. This includes the text, text features, and AMR Penman graph data.- __init__(context, name, root, children, doc)¶
- doc: AmrFeatureDocument¶
A document containing a subset of sentences that fall under this portion of the graph.
- class zensols.calamr.attr.AttributeGraphNode(context, sent, token_aligns, triple)[source]¶
Bases:
TripleGraphNode
Attribute data from AMR attribute nodes grafted on to the
igraph.Graph
.- ATTRIB_TYPE: ClassVar[str] = 'attribute'¶
The attribute type this class represents.
- __init__(context, sent, token_aligns, triple)¶
- class zensols.calamr.attr.ComponentAlignmentGraphEdge(context, capacity=0, flow=0)[source]¶
Bases:
GraphEdge
An edge that spans graph components.
- ATTRIB_TYPE: ClassVar[str] = 'component alignment'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0)¶
- class zensols.calamr.attr.ComponentCorefAlignmentGraphEdge(context, capacity=0, flow=0, relation=None, is_bipartite=False)[source]¶
Bases:
ComponentAlignmentGraphEdge
An edge that spans graph components.
- ATTRIB_TYPE: ClassVar[str] = 'component coref alignment'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0, relation=None, is_bipartite=False)¶
- is_bipartite: bool = False¶
Whether the coreference spans components.
- relation: Relation = None¶
The AMR coreference relation between this node and all other refs.
- class zensols.calamr.attr.ConceptGraphNode(context, sent, token_aligns, triple)[source]¶
Bases:
TripleGraphNode
Attribute data from AMR concept nodes grafted on to the
igraph.Graph
.- ATTRIB_TYPE: ClassVar[str] = 'concept'¶
The attribute type this class represents.
- __init__(context, sent, token_aligns, triple)¶
- property instance: str¶
The concept instance, such as the propbank entry (i.e. see-01). Other examples include nouns.
- property roleset: Roleset¶
- property roleset_embedding: Tensor¶
- property roleset_id: RolesetId¶
- class zensols.calamr.attr.DocumentGraphEdge(context, capacity=0, flow=0, relation='')[source]¶
Bases:
GraphEdge
An edge that has data about the non-AMR parts of the graph, such as sentence.
- ATTRIB_TYPE: ClassVar[str] = 'doc'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0, relation='')¶
- relation: str = ''¶
The edge relation between two document nodes or document to igraph node.
- class zensols.calamr.attr.DocumentGraphNode(context, level_name, doc_node)[source]¶
Bases:
GraphNode
A node that has data about the non-AMR parts of the graph, such as the unifying top level node that ties the sentences together. However, it can contain the root to an AMR sentence (see
AmrDocumentNode
).- ATTRIB_TYPE: ClassVar[str] = 'doc'¶
The attribute type this class represents.
- __init__(context, level_name, doc_node)¶
- level_name: str¶
The descriptive name of the node such as
doc
orsection
.
- class zensols.calamr.attr.DocumentNode(context, name, root, children)[source]¶
Bases:
GraphNode
A composite of a node in the document tree that are associated with the
FeatureDocument
as root node. This class represents nodes in a graph that either:make up the part of the graph that’s disjoint from the AMR sentinel subgraphs (i.e. a root
doc
node), orthe root to an AMR sentence (see
AmrDocumentNode
)
The in-memory object graph of these instances are dependent on the type of data it represents. For example, the Proxy Report corpus has a top level a summary and body nodes with AMR sentences below (root on top).
- __init__(context, name, root, children)¶
- children: Tuple[DocumentNode, ...]¶
The children of this node with respect to the composite pattern.
- property children_by_name: DocumentNode¶
The children’s names as keys and respective document nodes as capacitys.
- name: str¶
The descriptive name of the node such as
doc
orsection
.
- root: AmrFeatureDocument¶
The owning feature document containing all sentences/tokens of the graph.
- property sents: Tuple[AmrFeatureSentence]¶
The sentences of the this document level.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.calamr.attr.GraphEdge(context, capacity=0, flow=0)[source]¶
Bases:
GraphAttribute
Graph attriubte data added to the
igraph.Graph
edges.- MAX_CAPACITY: ClassVar[float] = 10000000000.0¶
Maximum value a capacity.
Implementation note: It seems
igraph
can only handle large values to represent infinity, and not floatinf
or the system defined largest float value.
- __init__(context, capacity=0, flow=0)¶
- class zensols.calamr.attr.GraphNode(context)[source]¶
Bases:
GraphAttribute
Graph attribute data added to the
igraph.Graph
vertexes.- __init__(context)¶
- class zensols.calamr.attr.RoleGraphEdge(context, sent, token_aligns, capacity=0, flow=0, triple=None, role=None)[source]¶
Bases:
GraphEdge
,SentenceGraphAttribute
Attribute data from the AMR role edges grafted on to the
igraph.Graph
.- ATTRIB_TYPE: ClassVar[str] = 'role'¶
The attribute type this class represents.
- __init__(context, sent, token_aligns, capacity=0, flow=0, triple=None, role=None)¶
- role: Union[str, Role] = None¶
The role name of the edge such as
:ARG0
.
- triple: Union[Instance, Attribute] = None¶
The AMR Penman graph triple.
- class zensols.calamr.attr.SentenceGraphAttribute(context, sent, token_aligns)[source]¶
Bases:
GraphAttribute
A node containing zero or more tokens with its parent sentence. Usually the AMR node represents a single token, but can have more than one token alignment.
- __init__(context, sent, token_aligns)¶
- sent: AmrFeatureSentence¶
The sentence from which this node was created.
- property token_align_str: str¶
A string representation of the AMR Penman representation of the token alignment.
- token_aligns: Tuple[Union[Alignment, RoleAlignment], ...]¶
The node to sentinel token index.
- property tokens: Tuple[FeatureToken, ...]¶
The tokens
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.calamr.attr.SentenceGraphEdge(context, capacity=0, flow=0, relation='', sent=None, sent_ix=None)[source]¶
Bases:
DocumentGraphEdge
An edge from a document node to a
SentenceGraphNode
.- ATTRIB_TYPE: ClassVar[str] = 'sentence'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0, relation='', sent=None, sent_ix=None)¶
- sent: AmrFeatureSentence = None¶
The sentence from which this node was created.
- sent_ix: int = None¶
The sentence index.
- class zensols.calamr.attr.SentenceGraphNode(context, sent, sent_ix)[source]¶
Bases:
GraphNode
A graph node containing the root of a sentence.
- ATTRIB_TYPE: ClassVar[str] = 'sentence'¶
The attribute type this class represents.
- SENT_TEXT_LEN: ClassVar[int] = 20¶
The truncated sentence length
- __init__(context, sent, sent_ix)¶
- sent: AmrFeatureSentence¶
The sentence from which this node was created.
- sent_ix: int¶
The sentence index.
- class zensols.calamr.attr.TerminalGraphEdge(context, capacity=0, flow=0)[source]¶
Bases:
GraphEdge
An edge that connects to terminal a
TerminalGraphNode
.- ATTRIB_TYPE: ClassVar[str] = 'terminal'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0)¶
- class zensols.calamr.attr.TerminalGraphNode(context, is_source)[source]¶
Bases:
GraphNode
A flow control: source or sink.
- ATTRIB_TYPE: ClassVar[str] = 'control'¶
The attribute type this class represents.
- __init__(context, is_source)¶
- is_source: bool¶
Whether or not this source (
s
) or sink (t
).
- class zensols.calamr.attr.TripleGraphNode(context, sent, token_aligns, triple)[source]¶
Bases:
SentenceGraphAttribute
,GraphNode
Contains a Penman triple with token alignments used for concepts and AMR attributes. Instances of this class get their embedding via
SentenceGraphAttribute._get_embedding()
.- __init__(context, sent, token_aligns, triple)¶
- triple: Union[Instance, Attribute]¶
The AMR Penman graph triple.
zensols.calamr.cli module¶
Command line entry point to the application.
- class zensols.calamr.cli.ApplicationFactory(*args, **kwargs)[source]¶
Bases:
ApplicationFactory
zensols.calamr.comp module¶
Base graph component class.
- class zensols.calamr.comp.GraphComponent(graph)[source]¶
Bases:
PersistableContainer
,Writable
A container class for an
igraph.Graph
, which also has caching data structures for fast access to graph attributes.-
GRAPH_ATTRIB_NAME:
ClassVar
[str
] = 'ga'¶ The name of the graph attributes on igraph nodes and edges.
- __init__(graph)¶
- property adjacency_list: List[List[int]]¶
“An adjacency list of vertexes based on their relation to each other in the graph. The outer list’s index is the source vertex and the inner list is that vertex’s neighbors.
Implementation note: the list is sub-setted at both the inner and outer level for those vertexes in this component.
- clone(reverse_edges=False, deep=True, **kwargs)[source]¶
Clone an instance and return it. The
graph
is deep copied, but allGraphAttribute
instances are not.- Parameters:
reverse_edges (
bool
) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithmdeep (
bool
) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed ofvs
; otherwise the graph is copied as a subcomponent starting fromroot
kwargs – arguments to add to as attributes to the clone; include
cls
is the type of the new instance
- Return type:
- Returns:
the cloned instance of this instance
- copy_graph(reverse_edges=False, subgraph_type=None)[source]¶
Return a copy of the class:igraph.Graph.
- Parameters:
- Return type:
- edge_by_graph_edge_id(ge_id)[source]¶
Return a edge based on the graph (attribute) edge id.
- Return type:
- edge_ref_by_id(ix)[source]¶
Get the
igraph.Edge
instance by its index.- Return type:
- property edges_reversed: bool¶
Whether the edge direction in the graph is reversed. This is
True
for reverse flow graphs.- See:
summary.ReverseFlowGraphAlignmentConstructor
- get_attributes()[source]¶
Return all graph attributes of the component, which include instances of both
GraphNode
andGraphEdge
.- Return type:
- property graph: Graph¶
The graph used for computational manipulation of the synthesized AMR sentences.
- invalidate()[source]¶
Clear cached data structures to force them to be recreated after igraph level data has changed.
- node_by_graph_node_id(gn_id)[source]¶
Return a node based on the graph (attribute) node id.
- Return type:
- property root: Vertex | None¶
The singular (first found) root of the graph, which is usually the top level
DocumentNode
instance.
- property roots: Iterable[Vertex]¶
The roots of the graph, which are usually top level
DocumentNode
instances.
- vertex_ref_by_id(ix)[source]¶
Get the
igraph.Vertex
instance by its index.- Return type:
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writer
using indentiondepth
.- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
-
GRAPH_ATTRIB_NAME:
zensols.calamr.ctrl module¶
Document graph controller implementations.
- class zensols.calamr.ctrl.AlignmentCapacitySetDocumentGraphController(name, min_capacity, capacity)[source]¶
Bases:
DocumentGraphController
Set the capacity on edges if the criteria matches
min_flow
,component_names
andmatch_edge_classes
.- __init__(name, min_capacity, capacity)¶
- capacity: float¶
The capacity to set.
- class zensols.calamr.ctrl.ConstructDocumentGraphController(name, build_graph_name, constructor, renderer)[source]¶
Bases:
DocumentGraphController
Constructs the graph that will later be used for the min cut/max flow algorithm (see
MaxflowDocumentGraphController
). After itsinvoke()
method is called,build_graph
is available, which is the constructed graph provided byconstructor
.- __init__(name, build_graph_name, constructor, renderer)¶
- build_graph_name: str¶
The name given to newly instances of
DocumentGraph
.
- constructor: GraphAlignmentConstructor¶
The constructor used to get the source and sink nodes.
- renderer: GraphRenderer¶
Visually render the graph in to a human understandable presentation.
- class zensols.calamr.ctrl.FixReentrancyDocumentGraphController(name, component_name, maxflow_controller, only_report)[source]¶
Bases:
DocumentGraphController
Fix reentrancies by splitting the flow of the last calculated maxflow as the capacity of the outgoing edges in the reversed graph. This fixes the issue edges getting flow starved, then later eliminated in the graph reduction steps.
Subsequently, the maxflow algorithm is rerun if we have at least one reentrancy after reallocating the capacit(ies).
- __init__(name, component_name, maxflow_controller, only_report)¶
- component_name: str¶
The name of the components to restore.
- maxflow_controller: MaxflowDocumentGraphController¶
The maxflow component used to recalculate the maxflow .
- only_report: bool¶
Whether to only report reentrancies rather than fix them.
- class zensols.calamr.ctrl.FlowDiscountDocumentGraphController(name, discount_sum, component_names=<factory>)[source]¶
Bases:
DocumentGraphController
Decrease/constrict the capacities by making the sum of the incoming flows from the bipartitie edges the value of
discount_sum
. The capacities are only updated if the sum of the incoming bipartitie edges have a flow greater thandiscount_sum
.- __init__(name, discount_sum, component_names=<factory>)¶
- component_names: Set[str]¶
The name of the components to discount.
- discount_sum: float¶
The capacity sum will be this value (see class docs).
- class zensols.calamr.ctrl.FlowSetDocumentGraphController(name, component_names=<factory>, match_edge_classes=<factory>, flow=0)[source]¶
Bases:
DocumentGraphController
Set a static flow on components based on name and edges based on class.
- __init__(name, component_names=<factory>, match_edge_classes=<factory>, flow=0)¶
- component_names: Set[str]¶
The components on which to set the flow.
- flow: float = 0¶
The flow to set.
- match_edge_classes: Set[Type[GraphEdge]]¶
The edge classes (i.e.
TerminalGraphEdge
) to set the flow.
- class zensols.calamr.ctrl.MaxflowDocumentGraphController(name, constructor)[source]¶
Bases:
DocumentGraphController
Executes the maxflow/min cut algorithm on a document graph.
- __init__(name, constructor)¶
- constructor: GraphAlignmentConstructor¶
The constructor used to get the source and sink nodes.
- class zensols.calamr.ctrl.NormFlowDocumentGraphController(name, component_names, constructor, normalize_mode='fpn')[source]¶
Bases:
DocumentGraphController
Normalizes flow on edges as the flow going through the edge and the total number of descendants. Descendants are counted as the edge’s source node and all children/descendants of that node.
This is done recursively to calculate flow per node. For each call recursive iteration, it computes the flow per node of the parent edge(s) from the perspective of the nascent graph, (root at top with arrows pointed to children underneath). However, the graph this operates on are the reverese flow max flow graphs (flow diretion is taken care of adjacency list computed in
GraphComponent
.Since an AMR node can have multiple parents, we keep track of descendants as a set rather than a count to avoid duplicate counts when nodes have more than one parent. Otherwise, in multiple parent case, duplicates would be counted when the path later converges closer to the root.
- __init__(name, component_names, constructor, normalize_mode='fpn')¶
- component_names: Set[str]¶
The name of the components to minimize.
- constructor: GraphAlignmentConstructor¶
The instance used to construct the graph passed in the
invoke()
method.
- normalize_mode: str = 'fpn'¶
How to normalize nodes (if at all), which is one of:
fpn
: leaves flow values as they were after the initial flow per nodecalculation
norm
: normalize so all values add to onevis
: same asnorm
but add avis_flow
attribute to the edgesso the original flow is displayed and visualized as the flow color
- class zensols.calamr.ctrl.RemoveAlignsDocumentGraphController(name, min_capacity)[source]¶
Bases:
DocumentGraphController
Removes graph component alignment for low capacity links.
- __init__(name, min_capacity)¶
- min_capacity: float¶
The graph component alignment edges are removed if their capacities are at or below this value.
- class zensols.calamr.ctrl.RoleCapacitySetDocumentGraphController(name, min_flow, capacity, component_names)[source]¶
Bases:
DocumentGraphController
This finds low flow role edges and sets (zeros out) all the capacities of all the connected edge alignments recursively for all descendants. We “slough off” entire subtrees (sometimes entire sentences or document nodes) for low flow ancestors.
- __init__(name, min_flow, capacity, component_names)¶
- capacity: float¶
The capacity (and flow) to set.
- component_names: Set[str]¶
The name of the components to minimize.
- class zensols.calamr.ctrl.SnapshotDocumentGraphController(name, component_names, snapshot_source)[source]¶
Bases:
DocumentGraphController
Record flows, then later restore. If
snapshot_source
is notNone
, then this instance restores from it. Otherwise it records.- __init__(name, component_names, snapshot_source)¶
- component_names: Set[str]¶
The name of the components on which to record or restore flows.
- snapshot_source: SnapshotDocumentGraphController¶
The source instance that contains the data from which to restore.
zensols.calamr.dcomp module¶
A document centric graph component.
- class zensols.calamr.dcomp.DocumentGraphComponent(graph, root_node, sent_index=<factory>, description=None)[source]¶
Bases:
GraphComponent
A class containing the root information of the document tree and the
igraph.Graph
vertex. When theigraph.Graph
is set with thegraph
property, a strongly connected subgraph component is induced. It does this by traversing all reachable verticies and edges from theroot
. Examples of these induced components include source and summary components of a document AMR graph.Instances are created by
DocumentGraphFactory
.- __init__(graph, root_node, sent_index=<factory>, description=None)¶
- clone(reverse_edges=False, deep=True, **kwargs)[source]¶
Clone an instance and return it. The
graph
is deep copied, but allGraphAttribute
instances are not.- Parameters:
reverse_edges (
bool
) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithmdeep (
bool
) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed ofvs
; otherwise the graph is copied as a subcomponent starting fromroot
kwargs – arguments to add to as attributes to the clone; include
cls
is the type of the new instance
- Return type:
- Returns:
the cloned instance of this instance
- property doc_vertices: Iterable[Vertex]¶
Get the vertices of
DocuemntGraphNode
. This only fetches those document nodes that do not branch.
- get_attributes()[source]¶
Return all graph attributes of the component, which include instances of both
GraphNode
andGraphEdge
.- Return type:
- property relation_set: RelationSet¶
The relations in the contained root node document.
- property root: Vertex | None¶
The roots of the graph, which are usually top level
DocumentNode
instances.
-
root_node:
AmrDocumentNode
¶ The root of the document tree.
-
sent_index:
SentenceIndex
¶ An index of the sentences of a
DocumentGraphComponent
.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writer
using indentiondepth
.- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.calamr.dcomp.SentenceEntry(node=None, concepts=None, attributes=None)[source]¶
Bases:
Dictable
Contains the sentence node of a sentence, and the respective concept and attribute nodes.
- __init__(node=None, concepts=None, attributes=None)¶
-
attributes:
Tuple
[AttributeGraphNode
] = None¶ The AMR attribute nodes of the sentence.
- property concept_by_variable: Dict[str, ConceptGraphNode]¶
-
concepts:
Tuple
[ConceptGraphNode
] = None¶ The AMR concept nodes of the sentence.
-
node:
SentenceGraphNode
= None¶ The sentence node, which is the root of the sentence subgraph.
- class zensols.calamr.dcomp.SentenceIndex(entries=None)[source]¶
Bases:
Dictable
An index of the sentences of a
DocumentGraphComponent
.- __init__(entries=None)¶
- property by_sentence: Dict[AmrFeatureSentence, SentenceEntry]¶
-
entries:
Tuple
[SentenceEntry
] = None¶ Then entries of the index, each of which is a sentence.
zensols.calamr.doc module¶
Document based graph container, factory and strategy classes.
- class zensols.calamr.doc.DocumentGraph(graph, name, graph_attrib_context, doc, components, children=<factory>)[source]¶
Bases:
GraphComponent
A graph containing the text, text features, AMR Penman graph and igraph.
This class roughly follows a GoF composite pattern with
children
a collection of instance of this class, which are the reversed source and summary graphs created for the max flow algorithm. The root is constructed from theDocumentGraphFactory
class and the children are built by theDocumentGraphController
instances.The children of this composite are not to be confused with
components
, which are the disconnected source and summary graph components in the root graph instance. Each child also has the reversed flow graphs, but are connected as a bipartite flow graph for use by the max flow algorithm.- __init__(graph, name, graph_attrib_context, doc, components, children=<factory>)¶
- property bipartite_relation_set: RelationSet¶
The bipartite relations that span components. This set includes all top level relations that are not self contained in any components.
-
children:
Dict
[str
,DocumentGraph
]¶ The children of this instance, which for now, are only instances of
FlowDocumentGraph
.
- clone(reverse_edges=False, deep=True, **kwargs)[source]¶
Clone an instance and return it. The
graph
is deep copied, but allGraphAttribute
instances are not.- Parameters:
reverse_edges (
bool
) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithmdeep (
bool
) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed ofvs
; otherwise the graph is copied as a subcomponent starting fromroot
kwargs – arguments to add to as attributes to the clone; include
cls
is the type of the new instance
- Return type:
- Returns:
the cloned instance of this instance
- component_iter()[source]¶
Return an iterable of the components of this graph and recursively over the children.
- Return type:
-
components:
Tuple
[DocumentGraphComponent
,...
]¶ The roots of the trees created by the
DocumentGraphFactory
.
- property components_by_name: Dict[str, DocumentGraphComponent]¶
Get document graph components by name.
- property components_by_name_sorted: Tuple[Tuple[str, DocumentGraphComponent], ...]¶
Get document graph components sorted name.
-
doc:
AmrFeatureDocument
¶ The document that represents the graph.
- property graph_attrib_context: GraphAttributeContext¶
The context given to all nodees and edges of the graph.
-
name:
str
¶ The name of the graph used to identify it. For now, this is only
reversed_source
for the graph that flows from the summary to the source, and ``reversed_summary for the graph that flows from the source to the summary. These are “reversed” because the flow is reversed from the leaf nodes to the root.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writer
using indentiondepth
.- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.calamr.doc.DocumentGraphDecorator[source]¶
Bases:
ABC
A strategy to create a graph from a document structure.
- __init__()¶
- abstract decorate(component)[source]¶
Creates the graph from a
DocumentNode
root node.- Parameters:
component (
DocumentGraphComponent
) – the graph to populate from the decorateing process
- class zensols.calamr.doc.DocumentGraphFactory(config_factory, graph_decorators, doc_graph_section_name, graph_attrib_context)[source]¶
Bases:
ABC
Creates a document graph. After the document portion of the graph is created, the igraph is built and merged using a
DocumentGraphDecorator
. This igraph has the corresponding vertexes and edges associated with the document graph, which includes AMR Penman graph and feature document artifacts.- __init__(config_factory, graph_decorators, doc_graph_section_name, graph_attrib_context)¶
-
config_factory:
ConfigFactory
¶ Used to create a
DocumentGraphDecorator
.
- create(root)[source]¶
Create a document graph and return it starting from the root note. See class docs.
- Parameters:
root (
AmrFeatureDocument
) – the feature document from which to create the graph- Return type:
-
doc_graph_section_name:
str
¶ The name of a section in the configuration that defines new instances of
DocumentGraph
.
-
graph_attrib_context:
GraphAttributeContext
¶ The context given to all nodees and edges of the graph.
-
graph_decorators:
Tuple
[DocumentGraphDecorator
,...
]¶ The name of the section that defines a
DocumentGraphDecorator
instance.
zensols.calamr.domain module¶
Classes that organize document in content in to a hierarchy.
Terminology: token alignments refer to the sentence index based token alignments to AMR nodes. This is not to be confused with alignment edges (aka graph component alignment edges).
- exception zensols.calamr.domain.ComponentAlignmentError(msg, sent=None)[source]¶
Bases:
AmrError
Package level errors.
- __module__ = 'zensols.calamr.domain'¶
- class zensols.calamr.domain.ComponentAlignmentFailure(exception=None, thrower=None, traceback=None, message=None)[source]¶
Bases:
Failure
Package level failures.
- __init__(exception=None, thrower=None, traceback=None, message=None)¶
- class zensols.calamr.domain.EmbeddingResource(torch_config, word_piece_doc_parser=None, word_piece_doc_factory=None, roleset_stash=None)[source]¶
Bases:
object
Generates embeddings for roles, role sets, text, and feature tokens.
- __init__(torch_config, word_piece_doc_parser=None, word_piece_doc_factory=None, roleset_stash=None)¶
- get_role_embedding(role)[source]¶
Return an embedding for a role. This uses the role’s relation’s embedding if available. Otherwise, it uses the embedding created fromi the role’s prefix.
- Return type:
Tensor
- get_sentence_tokens_embedding(sent)[source]¶
Return the sentence embeddings of
sent
.- Return type:
Tensor
- get_token_embedding(text)[source]¶
Return the mean of the token embeddings of
text
.- Return type:
Tensor
- get_tokens_embedding(tokens)[source]¶
Return the mean of the embeddings of
tokens
.- Return type:
Tensor
-
torch_config:
TorchConfig
¶ Used to create
unknown_edge_embedding
- property unknown_edge_embedding: Tensor¶
A zero embedding.
- property unknown_node_embedding: Tensor¶
A zero embedding.
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory
= None¶ Creates word piece data structures that have embeddings.
-
word_piece_doc_parser:
FeatureDocumentParser
= None¶ Used to get single token embeddings for nodes with no token alignments.
- class zensols.calamr.domain.GraphAttribute(context)[source]¶
Bases:
PersistableContainer
,Dictable
Contains AMR document attribute data added to the
igraph.Graph
. This is added as vertexes or edge attribute data.- ATTRIB_TYPE: ClassVar[str] = 'base'¶
The attribute type this class represents.
- __init__(context)¶
- context: GraphAttributeContext¶
Contains context data used by nodes and edges of the graph.
- property description: str¶
A human readable description that is usually used as the label and
__str__()
.
- property embedding: Tensor¶
The default embedding of the attribute. Note that some attributes have several different embeddings.
- property embedding_resource: EmbeddingResource¶
Generates embeddings for roles, role sets, text, and feature tokens.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.calamr.domain.GraphAttributeContext(embedding_resource, relation_stash, default_capacity, sink_capacity, component_alignment_capacity, doc_capacity, similarity_threshold, default_format_strlen)[source]¶
Bases:
Dictable
Contains context data used by nodes and edges of the graph.
- __init__(embedding_resource, relation_stash, default_capacity, sink_capacity, component_alignment_capacity, doc_capacity, similarity_threshold, default_format_strlen)¶
-
component_alignment_capacity:
float
¶ The default initial capacity for source/summary component alignment edges.
-
doc_capacity:
float
¶ The bipartitie (between source and summary) capacity value of
DocumentGraphNode
.
-
embedding_resource:
EmbeddingResource
¶ The manager that contains vectorizers that create node and edge embeddings.
-
relation_stash:
Stash
¶ `~zensols.propbankdb.domain..Relation.
- Type:
Creates instances of role
- Type:
class
zensols.calamr.flow module¶
Provides container classes and computes statistics for graph alignments.
- class zensols.calamr.flow.Flow(source, target, edge)[source]¶
Bases:
Dictable
A triple of a source node, target node and connecting edge from the graph. The connecting edge has a flow value associated with it.
- __init__(source, target, edge)¶
- class zensols.calamr.flow.FlowDocumentGraph(graph, name, graph_attrib_context, doc, components, children=<factory>)[source]¶
Bases:
DocumentGraph
Contains all the flows of a
DocumentGraph
and hasFlowDocumentGraphComponent
as components. Instances of this document graph have no children.- __init__(graph, name, graph_attrib_context, doc, components, children=<factory>)¶
- class zensols.calamr.flow.FlowDocumentGraphComponent(graph, root_node, sent_index=<factory>, description=None, reentrancy_set=None)[source]¶
Bases:
DocumentGraphComponent
Contains all the flows of a
DocumentComponent
.- __init__(graph, root_node, sent_index=<factory>, description=None, reentrancy_set=None)¶
- reentrancy_set: ReentrancySet = None¶
Concept nodes with multiple parents.
- property result: FlowGraphComponentResult¶
The flow results for this component.
- property root_flow: Flow¶
The root flow of the document component, which has the component’s
DocumentGraphNode
as the source node and the sink as the target node.
- class zensols.calamr.flow.FlowGraphComponentResult(component)[source]¶
Bases:
Dictable
A container class for the flow data from a
DocumentComponent
flow instance (aka reverse flow graph). This includes the data as dictionaries of statistics,pandas.DataFrame
andDataDescriber
instances.- property connected_stats: Dict[str, int | float]¶
The statistics on how well the two graphs are aligned by counting as:
alignable
: the number of nodes that are eligible for having analignment (i.e. sentence, concept, and attribute notes)
aligned
: the number aligned nodes in theFlowDocumentGraphComponent
this instance holds
aligned_portion
: the quotient of $aligned / alignable$, which isa number between $[0, 1]$ representing a score of how well the two graphs match
- create_data_frame_describer()[source]¶
Like
create_align_df()
but includes a human readable description of the data.- Return type:
DataFrameDescriber
- property df: pd.DataFrame¶
The data in
flows
androot
as a dataframe. Note the terms source and target refer to the nodes at the ends of the directed edge in a reversed graph.s_descr
: source node descriptions such as concept names,attribute constants and sentence text
t_descr
: target node ofs_descr
s_toks
: any source node aligned tokenst_toks
: any target node aligned tokenss_attr
: source node attribute name give byGraphAttribute.attrib_type
, such asdoc
,sentence
,concept
,attribute
t_attr: target node of ``s_attr
s_id
: source nodeigraph
IDt_id
: target nodeigraph
IDedge_type
: whether the edge is an AMRrole
oralignment
rel_id
: the coreference relation ID ornull
if the edge isnot a corefernce
is_bipartite
: whether relationrel_id
spans components ornull
if the edge is not a coreference
flow
: the (normalized/flow per node) flow of the edgereentrancy
: whether the edge participates an AMR reentrancyalign_flow
: the flow sum of the alignment edges for therespective edge
align_count
: the count of incoming alignment edges to the targetnode in the
FlowDocumentGraphComponent
this instance holds
- property n_alignable_nodes: int¶
The number of nodes in the component that can take alignment edges. Whether those nodes in the count have edges does not effect the result.
- property stats: Dict[str, Any]¶
All statistics/scores available for this instances, which include:
root_flow
: the flow from the root node to the sinkconnected
:connected_stats
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.calamr.flow.FlowGraphResult(component_paths, context, data)[source]¶
Bases:
PersistableContainer
,Dictable
A container class for flow document results, which include the detailed data as dictionaries of statistics,
pandas.DataFrame
andDataDescriber
instances. This is aggregated fromdoc_graph
and the flow children’s flow graph components.All graphs (from nascent to the reversed flow children graphs) have the final state of the actions of the
DocumentGraphController
as coordinated by theGraphSequencer
. Since the flows are copied from the reversed source graph to the root level (doc_graph
) factory built nascent graph, all flows are the same. However, the nascent graph will still be the disconnect source and summary graphs.- __init__(component_paths, context, data)[source]¶
Initialize the flow results.
- Parameters:
data (
Union
[DocumentGraph
,ComponentAlignmentFailure
]) – the root nascentDocumentGraphFactory
build graph or an instance ofComponentAlignmentFailure
if the alignment failedcomponent_paths (
Tuple
[Tuple
[str
,str
],...
]) – a set of paths that indicate which flow components to use for the results in the form(<child name>, <component name>)
- create_data_describer()[source]¶
Like
create_align_df()
but includes a human readable description of the data.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- Return type:
DataDescriber
- property df: pd.DataFrame¶
A concatenation of frames created with
FlowDocumentGraphComponent.create_align_df()
with the name of each component.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- property doc_graph: DocumentGraph¶
The root nascent
DocumentGraphFactory
build graph.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- property failure: ComponentAlignmentFailure | None¶
What caused the alignment to fail, or
None
if it was a success.
- get_render_contexts(child_names=None, include_nascent=False)[source]¶
Get contexts used to render the graphs with
render.base.rendergroup
.- Parameters:
child_names (
Iterable
[str
]) – the name of theDocumentGraph.children
to render, which defaults the the nascent grah and the final bipartite graph rendered (“restore previous flow on source”)include_nascent (
bool
) – whether to include the nascent graphs
- Return type:
- render(contexts=None, graph_id='graph', display=True, directory=None)[source]¶
Render several graphs at a time, then optionally display them.
- Parameters:
contexts (
Tuple
[RenderContext
]) – the data to render, which defaults to the output ofget_render_contexts()
graph_id (
str
) – a unique identifier prefixed to files generated if none provided in the call methoddisplay (
bool
) – whether to display the files after generateddirectory (
Path
) – the directory to create the files in place of the temporary directory; if provided the directory is not removed after the graphs are rendered
- property stats: Dict[str, Any]¶
The statistics with keys as component names and values taken from
FlowDocumentGraphComponent.stats
.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=True)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
zensols.calamr.flowmeta module¶
Metadata for the flow
module.
zensols.calamr.morph module¶
Populate an igraph
from AMR graphs.
- class zensols.calamr.morph.IsomorphDocumentGraphDecorator(config_factory, graph_attrib_context_name)[source]¶
Bases:
DocumentGraphDecorator
Populates a
igraph.Graph
attributes from aDocumentGraph
data by adding AMR node and edge information.- __init__(config_factory, graph_attrib_context_name)¶
-
config_factory:
ConfigFactory
¶ The configuration factory used to create a
GraphAttributeContext
.
- decorate(comp)[source]¶
Creates the graph from a
DocumentNode
root node.- Parameters:
component – the graph to populate from the decorateing process
-
graph_attrib_context_name:
str
¶ The section name of the
GraphAttributeContext
context given to all nodees and edges of the graph.
zensols.calamr.proto module¶
Prototyping and cookbook.
zensols.calamr.reentrancy module¶
Reentrancy container classes.
- class zensols.calamr.reentrancy.EdgeFlow(edge, flow=None)[source]¶
Bases:
PersistableContainer
,Dictable
The flow over a graph edge. This keeps the flow of the edge as a “snapshot” of the value at a particular point in the algorithm, before it is modified to fix the issue.
- __init__(edge, flow=None)¶
- class zensols.calamr.reentrancy.Reentrancy(concept_node, concept_node_vertex, edge_flows)[source]¶
Bases:
PersistableContainer
,Dictable
Reentrancies are concept nodes with multiple parents (in the forward graph) and have side effects when running the algorithm.
Note: an AMR (always acyclic) graph with no reentrancies are trees.
- __init__(concept_node, concept_node_vertex, edge_flows)¶
-
concept_node:
ConceptGraphNode
¶ The concept node of the reentrancy
-
edge_flows:
Tuple
[EdgeFlow
]¶ The outgoing edges connected to the reentrant
concept_node
.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.calamr.reentrancy.ReentrancySet(reentrancies=())[source]¶
Bases:
PersistableContainer
,Dictable
A set of reentrancies, one for each iteration of the algorithm.
- __init__(reentrancies=())¶
- property by_vertex: Dict[int, Reentrancy]¶
-
reentrancies:
Tuple
[Reentrancy
] = ()¶ Concept nodes with multiple parents.
zensols.calamr.score module¶
Produces CALAMR scores.
- class zensols.calamr.score.CalamrScore(flow_graph_res)[source]¶
Bases:
Score
Contains all CALAMR scores.
- NAN_INSTANCE = CalamrScore()¶
- __init__(flow_graph_res)¶
-
flow_graph_res:
FlowGraphResult
¶
- class zensols.calamr.score.CalamrScoreMethod(reverse_sents=False, word_piece_doc_factory=None, doc_graph_factory=None, doc_graph_aligner=None)[source]¶
Bases:
ScoreMethod
Computes the smatch scores of AMR sentences. Sentence pairs are ordered
(<summary>, <source>)
.- __init__(reverse_sents=False, word_piece_doc_factory=None, doc_graph_factory=None, doc_graph_aligner=None)¶
-
doc_graph_aligner:
DocumentGraphAligner
= None¶ Create document graphs.
-
doc_graph_factory:
DocumentGraphFactory
= None¶ Create document graphs.
- score_annotated_doc(doc)[source]¶
Score a document that has an
amr
of typeAnnotatedAmrDocument
.- Raises:
[zensols.amr.domain.AmrError]: if the AMR could not be parsed or aligned
- Return type:
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory
= None¶ The feature document factory that populates embeddings.
zensols.calamr.stash module¶
Alignment dataframe stash.
- class zensols.calamr.stash.FlowGraphRestoreStash(delegate, flow_graph_result_context)[source]¶
Bases:
DelegateStash
,PrimeableStash
The a stash that restores transient data on
FlowGraphResult
instances.- __init__(delegate, flow_graph_result_context)¶
- exists(name)[source]¶
Return
True
if data with keyname
exists.Implementation note: This
Stash.exists()
method is very inefficient and should be overriden.- Return type:
-
flow_graph_result_context:
_FlowGraphResultContext
¶ Contains in memory/interperter session data needed by
FlowGraphResult
when it is created or unpickled.
- get(name, default=None)[source]¶
Load an object or a default if key
name
doesn’t exist.Implementation note: sub classes will probably want to override this method given the super method is cavalier about calling
exists:()
andload()
. Based on the implementation, this can be problematic.- Return type:
- class zensols.calamr.stash.FlowGraphResultFactoryStash(anon_doc_stash, doc_graph_aligner, doc_graph_factory, limit=9223372036854775807)[source]¶
Bases:
ReadOnlyStash
,PrimeableStash
A factory stash that creates aligned
FlowGraphResult
instances orComponentAlignmentFailure
when the document cannot be aligned.- __init__(anon_doc_stash, doc_graph_aligner, doc_graph_factory, limit=9223372036854775807)¶
-
doc_graph_aligner:
DocumentGraphAligner
¶ Create document graphs.
-
doc_graph_factory:
DocumentGraphFactory
¶ Create document graphs.
- exists(name)[source]¶
Return
True
if data with keyname
exists.Implementation note: This
Stash.exists()
method is very inefficient and should be overriden.- Return type: