zensols.calamr package¶
Subpackages¶
- zensols.calamr.render namespace
- zensols.calamr.summary namespace
- Submodules
- zensols.calamr.summary.alignconst module
ReverseFlowGraphAlignmentConstructorSharedGraphAlignmentConstructorSummaryGraphAlignmentConstructorSummaryGraphAlignmentConstructor.__init__()SummaryGraphAlignmentConstructor.build()SummaryGraphAlignmentConstructor.capacity_calculatorSummaryGraphAlignmentConstructor.component_alignment_capacitiesSummaryGraphAlignmentConstructor.connectionsSummaryGraphAlignmentConstructor.find_flow_diffs()SummaryGraphAlignmentConstructor.sourceSummaryGraphAlignmentConstructor.summary
- zensols.calamr.summary.capacity module
- zensols.calamr.summary.coref module
- zensols.calamr.summary.factory module
Submodules¶
zensols.calamr.adhoc module¶
Classes that aid in creating and aligning documents without a corpus.
- class zensols.calamr.adhoc.AdhocAmrDocumentPartStash(anon_doc_factory, corpus=None)[source]¶
Bases:
ReadOnlyStashA factory stash that creates
AmrDocumentinstances for each entry in atyping.Dictincorpus. This is used byAdhocAnnotatedAmrDocumentStashto roll all individual documents in a single document.- __init__(anon_doc_factory, corpus=None)¶
-
anon_doc_factory:
AnnotatedAmrFeatureDocumentFactory¶ Parses text data into AMR source and summary documents.
-
corpus:
Dict[str,Dict[str,str]] = None¶ Documents as sentences given with document ID mapped to
typing.Dictas described in thecorpusparameter inAdhocAnnotatedAmrDocumentStash.set_corpus().
- exists(did)[source]¶
Return
Trueif data with keynameexists.Implementation note: This
Stash.exists()method is very inefficient and should be overriden.- Return type:
- class zensols.calamr.adhoc.AdhocAnnotatedAmrDocumentStash(delegate, hasher, doc_part_stash, root_dir, swap_keys, swapper, amr_anon_doc_stash, _cache_dir=None)[source]¶
Bases:
PrimeableStash,DelegateStashA stash that generates and cachees instances of
AmrDocument. This rolls sentences from all documents given toset_corpus()into a single document, which is then used to create annotated AMR feature sentences.- __init__(delegate, hasher, doc_part_stash, root_dir, swap_keys, swapper, amr_anon_doc_stash, _cache_dir=None)¶
-
amr_anon_doc_stash:
PrimeableStash¶
-
doc_part_stash:
Stash¶ Leverages
AdhocAmrDocumentPartStash(in some object graph) to createAmrDocumentinstances. These, in turn, are used to create one document from all the sentences.
- restore()[source]¶
Restores the configuration that writes to the file system after the call to
set_corpus().
- set_corpus(corpus, corpus_id=None)[source]¶
Set the corpus documents that will be used for parsing and annotating. The data (
corpus) will immediately be parsed into AMRs in this call and the data that writes to the file system will be updated to point to a new.../adhocdirectory to not interfere with any corpus documents (seeConfigSwapper).To restore the configuration after adhoc document processing is finished, call
restore().- Parameters:
corpus (
Sequence[Dict[str,str]]) – the AMR summary documents, which is usually a sequence ofDictinstances (seeAnnotatedAmrFeatureDocumentFactoryfor data structure details)corpus_id (
str) – a unique identifier fordata, orNoneto use a hashed string, which in turn, is used as the directory name for the cached data
-
swap_keys:
Dict[str,List[str]]¶ Indicates the
(config section, attribute)to locate the persisted work to set with the cached document so the factory stash doesn’t.
-
swapper:
ConfigSwapper¶ Used to swap in the adhoc paths and document and then back out.
- class zensols.calamr.adhoc.ConfigSwapper(config_factory, root_dir, swaps)[source]¶
Bases:
DictableSwap file system paths and
PersistedWorkinstances. This used to temporarily cache files forAdhocAnnotatedAmrDocumentStash. All specified paths inswapare “redirected” to directories (with the same names as swapped path) stemming from parent/ancestory directoryroot_dir.First
swap()is called to replace data. Thenrestore()is called to restore the values of all data beforeswap()was called.- __init__(config_factory, root_dir, swaps)¶
-
config_factory:
ConfigFactory¶ Used to retrieve instances that will have data swapped.
-
swaps:
Sequence[Tuple[str,str,str]]¶ Target factory objects to swap data (see class docs). Each is a tuple of:
(config section, attribute, <path|persist>).The third string tells whether to treat the attribute as a path orPersistedWork. If the former, replace data will have a new path that starts atroot_dir. If the latter, a new uninitialized persisted work is swapped in.
zensols.calamr.alignconst module¶
Defines a class that aligns components of a (bipartite) graph.
- class zensols.calamr.alignconst.GraphAlignmentConstructor(doc_graph=None, source_flow=10000000000.0)[source]¶
Bases:
objectAdds additional nodes and edges that enable the maxflow algorithm to be used on the graph. This include component alignment edges, source node and sink node. Capacities for the component alignment edges are also set.
- __init__(doc_graph=None, source_flow=10000000000.0)¶
- add_edges(capacities, cls=<class 'zensols.calamr.attr.ComponentAlignmentGraphEdge'>)[source]¶
Add
capacitiesas graph capacities to the graph.
- property doc_graph: DocumentGraph¶
A document graph that contains the graph to be aligned.
- property sink_flow_node: Vertex¶
The sink flow node.
-
source_flow:
float= 10000000000.0¶ The capacity to use for the source node of the transporation graph.
- property source_flow_node: Vertex¶
The source flow node.
zensols.calamr.aligner module¶
Contains classes that run the algorithm to compute graph component alignments.
- class zensols.calamr.aligner.DocumentGraphAligner(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir)[source]¶
Bases:
ABCAligns the graph components of
doc_graphand visualizes them withrenderer.-
MAX_RENDER_LEVEL:
ClassVar[int] = 10¶ The maximum value for
render_level.
- __init__(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir)¶
- align(doc_graph)[source]¶
Align the graph components of
doc_graphand optionally visualize them withrenderer. To disable rendering, setrender_levelto 0.- Parameters:
doc_graph (
DocumentGraph) – the graph created by theDocumentGraphFactory- Return type:
- Returns:
the alignments available as the in memory graph and object graph, Pandas dataframes, statistics and scores
-
config_factory:
ConfigFactory¶ Used to create the
GraphSequencerinstance.
- create_error_result(ex, msg='Could not align')[source]¶
Create an error graph result (rather than an alignment result). This should be called in a try/catch to obtain the error information.
- Parameters:
ex (
Exception) – the exception that caused the issue- Param:
msg: the error message for the failure
- Return type:
-
doc_graph_name:
str¶ The
DocumentGraph.namedocument return fromalign().
-
flow_graph_result_name:
str¶ The app configuration section name of
FlowGraphResult.
-
init_loops_render_level:
int¶ The
render_levelto use for all iteration loops except for the last before the algorithm converges.
- classmethod is_valid_render_level(render_level, should_raise=False)[source]¶
Return whether
render_levelis a valid value forrender_level.- Return type:
-
output_dir:
Path¶ If this is set, the graphs are written to this created directory on the file system. Otherwise, they are displayed and cleaned up afterward.
-
render_level:
int¶ How many graphs to render on a scale from 0 - 10. The higher the number the more likely a graph is to be rendered. A value of 0 prevents rendering and a setting of 10 will render all graphs.
- See:
-
renderer:
GraphRenderer¶ Visually render the graph in to a human understandable presentation.
-
MAX_RENDER_LEVEL:
- class zensols.calamr.aligner.DocumentGraphController(name)[source]¶
Bases:
DictableExecutes the maxflow/min cut algorithm on a document graph.
- __init__(name)¶
- invoke(doc_graph)[source]¶
Perform operations on the graph algorithm.
- Parameters:
doc_graph (
DocumentGraph) – the graph to edit- Return type:
- Returns:
the number of edits made to the graph
- class zensols.calamr.aligner.GraphIteration(sequence, render_level, updates)[source]¶
Bases:
DictableAn iteration of the alignment algorithm.
- __init__(sequence, render_level, updates)¶
-
render_level:
int¶ Whether to render graphs on a scale from 0 - 10. The higher the number the more likely it is to be rendered with 0 never rendering the graph, and 10 always rendering the graph.
- reset()[source]¶
Reset all state in application context shared objects so new data is forced to be created on the next alignment request.
-
sequence:
GraphSequence¶ The sequence to use for this iteration.
- class zensols.calamr.aligner.GraphSequence(name, process_name, render_name, heading, controller, sequencer)[source]¶
Bases:
DictableA strategy GoF pattern that models what to do during a sequence of graph modifications using
DocumentGraphController. It also contains rendering information for visualization.- __init__(name, process_name, render_name, heading, controller, sequencer)¶
-
controller:
Optional[DocumentGraphController]¶ The controller used in the invocation of this strategy.
- invoke()[source]¶
Invoke the strategy. This implementation calls the controller with the
process_graphto be processed and passes back the update count.- Return type:
- populate_render_context(context)[source]¶
Alows the sequence to override the parameters before being sent to the graph rendinger API.
- property process_graph: DocumentGraph¶
The graph provided to the graph controller.
-
process_name:
str¶ The name of the graph provided to the graph controller. See
process_graph.
- property render_graph¶
The graph to render.
-
render_name:
str¶ The name of the graph to render. See
render_graph.
- reset()[source]¶
Reset all state in application context shared objects so new data is forced to be created on the next alignment request.
-
sequencer:
GraphSequencer¶ Owns and controls this instance.
- class zensols.calamr.aligner.GraphSequencer(config_factory, sequence_path, nascent_graph, render, descriptor=None, heading_format=None)[source]¶
Bases:
objectThis invokes the
GraphSequenceobjects in the provided sequence to automate the graph alignment algorithm and used byMaxflowDocumentGraphAligner.- __init__(config_factory, sequence_path, nascent_graph, render, descriptor=None, heading_format=None)[source]¶
Initialize this instance.
- Parameters:
config_factory (
ConfigFactory) – used to create the controller in the sequence instancessequence_path (
Path) – the path to the JSON file that has the sequences’ configurationnascent_graph (
DocumentGraph) – the initial disconnected graph created byDocumentGraphFactoryrender (
rendergroup) – the render object created bybase.rendergroup
- property render_level: int¶
Whether to render graphs on a scale from 0 - 10. See
DocumentGraphAligner.MAX_RENDER_LEVEL.
- class zensols.calamr.aligner.MaxflowDocumentGraphAligner(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir, graph_sequencer_name, max_sequencer_iterations, hyp)[source]¶
Bases:
DocumentGraphAlignerUses the maxflow/min cut algorithm to compute graph component alignments.
- __init__(config_factory, flow_graph_result_name, doc_graph_name, renderer, render_level, init_loops_render_level, output_dir, graph_sequencer_name, max_sequencer_iterations, hyp)¶
-
graph_sequencer_name:
str¶ The app configuration section name of
GraphSequencer.
-
hyp:
HyperparamModel¶ The capacity calculator hyperparameters.
- See:
summary.CapacityCalculator.hyp
- class zensols.calamr.aligner.RenderUpSideDownGraphSequence(name, process_name, render_name, heading, controller, sequencer)[source]¶
Bases:
GraphSequenceA graph sequence that tells
graphvizto render the diagram upside down, which is useful for reverse flow graphs.- __init__(name, process_name, render_name, heading, controller, sequencer)¶
zensols.calamr.annotate module¶
Contain a class to add embeddings to AMR feature documents.
- class zensols.calamr.annotate.AddEmbeddingsFeatureDocumentStash(delegate, word_piece_doc_factory=None)[source]¶
Bases:
DelegateStash,PrimeableStashAdd embeddings to AMR feature documents. Embedding population is disabled by configuring
word_piece_doc_factoryasNone.- __init__(delegate, word_piece_doc_factory=None)¶
- get(name, default=None)[source]¶
Load an object or a default if key
namedoesn’t exist.Implementation note: sub classes will probably want to override this method given the super method is cavalier about calling
exists:()andload(). Based on the implementation, this can be problematic.- Return type:
- load(name)[source]¶
Load a data value from the pickled data with key
name. Semantically, this method loads the using the stash’s implementation. For exampleDirectoryStashloads the data from a file if it exists, but factory type stashes will always re-generate the data.- See:
- Return type:
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory= None¶ The feature document factory that populates embeddings.
- class zensols.calamr.annotate.CalamrAnnotatedAmrFeatureDocumentFactory(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0, word_piece_doc_factory=None)[source]¶
Bases:
AnnotatedAmrFeatureDocumentFactoryAdds wordpiece embeddings to
AmrFeatureDocumentinstances.- __init__(doc_parser, remove_wiki_attribs=False, remove_alignments=False, doc_id=0, word_piece_doc_factory=None)¶
- to_annotated_doc(doc)[source]¶
Clone
doc.amrinto anAnnotatedAmrDocument.- Parameters:
sent – the document to convert to an
AnnotatedAmrDocument- Return type:
- Returns:
a feature document with a new
amrto newAnnotatedAmrDocument, which is a new instance ifsentisn’t an annotated AMR document
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory= None¶ The feature document factory that populates embeddings.
- class zensols.calamr.annotate.ProxyReportAnnotatedAmrDocument(sents, path=None, doc_id=None)[source]¶
Bases:
AnnotatedAmrDocumentOverrides the sections property to skip duplicate summary sentences also found in the body.
- __init__(sents, path=None, doc_id=None)¶
Initialize.
- Parameters:
sents (Tuple[AmrSentence, …]) – the document’s sentences
path (Optional[Path, …]) – the path to file containing the Penman notation sentence graphs used in
sentsmodel – the model to initailize
AmrSentencewhensentsis a list of string Penman graphs
- property sections: Tuple[AnnotatedAmrSectionDocument]¶
The sentences that make up the body of the document.
zensols.calamr.app module¶
Alignment entry point application.
- class zensols.calamr.app.AlignmentApplication(resources, config_factory)[source]¶
Bases:
_AlignmentBaseApplicationThis application aligns data in files.
- __init__(resources, config_factory)¶
- align(output_dir=PosixPath('-'), input_file=None, output_format=Format.csv, keys=None, render_level=None)[source]¶
Align the configured corpus or annotated documents from a JSON file
input_fileif given.
-
config_factory:
ConfigFactory¶ Application configuration factory.
- class zensols.calamr.app.CorpusApplication(resources, config_factory, results_dir)[source]¶
Bases:
_AlignmentBaseApplicationAMR graph aligment.
- __init__(resources, config_factory, results_dir)¶
-
config_factory:
ConfigFactory¶ For prototyping.
- dump_annotated(limit=None, output_dir=PosixPath('-'), output_format=Format.csv)[source]¶
Write annotated documents and their keys.
- get_annotated_summary(limit=None)[source]¶
Return a CSV file with a summary of the annotated AMR dataset.
-
results_dir:
Path¶ The directory where the output results are written, then read back for analysis reporting.
zensols.calamr.attr module¶
Graph node and edge domain classes.
Terminology: token alignments refer to the sentence index based token alignments to AMR nodes. This is not to be confused with alignment edges (aka graph component alignment edges).
- class zensols.calamr.attr.AmrDocumentNode(context, name, root, children, doc)[source]¶
Bases:
DocumentNodeA composite note containing a subset of the
DocumentNode.rootsentences. This includes the text, text features, and AMR Penman graph data.- __init__(context, name, root, children, doc)¶
- doc: AmrFeatureDocument¶
A document containing a subset of sentences that fall under this portion of the graph.
- class zensols.calamr.attr.AttributeGraphNode(context, sent, token_aligns, triple)[source]¶
Bases:
TripleGraphNodeAttribute data from AMR attribute nodes grafted on to the
igraph.Graph.- ATTRIB_TYPE: ClassVar[str] = 'attribute'¶
The attribute type this class represents.
- __init__(context, sent, token_aligns, triple)¶
- class zensols.calamr.attr.ComponentAlignmentGraphEdge(context, capacity=0, flow=0)[source]¶
Bases:
GraphEdgeAn edge that spans graph components.
- ATTRIB_TYPE: ClassVar[str] = 'component alignment'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0)¶
- class zensols.calamr.attr.ComponentCorefAlignmentGraphEdge(context, capacity=0, flow=0, relation=None, is_bipartite=False)[source]¶
Bases:
ComponentAlignmentGraphEdgeAn edge that spans graph components.
- ATTRIB_TYPE: ClassVar[str] = 'component coref alignment'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0, relation=None, is_bipartite=False)¶
- is_bipartite: bool = False¶
Whether the coreference spans components.
- relation: Relation = None¶
The AMR coreference relation between this node and all other refs.
- class zensols.calamr.attr.ConceptGraphNode(context, sent, token_aligns, triple)[source]¶
Bases:
TripleGraphNodeAttribute data from AMR concept nodes grafted on to the
igraph.Graph.- ATTRIB_TYPE: ClassVar[str] = 'concept'¶
The attribute type this class represents.
- __init__(context, sent, token_aligns, triple)¶
- property instance: str¶
The concept instance, such as the propbank entry (i.e. see-01). Other examples include nouns.
- property roleset: Roleset¶
- property roleset_embedding: Tensor¶
- property roleset_id: RolesetId¶
- class zensols.calamr.attr.DocumentGraphEdge(context, capacity=0, flow=0, relation='')[source]¶
Bases:
GraphEdgeAn edge that has data about the non-AMR parts of the graph, such as sentence.
- ATTRIB_TYPE: ClassVar[str] = 'doc'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0, relation='')¶
- relation: str = ''¶
The edge relation between two document nodes or document to igraph node.
- class zensols.calamr.attr.DocumentGraphNode(context, level_name, doc_node)[source]¶
Bases:
GraphNodeA node that has data about the non-AMR parts of the graph, such as the unifying top level node that ties the sentences together. However, it can contain the root to an AMR sentence (see
AmrDocumentNode).- ATTRIB_TYPE: ClassVar[str] = 'doc'¶
The attribute type this class represents.
- __init__(context, level_name, doc_node)¶
- doc_node: DocumentNode¶
The document node associated with the attached
igraphnode.
- level_name: str¶
The descriptive name of the node such as
docorsection.
- class zensols.calamr.attr.DocumentNode(context, name, root, children)[source]¶
Bases:
GraphNodeA composite of a node in the document tree that are associated with the
FeatureDocumentas root node. This class represents nodes in a graph that either:make up the part of the graph that’s disjoint from the AMR sentinel subgraphs (i.e. a root
docnode), orthe root to an AMR sentence (see
AmrDocumentNode)
The in-memory object graph of these instances are dependent on the type of data it represents. For example, the Proxy Report corpus has a top level a summary and body nodes with AMR sentences below (root on top).
- __init__(context, name, root, children)¶
- children: Tuple[DocumentNode, ...]¶
The children of this node with respect to the composite pattern.
- property children_by_name: DocumentNode¶
The children’s names as keys and respective document nodes as capacitys.
- name: str¶
The descriptive name of the node such as
docorsection.
- root: AmrFeatureDocument¶
The owning feature document containing all sentences/tokens of the graph.
- property sents: Tuple[AmrFeatureSentence]¶
The sentences of the this document level.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.attr.GraphEdge(context, capacity=0, flow=0)[source]¶
Bases:
GraphAttributeGraph attriubte data added to the
igraph.Graphedges.- MAX_CAPACITY: ClassVar[float] = 10000000000.0¶
Maximum value a capacity.
Implementation note: It seems
igraphcan only handle large values to represent infinity, and not floatinfor the system defined largest float value.
- __init__(context, capacity=0, flow=0)¶
- class zensols.calamr.attr.GraphNode(context)[source]¶
Bases:
GraphAttributeGraph attribute data added to the
igraph.Graphvertexes.- __init__(context)¶
- class zensols.calamr.attr.RoleGraphEdge(context, sent, token_aligns, capacity=0, flow=0, triple=None, role=None)[source]¶
Bases:
GraphEdge,SentenceGraphAttributeAttribute data from the AMR role edges grafted on to the
igraph.Graph.- ATTRIB_TYPE: ClassVar[str] = 'role'¶
The attribute type this class represents.
- __init__(context, sent, token_aligns, capacity=0, flow=0, triple=None, role=None)¶
- role: Union[str, Role] = None¶
The role name of the edge such as
:ARG0.
- triple: Union[Instance, Attribute] = None¶
The AMR Penman graph triple.
- class zensols.calamr.attr.SentenceGraphAttribute(context, sent, token_aligns)[source]¶
Bases:
GraphAttributeA node containing zero or more tokens with its parent sentence. Usually the AMR node represents a single token, but can have more than one token alignment.
- __init__(context, sent, token_aligns)¶
- sent: AmrFeatureSentence¶
The sentence from which this node was created.
- property token_align_str: str¶
A string representation of the AMR Penman representation of the token alignment.
- token_aligns: Tuple[Union[Alignment, RoleAlignment], ...]¶
The node to sentinel token index.
- property tokens: Tuple[FeatureToken, ...]¶
The tokens
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.attr.SentenceGraphEdge(context, capacity=0, flow=0, relation='', sent=None, sent_ix=None)[source]¶
Bases:
DocumentGraphEdgeAn edge from a document node to a
SentenceGraphNode.- ATTRIB_TYPE: ClassVar[str] = 'sentence'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0, relation='', sent=None, sent_ix=None)¶
- sent: AmrFeatureSentence = None¶
The sentence from which this node was created.
- sent_ix: int = None¶
The sentence index.
- class zensols.calamr.attr.SentenceGraphNode(context, sent, sent_ix)[source]¶
Bases:
GraphNodeA graph node containing the root of a sentence.
- ATTRIB_TYPE: ClassVar[str] = 'sentence'¶
The attribute type this class represents.
- SENT_TEXT_LEN: ClassVar[int] = 20¶
The truncated sentence length
- __init__(context, sent, sent_ix)¶
- sent: AmrFeatureSentence¶
The sentence from which this node was created.
- sent_ix: int¶
The sentence index.
- class zensols.calamr.attr.TerminalGraphEdge(context, capacity=0, flow=0)[source]¶
Bases:
GraphEdgeAn edge that connects to terminal a
TerminalGraphNode.- ATTRIB_TYPE: ClassVar[str] = 'terminal'¶
The attribute type this class represents.
- __init__(context, capacity=0, flow=0)¶
- class zensols.calamr.attr.TerminalGraphNode(context, is_source)[source]¶
Bases:
GraphNodeA flow control: source or sink.
- ATTRIB_TYPE: ClassVar[str] = 'control'¶
The attribute type this class represents.
- __init__(context, is_source)¶
- is_source: bool¶
Whether or not this source (
s) or sink (t).
- class zensols.calamr.attr.TripleGraphNode(context, sent, token_aligns, triple)[source]¶
Bases:
SentenceGraphAttribute,GraphNodeContains a Penman triple with token alignments used for concepts and AMR attributes. Instances of this class get their embedding via
SentenceGraphAttribute._get_embedding().- __init__(context, sent, token_aligns, triple)¶
- triple: Union[Instance, Attribute]¶
The AMR Penman graph triple.
zensols.calamr.cli module¶
Command line entry point to the application.
- class zensols.calamr.cli.ApplicationFactory(*args, **kwargs)[source]¶
Bases:
ApplicationFactory
zensols.calamr.comp module¶
Base graph component class.
- class zensols.calamr.comp.GraphComponent(graph)[source]¶
Bases:
PersistableContainer,WritableA container class for an
igraph.Graph, which also has caching data structures for fast access to graph attributes.-
GRAPH_ATTRIB_NAME:
ClassVar[str] = 'ga'¶ The name of the graph attributes on igraph nodes and edges.
- __init__(graph)¶
- property adjacency_list: List[List[int]]¶
“An adjacency list of vertexes based on their relation to each other in the graph. The outer list’s index is the source vertex and the inner list is that vertex’s neighbors.
Implementation note: the list is sub-setted at both the inner and outer level for those vertexes in this component.
- clone(reverse_edges=False, deep=True, **kwargs)[source]¶
Clone an instance and return it. The
graphis deep copied, but allGraphAttributeinstances are not.- Parameters:
reverse_edges (
bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithmdeep (
bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed ofvs; otherwise the graph is copied as a subcomponent starting fromrootkwargs – arguments to add to as attributes to the clone; include
clsis the type of the new instance
- Return type:
- Returns:
the cloned instance of this instance
- copy_graph(reverse_edges=False, subgraph_type=None)[source]¶
Return a copy of the class:igraph.Graph.
- Parameters:
- Return type:
Graph
- static create_edges(g, es)[source]¶
Add the edges of list
esand return them after being added.- Return type:
Tuple[Edge]
- static create_vertexes(g, n)[source]¶
Create
nvertexes and return them after being added.- Return type:
Tuple[Vertex]
- edge_by_graph_edge_id(ge_id)[source]¶
Return a edge based on the graph (attribute) edge id.
- Return type:
- property edges_reversed: bool¶
Whether the edge direction in the graph is reversed. This is
Truefor reverse flow graphs.- See:
summary.ReverseFlowGraphAlignmentConstructor
- get_attributes()[source]¶
Return all graph attributes of the component, which include instances of both
GraphNodeandGraphEdge.- Return type:
- property graph: Graph¶
The graph used for computational manipulation of the synthesized AMR sentences.
- invalidate()[source]¶
Clear cached data structures to force them to be recreated after igraph level data has changed.
- node_by_graph_node_id(gn_id)[source]¶
Return a node based on the graph (attribute) node id.
- Return type:
- property root: Vertex | None¶
The singular (first found) root of the graph, which is usually the top level
DocumentNodeinstance.
- property roots: Iterable[Vertex]¶
The roots of the graph, which are usually top level
DocumentNodeinstances.
- select_edges(**kwargs)[source]¶
Return matched graph edges from an
igraph.Graph.vs.select().- Return type:
Iterable[Edge]
- select_vertices(**kwargs)[source]¶
Return matched graph nodes from an
igraph.Graph.vs.select().- Return type:
Iterable[Vertex]
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writerusing indentiondepth.- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
-
GRAPH_ATTRIB_NAME:
zensols.calamr.ctrl module¶
Document graph controller implementations.
- class zensols.calamr.ctrl.AlignmentCapacitySetDocumentGraphController(name, min_capacity, capacity)[source]¶
Bases:
DocumentGraphControllerSet the capacity on edges if the criteria matches
min_flow,component_namesandmatch_edge_classes.- __init__(name, min_capacity, capacity)¶
- capacity: float¶
The capacity to set.
- class zensols.calamr.ctrl.ConstructDocumentGraphController(name, build_graph_name, constructor, renderer)[source]¶
Bases:
DocumentGraphControllerConstructs the graph that will later be used for the min cut/max flow algorithm (see
MaxflowDocumentGraphController). After itsinvoke()method is called,build_graphis available, which is the constructed graph provided byconstructor.- __init__(name, build_graph_name, constructor, renderer)¶
- build_graph_name: str¶
The name given to newly instances of
DocumentGraph.
- constructor: GraphAlignmentConstructor¶
The constructor used to get the source and sink nodes.
- renderer: GraphRenderer¶
Visually render the graph in to a human understandable presentation.
- class zensols.calamr.ctrl.FixReentrancyDocumentGraphController(name, component_name, maxflow_controller, only_report)[source]¶
Bases:
DocumentGraphControllerFix reentrancies by splitting the flow of the last calculated maxflow as the capacity of the outgoing edges in the reversed graph. This fixes the issue edges getting flow starved, then later eliminated in the graph reduction steps.
Subsequently, the maxflow algorithm is rerun if we have at least one reentrancy after reallocating the capacit(ies).
- __init__(name, component_name, maxflow_controller, only_report)¶
- component_name: str¶
The name of the components to restore.
- maxflow_controller: MaxflowDocumentGraphController¶
The maxflow component used to recalculate the maxflow .
- only_report: bool¶
Whether to only report reentrancies rather than fix them.
- class zensols.calamr.ctrl.FlowDiscountDocumentGraphController(name, discount_sum, component_names=<factory>)[source]¶
Bases:
DocumentGraphControllerDecrease/constrict the capacities by making the sum of the incoming flows from the bipartitie edges the value of
discount_sum. The capacities are only updated if the sum of the incoming bipartitie edges have a flow greater thandiscount_sum.- __init__(name, discount_sum, component_names=<factory>)¶
- component_names: Set[str]¶
The name of the components to discount.
- discount_sum: float¶
The capacity sum will be this value (see class docs).
- class zensols.calamr.ctrl.FlowSetDocumentGraphController(name, component_names=<factory>, match_edge_classes=<factory>, flow=0)[source]¶
Bases:
DocumentGraphControllerSet a static flow on components based on name and edges based on class.
- __init__(name, component_names=<factory>, match_edge_classes=<factory>, flow=0)¶
- component_names: Set[str]¶
The components on which to set the flow.
- flow: float = 0¶
The flow to set.
- match_edge_classes: Set[Type[GraphEdge]]¶
The edge classes (i.e.
TerminalGraphEdge) to set the flow.
- class zensols.calamr.ctrl.MaxflowDocumentGraphController(name, constructor)[source]¶
Bases:
DocumentGraphControllerExecutes the maxflow/min cut algorithm on a document graph.
- __init__(name, constructor)¶
- constructor: GraphAlignmentConstructor¶
The constructor used to get the source and sink nodes.
- class zensols.calamr.ctrl.NormFlowDocumentGraphController(name, component_names, constructor, normalize_mode='fpn')[source]¶
Bases:
DocumentGraphControllerNormalizes flow on edges as the flow going through the edge and the total number of descendants. Descendants are counted as the edge’s source node and all children/descendants of that node.
This is done recursively to calculate flow per node. For each call recursive iteration, it computes the flow per node of the parent edge(s) from the perspective of the nascent graph, (root at top with arrows pointed to children underneath). However, the graph this operates on are the reverese flow max flow graphs (flow diretion is taken care of adjacency list computed in
GraphComponent.Since an AMR node can have multiple parents, we keep track of descendants as a set rather than a count to avoid duplicate counts when nodes have more than one parent. Otherwise, in multiple parent case, duplicates would be counted when the path later converges closer to the root.
- __init__(name, component_names, constructor, normalize_mode='fpn')¶
- component_names: Set[str]¶
The name of the components to minimize.
- constructor: GraphAlignmentConstructor¶
The instance used to construct the graph passed in the
invoke()method.
- normalize_mode: str = 'fpn'¶
How to normalize nodes (if at all), which is one of:
fpn: leaves flow values as they were after the initial flow per nodecalculation
norm: normalize so all values add to onevis: same asnormbut add avis_flowattribute to the edgesso the original flow is displayed and visualized as the flow color
- class zensols.calamr.ctrl.RemoveAlignsDocumentGraphController(name, min_capacity)[source]¶
Bases:
DocumentGraphControllerRemoves graph component alignment for low capacity links.
- __init__(name, min_capacity)¶
- min_capacity: float¶
The graph component alignment edges are removed if their capacities are at or below this value.
- class zensols.calamr.ctrl.RoleCapacitySetDocumentGraphController(name, min_flow, capacity, component_names)[source]¶
Bases:
DocumentGraphControllerThis finds low flow role edges and sets (zeros out) all the capacities of all the connected edge alignments recursively for all descendants. We “slough off” entire subtrees (sometimes entire sentences or document nodes) for low flow ancestors.
- __init__(name, min_flow, capacity, component_names)¶
- capacity: float¶
The capacity (and flow) to set.
- component_names: Set[str]¶
The name of the components to minimize.
- class zensols.calamr.ctrl.SnapshotDocumentGraphController(name, component_names, snapshot_source)[source]¶
Bases:
DocumentGraphControllerRecord flows, then later restore. If
snapshot_sourceis notNone, then this instance restores from it. Otherwise it records.- __init__(name, component_names, snapshot_source)¶
- component_names: Set[str]¶
The name of the components on which to record or restore flows.
- snapshot_source: SnapshotDocumentGraphController¶
The source instance that contains the data from which to restore.
zensols.calamr.dcomp module¶
A document centric graph component.
- class zensols.calamr.dcomp.DocumentGraphComponent(graph, root_node, sent_index=<factory>, description=None)[source]¶
Bases:
GraphComponentA class containing the root information of the document tree and the
igraph.Graphvertex. When theigraph.Graphis set with thegraphproperty, a strongly connected subgraph component is induced. It does this by traversing all reachable verticies and edges from theroot. Examples of these induced components include source and summary components of a document AMR graph.Instances are created by
DocumentGraphFactory.- __init__(graph, root_node, sent_index=<factory>, description=None)¶
- clone(reverse_edges=False, deep=True, **kwargs)[source]¶
Clone an instance and return it. The
graphis deep copied, but allGraphAttributeinstances are not.- Parameters:
reverse_edges (
bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithmdeep (
bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed ofvs; otherwise the graph is copied as a subcomponent starting fromrootkwargs – arguments to add to as attributes to the clone; include
clsis the type of the new instance
- Return type:
- Returns:
the cloned instance of this instance
- property doc_vertices: Iterable[Vertex]¶
Get the vertices of
DocuemntGraphNode. This only fetches those document nodes that do not branch.
- get_attributes()[source]¶
Return all graph attributes of the component, which include instances of both
GraphNodeandGraphEdge.- Return type:
- property relation_set: RelationSet¶
The relations in the contained root node document.
- property root: Vertex | None¶
The roots of the graph, which are usually top level
DocumentNodeinstances.
-
root_node:
AmrDocumentNode¶ The root of the document tree.
-
sent_index:
SentenceIndex¶ An index of the sentences of a
DocumentGraphComponent.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writerusing indentiondepth.- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.dcomp.SentenceEntry(node=None, concepts=None, attributes=None)[source]¶
Bases:
DictableContains the sentence node of a sentence, and the respective concept and attribute nodes.
- __init__(node=None, concepts=None, attributes=None)¶
-
attributes:
Tuple[AttributeGraphNode] = None¶ The AMR attribute nodes of the sentence.
- property concept_by_variable: Dict[str, ConceptGraphNode]¶
-
concepts:
Tuple[ConceptGraphNode] = None¶ The AMR concept nodes of the sentence.
-
node:
SentenceGraphNode= None¶ The sentence node, which is the root of the sentence subgraph.
- class zensols.calamr.dcomp.SentenceIndex(entries=None)[source]¶
Bases:
DictableAn index of the sentences of a
DocumentGraphComponent.- __init__(entries=None)¶
- property by_sentence: Dict[AmrFeatureSentence, SentenceEntry]¶
-
entries:
Tuple[SentenceEntry] = None¶ Then entries of the index, each of which is a sentence.
zensols.calamr.doc module¶
Document based graph container, factory and strategy classes.
- class zensols.calamr.doc.DocumentGraph(graph, name, graph_attrib_context, doc, components, children=<factory>)[source]¶
Bases:
GraphComponentA graph containing the text, text features, AMR Penman graph and igraph.
This class roughly follows a GoF composite pattern with
childrena collection of instance of this class, which are the reversed source and summary graphs created for the max flow algorithm. The root is constructed from theDocumentGraphFactoryclass and the children are built by theDocumentGraphControllerinstances.The children of this composite are not to be confused with
components, which are the disconnected source and summary graph components in the root graph instance. Each child also has the reversed flow graphs, but are connected as a bipartite flow graph for use by the max flow algorithm.- __init__(graph, name, graph_attrib_context, doc, components, children=<factory>)¶
- property bipartite_relation_set: RelationSet¶
The bipartite relations that span components. This set includes all top level relations that are not self contained in any components.
-
children:
Dict[str,DocumentGraph]¶ The children of this instance, which for now, are only instances of
FlowDocumentGraph.
- clone(reverse_edges=False, deep=True, **kwargs)[source]¶
Clone an instance and return it. The
graphis deep copied, but allGraphAttributeinstances are not.- Parameters:
reverse_edges (
bool) – whether to reverse the direction of edges in the directed graph, which is used to create the reverse flow graphs used in the maxflow algorithmdeep (
bool) – whether to create a deep clone, which detaches a component from any bipartite graph by keeping only the graph composed ofvs; otherwise the graph is copied as a subcomponent starting fromrootkwargs – arguments to add to as attributes to the clone; include
clsis the type of the new instance
- Return type:
- Returns:
the cloned instance of this instance
- component_iter()[source]¶
Return an iterable of the components of this graph and recursively over the children.
- Return type:
-
components:
Tuple[DocumentGraphComponent,...]¶ The roots of the trees created by the
DocumentGraphFactory.
- property components_by_name: Dict[str, DocumentGraphComponent]¶
Get document graph components by name.
- property components_by_name_sorted: Tuple[Tuple[str, DocumentGraphComponent], ...]¶
Get document graph components sorted name.
-
doc:
AmrFeatureDocument¶ The document that represents the graph.
- property graph_attrib_context: GraphAttributeContext¶
The context given to all nodees and edges of the graph.
-
name:
str¶ The name of the graph used to identify it. For now, this is only
reversed_sourcefor the graph that flows from the summary to the source, andreversed_summaryfor the graph that flows from the source to the summary. These are “reversed” because the flow is reversed from the leaf nodes to the root.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writerusing indentiondepth.- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.doc.DocumentGraphDecorator[source]¶
Bases:
ABCA strategy to create a graph from a document structure.
- __init__()¶
- abstract decorate(component)[source]¶
Creates the graph from a
DocumentNoderoot node.- Parameters:
component (
DocumentGraphComponent) – the graph to populate from the decorateing process
- class zensols.calamr.doc.DocumentGraphFactory(config_factory, graph_decorators, doc_graph_section_name, graph_attrib_context)[source]¶
Bases:
ABCCreates a document graph. After the document portion of the graph is created, the igraph is built and merged using a
DocumentGraphDecorator. This igraph has the corresponding vertexes and edges associated with the document graph, which includes AMR Penman graph and feature document artifacts.- __init__(config_factory, graph_decorators, doc_graph_section_name, graph_attrib_context)¶
-
config_factory:
ConfigFactory¶ Used to create a
DocumentGraphDecorator.
- create(root)[source]¶
Create a document graph and return it starting from the root note. See class docs.
- Parameters:
root (
AmrFeatureDocument) – the feature document from which to create the graph- Return type:
-
doc_graph_section_name:
str¶ The name of a section in the configuration that defines new instances of
DocumentGraph.
-
graph_attrib_context:
GraphAttributeContext¶ The context given to all nodees and edges of the graph.
-
graph_decorators:
Tuple[DocumentGraphDecorator,...]¶ The name of the section that defines a
DocumentGraphDecoratorinstance.
zensols.calamr.domain module¶
Classes that organize document in content in to a hierarchy.
Terminology: token alignments refer to the sentence index based token alignments to AMR nodes. This is not to be confused with alignment edges (aka graph component alignment edges).
- exception zensols.calamr.domain.ComponentAlignmentError(msg, sent=None)[source]¶
Bases:
AmrErrorPackage level errors.
- __module__ = 'zensols.calamr.domain'¶
- class zensols.calamr.domain.ComponentAlignmentFailure(exception=None, thrower=None, traceback=None, message=None)[source]¶
Bases:
FailurePackage level failures.
- __init__(exception=None, thrower=None, traceback=None, message=None)¶
- class zensols.calamr.domain.EmbeddingResource(torch_config, word_piece_doc_parser=None, word_piece_doc_factory=None, roleset_stash=None)[source]¶
Bases:
objectGenerates embeddings for roles, role sets, text, and feature tokens.
- __init__(torch_config, word_piece_doc_parser=None, word_piece_doc_factory=None, roleset_stash=None)¶
- get_role_embedding(role)[source]¶
Return an embedding for a role. This uses the role’s relation’s embedding if available. Otherwise, it uses the embedding created fromi the role’s prefix.
- Return type:
Tensor
- get_sentence_tokens_embedding(sent)[source]¶
Return the sentence embeddings of
sent.- Return type:
Tensor
- get_token_embedding(text)[source]¶
Return the mean of the token embeddings of
text.- Return type:
Tensor
- get_tokens_embedding(tokens)[source]¶
Return the mean of the embeddings of
tokens.- Return type:
Tensor
- get_word_piece_document(text)[source]¶
Return a word piece document parsed from
text.- Return type:
WordPieceFeatureDocument
-
torch_config:
TorchConfig¶ Used to create
unknown_edge_embedding
- property unknown_edge_embedding: Tensor¶
A zero embedding.
- property unknown_node_embedding: Tensor¶
A zero embedding.
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory= None¶ Creates word piece data structures that have embeddings.
-
word_piece_doc_parser:
FeatureDocumentParser= None¶ Used to get single token embeddings for nodes with no token alignments.
- class zensols.calamr.domain.GraphAttribute(context)[source]¶
Bases:
PersistableContainer,DictableContains AMR document attribute data added to the
igraph.Graph. This is added as vertexes or edge attribute data.- ATTRIB_TYPE: ClassVar[str] = 'base'¶
The attribute type this class represents.
- __init__(context)¶
- context: GraphAttributeContext¶
Contains context data used by nodes and edges of the graph.
- property description: str¶
A human readable description that is usually used as the label and
__str__().
- property embedding: Tensor¶
The default embedding of the attribute. Note that some attributes have several different embeddings.
- property embedding_resource: EmbeddingResource¶
Generates embeddings for roles, role sets, text, and feature tokens.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.domain.GraphAttributeContext(embedding_resource, relation_stash, default_capacity, sink_capacity, component_alignment_capacity, doc_capacity, similarity_threshold, default_format_strlen)[source]¶
Bases:
DictableContains context data used by nodes and edges of the graph.
- __init__(embedding_resource, relation_stash, default_capacity, sink_capacity, component_alignment_capacity, doc_capacity, similarity_threshold, default_format_strlen)¶
-
component_alignment_capacity:
float¶ The default initial capacity for source/summary component alignment edges.
-
doc_capacity:
float¶ The bipartitie (between source and summary) capacity value of
DocumentGraphNode.
-
embedding_resource:
EmbeddingResource¶ The manager that contains vectorizers that create node and edge embeddings.
zensols.calamr.flow module¶
Provides container classes and computes statistics for graph alignments.
- class zensols.calamr.flow.Flow(source, target, edge)[source]¶
Bases:
DictableA triple of a source node, target node and connecting edge from the graph. The connecting edge has a flow value associated with it.
- __init__(source, target, edge)¶
- class zensols.calamr.flow.FlowDocumentGraph(graph, name, graph_attrib_context, doc, components, children=<factory>)[source]¶
Bases:
DocumentGraphContains all the flows of a
DocumentGraphand hasFlowDocumentGraphComponentas components. Instances of this document graph have no children.- __init__(graph, name, graph_attrib_context, doc, components, children=<factory>)¶
- class zensols.calamr.flow.FlowDocumentGraphComponent(graph, root_node, sent_index=<factory>, description=None, reentrancy_set=None)[source]¶
Bases:
DocumentGraphComponentContains all the flows of a
DocumentComponent.- __init__(graph, root_node, sent_index=<factory>, description=None, reentrancy_set=None)¶
- reentrancy_set: ReentrancySet = None¶
Concept nodes with multiple parents.
- property result: FlowGraphComponentResult¶
The flow results for this component.
- property root_flow: Flow¶
The root flow of the document component, which has the component’s
DocumentGraphNodeas the source node and the sink as the target node.
- class zensols.calamr.flow.FlowGraphComponentResult(component)[source]¶
Bases:
DictableA container class for the flow data from a
DocumentComponentflow instance (aka reverse flow graph). This includes the data as dictionaries of statistics,pandas.DataFrameandDataDescriberinstances.- property connected_stats: Dict[str, int | float]¶
The statistics on how well the two graphs are aligned by counting as:
alignable: the number of nodes that are eligible for having analignment (i.e. sentence, concept, and attribute notes)
aligned: the number aligned nodes in theFlowDocumentGraphComponentthis instance holds
aligned_portion: the quotient of $aligned / alignable$, which isa number between $[0, 1]$ representing a score of how well the two graphs match
- create_data_frame_describer()[source]¶
Like
create_align_df()but includes a human readable description of the data.- Return type:
DataFrameDescriber
- property df: pd.DataFrame¶
The data in
flowsandrootas a dataframe. Note the terms source and target refer to the nodes at the ends of the directed edge in a reversed graph.s_descr: source node descriptions such as concept names,attribute constants and sentence text
t_descr: target node ofs_descrs_toks: any source node aligned tokenst_toks: any target node aligned tokenss_attr: source node attribute name give byGraphAttribute.attrib_type, such asdoc,sentence,concept,attribute
t_attr: target node of ``s_attrs_id: source nodeigraphIDt_id: target nodeigraphIDedge_type: whether the edge is an AMRroleoralignmentrel_id: the coreference relation ID ornullif the edge isnot a corefernce
is_bipartite: whether relationrel_idspans components ornullif the edge is not a coreference
flow: the (normalized/flow per node) flow of the edgereentrancy: whether the edge participates an AMR reentrancyalign_flow: the flow sum of the alignment edges for therespective edge
align_count: the count of incoming alignment edges to the targetnode in the
FlowDocumentGraphComponentthis instance holds
- property n_alignable_nodes: int¶
The number of nodes in the component that can take alignment edges. Whether those nodes in the count have edges does not effect the result.
- property stats: Dict[str, Any]¶
All statistics/scores available for this instances, which include:
root_flow: the flow from the root node to the sinkconnected:connected_stats
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.flow.FlowGraphResult(component_paths, context, data)[source]¶
Bases:
PersistableContainer,DictableA container class for flow document results, which include the detailed data as dictionaries of statistics,
pandas.DataFrameandDataDescriberinstances. This is aggregated fromdoc_graphand the flow children’s flow graph components.All graphs (from nascent to the reversed flow children graphs) have the final state of the actions of the
DocumentGraphControlleras coordinated by theGraphSequencer. Since the flows are copied from the reversed source graph to the root level (doc_graph) factory built nascent graph, all flows are the same. However, the nascent graph will still be the disconnect source and summary graphs.- __init__(component_paths, context, data)[source]¶
Initialize the flow results.
- Parameters:
data (
Union[DocumentGraph,ComponentAlignmentFailure]) – the root nascentDocumentGraphFactorybuild graph or an instance ofComponentAlignmentFailureif the alignment failedcomponent_paths (
Tuple[Tuple[str,str],...]) – a set of paths that indicate which flow components to use for the results in the form(<child name>, <component name>)
- create_data_describer()[source]¶
Like
create_align_df()but includes a human readable description of the data.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- Return type:
DataDescriber
- property df: pd.DataFrame¶
A concatenation of frames created with
FlowDocumentGraphComponent.create_align_df()with the name of each component.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- property doc_graph: DocumentGraph¶
The root nascent
DocumentGraphFactorybuild graph.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- property failure: ComponentAlignmentFailure | None¶
What caused the alignment to fail, or
Noneif it was a success.
- get_render_contexts(child_names=None, include_nascent=False)[source]¶
Get contexts used to render the graphs with
render.base.rendergroup.- Parameters:
child_names (
Iterable[str]) – the name of theDocumentGraph.childrento render, which defaults the nascent graph and the final bipartite graph rendered (“restore previous flow on source”)include_nascent (
bool) – whether to include the nascent graphs
- Return type:
- reduce(child_name=None, component_name=None, prune=True)[source]¶
Deletes flow graph terminals and optionally prunes. To computed the reduced graph and add it back to this flow and render it, use:
from zensols.calamr import FlowGraphResult, Resources, ApplicationFactory resources: Resources = ApplicationFactory.get_resources() with resources.corpus() as r: flow: FlowGraphResult = r.alignments['some_key'] flow.doc_graph.children['reduced'] = flow.reduce() flow.render(flow.get_render_contexts(['reduced']))
- Parameters:
- Return type:
- Returns:
a new cloned graph without terminals and optionally pruned (see
prune)
- render(contexts=None, graph_id='graph', display=True, directory=None)[source]¶
Render several graphs at a time, then optionally display them.
- Parameters:
contexts (
Tuple[RenderContext,...]) – the data to render, which defaults to the output ofget_render_contexts()graph_id (
str) – a unique identifier prefixed to files generated if none provided in the call methoddisplay (
bool) – whether to display the files after generateddirectory (
Path) – the directory to create the files in place of the temporary directory; if provided the directory is not removed after the graphs are rendered
- property stats: Dict[str, Any]¶
The statistics with keys as component names and values taken from
FlowDocumentGraphComponent.stats.- Throws ComponentAlignmentError:
if this instanced resulted in an error
- See:
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=True)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
zensols.calamr.flowmeta module¶
Metadata for the flow module.
zensols.calamr.morph module¶
Populate an igraph from AMR graphs.
- class zensols.calamr.morph.IsomorphDocumentGraphDecorator(config_factory, graph_attrib_context_name)[source]¶
Bases:
DocumentGraphDecoratorPopulates a
igraph.Graphattributes from aDocumentGraphdata by adding AMR node and edge information.- __init__(config_factory, graph_attrib_context_name)¶
-
config_factory:
ConfigFactory¶ The configuration factory used to create a
GraphAttributeContext.
- decorate(comp)[source]¶
Creates the graph from a
DocumentNoderoot node.- Parameters:
component – the graph to populate from the decorateing process
-
graph_attrib_context_name:
str¶ The section name of the
GraphAttributeContextcontext given to all nodees and edges of the graph.
zensols.calamr.multi module¶
Frankenstein PyTorch multiprocessing multi-stashes.
- class zensols.calamr.multi.TorchMultiProcessFactoryStash(config, name, factory, enable_preemptive=False, **kwargs)[source]¶
Bases:
MultiProcessFactoryStash
zensols.calamr.proto module¶
Prototyping and cookbook.
zensols.calamr.reentrancy module¶
Reentrancy container classes.
- class zensols.calamr.reentrancy.EdgeFlow(edge, flow=None)[source]¶
Bases:
PersistableContainer,DictableThe flow over a graph edge. This keeps the flow of the edge as a “snapshot” of the value at a particular point in the algorithm, before it is modified to fix the issue.
- __init__(edge, flow=None)¶
- class zensols.calamr.reentrancy.Reentrancy(concept_node, concept_node_vertex, edge_flows)[source]¶
Bases:
PersistableContainer,DictableReentrancies are concept nodes with multiple parents (in the forward graph) and have side effects when running the algorithm.
Note: an AMR (always acyclic) graph with no reentrancies are trees.
- __init__(concept_node, concept_node_vertex, edge_flows)¶
-
concept_node:
ConceptGraphNode¶ The concept node of the reentrancy
-
edge_flows:
Tuple[EdgeFlow]¶ The outgoing edges connected to the reentrant
concept_node.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.calamr.reentrancy.ReentrancySet(reentrancies=())[source]¶
Bases:
PersistableContainer,DictableA set of reentrancies, one for each iteration of the algorithm.
- __init__(reentrancies=())¶
- property by_vertex: Dict[int, Reentrancy]¶
-
reentrancies:
Tuple[Reentrancy] = ()¶ Concept nodes with multiple parents.
zensols.calamr.resource module¶
Client facade access to annotated AMR documents and alignment.
- class zensols.calamr.resource.Resource(documents, alignments, _resources)[source]¶
Bases:
objectContains objects that parse AMR annotated documents and align them. Instance of this class are created with
Resources.- __init__(documents, alignments, _resources)¶
- align(doc, render_level=None, directory=None)[source]¶
Align a document, which allows for the rendering of a document since it does not use the cached results from
alignments.- Parameters:
doc (
Union[AmrFeatureDocument,str]) – either the unique document ID or a unique document ID that indicates which document to alignrender_level (
int) – how many graphs to render (0 - 10), higher means moredirectory (
Path) – the output directory
- Return type:
-
alignments:
Stash¶ A stash (
dictlike) collection with AMR doc IDs keys toFlowGraphResultvalues.
-
documents:
Stash¶ A stash (
dictlike) collection with AMR doc IDs keys toAmrFeatureDocumentvalues.
- class zensols.calamr.resource.Resources(serialized_factory, doc_graph_factory, doc_graph_aligner, _anon_doc_stash, _adhoc_doc_stash, _flow_results_stash)[source]¶
Bases:
objectA client facade (GoF) for Calamr annotated AMR corpus access and alginment. This object is used as a context manager.
The
corpus()andadhoc()methods provide access to documents and alignments. Use the stashes provided by those methods to clear respective cached data.- __init__(serialized_factory, doc_graph_factory, doc_graph_aligner, _anon_doc_stash, _adhoc_doc_stash, _flow_results_stash)¶
- adhoc(corpus=None, corpus_id=None, clear=False)[source]¶
Return a context manager for parsing and aligning adhoc documents. This sets the corpus documents that will be used for parsing and annotating. The data will immediately be parsed into AMRs in this call and the data that writes to the file system will be updated to point to a new
.../adhocdirectory to not interfere with any corpus documents.The
datainput can be a file name that contains parsed parenthetical AMRs, a single document, or a sequence of documents. The keys of each dictionary are the case-insensitive enumeration values ofSentenceType. Keysidandcommentare the unique document identifier and a comment that is added to the AMR sentence metadata.The following example JSON creates a document with ID
ex1, acommentmetadata, oneSUMMARYand twoBODYsentences:corpus = [{ "id": "ex1", "comment": "very short", "body": "The man ran to make the train. He just missed it.", "summary": "A man got caught in a train he just missed." }]
This source / summary text can then be AMR parsed, aligned, and rendered with:
from zensols.calamr import Resources, ApplicationFactory resources: Resources = ApplicationFactory.get_resources() with resources.adhoc(corpus, clear=True) as r: # render an aligned document r.alignments['some_key'].render()
The
clear=Truemeans to delete all cached files generated in the block.Either
corpusand/orcorpus_idmust be given. Ifcorpusis not given butcorpus_idis, it will assume there is an existing set of data files to use for accessing.- Parameters:
corpus (
Sequence[Dict[str,str]]) – the AMR summary documents, which is usually a sequence ofDictinstances (seeAnnotatedAmrFeatureDocumentFactoryfor data structure details)corpus_id (
str) – a unique identifier fordata, orNoneto use a hashed string, which in turn, is used as the directory name for the cached dataclear (
bool) – whether or not to deleted the cached files (parsed documents, aligned graphs etc) after leaving the lexical boundaries of the context manager
- Return type:
- corpus()[source]¶
Return a context manager for corpus access. A corpus must be created before using this method, which amounts to using an AMR parser to create the parenthetical text files. These files are then made available as resource to be downloaded or available on the file system.
Example:
from zensols.calamr import Resources, ApplicationFactory resources: Resources = ApplicationFactory.get_resources() with resources.corpus() as r: # print the keys of the annotated AMR documents print(tuple(r.documents.keys())) # determine if a document is in the stash print('some_key' in r.documents) # write an AMR document r.documents['some_key'].write()
- Return type:
-
doc_graph_aligner:
DocumentGraphAligner¶ Align document graphs.
-
doc_graph_factory:
DocumentGraphFactory¶ Create document graphs.
- restore(res)[source]¶
Restore the information on a flow graph result needed to render it. Without out it,
FlowGraphResult.render()will raise errors. This is only needed when unpickling aFlowGraphResult.- Parameters:
res (
FlowGraphResult) – to instance to have additional context information set
-
serialized_factory:
AmrSerializedFactory¶ Creates a
SerializedfromAmrDocument,AmrSentenceorAnnotatedAmrDocument.
zensols.calamr.score module¶
Produces CALAMR scores.
- class zensols.calamr.score.CalamrScore(flow_graph_res)[source]¶
Bases:
ScoreContains all CALAMR scores.
- NAN_INSTANCE = CalamrScore()¶
- __init__(flow_graph_res)¶
-
flow_graph_res:
FlowGraphResult¶
- class zensols.calamr.score.CalamrScoreMethod(reverse_sents=False, word_piece_doc_factory=None, doc_graph_factory=None, doc_graph_aligner=None)[source]¶
Bases:
ScoreMethodComputes the smatch scores of AMR sentences. Sentence pairs are ordered
(<summary>, <source>).- __init__(reverse_sents=False, word_piece_doc_factory=None, doc_graph_factory=None, doc_graph_aligner=None)¶
-
doc_graph_aligner:
DocumentGraphAligner= None¶ Create document graphs.
-
doc_graph_factory:
DocumentGraphFactory= None¶ Create document graphs.
- score_annotated_doc(doc)[source]¶
Score a document that has an
amrof typeAnnotatedAmrDocument.- Raises:
[zensols.amr.domain.AmrError]: if the AMR could not be parsed or aligned
- Return type:
-
word_piece_doc_factory:
WordPieceFeatureDocumentFactory= None¶ The feature document factory that populates embeddings.
zensols.calamr.stash module¶
Alignment dataframe stash.
- class zensols.calamr.stash.FlowGraphRestoreStash(delegate, flow_graph_result_context)[source]¶
Bases:
DelegateStash,PrimeableStashThe a stash that restores transient data on
FlowGraphResultinstances.- __init__(delegate, flow_graph_result_context)¶
- exists(name)[source]¶
Return
Trueif data with keynameexists.Implementation note: This
Stash.exists()method is very inefficient and should be overriden.- Return type:
-
flow_graph_result_context:
_FlowGraphResultContext¶ Contains in memory/interperter session data needed by
FlowGraphResultwhen it is created or unpickled.
- get(name, default=None)[source]¶
Load an object or a default if key
namedoesn’t exist.Implementation note: sub classes will probably want to override this method given the super method is cavalier about calling
exists:()andload(). Based on the implementation, this can be problematic.- Return type:
- class zensols.calamr.stash.FlowGraphResultFactoryStash(anon_doc_stash, doc_graph_aligner, doc_graph_factory)[source]¶
Bases:
ReadOnlyStash,PrimeableStashA factory stash that creates aligned
FlowGraphResultinstances orComponentAlignmentFailurewhen the document cannot be aligned.- __init__(anon_doc_stash, doc_graph_aligner, doc_graph_factory)¶
-
doc_graph_aligner:
DocumentGraphAligner¶ Create document graphs.
-
doc_graph_factory:
DocumentGraphFactory¶ Create document graphs.
- exists(name)[source]¶
Return
Trueif data with keynameexists.Implementation note: This
Stash.exists()method is very inefficient and should be overriden.- Return type: