zensols.datdesc package¶
Submodules¶
zensols.datdesc.app module¶
Generate LaTeX tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file. Paraemters are both files or both directories. When using directories, only files that match *-table.yml are considered.
- class zensols.datdesc.app.Application(config_factory, table_factory_name, figure_factory_name, data_file_regex=re.compile('^.+-table\\\\.yml$'), serial_file_regex=re.compile('^.+-table\\\\.json$'), figure_file_regex=re.compile('^.+-figure\\\\.yml$'), hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None)[source]¶
Bases:
objectGenerate LaTeX tables files from CSV files and hyperparameter .sty files.
- __init__(config_factory, table_factory_name, figure_factory_name, data_file_regex=re.compile('^.+-table\\\\.yml$'), serial_file_regex=re.compile('^.+-table\\\\.json$'), figure_file_regex=re.compile('^.+-figure\\\\.yml$'), hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None)¶
-
config_factory:
ConfigFactory¶ Creates table and figure factories.
- property figure_factory: FigureFactory¶
Reads the figure definitions file and writes
epsfigures..
-
figure_factory_name:
str¶ The section name of the figure factory (see
figure_factory).
-
figure_file_regex:
Pattern= re.compile('^.+-figure\\.yml$')¶ Matches file names of figure definitions.
- generate_hyperparam(input_path, output_path, output_format=_OutputFormat.short)[source]¶
Write hyperparameter formatted data.
-
hyperparam_file_regex:
Pattern= re.compile('^.+-hyperparam\\.yml$')¶ Matches file names of tables process in the LaTeX output.
- list_figures(input_path)[source]¶
Generate figures.
- Parameters:
input_path (
Path) – YAML definitions or JSON serialized fileoutput_path – output file or directory
output_image_format – the output format (defaults to
svg)
-
serial_file_regex:
Pattern= re.compile('^.+-table\\.json$')¶ Matches file names of serialized dataframe.
- show_table(name=None)[source]¶
Print a list of example LaTeX tables.
- Parameters:
name (
str) – the name of the example table or a listing of tables if omitted
- property table_factory: TableFactory¶
Reads the table definitions file and writes a Latex .sty file of the generated tables from the CSV data.
-
table_factory_name:
str¶ The section name of the table factory (see
table_factory).
zensols.datdesc.cli module¶
Command line entry point to the application.
- class zensols.datdesc.cli.ApplicationFactory(*args, **kwargs)[source]¶
Bases:
ApplicationFactory
zensols.datdesc.desc module¶
Metadata container classes.
- class zensols.datdesc.desc.DataDescriber(describers, name='default', mangle_sheet_name=False)[source]¶
Bases:
PersistableContainer,DictableContainer class for
DataFrameDescriberinstances. It also saves their instances as CSV data files and YAML configuration files.- __init__(describers, name='default', mangle_sheet_name=False)¶
- add_summary()[source]¶
Add a new metadata like
DataFrameDescriberas a first entry indescribersthat describes what data this instance currently has.- Return type:
- Returns:
the added metadata
DataFrameDescriberinstance
- derive(**kwargs)[source]¶
Create a new instance based on this instance and replace any non-
Nonekwargs.- Parameters:
kwargs – the key word arguments to replace any field data from this instance
- Return type:
- Returns:
a new instance with replaced data, or a clone if called with no key word arguments
- derive_with_index_meta(index_format=None)[source]¶
Applies
DataFrameDescriber.derive_with_index_meta()to each element ofdescribers.- Return type:
-
describers:
Tuple[DataFrameDescriber,...]¶ The contained dataframe and metadata.
- property describers_by_name: Dict[str, DataFrameDescriber]¶
Data frame describers keyed by the describer name.
- classmethod from_describer(dfd)[source]¶
Create a singleton describer. The
nameis taken from thedfdDataFrameDescriber.name.- Return type:
- classmethod from_json(reader)[source]¶
Unserialize a JSON stream into a data descriptor.
- Parameters:
reader (
TextIOWrapper) – the file / data stream- Return type:
- classmethod from_yaml_file(path)[source]¶
Create a data descriptor from a previously written YAML/CSV files using
save().- See:
- See:
- Return type:
-
mangle_sheet_name:
bool= False¶ Whether to normalize the Excel sheet names when
xlsxwriter.exceptions.InvalidWorksheetNameis raised.
- save(csv_dir=None, yaml_dir=None, excel_path=None)[source]¶
Save both the CSV and YAML configuration file.
- Parameters:
- See:
- See:
- Return type:
- save_yaml(csv_dir, yaml_dir)[source]¶
Save all provided dataframe describers YAML files used by the
datdesccommand.
- to_json(writer)[source]¶
Serialize the object to JSON that can be re-instantiated using
from_json().- Parameters:
writer (
TextIOBase) – the data sinkkwargs – the key word arguments to give to
json.dump()
- class zensols.datdesc.desc.DataFrameDescriber(name, df, desc, head=None, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None, mangle_file_names=False)[source]¶
Bases:
PersistableContainer,DictableA class that contains a Pandas dataframe, a description of the data, and descriptions of all the columns in that dataframe.
- property T: DataFrameDescriber¶
See
transpose().
- __init__(name, df, desc, head=None, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None, mangle_file_names=False)¶
- property column_descriptions: Dict[str, str]¶
A dictionary of name to Descriptions of the column metadata created from
meta. Any missing column metadata will result inNonedictionary values.
- derive(*, name=None, df=None, desc=None, meta=None, index_meta=None)[source]¶
Create a new instance based on this instance and replace any non-
Nonekwargs.If
metais provided, it is merged with the metadata of this instance. However, any metadata provided must match in both column names and descriptions.
- derive_with_index_meta(index_format=None)[source]¶
Like
derive(), but the dataframe is generated withdf_with_index_meta()usingindex_formatas a parameter.- Parameters:
index_format (
str) – seedf_with_index_meta()- Return type:
- df_with_index_meta(index_format=None)[source]¶
Create a dataframe with the first column containing index metadata. This uses
index_metato create the column values.- Parameters:
index_format (
str) – the new index column format usingindexandvalue, which defaults to{index}- Return type:
- Returns:
the dataframe with a new first column of the index metadata, or
dfifindex_metaisNone
- format_table()[source]¶
Replace (in place) dataframe
dfwith the formatted table obtained withTable.formatted_dataframe. TheTableis created by withcreate_table().
- classmethod from_columns(source, name=None, desc=None)[source]¶
Create a new instance by transposing a column data into a new dataframe describer. If
sourceis a dataframe, it that has the following columns:Otherwise, each element of the sequence is a row of column, meta descriptions, and data sequences.
-
head:
str= None¶ A short summary of the table and used in
Table.head.
-
index_meta:
Dict[Any,str] = None¶ The index metadata, which maps index values to descriptions of the respective row.
- property meta: DataFrame¶
The column metadata for
dataframe, which needs columnsnameanddescription. If this is not provided, it is read from filemeta_path. If this is set to a tuple of tuples, a dataframe is generated from the form:((<column name 1>, <column description 1>), (<column name 2>, <column description 2>) ...
If both this and
meta_pathare not provided, the following is used:(('description', 'Description'), ('value', 'Value')))
- save_excel(output_dir=PosixPath('.'))[source]¶
Save as an Excel file using
csv_path. To add column labels use instances of this object withDataDescriber.save_excel().- See:
- Return type:
-
table_kwargs:
Dict[str,Any]¶ Additional key word arguments given when creating a table in
create_table().
- transpose(row_names=((0, 'value', 'Value'),), name_column='name', name_description='Name', index_column='description')[source]¶
Transpose all data in this descriptor by transposing
dfand swappingmetawithindex_metaas a new instance.- Parameters:
row_names (
Tuple[int,str,str]) – a tuple of (row index indf, the column in the newdfand the metadata description of that column in the newdf; the default takes only the first rowdescription_column – the column description this instance’s
dfindex_column (
str) – the name of the new index in the returned instance
- Return type:
- Returns:
a new derived instance of the transposed data
zensols.datdesc.dfstash module¶
A stash implementation that uses a Pandas dataframe and stored as a CSV file.
- class zensols.datdesc.dfstash.DataFrameStash(path, dataframe=None, key_column='key', columns=('value',), mkdirs=True, auto_commit=True, single_column_index=0)[source]¶
Bases:
CloseableStashA backing stash that persists to a CSV file via a Pandas dataframe. All modification go through the
pandas.DataFrameand then saved withcommit()orclose().- __init__(path, dataframe=None, key_column='key', columns=('value',), mkdirs=True, auto_commit=True, single_column_index=0)¶
- clear()[source]¶
Delete all data from the from the stash.
Important: Exercise caution with this method, of course.
-
columns:
Tuple[str,...] = ('value',)¶ The columns to create in the spreadsheet. These must be consistent when the data is restored.
- property dataframe: DataFrame¶
The dataframe to proxy in memory. This is settable on instantiation but read-only afterward. If this is not set an empty dataframe is created with the metadata in this class.
- delete(name=None)[source]¶
Delete the resource for data pointed to by
nameor the entire resource ifnameis not given.
- exists(name)[source]¶
Return
Trueif data with keynameexists.Implementation note: This
Stash.exists()method is very inefficient and should be overriden.- Return type:
- get(name, default=None)[source]¶
Load an object or a default if key
namedoesn’t exist. Semantically, this method tries not to re-create the data if it already exists. This means that if a stash has built-in caching mechanisms, this method uses it.
- load(name)[source]¶
Load a data value from the pickled data with key
name. Semantically, this method loads the using the stash’s implementation. For exampleDirectoryStashloads the data from a file if it exists, but factory type stashes will always re-generate the data.
-
mkdirs:
bool= True¶ Whether to recusively create the directory where
pathis stored if it does not already exist.
zensols.datdesc.figure module¶
A simple object oriented plotting API.
- class zensols.datdesc.figure.Figure(name='Untitled', config_factory=None, title_font_size=0, height=5, width=5, padding=5.0, metadata=<factory>, plots=(), image_dir=PosixPath('.'), image_format='svg', image_file_norm=True, seaborn=<factory>, subplot_params=<factory>)[source]¶
Bases:
Deallocatable,DictableAn object oriented class to manage
matplit.figure.Figureand subplots (matplit.pyplot.Axes).- __init__(name='Untitled', config_factory=None, title_font_size=0, height=5, width=5, padding=5.0, metadata=<factory>, plots=(), image_dir=PosixPath('.'), image_format='svg', image_file_norm=True, seaborn=<factory>, subplot_params=<factory>)¶
- add_plot(plot)[source]¶
Add to the collection of managed plots. This is needed for the plot to work if not created from this manager instance.
- Parameters:
plot (
Plot) – the plot to be managed
-
config_factory:
ConfigFactory= None¶ The configuration factory used to create plots.
- property path: Path¶
The path of the image figure to save. This is constructed from
image_dir,nameand :obj`image_format`. Conversely, when set, it updates these fields.
-
plots:
Tuple[Plot,...] = ()¶ The plots managed by this object instance. Use
add_plot()to add new plots.
-
seaborn:
Dict[str,Any]¶ Seaborn (
seaborn) rendering configuration. It has the following optional keys:style: parameters used with :function:`sns.set_style`context: parameters used with :function:`sns.set_context`
- class zensols.datdesc.figure.FigureFactory(config_factory, plot_section_regex)[source]¶
Bases:
DictableCreate instances of :.Figure using
create()or from configuration files withfrom_file(). See the `usage`_ documentation for information about the configuration files used byfrom_file().- __init__(config_factory, plot_section_regex)¶
-
config_factory:
ConfigFactory¶ The configuration factory used to create
Tableinstances.
- from_dict(figure_config)[source]¶
Return figures parsed from nested
builtins.dict(see class documentation).
- from_file(figure_path)[source]¶
Like
from_dict()but read from a YAML file.
- class zensols.datdesc.figure.Plot(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>)[source]¶
Bases:
DictableAn abstract base class for plots. The subclass overrides
plot()to generate the plot. Then the client can usesave()orrender()it. The plot is created as a subplot providing attributes for space to be taken in rows, columns, height and width.- __init__(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>)¶
zensols.datdesc.hyperparam module¶
Hyperparameter metadata: access and documentation. This package was designed for the following purposes:
Provide a basic scaffolding to update model hyperparameters such as
hyperopt.Generate LaTeX tables of the hyperparamers and their descriptions for academic papers.
The object instance graph hierarchy is:
Access to the hyperparameters is done by calling the set or model levels
with a dotted path notation string. For example, svm.C first navigates to
model svm, then to the hyperparameter named C.
- class zensols.datdesc.hyperparam.Hyperparam(name, type, doc, choices=None, value=None, interval=None)[source]¶
Bases:
DictableA hyperparameter’s metadata, documentation and value. The value is accessed (retrieval and setting) at runtime. Do not use this class explicitly. Instead use
HyperparamModel.The index access only applies when
typeislistordict. Otherwise, thevaluemember has the value of the hyperparameter.-
CLASS_MAP:
ClassVar[Dict[str,Type]] = {'bool': <class 'bool'>, 'choice': <class 'str'>, 'dict': <class 'dict'>, 'float': <class 'float'>, 'int': <class 'int'>, 'list': <class 'list'>, 'str': <class 'str'>}¶ A mapping for values set in
typeto their Python class equivalents.
-
VALID_TYPES:
ClassVar[str] = frozenset({'bool', 'choice', 'dict', 'float', 'int', 'list', 'str'})¶ Valid settings for
type.
- __init__(name, type, doc, choices=None, value=None, interval=None)¶
-
doc:
str¶ The human readable documentation for the hyperparameter. This is used in documentation generation tasks.
-
interval:
Union[Tuple[float,float],Tuple[int,int]] = None¶ Valid intervals for
valueas an inclusive interval.
-
CLASS_MAP:
- class zensols.datdesc.hyperparam.HyperparamContainer[source]¶
Bases:
DictableA container class for
Hyperparaminstances.- __init__()¶
- abstract flatten(deep=False)[source]¶
Return a flattened directory with the dotted path notation (see module docs).
- exception zensols.datdesc.hyperparam.HyperparamError[source]¶
Bases:
DataDescriptionErrorRaised for any error related hyperparameter access.
- __module__ = 'zensols.datdesc.hyperparam'¶
- class zensols.datdesc.hyperparam.HyperparamModel(name, doc, desc=None, params=<factory>, table=None)[source]¶
Bases:
HyperparamContainerThe model level class that contains the parameters. This class represents a machine learning model such as a SVM with hyperparameters such as
Candmaximum iterations.- __init__(name, doc, desc=None, params=<factory>, table=None)¶
- create_dataframe_describer()[source]¶
Return an object with metadata fully describing the hyperparameters of this model.
- Return type:
-
desc:
str= None¶ name is not sufficient. Since
namehas naming constraints, this can be used as in place during documentation generation.- Type:
The description the model used in the documentation when obj
- flatten(deep=False)[source]¶
Return a flattened directory with the dotted path notation (see module docs).
- property metadata_dataframe: DataFrame¶
A dataframe describing the
values_dataframe.
-
name:
str¶ The name of the model (i.e.
svm). This name can have only alpha-numeric and underscore charaters.
-
params:
Dict[str,Hyperparam]¶ The hyperparameters keyed by their names.
-
table:
Optional[Dict[str,Any]] = None¶ Overriding data used when creating a
TablefromDataFrameDescriber.create_table().
- property values_dataframe: DataFrame¶
A dataframe with parameter data. This includes the name, type, value and documentation.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.datdesc.hyperparam.HyperparamSet(models=<factory>, name=None)[source]¶
Bases:
HyperparamContainerThe top level in the object graph hierarchy (see module docs). This contains a set of models and typically where calls by packages such as
hyperoptare used to update the hyperparameters of the model(s).- __init__(models=<factory>, name=None)¶
- create_describer(meta_path=None)[source]¶
Return an object with metadata fully describing the hyperparameters of this model.
- Parameters:
meta_path (
Path) – if provided, set the path on the returned instance- Return type:
- flatten(deep=False)[source]¶
Return a flattened directory with the dotted path notation (see module docs).
-
models:
Dict[str,HyperparamModel]¶ The models containing hyperparameters for this set.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.datdesc.hyperparam.HyperparamSetLoader(data, config=None, updates=())[source]¶
Bases:
objectLoads a set of hyperparameters from a YAML
pathlib.Path,dictor streamio.TextIOBase.- __init__(data, config=None, updates=())¶
-
config:
Configurable= None¶ The application configuration used to update the hyperparameters from other sections.
-
data:
Union[Dict[str,Any],Path,TextIOBase]¶ The source of data to load, which is a YAML
pathlib.Path,dictor streamio.TextIOBase.- See:
- load(**kwargs) HyperparamSet¶
Load and return the hyperparameter object graph from
data.- Return type:
HyperparamSet
- exception zensols.datdesc.hyperparam.HyperparamValueError[source]¶
Bases:
HyperparamErrorRaised for bad values set on a hyperparameter.
- __annotations__ = {}¶
- __module__ = 'zensols.datdesc.hyperparam'¶
zensols.datdesc.latex module¶
Contains the manager classes that invoke the tables to generate.
- class zensols.datdesc.latex.CsvToLatexTable(tables, package_name)[source]¶
Bases:
WritableGenerate a Latex table from a CSV file.
- __init__(tables, package_name)¶
- class zensols.datdesc.latex.LatexTable(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, rules=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None, row_range=(1, -1), row_deletes=frozenset({}), booktabs=False)[source]¶
Bases:
TableThis subclass generates LaTeX tables.
- __init__(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, rules=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None, row_range=(1, -1), row_deletes=frozenset({}), booktabs=False)¶
-
booktabs:
bool= False¶ Whether or not to use the
booktabsstyle table and to format using its style.
- class zensols.datdesc.latex.SlackTable(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, rules=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None, row_range=(1, -1), row_deletes=frozenset({}), booktabs=False, slack_column=0)[source]¶
Bases:
LatexTableAn instance of the table that fills up space based on the widest column.
- __init__(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, rules=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None, row_range=(1, -1), row_deletes=frozenset({}), booktabs=False, slack_column=0)¶
zensols.datdesc.opt module¶
zensols.datdesc.optscore module¶
zensols.datdesc.plots module¶
Common used plots for ML.
- class zensols.datdesc.plots.BarPlot(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, data=None, palette=None, x_axis_label=None, y_axis_label=None, x_column_name=None, y_column_name=None, hue_column_name=None, x_label_rotation=0, key_title=None, log_scale=None, render_value_font_size=None, hue_palette=False, plot_params=<factory>)[source]¶
Bases:
PaletteContainerPlot,DataFramePlotCreate a bar plot using
seaborn.barplot().- __init__(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, data=None, palette=None, x_axis_label=None, y_axis_label=None, x_column_name=None, y_column_name=None, hue_column_name=None, x_label_rotation=0, key_title=None, log_scale=None, render_value_font_size=None, hue_palette=False, plot_params=<factory>)¶
- class zensols.datdesc.plots.DataFramePlot(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, data=None)[source]¶
Bases:
PlotA base class for plots that render data from a Pandas dataframe.
- __init__(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, data=None)¶
- class zensols.datdesc.plots.HeatMapPlot(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, data=None, palette=None, format='.2f', x_label_rotation=0, params=<factory>)[source]¶
Bases:
PaletteContainerPlot,DataFramePlotCreate heat map plot and optionally normalize. This uses
seaborn’sheatmap.- __init__(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, data=None, palette=None, format='.2f', x_label_rotation=0, params=<factory>)¶
- class zensols.datdesc.plots.HistPlot(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, palette=None, data=<factory>, x_axis_label=None, y_axis_label=None, key_title=None, log_scale=None, plot_params=<factory>)[source]¶
Bases:
PaletteContainerPlotCreate a histogram plot using
seaborn.histplot().- __init__(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, palette=None, data=<factory>, x_axis_label=None, y_axis_label=None, key_title=None, log_scale=None, plot_params=<factory>)¶
-
data:
List[Tuple[str,DataFrame]]¶ The data to plot. Each element is tuple first components with the plot name.
- class zensols.datdesc.plots.PaletteContainerPlot(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, palette=None)[source]¶
Bases:
PlotA base class that supports creating a color palette for subclasses.
- __init__(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, palette=None)¶
- class zensols.datdesc.plots.PointPlot(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, palette=None, data=<factory>, x_axis_name=None, y_axis_name=None, x_column_name=None, y_column_name=None, key_title=None, sample_rate=0, plot_params=<factory>)[source]¶
Bases:
PaletteContainerPlotAn abstract base class that renders overlapping lines that uses a
seabornpointplot.- __init__(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, palette=None, data=<factory>, x_axis_name=None, y_axis_name=None, x_column_name=None, y_column_name=None, key_title=None, sample_rate=0, plot_params=<factory>)¶
- add(name, line)[source]¶
Add the losses of a dataset by adding X values as incrementing integers the size of
line.
-
data:
List[Tuple[str,DataFrame]]¶ The data to plot. Each element is tuple first components with the plot name. The second component is a dataframe with columns:
x_column_name: the X values of the graph, usually an incrementing numbery_column_name: a list loss float values
Optionally use
add_line()to populate this list.
- class zensols.datdesc.plots.RadarPlot(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, data=None, key_title=None, frame='circle', render_value_font_size=None, label_gap=None, alpha=0.25)[source]¶
Bases:
DataFramePlotA radar plot (a.k.a. spider plolt).
- __init__(title=None, row=0, column=0, post_hooks=<factory>, legend_params=<factory>, data=None, key_title=None, frame='circle', render_value_font_size=None, label_gap=None, alpha=0.25)¶
zensols.datdesc.table module¶
This module contains classes that generate tables.
- class zensols.datdesc.table.Table(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, rules=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None)[source]¶
Bases:
PersistableContainer,DictableGenerates a Zensols styled Latex table from a CSV file.
- __init__(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, rules=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None)¶
- asflatdict(*args, **kwargs)[source]¶
Like
asdict()but flatten in to a data structure suitable for writing to JSON or YAML.
-
blank_columns:
List[int]¶ A list of column indexes to set to the empty string (i.e. 0th to fixed the
Unnamed: 0issues).
-
capitalize_columns:
Dict[str,bool]¶ Capitalize either sentences (
Falsevalues) or every word (Truevalues). The keys are column names.
-
code_format:
str= None¶ Like
code_postbut modifies the table after this class’s all formatting of the table (including those applied by this class).
-
code_post:
str= None¶ Like
code_prebut modifies the table after this class’s modifications of the table.
-
code_pre:
str= None¶ Python code executed that manipulates the table’s dataframe before modifications made by this class. The code has a local
dfvariable and the returned value is used as the replacement. This is usually a one-liner used to subset the data etc. The code is evaluated witheval().
-
column_aligns:
str= None¶ The alignment/justification (i.e.
|l|l|for two columns). If not provided, they are automatically generated based on the columns of the table.
-
column_value_replaces:
Dict[str,Dict[Any,Any]]¶ Data values to replace in the dataframe. It is keyed by the column name and values are the replacements. Each value is a
dictwith orignal value keys and the replacements as values.
-
default_params:
Sequence[Sequence[str]]¶ Default parameters to be substituted in the template that are interpolated by the LaTeX numeric values such as #1, #2, etc. This is a sequence (list or tuple) of
(<name>, [<default>])wherenameis substituted by name in the template anddefaultis the default if not given inparams.
-
format_scientific_column_names:
Dict[str,Optional[int]]¶ Format a column using LaTeX formatted scientific notation using
format_scientific(). Keys are column names and values is the mantissa length or 1 ifNone.
- static format_thousand(x, apply_k=True, add_comma=True, round=None)[source]¶
Format a number as a string with comma separating thousands.
-
format_thousands_column_names:
Dict[str,Optional[Dict[str,Any]]]¶ Columns to format using thousands, and optionally round. The keys are the column names of the table and the values are either
Noneor the keyword arguments toformat_thousand().
- property formatted_dataframe: DataFrame¶
The
dataframewith the formatting applied to it used to create the Latex table. Modifications such as string replacements for adding percents is done.
-
head:
str= None¶ The header to use for the table, which is used as the text in the list of tables and made bold in the table.
-
make_percent_column_names:
Dict[str,Union[int,str]]¶ Each columnn in the map will get rounded to the value * 100 of the name. For example,
{'ann_per': 3}will round columnann_perto 3 decimal places.If the values are strings then it is interpreted as a Python f-string using
vas the value. For example,{'ann_per': '{v:.1f}'}gives a percentage to the first decimal without the percentage sign (%).
-
params:
Dict[str,str]¶ Parameters used in the template that override of the
default_params.
-
read_params:
Dict[str,str]¶ Keyword arguments used in the
read_csv()call when reading the CSV file.
-
replace_nan:
str= None¶ Replace NaN values with a the value of this field as
tabulate()is not using the missing value due to some bug I assume.
-
round_column_names:
Dict[str,Union[Tuple[int,int],int]]¶ Each column in the map will get rounded to their respective values.
For tuple values the number will be rounded as an integer if higher than a cutoff (second element), otherwise it is rounded to the decimal (first element).
-
rules:
Dict[int,str]¶ Like
hlinesbut allows other horizontal lines such as ``toprule`. Each key/value is a tuple of row and the verbatim text to add at that place.
-
tabulate_params:
Dict[str,str]¶ Keyword arguments used in the
tabulate()call when writing the table. The default tellstabulateto not parse/format numerical data.
-
variables:
Dict[str,Union[Tuple[int,int],str]]¶ A mapping of variable names to a dataframe cell or Python code snipped that is evaluated with
exec(). In LaTeX, this is done by setting anewcommand(seeLatexTable).If set to a tuple of
(<row>, <column>)the value of the pre-formatted dataframe is used (seeunformattedbelow).If a Python evalution string, the code values must set variables
vto the variable value. A variablestagesis aDictused to get one of the dataframes created at various stages of formatting the table with entries:nascent: same asdataframeunformatted: after the pre-evaluation but before any formattingpostformat: after number formatting and post evaluation, but before remaining column and cell modificationsformatted: same asformatted_dataframe
For example, the following uses the value at row 2 and column 3 of the unformatted dataframe:
v = stages['unformatted'].iloc[2, 3]
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.datdesc.table.TableFactory(config_factory, table_section_regex, default_table_type)[source]¶
Bases:
DictableReads the table definitions file and writes a Latex
.styfile of the generated tables from the CSV data. Tables are created with eitherusage()orfrom_file(). See the `usage`_ documentation for information about the configuration files used byfrom_file().- __init__(config_factory, table_section_regex, default_table_type)¶
-
config_factory:
ConfigFactory¶ The configuration factory used to create
Tableinstances.
-
default_table_type:
str¶ The default name, which resolves to a section name, to use when creating anonymous tables.
Module contents¶
Generate Latex tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file.
- Example::
- latextablenamehere:
type: slack slack_col: 0 path: ../config/table-name.csv caption: Some Caption placement: t! size: small single_column: true percent_column_names: [‘Proportion’]
- exception zensols.datdesc.DataDescriptionError[source]¶
Bases:
APIErrorThrown for any application level error.
- __annotations__ = {}¶
- __module__ = 'zensols.datdesc'¶
- exception zensols.datdesc.FigureError(reason, figure=None)[source]¶
Bases:
DataDescriptionErrorThrown for any application level error related to creating figures.
- __annotations__ = {}¶
- __module__ = 'zensols.datdesc'¶