zensols.datdesc package¶

Submodules¶

zensols.datdesc.app module¶

Generate LaTeX tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file. Paraemters are both files or both directories. When using directories, only files that match *-table.yml are considered.

class zensols.datdesc.app.Application(config_factory, table_factory_name, figure_factory_name, data_file_regex=re.compile('^.+-table\\\\.yml$'), figure_file_regex=re.compile('^.+-figure\\\\.yml$'), hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None)[source]¶

Bases: object

Generate LaTeX tables files from CSV files and hyperparameter .sty files.

__init__(config_factory, table_factory_name, figure_factory_name, data_file_regex=re.compile('^.+-table\\\\.yml$'), figure_file_regex=re.compile('^.+-figure\\\\.yml$'), hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None)¶

config_factory: ConfigFactory¶: Creates table and figure factories.

data_file_regex: Pattern = re.compile('^.+-table\\.yml$')¶: Matches file names of table definitions process in the LaTeX output.

property figure_factory: FigureFactory¶: Reads the figure definitions file and writes eps figures..

figure_factory_name: str¶: The section name of the figure factory (see figure_factory).

figure_file_regex: Pattern = re.compile('^.+-figure\\.yml$')¶: Matches file names of figure definitions process in the LaTeX output.

generate_figures(input_path, output_path, output_image_format=None)[source]¶

Generate figures.

Parameters:

input_path (Path) – definitions YAML path location or directory
output_path (Path) – output file or directory
output_image_format (str) – the output format (defaults to svg)

generate_hyperparam(input_path, output_path, output_format=_OutputFormat.short)[source]¶

Write hyperparameter formatted data.

Parameters:

input_path (Path) – definitions YAML path location or directory
output_path (Path) – output file or directory
output_format (_OutputFormat) – output format of the hyperparameter metadata

generate_tables(input_path, output_path)[source]¶

Create LaTeX tables.

Parameters:

input_path (Path) – definitions YAML path location or directory
output_path (Path) – output file or directory

hyperparam_file_regex: Pattern = re.compile('^.+-hyperparam\\.yml$')¶: Matches file names of tables process in the LaTeX output.

hyperparam_table_default: Settings = None¶: Default settings for hyperparameter Table instances.

list_figures(input_path)[source]¶

Generate figures.

Parameters:

input_path (Path) – definitions YAML path location or directory
output_path – output file or directory
output_image_format – the output format (defaults to svg)

show_table(name=None)[source]¶

Print a list of example LaTeX tables.

Parameters:: name (str) – the name of the example table or a listing of tables if omitted

property table_factory: TableFactory¶: Reads the table definitions file and writes a Latex .sty file of the generated tables from the CSV data.

table_factory_name: str¶: The section name of the table factory (see table_factory).

write_excel(input_path, output_file=None, output_latex_format=False)[source]¶

Create an Excel file from table data.

Parameters:

input_path (Path) – definitions YAML path location or directory
output_file (Path) – the output file, which defaults to the input prefix with the approproate extension
output_latex_format (bool) – whether to output with LaTeX commands

class zensols.datdesc.app.PrototypeApplication(app)[source]¶

Bases: object

CLI_META = {'is_usage_visible': False}¶

__init__(app)¶

app: Application¶

proto()[source]¶: Prototype test.

zensols.datdesc.cli module¶

Command line entry point to the application.

class zensols.datdesc.cli.ApplicationFactory(*args, **kwargs)[source]¶

Bases: ApplicationFactory

__init__(*args, **kwargs)[source]¶

zensols.datdesc.cli.main(args=['/Users/landes/opt/lib/pixi/envs/zensols_relpo/bin/sphinx-build', '-M', 'html', '/Users/landes/view/util/datdesc/target/doc/stage', '/Users/landes/view/util/datdesc/target/doc/build'], **kwargs)[source]¶

Return type:: ActionResult

zensols.datdesc.desc module¶

Metadata container classes.

class zensols.datdesc.desc.DataDescriber(describers, name='default', output_dir=PosixPath('results'), csv_dir=PosixPath('csv'), yaml_dir=PosixPath('config'), mangle_file_names=False, mangle_sheet_name=False)[source]¶

Bases: PersistableContainer, Dictable

Container class for DataFrameDescriber instances. It also saves their instances as CSV data files and YAML configuration files.

SHEET_NAME_MAXLEN: ClassVar[int] = 31¶: Maximum allowed characters in an Excel spreadsheet’s name.

__init__(describers, name='default', output_dir=PosixPath('results'), csv_dir=PosixPath('csv'), yaml_dir=PosixPath('config'), mangle_file_names=False, mangle_sheet_name=False)¶

add_summary()[source]¶

Add a new metadata like DataFrameDescriber as a first entry in describers that describes what data this instance currently has.

Return type:: DataFrameDescriber
Returns:: the added metadata DataFrameDescriber instance

csv_dir: Path = PosixPath('csv')¶: The directory where to write the CSV files.

describers: Tuple[DataFrameDescriber, ...]¶: The contained dataframe and metadata.

property describers_by_name: Dict[str, DataFrameDescriber]¶: Data frame describers keyed by the describer name.

format_tables()[source]¶: See DataFrameDescriber.format_table().

classmethod from_yaml_file(path)[source]¶

Create a data descriptor from a previously written YAML/CSV files using save().

See:: save()
See:: DataFrameDescriber.from_table()
Return type:: DataDescriber

items()[source]¶

Return type:: Iterable[Tuple[str, DataFrameDescriber]]

keys()[source]¶

Return type:: Sequence[str]

mangle_file_names: bool = False¶: Whether to normalize output file names.

mangle_sheet_name: bool = False¶: Whether to normalize the Excel sheet names when xlsxwriter.exceptions.InvalidWorksheetName is raised.

name: str = 'default'¶: The name of the dataset.

output_dir: Path = PosixPath('results')¶: The directory where to write the results.

save(output_dir=None, yaml_dir=None, include_excel=False)[source]¶

Save both the CSV and YAML configuration file.

Parameters:: include_excel (bool) – whether to also write the Excel file to its default output file name
See:: save_csv()
Return type:: List[Path]

:see save_yaml()

save_csv(output_dir=None)[source]¶

Save all provided dataframe describers to an CSV files.

Parameters:: output_dir (Path) – the directory of where to save the data
Return type:: List[Path]

save_excel(output_file=None)[source]¶

Save all provided dataframe describers to an Excel file.

Parameters:: output_file (Path) – the Excel file to write, which needs an .xlsx extension; this defaults to a path created from output_dir and name
Return type:: Path

save_yaml(output_dir=None, yaml_dir=None)[source]¶

Save all provided dataframe describers YAML files used by the datdesc command.

Parameters:: output_dir (Path) – the directory of where to save the data
Return type:: List[Path]

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, df_params=None)[source]¶

Parameters:: df_params (Dict[str, Any]) – the formatting pandas options, which defaults to max_colwidth=80

yaml_dir: Path = PosixPath('config')¶: The directory where to write the CSV files.

class zensols.datdesc.desc.DataFrameDescriber(name, df, desc, head=None, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None, mangle_file_names=False)[source]¶

Bases: PersistableContainer, Dictable

A class that contains a Pandas dataframe, a description of the data, and descriptions of all the columns in that dataframe.

property T: DataFrameDescriber¶: See transpose().

__init__(name, df, desc, head=None, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None, mangle_file_names=False)¶

create_table(**kwargs)[source]¶

Create a table from the metadata using:

csv_path as Table.path

df as Table.dataframe

desc as Table.caption

asdict() as Table.column_renames

Parameters:: kwargs – key word arguments that override the default parameterized data passed to Table
Return type:: Table

property csv_path: Path¶: The CVS file that contains the data this instance describes.

derive(*, name=None, df=None, desc=None, meta=None, index_meta=None)[source]¶

Create a new instance based on this instance and replace any non-None kwargs.

If meta is provided, it is merged with the metadata of this instance. However, any metadata provided must match in both column names and descriptions.

Parameters:

name (str) – name
df (DataFrame) – df
desc (str) – desc
meta (Union[DataFrame, Tuple[Tuple[str, str], ...]]) – meta

Raises:

DataDescriptionError – if multiple metadata columns with differing descriptions are found

Return type:

DataFrameDescriber

derive_with_index_meta(index_format=None)[source]¶

Like derive(), but the dataframe is generated with df_with_index_meta() using index_format as a parameter.

Parameters:: index_format (str) – see df_with_index_meta()
Return type:: DataFrameDescriber

desc: str¶: The description of the data frame.

df: DataFrame¶: The dataframe to describe.

df_with_index_meta(index_format=None)[source]¶

Create a dataframe with the first column containing index metadata. This uses index_meta to create the column values.

Parameters:: index_format (str) – the new index column format using index and value, which defaults to {index}
Return type:: DataFrame
Returns:: the dataframe with a new first column of the index metadata, or df if index_meta is None

format_table()[source]¶: Replace (in place) dataframe df with the formatted table obtained with Table.formatted_dataframe. The Table is created by with create_table().

classmethod from_columns(source, name=None, desc=None)[source]¶

Create a new instance by transposing a column data into a new dataframe describer. If source is a dataframe, it that has the following columns:

column: the column names of the resulting describer

meta: the description that makes up the meta

data: Sequence’s of the data

Otherwise, each element of the sequence is a row of column, meta descriptions, and data sequences.

Parameters:

source (Union[DataFrame, Sequence[Sequence[Any]]]) – the data as columns
name (str) – used for name
desc (str) – used for desc

Return type:

DataFrameDescriber

classmethod from_table(tab)[source]¶

Create a frame descriptor from a Table.

Return type:: DataFrameDescriber

get_table_name(form)[source]¶

The table derived from name.

Parameters:: form (str) – specifies the format: file means file-friendly, camel is for reverse camel notation
Return type:: str

head: str = None¶: A short summary of the table and used in Table.head.

index_meta: Dict[Any, str] = None¶: The index metadata, which maps index values to descriptions of the respective row.

mangle_file_names: bool = False¶: Whether to normalize output file names.

property meta: DataFrame¶

The column metadata for dataframe, which needs columns name and description. If this is not provided, it is read from file meta_path. If this is set to a tuple of tuples, a dataframe is generated from the form:

((<column name 1>, <column description 1>),
 (<column name 2>, <column description 2>) ...

If both this and meta_path are not provided, the following is used:

(('description', 'Description'),
 ('value', 'Value')))

meta_path: Optional[Path] = None¶

A path to use to create meta metadata.

See:: meta

name: str¶: The description of the data this describer holds.

save_csv(output_dir=PosixPath('.'))[source]¶

Save as a CSV file using csv_path.

Return type:: Path

save_excel(output_dir=PosixPath('.'))[source]¶

Save as an Excel file using csv_path. The same file naming semantics are used as with DataDescriber.save_excel().

See:: DataDescriber.save_excel()
Return type:: Path

table_kwargs: Dict[str, Any]¶: Additional key word arguments given when creating a table in create_table().

transpose(row_names=((0, 'value', 'Value'),), name_column='name', name_description='Name', index_column='description')[source]¶

Transpose all data in this descriptor by transposing df and swapping meta with index_meta as a new instance.

Parameters:

row_names (Tuple[int, str, str]) – a tuple of (row index in df, the column in the new df and the metadata description of that column in the new df; the default takes only the first row
name_column (str) – the column name this instance’s df
description_column – the column description this instance’s df
index_column (str) – the name of the new index in the returned instance

Return type:

DataFrameDescriber

Returns:

a new derived instance of the transposed data

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, df_params=None)[source]¶

Parameters:: df_params (Dict[str, Any]) – the formatting pandas options, which defaults to max_colwidth=80

write_pretty(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_metadata=False, title_format='{name} ({desc})', **tabulate_params)[source]¶: Like write(), but generate a visually appealing table and optionally column metadata.

zensols.datdesc.dfstash module¶

A stash implementation that uses a Pandas dataframe and stored as a CSV file.

class zensols.datdesc.dfstash.DataFrameStash(path, dataframe=None, key_column='key', columns=('value',), mkdirs=True, auto_commit=True, single_column_index=0)[source]¶

Bases: CloseableStash

A backing stash that persists to a CSV file via a Pandas dataframe. All modification go through the pandas.DataFrame and then saved with commit() or close().

__init__(path, dataframe=None, key_column='key', columns=('value',), mkdirs=True, auto_commit=True, single_column_index=0)¶

auto_commit: bool = True¶: Whether to save to the file system after any modification.

clear()[source]¶

Delete all data from the from the stash.

Important: Exercise caution with this method, of course.

close()[source]¶: Close all resources created by the stash.

columns: Tuple[str, ...] = ('value',)¶: The columns to create in the spreadsheet. These must be consistent when the data is restored.

commit()[source]¶: Commit changes to the file system.

property dataframe: DataFrame¶: The dataframe to proxy in memory. This is settable on instantiation but read-only afterward. If this is not set an empty dataframe is created with the metadata in this class.

delete(name=None)[source]¶: Delete the resource for data pointed to by name or the entire resource if name is not given.

dump(name, inst)[source]¶: Persist data value inst with key name.

exists(name)[source]¶

Return True if data with key name exists.

Implementation note: This Stash.exists() method is very inefficient and should be overriden.

Return type:: bool

get(name, default=None)[source]¶

Load an object or a default if key name doesn’t exist. Semantically, this method tries not to re-create the data if it already exists. This means that if a stash has built-in caching mechanisms, this method uses it.

See:: load()
Return type:: Union[Any, Tuple[Any, ...]]

key_column: str = 'key'¶: The spreadsheet column name used to store stash keeys.

keys()[source]¶

Return an iterable of keys in the collection.

Return type:: Iterable[str]

load(name)[source]¶

Load a data value from the pickled data with key name. Semantically, this method loads the using the stash’s implementation. For example DirectoryStash loads the data from a file if it exists, but factory type stashes will always re-generate the data.

See:: get()
Return type:: Union[Any, Tuple[Any, ...]]

mkdirs: bool = True¶: Whether to recusively create the directory where path is stored if it does not already exist.

path: Path¶: The path of the file from which to read and write.

single_column_index: Optional[int] = 0¶: If this is set, then a single type is assumed for loads and restores. Otherwise, if set to None, multiple columns are saved and retrieved.

values()[source]¶

Return the values in the hash.

Return type:: Iterable[Union[Any, Tuple[Any, ...]]]

zensols.datdesc.figure module¶

A simple object oriented plotting API.

class zensols.datdesc.figure.Figure(name='Untitled', config_factory=None, title_font_size=0, height=5, width=5, padding=5.0, metadata=<factory>, plots=(), image_dir=PosixPath('.'), image_format='svg', image_file_norm=True, seaborn=<factory>)[source]¶

Bases: Deallocatable, Dictable

An object oriented class to manage matplit.figure.Figure and subplots (matplit.pyplot.Axes).

__init__(name='Untitled', config_factory=None, title_font_size=0, height=5, width=5, padding=5.0, metadata=<factory>, plots=(), image_dir=PosixPath('.'), image_format='svg', image_file_norm=True, seaborn=<factory>)¶

add_plot(plot)[source]¶

Add to the collection of managed plots. This is needed for the plot to work if not created from this manager instance.

Parameters:: plot (Plot) – the plot to be managed

clear()[source]¶: Remove all plots and reset the matplotlib module.

config_factory: ConfigFactory = None¶: The configuration factory used to create plots.

create(name, **kwargs)[source]¶

Create a plot using the arguments of Plot.

Parameters:

name (Union[str, Type[Plot]]) – the configuration section name of the plot
kwargs – the initializer keyword arguments when creating the plot

Return type:

Plot

deallocate()[source]¶: Deallocate all resources for this instance.

height: int = 5¶: The height in inches of the entire figure.

image_dir: Path = PosixPath('.')¶: The default image save directory.

image_file_norm: bool = True¶: Whether to normalize the image output file name.

image_format: str = 'svg'¶: The image format to use when saving plots.

metadata: Dict[str, str]¶: Metadata added to the image when saved.

name: str = 'Untitled'¶: Used for file naming and the title.

padding: float = 5.0¶: Tight layout padding.

property path: Path¶: The path of the image figure to save. This is constructed from image_dir, name and :obj`image_format`. Conversely, when set, it updates these fields.

plots: Tuple[Plot, ...] = ()¶: The plots managed by this object instance. Use add_plot() to add new plots.

save()[source]¶

Save the figure of subplot(s) to at location path.

Param:: if provided, overrides the save location path
Return type:: Path
Returns:: the value of path

seaborn: Dict[str, Any]¶

Seaborn (seaborn) rendering configuration. It has the following optional keys:

style: parameters used with :function:`sns.set_style`

context: parameters used with :function:`sns.set_context`

show()[source]¶: Render and display the plot.

title_font_size: int = 0¶: The font size name. A size of 0 means do not render the title. Typically a font size of 16 is appropriate.

width: int = 5¶: The width in inches of the entire figure.

class zensols.datdesc.figure.FigureFactory(config_factory, plot_section_regex)[source]¶

Bases: Dictable

Create instances of :.Figure using create() or from configuration files with from_file(). See the `usage`_ documentation for information about the configuration files used by from_file().

__init__(config_factory, plot_section_regex)¶

config_factory: ConfigFactory¶: The configuration factory used to create Table instances.

create(type, **params)[source]¶

Create a plot from the application configuration.

Parameters:

type (str) – the name used to find the plot by section
params (Dict[str, Any]) – the keyword arguments used to create the plot

Return type:

Plot

Returns:

a new instance of the plot defined by the template

See:

get_plot_names()

classmethod default_instance()[source]¶

Get the singleton instance.

Return type:: FigureFactory

from_file(figure_path)[source]¶

Return figures parsed from a YAML file (see class documentation).

Parameters:: figure_path (Path) – the file containing the figure configurations
Return type:: Iterable[Figure]

get_plot_names()[source]¶

Return names of plots used in :meth:create.

Return type:: Iterable[str]

plot_section_regex: Pattern¶: A regular expression that matches plot entries.

classmethod reset_default_instance()[source]¶: Force :meth:`default_instance’ to re-instantiate a new instance on a subsequent call.

class zensols.datdesc.figure.Plot(title=None, row=0, column=0, post_hooks=<factory>)[source]¶

Bases: Dictable

An abstract base class for plots. The subclass overrides plot() to generate the plot. Then the client can use save() or render() it. The plot is created as a subplot providing attributes for space to be taken in rows, columns, height and width.

__init__(title=None, row=0, column=0, post_hooks=<factory>)¶

column: int = 0¶: The column grid position of the plot.

post_hooks: List[Callable]¶: Methods to invoke after rendering.

render(axes)[source]¶

row: int = 0¶: The row grid position of the plot.

title: str = None¶: The title to render in the plot.

zensols.datdesc.hyperparam module¶

Hyperparameter metadata: access and documentation. This package was designed for the following purposes:

Provide a basic scaffolding to update model hyperparameters such as hyperopt.

Generate LaTeX tables of the hyperparamers and their descriptions for academic papers.

The object instance graph hierarchy is:

HyperparamSet

+– HyperparamModel

+–Hyperparam

Access to the hyperparameters is done by calling the set or model levels with a dotted path notation string. For example, svm.C first navigates to model svm, then to the hyperparameter named C.

class zensols.datdesc.hyperparam.Hyperparam(name, type, doc, choices=None, value=None, interval=None)[source]¶

Bases: Dictable

A hyperparameter’s metadata, documentation and value. The value is accessed (retrieval and setting) at runtime. Do not use this class explicitly. Instead use HyperparamModel.

The index access only applies when type is list or dict. Otherwise, the value member has the value of the hyperparameter.

CLASS_MAP: ClassVar[Dict[str, Type]] = {'bool': <class 'bool'>, 'choice': <class 'str'>, 'dict': <class 'dict'>, 'float': <class 'float'>, 'int': <class 'int'>, 'list': <class 'list'>, 'str': <class 'str'>}¶: A mapping for values set in type to their Python class equivalents.

VALID_TYPES: ClassVar[str] = frozenset({'bool', 'choice', 'dict', 'float', 'int', 'list', 'str'})¶: Valid settings for type.

__init__(name, type, doc, choices=None, value=None, interval=None)¶

choices: Tuple[str, ...] = None¶: When type is choice, the value strings used in value.

property cls: Type¶: The Python equivalent class of type.

doc: str¶: The human readable documentation for the hyperparameter. This is used in documentation generation tasks.

get_type_str(short=True)[source]¶

Return type:: str

interval: Union[Tuple[float, float], Tuple[int, int]] = None¶: Valid intervals for value as an inclusive interval.

property interval_str: str¶

name: str¶: The name of the hyperparameter (i.e. C or learning_rate).

type: str¶: The type of value (i.e. float, or int).

property value: str | float | int | bool | list | dict | None¶: The value of the hyperparamer used in the application.

write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

class zensols.datdesc.hyperparam.HyperparamContainer[source]¶

Bases: Dictable

A container class for Hyperparam instances.

__init__()¶

abstract flatten(deep=False)[source]¶

Return a flattened directory with the dotted path notation (see module docs).

Parameters:: deep (bool) – if True recurse in to dict and list hyperparameter values
Return type:: Dict[str, Any]

update(params)[source]¶

Update parameter values.

Parameters:: params (Union[Dict[str, Any], HyperparamContainer]) – a dict of dotted path notation keys

abstract write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶: Write Sphinx autodoc used in a class as dataclasses.dataclass field.

exception zensols.datdesc.hyperparam.HyperparamError[source]¶

Bases: DataDescriptionError

Raised for any error related hyperparameter access.

__module__ = 'zensols.datdesc.hyperparam'¶

class zensols.datdesc.hyperparam.HyperparamModel(name, doc, desc=None, params=<factory>, table=None)[source]¶

Bases: HyperparamContainer

The model level class that contains the parameters. This class represents a machine learning model such as a SVM with hyperparameters such as C and maximum iterations.

__init__(name, doc, desc=None, params=<factory>, table=None)¶

clone()[source]¶

Make a copy of this instance.

Return type:: HyperparamModel

create_dataframe_describer()[source]¶

Return an object with metadata fully describing the hyperparameters of this model.

Return type:: DataFrameDescriber

desc: str = None¶

name is not sufficient. Since name has naming constraints, this can be used as in place during documentation generation.

Type:: The description the model used in the documentation when obj

doc: str¶: The human readable documentation for the model.

flatten(deep=False)[source]¶

Return a flattened directory with the dotted path notation (see module docs).

Parameters:: deep (bool) – if True recurse in to dict and list hyperparameter values
Return type:: Dict[str, Any]

get(name)[source]¶

Return type:: HyperparamModel

property metadata_dataframe: DataFrame¶: A dataframe describing the values_dataframe.

name: str¶: The name of the model (i.e. svm). This name can have only alpha-numeric and underscore charaters.

params: Dict[str, Hyperparam]¶: The hyperparameters keyed by their names.

table: Optional[Dict[str, Any]] = None¶: Overriding data used when creating a Table from DataFrameDescriber.create_table().

property values_dataframe: DataFrame¶: A dataframe with parameter data. This includes the name, type, value and documentation.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]¶

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶: Write Sphinx autodoc used in a class as dataclasses.dataclass field.

class zensols.datdesc.hyperparam.HyperparamSet(models=<factory>, name=None)[source]¶

Bases: HyperparamContainer

The top level in the object graph hierarchy (see module docs). This contains a set of models and typically where calls by packages such as hyperopt are used to update the hyperparameters of the model(s).

__init__(models=<factory>, name=None)¶

create_describer(meta_path=None)[source]¶

Return an object with metadata fully describing the hyperparameters of this model.

Parameters:: meta_path (Path) – if provided, set the path on the returned instance
Return type:: DataDescriber

flatten(deep=False)[source]¶

Return a flattened directory with the dotted path notation (see module docs).

Parameters:: deep (bool) – if True recurse in to dict and list hyperparameter values
Return type:: Dict[str, Any]

get(name)[source]¶

Return type:: HyperparamModel

models: Dict[str, HyperparamModel]¶: The models containing hyperparameters for this set.

name: Optional[str] = None¶: The name fo the hyperparameter set.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]¶

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶: Write Sphinx autodoc used in a class as dataclasses.dataclass field.

class zensols.datdesc.hyperparam.HyperparamSetLoader(data, config=None, updates=())[source]¶

Bases: object

Loads a set of hyperparameters from a YAML pathlib.Path, dict or stream io.TextIOBase.

__init__(data, config=None, updates=())¶

config: Configurable = None¶: The application configuration used to update the hyperparameters from other sections.

data: Union[Dict[str, Any], Path, TextIOBase]¶

The source of data to load, which is a YAML pathlib.Path, dict or stream io.TextIOBase.

See:: updates

load(**kwargs) → HyperparamSet¶

Load and return the hyperparameter object graph from data.

Return type:: HyperparamSet

updates: Sequence[Dict[str, Any]] = ()¶: A sequence of dictionaries with keys as HyperparamModel names and values as sections with values to set after loading using data.

exception zensols.datdesc.hyperparam.HyperparamValueError[source]¶

Bases: HyperparamError

Raised for bad values set on a hyperparameter.

__annotations__ = {}¶

__module__ = 'zensols.datdesc.hyperparam'¶

zensols.datdesc.latex module¶

Contains the manager classes that invoke the tables to generate.

class zensols.datdesc.latex.CsvToLatexTable(tables, package_name)[source]¶

Bases: Writable

Generate a Latex table from a CSV file.

__init__(tables, package_name)¶

package_name: str¶: The name Latex .sty package.

tables: Sequence[Table]¶: A list of table instances to create Latex table definitions.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶: Write the Latex table to the writer given in the initializer.

class zensols.datdesc.latex.LatexTable(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None, row_range=(1, -1), booktabs=False)[source]¶

Bases: Table

This subclass generates LaTeX tables.

__init__(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None, row_range=(1, -1), booktabs=False)¶

booktabs: bool = False¶: Whether or not to use the booktabs style table and to format using its style.

format_scientific(x, sig_digits=1)[source]¶

Format x in scientific notation.

Parameters:

x (float) – the number to format
sig_digits (int) – the number of digits after the decimal point

Return type:

str

row_range: Tuple[int, int] = (1, -1)¶: The range of rows to add to output proeduced by tabulate.

class zensols.datdesc.latex.SlackTable(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None, row_range=(1, -1), booktabs=False, slack_column=0)[source]¶

Bases: LatexTable

An instance of the table that fills up space based on the widest column.

__init__(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None, row_range=(1, -1), booktabs=False, slack_column=0)¶

property columns: str¶: Return the columns field in the Latex environment header.

slack_column: int = 0¶: Which column elastically grows or shrinks to make the table fit.

zensols.datdesc.opt module¶

zensols.datdesc.optscore module¶

zensols.datdesc.plots module¶

Common used plots for ML.

class zensols.datdesc.plots.BarPlot(title=None, row=0, column=0, post_hooks=<factory>, data=None, palette=None, x_axis_label=None, y_axis_label=None, x_column_name=None, y_column_name=None, hue_column_name=None, x_label_rotation=0, key_title=None, log_scale=None, render_value_font_size=None, hue_palette=False, plot_params=<factory>)[source]¶

Bases: PaletteContainerPlot, DataFramePlot

Create a bar plot using seaborn.barplot().

__init__(title=None, row=0, column=0, post_hooks=<factory>, data=None, palette=None, x_axis_label=None, y_axis_label=None, x_column_name=None, y_column_name=None, hue_column_name=None, x_label_rotation=0, key_title=None, log_scale=None, render_value_font_size=None, hue_palette=False, plot_params=<factory>)¶

hue_column_name: str = None¶: The column in data used for the data hue (each data category will have a unique in palette.

hue_palette: bool = False¶: Whether to use the hue to calculate the palette colors.

key_title: str = None¶: The title that goes in the key.

log_scale: float = None¶: The log scale of the Y-axis (see matplotlib.axes.Axis.set_yscale.

plot_params: Dict[str, Any]¶: Parameters given to seaborn.barplot().

render_value_font_size: int = None¶: Whether to add Y-axis values to the bars.

x_axis_label: str = None¶: The axis name with the X label.

x_column_name: str = None¶: The data column with the X values.

x_label_rotation: float = 0¶: The degree of label rotation.

y_axis_label: str = None¶: The axis name with the Y label.

y_column_name: str = None¶: The data column with the Y values.

class zensols.datdesc.plots.DataFramePlot(title=None, row=0, column=0, post_hooks=<factory>, data=None)[source]¶

Bases: Plot

__init__(title=None, row=0, column=0, post_hooks=<factory>, data=None)¶

data: DataFrame = None¶: The data to plot.

class zensols.datdesc.plots.HeatMapPlot(title=None, row=0, column=0, post_hooks=<factory>, data=None, palette=None, format='.2f', x_label_rotation=0, params=<factory>)[source]¶

Bases: PaletteContainerPlot, DataFramePlot

Create heat map plot and optionally normalize. This uses seaborn’s heatmap.

__init__(title=None, row=0, column=0, post_hooks=<factory>, data=None, palette=None, format='.2f', x_label_rotation=0, params=<factory>)¶

format: str = '.2f'¶: The format of the plots’s cell numerical values.

params: Dict[str, Any]¶: Additional parameters to give to seaborn.heatmap().

x_label_rotation: float = 0¶: The degree of label rotation.

class zensols.datdesc.plots.HistPlot(title=None, row=0, column=0, post_hooks=<factory>, palette=None, data=<factory>, x_axis_label=None, y_axis_label=None, key_title=None, log_scale=None, plot_params=<factory>)[source]¶

Bases: PaletteContainerPlot

Create a histogram plot using seaborn.histplot().

__init__(title=None, row=0, column=0, post_hooks=<factory>, palette=None, data=<factory>, x_axis_label=None, y_axis_label=None, key_title=None, log_scale=None, plot_params=<factory>)¶

add(name, data)[source]¶

Add occurances to use in the histogram.

Parameters:

name (str) – the variable name
data (Iterable[float]) – the data to render

data: List[Tuple[str, DataFrame]]¶: The data to plot. Each element is tuple first components with the plot name.

key_title: str = None¶: The title that goes in the key.

log_scale: float = None¶: See the seaborn.histplot() log_scale parameter. This is also used to update the ticks if provided.

plot_params: Dict[str, Any]¶: Parameters given to seaborn.histplot().

x_axis_label: str = None¶: The axis name with the X label.

y_axis_label: str = None¶: The axis name with the Y label.

class zensols.datdesc.plots.PaletteContainerPlot(title=None, row=0, column=0, post_hooks=<factory>, palette=None)[source]¶

Bases: Plot

A base class that supports creating a color palette for subclasses.

__init__(title=None, row=0, column=0, post_hooks=<factory>, palette=None)¶

palette: Union[str, Callable] = None¶: Either the a list of color characters or a callable that takes the number of colors as input. For example, the Seaborn color palette (such as sns.color_palette('tab10', n_colors=n)). This is used as the palette parameter in the sns.pointplot call.

class zensols.datdesc.plots.PointPlot(title=None, row=0, column=0, post_hooks=<factory>, palette=None, data=<factory>, x_axis_name=None, y_axis_name=None, x_column_name=None, y_column_name=None, key_title=None, sample_rate=0, plot_params=<factory>)[source]¶

Bases: PaletteContainerPlot

An abstract base class that renders overlapping lines that uses a seaborn pointplot.

__init__(title=None, row=0, column=0, post_hooks=<factory>, palette=None, data=<factory>, x_axis_name=None, y_axis_name=None, x_column_name=None, y_column_name=None, key_title=None, sample_rate=0, plot_params=<factory>)¶

add(name, line)[source]¶

Add the losses of a dataset by adding X values as incrementing integers the size of line.

Parameters:

name (str) – the line name
line (Iterable[float]) – the Y values for the line

data: List[Tuple[str, DataFrame]]¶

The data to plot. Each element is tuple first components with the plot name. The second component is a dataframe with columns:

x_column_name: the X values of the graph, usually an incrementing number

y_column_name: a list loss float values

Optionally use add_line() to populate this list.

key_title: str = None¶: The title that goes in the key.

plot_params: Dict[str, Any]¶: Parameters given to seaborn.plotpoint(). The default are decorative parameters for the marker size and line width.

sample_rate: int = 0¶: Every $n$ data point in the list of losses is added to the plot.

x_axis_name: str = None¶: The axis name with the X label.

x_column_name: str = None¶: The data column with the X values.

y_axis_name: str = None¶: The axis name with the Y label.

y_column_name: str = None¶: The data column with the Y values.

zensols.datdesc.table module¶

This module contains classes that generate tables.

class zensols.datdesc.table.Table(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None)[source]¶

Bases: PersistableContainer, Dictable

Generates a Zensols styled Latex table from a CSV file.

__init__(path, name, template, caption='', head=None, type=None, template_params=<factory>, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, code_format=None)¶

asflatdict(*args, **kwargs)[source]¶

Like asdict() but flatten in to a data structure suitable for writing to JSON or YAML.

Return type:: Dict[str, Any]

blank_columns: List[int]¶: A list of column indexes to set to the empty string (i.e. 0th to fixed the Unnamed: 0 issues).

bold_cells: List[Tuple[int, int]]¶: A list of row/column cells to bold.

bold_max_columns: List[str]¶: A list of column names that will have its max value bolded.

capitalize_columns: Dict[str, bool]¶: Capitalize either sentences (False values) or every word (True values). The keys are column names.

caption: str = ''¶: The human readable string used to the caption in the table.

code_format: str = None¶: Like code_post but modifies the table after this class’s all formatting of the table (including those applied by this class).

code_post: str = None¶: Like code_pre but modifies the table after this class’s modifications of the table.

code_pre: str = None¶: Python code executed that manipulates the table’s dataframe before modifications made by this class. The code has a local df variable and the returned value is used as the replacement. This is usually a one-liner used to subset the data etc. The code is evaluated with eval().

column_aligns: str = None¶: The alignment/justification (i.e. |l|l| for two columns). If not provided, they are automatically generated based on the columns of the table.

column_keeps: Optional[List[str]] = None¶: If provided, only keep the columns in the list

column_removes: List[str]¶: The name of the columns to remove from the table, if any.

column_renames: Dict[str, str]¶: Columns to rename, if any.

column_value_replaces: Dict[str, Dict[Any, Any]]¶: Data values to replace in the dataframe. It is keyed by the column name and values are the replacements. Each value is a dict with orignal value keys and the replacements as values.

property columns: str¶: Return the columns field in the Latex environment header.

property dataframe: DataFrame¶: The Pandas dataframe that holds the CSV data.

default_params: Sequence[Sequence[str]]¶: Default parameters to be substituted in the template that are interpolated by the LaTeX numeric values such as #1, #2, etc. This is a sequence (list or tuple) of (<name>, [<default>]) where name is substituted by name in the template and default is the default if not given in params.

definition_file: Path = None¶: The YAML file from which this instance was created.

double_hlines: Sequence[int]¶: Indexes of rows to put double horizontal line breaks.

abstract format_scientific(x, sig_digits=1)[source]¶

Format x in scientific notation.

Parameters:

x (float) – the number to format
sig_digits (int) – the number of digits after the decimal point

Return type:

str

format_scientific_column_names: Dict[str, Optional[int]]¶: Format a column using LaTeX formatted scientific notation using format_scientific(). Keys are column names and values is the mantissa length or 1 if None.

static format_thousand(x, apply_k=True, add_comma=True, round=None)[source]¶

Format a number as a string with comma separating thousands.

Parameters:

x (int) – the number to format
apply_k (bool) – add a K to the end of large numbers
add_comma (bool) – whether to add a comma
round (int) – the number to round the mantissa if given

Return type:

str

format_thousands_column_names: Dict[str, Optional[Dict[str, Any]]]¶: Columns to format using thousands, and optionally round. The keys are the column names of the table and the values are either None or the keyword arguments to format_thousand().

property formatted_dataframe: DataFrame¶: The dataframe with the formatting applied to it used to create the Latex table. Modifications such as string replacements for adding percents is done.

head: str = None¶: The header to use for the table, which is used as the text in the list of tables and made bold in the table.

hlines: Sequence[int]¶: Indexes of rows to put horizontal line breaks.

index_col_name: str = None¶: If set, add an index column with the given name.

make_percent_column_names: Dict[str, int]¶: Each columnn in the map will get rounded to the value * 100 of the name. For example, {'ann_per': 3} will round column ann_per to 3 decimal places.

name: str¶: The name of the table, also used as the label.

property package_name: str¶: Return the package name for the table in table_path.

params: Dict[str, str]¶: Parameters used in the template that override of the default_params.

path: Union[Path, str]¶: The path to the CSV file to make a latex table.

percent_column_names: Sequence[str] = ()¶: Column names that have a percent sign to be escaped.

read_params: Dict[str, str]¶: Keyword arguments used in the read_csv() call when reading the CSV file.

replace_nan: str = None¶: Replace NaN values with a the value of this field as tabulate() is not using the missing value due to some bug I assume.

round_column_names: Dict[str, int]¶: Each column in the map will get rounded to their respective values.

tabulate_params: Dict[str, str]¶: Keyword arguments used in the tabulate() call when writing the table. The default tells tabulate to not parse/format numerical data.

template: str¶: The table template, which lives in the application configuration obj.yml.

template_params: Dict[str, str]¶: Parameters used in the template.

type: str = None¶

uses: List[str]¶: Comma separated list of packages to use.

variables: Dict[str, Union[Tuple[int, int], str]]¶

A mapping of variable names to a dataframe cell or Python code snipped that is evaluated with exec(). In LaTeX, this is done by setting a newcommand (see LatexTable).

If set to a tuple of (<row>, <column>) the value of the pre-formatted dataframe is used (see unformatted below).

If a Python evalution string, the code values must set variables v to the variable value. A variable stages is a Dict used to get one of the dataframes created at various stages of formatting the table with entries:

nascent: same as dataframe

unformatted: after the pre-evaluation but before any formatting

postformat: after number formatting and post evaluation, but before remaining column and cell modifications

formatted: same as formatted_dataframe

For example, the following uses the value at row 2 and column 3 of the unformatted dataframe:

v = stages['unformatted'].iloc[2, 3]

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:

depth (int) – the starting indentation depth
writer (TextIOBase) – the writer to dump the content of this writable

writes: List[str]¶: A list of what to output for this table. Entries are table and varaibles.

class zensols.datdesc.table.TableFactory(config_factory, table_section_regex, default_table_type)[source]¶

Bases: Dictable

Reads the table definitions file and writes a Latex .sty file of the generated tables from the CSV data. Tables are created with either usage() or from_file(). See the `usage`_ documentation for information about the configuration files used by from_file().

__init__(config_factory, table_section_regex, default_table_type)¶

config_factory: ConfigFactory¶: The configuration factory used to create Table instances.

create(type=None, **params)[source]¶

Create a table from the application configuration.

Parameters:

type (str) – the name used to find the table by section
params (Dict[str, Any]) – the keyword arguments used to create the table

Return type:

Table

Returns:

a new instance of the table defined by the template

See:

get_table_names()

classmethod default_instance()[source]¶

Get the singleton instance.

Return type:: TableFactory

default_table_type: str¶: The default name, which resolves to a section name, to use when creating anonymous tables.

from_file(table_path)[source]¶

Return tables parsed from a YAML file (see class documentation).

Parameters:: table_path (Path) – the file containing the table configurations
Return type:: Iterable[Table]

get_table_names()[source]¶

Return names of tables used in :meth:create.

Return type:: Iterable[str]

classmethod reset_default_instance()[source]¶: Force :meth:`default_instance’ to re-instantiate a new instance on a subsequent call.

table_section_regex: Pattern¶: A regular expression that matches table entries.

to_file(table, table_path)[source]¶

Return type:: Dict[str, Any]

Module contents¶

Generate Latex tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file.

Example::

latextablenamehere:: type: slack slack_col: 0 path: ../config/table-name.csv caption: Some Caption placement: t! size: small single_column: true percent_column_names: [‘Proportion’]

exception zensols.datdesc.DataDescriptionError[source]¶

Bases: APIError

Thrown for any application level error.

__annotations__ = {}¶

__module__ = 'zensols.datdesc'¶

exception zensols.datdesc.FigureError(reason, figure=None)[source]¶

Bases: DataDescriptionError

Thrown for any application level error related to creating figures.

__annotations__ = {}¶

__init__(reason, figure=None)[source]¶

__module__ = 'zensols.datdesc'¶

exception zensols.datdesc.LatexTableError(reason, table=None)[source]¶

Bases: DataDescriptionError

Thrown for any application level error related to creating tables.

__annotations__ = {}¶

__init__(reason, table=None)[source]¶

__module__ = 'zensols.datdesc'¶