zensols.datdesc package

Submodules

zensols.datdesc.app module

Generate LaTeX tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file. Paraemters are both files or both directories. When using directories, only files that match *-table.yml are considered.

class zensols.datdesc.app.Application(table_factory, hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None, data_file_regex=re.compile('^.+-table\\\\.yml$'))[source]

Bases: object

Generate LaTeX tables files from CSV files and hyperparameter .sty files.

__init__(table_factory, hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None, data_file_regex=re.compile('^.+-table\\\\.yml$'))
data_file_regex: Pattern = re.compile('^.+-table\\.yml$')

Matches file names of tables process in the LaTeX output.

generate_hyperparam(input_path, output_path, output_format=_OutputFormat.short)[source]

Write hyperparameter formatted data

Parameters:
  • input_path (Path) – definitions YAML path location or directory

  • output_path (Path) – output file or directory

  • output_format (_OutputFormat) – output format of the hyperparameter metadata

generate_tables(input_path, output_path)[source]

Create LaTeX tables.

Parameters:
  • input_path (Path) – definitions YAML path location or directory

  • output_path (Path) – output file or directory

hyperparam_file_regex: Pattern = re.compile('^.+-hyperparam\\.yml$')

Matches file names of tables process in the LaTeX output.

hyperparam_table_default: Settings = None

Default settings for hyperparameter Table instances.

show_table(name=None)[source]

Print a list of example LaTeX tables.

Parameters:

name (str) – the name of the example table or a listing of tables if omitted

table_factory: TableFactory

Reads the table definitions file and writes a Latex .sty file of the generated tables from the CSV data.

write_excel(input_path, output_file=None, output_latex_format=False)[source]

Create an Excel file from table data.

Parameters:
  • input_path (Path) – definitions YAML path location or directory

  • output_file (Path) – the output file, which defaults to the input prefix with the approproate extension

  • output_latex_format (bool) – whether to output with LaTeX commands

class zensols.datdesc.app.PrototypeApplication(app)[source]

Bases: object

CLI_META = {'is_usage_visible': False}
__init__(app)
app: Application
proto()[source]

Prototype test.

zensols.datdesc.cli module

Command line entry point to the application.

class zensols.datdesc.cli.ApplicationFactory(*args, **kwargs)[source]

Bases: ApplicationFactory

__init__(*args, **kwargs)[source]
zensols.datdesc.cli.main(args=['/Users/landes/opt/lib/python/util/bin/sphinx-build', '-M', 'html', '/Users/landes/view/util/datdesc/target/doc/src', '/Users/landes/view/util/datdesc/target/doc/build'], **kwargs)[source]
Return type:

ActionResult

zensols.datdesc.desc module

Metadata container classes.

class zensols.datdesc.desc.DataDescriber(describers, name='default', output_dir=PosixPath('results'), csv_dir=PosixPath('csv'), yaml_dir=PosixPath('config'), mangle_sheet_name=False)[source]

Bases: PersistableContainer, Dictable

Container class for DataFrameDescriber instances. It also saves their instances as CSV data files and YAML configuration files.

SHEET_NAME_MAXLEN: ClassVar[int] = 31

Maximum allowed characters in an Excel spreadsheet’s name.

__init__(describers, name='default', output_dir=PosixPath('results'), csv_dir=PosixPath('csv'), yaml_dir=PosixPath('config'), mangle_sheet_name=False)
add_summary()[source]

Add a new metadata like DataFrameDescriber as a first entry in describers that describes what data this instance currently has.

Return type:

DataFrameDescriber

Returns:

the added metadata DataFrameDescriber instance

csv_dir: Path = PosixPath('csv')

The directory where to write the CSV files.

describers: Tuple[DataFrameDescriber, ...]

The contained dataframe and metadata.

property describers_by_name: Dict[str, DataFrameDescriber]

Data frame describers keyed by the describer name.

format_tables()[source]

See DataFrameDescriber.format_table().

classmethod from_yaml_file(path)[source]

Create a data descriptor from a previously written YAML/CSV files using save().

See:

save()

See:

DataFrameDescriber.from_table()

Return type:

DataDescriber

items()[source]
Return type:

Iterable[Tuple[str, DataFrameDescriber]]

keys()[source]
Return type:

Sequence[str]

mangle_sheet_name: bool = False

Whether to normalize the Excel sheet names when xlsxwriter.exceptions.InvalidWorksheetName is raised.

name: str = 'default'

The name of the dataset.

output_dir: Path = PosixPath('results')

The directory where to write the results.

save(output_dir=None, yaml_dir=None, include_excel=False)[source]

Save both the CSV and YAML configuration file.

Parameters:

include_excel (bool) – whether to also write the Excel file to its default output file name

See:

save_csv()

Return type:

List[Path]

:see save_yaml()

save_csv(output_dir=None)[source]

Save all provided dataframe describers to an CSV files.

Parameters:

output_dir (Path) – the directory of where to save the data

Return type:

List[Path]

save_excel(output_file=None)[source]

Save all provided dataframe describers to an Excel file.

Parameters:

output_file (Path) – the Excel file to write, which needs an .xlsx extension; this defaults to a path created from output_dir and name

Return type:

Path

save_yaml(output_dir=None, yaml_dir=None)[source]

Save all provided dataframe describers YAML files used by the datdesc command.

Parameters:

output_dir (Path) – the directory of where to save the data

Return type:

List[Path]

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, df_params=None)[source]
Parameters:

df_params (Dict[str, Any]) – the formatting pandas options, which defaults to max_colwidth=80

yaml_dir: Path = PosixPath('config')

The directory where to write the CSV files.

class zensols.datdesc.desc.DataFrameDescriber(name, df, desc, head=None, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None)[source]

Bases: PersistableContainer, Dictable

A class that contains a Pandas dataframe, a description of the data, and descriptions of all the columns in that dataframe.

property T: DataFrameDescriber

See transpose().

__init__(name, df, desc, head=None, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None)
create_table(**kwargs)[source]

Create a table from the metadata using:

Parameters:

kwargs – key word arguments that override the default parameterized data passed to Table

Return type:

Table

property csv_path: Path

The CVS file that contains the data this instance describes.

derive(*, name=None, df=None, desc=None, meta=None, index_meta=None)[source]

Create a new instance based on this instance and replace any non-None kwargs.

If meta is provided, it is merged with the metadata of this instance. However, any metadata provided must match in both column names and descriptions.

Parameters:
Raises:

DataDescriptionError – if multiple metadata columns with differing descriptions are found

Return type:

DataFrameDescriber

derive_with_index_meta(index_format=None)[source]

Like derive(), but the dataframe is generated with df_with_index_meta() using index_format as a parameter.

Parameters:

index_format (str) – see df_with_index_meta()

Return type:

DataFrameDescriber

desc: str

The description of the data frame.

df: DataFrame

The dataframe to describe.

df_with_index_meta(index_format=None)[source]

Create a dataframe with the first column containing index metadata. This uses index_meta to create the column values.

Parameters:

index_format (str) – the new index column format using index and value, which defaults to {index}

Return type:

DataFrame

Returns:

the dataframe with a new first column of the index metadata, or df if index_meta is None

format_table()[source]

Replace (in place) dataframe df with the formatted table obtained with Table.formatted_dataframe. The Table is created by with create_table().

classmethod from_columns(source, name=None, desc=None)[source]

Create a new instance by transposing a column data into a new dataframe describer. If source is a dataframe, it that has the following columns:

  • column: the column names of the resulting describer

  • meta: the description that makes up the meta

  • data: Sequence’s of the data

Otherwise, each element of the sequence is a row of column, meta descriptions, and data sequences.

Parameters:
Return type:

DataFrameDescriber

classmethod from_table(tab)[source]

Create a frame descriptor from a Table.

Return type:

DataFrameDescriber

get_table_name(form)[source]

The table derived from name.

Parameters:

form (str) – specifies the format: file means file-friendly, camel is for reverse camel notation

Return type:

str

head: str = None

A short summary of the table and used in Table.head.

index_meta: Dict[Any, str] = None

The index metadata, which maps index values to descriptions of the respective row.

property meta: DataFrame

The column metadata for dataframe, which needs columns name and description. If this is not provided, it is read from file meta_path. If this is set to a tuple of tuples, a dataframe is generated from the form:

((<column name 1>, <column description 1>),
 (<column name 2>, <column description 2>) ...

If both this and meta_path are not provided, the following is used:

(('description', 'Description'),
 ('value', 'Value')))
meta_path: Optional[Path] = None

A path to use to create meta metadata.

See:

meta

name: str

The description of the data this describer holds.

save_csv(output_dir=PosixPath('.'))[source]

Save as a CSV file using csv_path.

Return type:

Path

save_excel(output_dir=PosixPath('.'))[source]

Save as an Excel file using csv_path. The same file naming semantics are used as with DataDescriber.save_excel().

See:

DataDescriber.save_excel()

Return type:

Path

table_kwargs: Dict[str, Any]

Additional key word arguments given when creating a table in create_table().

transpose(row_names=((0, 'value', 'Value'),), name_column='name', name_description='Name', index_column='description')[source]

Transpose all data in this descriptor by transposing df and swapping meta with index_meta as a new instance.

Parameters:
  • row_names (Tuple[int, str, str]) – a tuple of (row index in df, the column in the new df and the metadata description of that column in the new df; the default takes only the first row

  • name_column (str) – the column name this instance’s df

  • description_column – the column description this instance’s df

  • index_column (str) – the name of the new index in the returned instance

Return type:

DataFrameDescriber

Returns:

a new derived instance of the transposed data

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, df_params=None)[source]
Parameters:

df_params (Dict[str, Any]) – the formatting pandas options, which defaults to max_colwidth=80

write_pretty(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_metadata=False, title_format='{name} ({desc})', **tabulate_params)[source]

Like write(), but generate a visually appealing table and optionally column metadata.

zensols.datdesc.dfstash module

A stash implementation that uses a Pandas dataframe and stored as a CSV file.

class zensols.datdesc.dfstash.DataFrameStash(path, dataframe=None, key_column='key', columns=('value',), mkdirs=True, auto_commit=True, single_column_index=0)[source]

Bases: CloseableStash

A backing stash that persists to a CSV file via a Pandas dataframe. All modification go through the pandas.DataFrame and then saved with commit() or close().

__init__(path, dataframe=None, key_column='key', columns=('value',), mkdirs=True, auto_commit=True, single_column_index=0)
auto_commit: bool = True

Whether to save to the file system after any modification.

clear()[source]

Delete all data from the from the stash.

Important: Exercise caution with this method, of course.

close()[source]

Close all resources created by the stash.

columns: Tuple[str, ...] = ('value',)

The columns to create in the spreadsheet. These must be consistent when the data is restored.

commit()[source]

Commit changes to the file system.

property dataframe: DataFrame

The dataframe to proxy in memory. This is settable on instantiation but read-only afterward. If this is not set an empty dataframe is created with the metadata in this class.

delete(name=None)[source]

Delete the resource for data pointed to by name or the entire resource if name is not given.

dump(name, inst)[source]

Persist data value inst with key name.

exists(name)[source]

Return True if data with key name exists.

Implementation note: This Stash.exists() method is very inefficient and should be overriden.

Return type:

bool

get(name, default=None)[source]

Load an object or a default if key name doesn’t exist. Semantically, this method tries not to re-create the data if it already exists. This means that if a stash has built-in caching mechanisms, this method uses it.

See:

load()

Return type:

Union[Any, Tuple[Any, ...]]

key_column: str = 'key'

The spreadsheet column name used to store stash keeys.

keys()[source]

Return an iterable of keys in the collection.

Return type:

Iterable[str]

load(name)[source]

Load a data value from the pickled data with key name. Semantically, this method loads the using the stash’s implementation. For example DirectoryStash loads the data from a file if it exists, but factory type stashes will always re-generate the data.

See:

get()

Return type:

Union[Any, Tuple[Any, ...]]

mkdirs: bool = True

Whether to recusively create the directory where path is stored if it does not already exist.

path: Path

The path of the file from which to read and write.

single_column_index: Optional[int] = 0

If this is set, then a single type is assumed for loads and restores. Otherwise, if set to None, multiple columns are saved and retrieved.

values()[source]

Return the values in the hash.

Return type:

Iterable[Union[Any, Tuple[Any, ...]]]

zensols.datdesc.hyperparam module

Hyperparameter metadata: access and documentation. This package was designed for the following purposes:

  • Provide a basic scaffolding to update model hyperparameters such as hyperopt.

  • Generate LaTeX tables of the hyperparamers and their descriptions for academic papers.

The object instance graph hierarchy is:

Access to the hyperparameters is done by calling the set or model levels with a dotted path notation string. For example, svm.C first navigates to model svm, then to the hyperparameter named C.

class zensols.datdesc.hyperparam.Hyperparam(name, type, doc, choices=None, value=None, interval=None)[source]

Bases: Dictable

A hyperparameter’s metadata, documentation and value. The value is accessed (retrieval and setting) at runtime. Do not use this class explicitly. Instead use HyperparamModel.

The index access only applies when type is list or dict. Otherwise, the value member has the value of the hyperparameter.

CLASS_MAP: ClassVar[Dict[str, Type]] = {'bool': <class 'bool'>, 'choice': <class 'str'>, 'dict': <class 'dict'>, 'float': <class 'float'>, 'int': <class 'int'>, 'list': <class 'list'>, 'str': <class 'str'>}

A mapping for values set in type to their Python class equivalents.

VALID_TYPES: ClassVar[str] = frozenset({'bool', 'choice', 'dict', 'float', 'int', 'list', 'str'})

Valid settings for type.

__init__(name, type, doc, choices=None, value=None, interval=None)
choices: Tuple[str, ...] = None

When type is choice, the value strings used in value.

property cls: Type

The Python equivalent class of type.

doc: str

The human readable documentation for the hyperparameter. This is used in documentation generation tasks.

get_type_str(short=True)[source]
Return type:

str

interval: Union[Tuple[float, float], Tuple[int, int]] = None

Valid intervals for value as an inclusive interval.

property interval_str: str
name: str

The name of the hyperparameter (i.e. C or learning_rate).

type: str

The type of value (i.e. float, or int).

property value: str | float | int | bool | list | dict | None

The value of the hyperparamer used in the application.

write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]
class zensols.datdesc.hyperparam.HyperparamContainer[source]

Bases: Dictable

A container class for Hyperparam instances.

__init__()
abstract flatten(deep=False)[source]

Return a flattened directory with the dotted path notation (see module docs).

Parameters:

deep (bool) – if True recurse in to dict and list hyperparameter values

Return type:

Dict[str, Any]

update(params)[source]

Update parameter values.

Parameters:

params (Union[Dict[str, Any], HyperparamContainer]) – a dict of dotted path notation keys

abstract write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write Sphinx autodoc used in a class as dataclasses.dataclass field.

exception zensols.datdesc.hyperparam.HyperparamError[source]

Bases: DataDescriptionError

Raised for any error related hyperparameter access.

__module__ = 'zensols.datdesc.hyperparam'
class zensols.datdesc.hyperparam.HyperparamModel(name, doc, desc=None, params=<factory>, table=None)[source]

Bases: HyperparamContainer

The model level class that contains the parameters. This class represents a machine learning model such as a SVM with hyperparameters such as C and maximum iterations.

__init__(name, doc, desc=None, params=<factory>, table=None)
clone()[source]

Make a copy of this instance.

Return type:

HyperparamModel

create_dataframe_describer()[source]

Return an object with metadata fully describing the hyperparameters of this model.

Return type:

DataFrameDescriber

desc: str = None

name is not sufficient. Since name has naming constraints, this can be used as in place during documentation generation.

Type:

The description the model used in the documentation when obj

doc: str

The human readable documentation for the model.

flatten(deep=False)[source]

Return a flattened directory with the dotted path notation (see module docs).

Parameters:

deep (bool) – if True recurse in to dict and list hyperparameter values

Return type:

Dict[str, Any]

get(name)[source]
Return type:

HyperparamModel

property metadata_dataframe: DataFrame

A dataframe describing the values_dataframe.

name: str

The name of the model (i.e. svm). This name can have only alpha-numeric and underscore charaters.

params: Dict[str, Hyperparam]

The hyperparameters keyed by their names.

table: Optional[Dict[str, Any]] = None

Overriding data used when creating a Table from DataFrameDescriber.create_table().

property values_dataframe: DataFrame

A dataframe with parameter data. This includes the name, type, value and documentation.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write Sphinx autodoc used in a class as dataclasses.dataclass field.

class zensols.datdesc.hyperparam.HyperparamSet(models=<factory>, name=None)[source]

Bases: HyperparamContainer

The top level in the object graph hierarchy (see module docs). This contains a set of models and typically where calls by packages such as hyperopt are used to update the hyperparameters of the model(s).

__init__(models=<factory>, name=None)
create_describer(meta_path=None)[source]

Return an object with metadata fully describing the hyperparameters of this model.

Parameters:

meta_path (Path) – if provided, set the path on the returned instance

Return type:

DataDescriber

flatten(deep=False)[source]

Return a flattened directory with the dotted path notation (see module docs).

Parameters:

deep (bool) – if True recurse in to dict and list hyperparameter values

Return type:

Dict[str, Any]

get(name)[source]
Return type:

HyperparamModel

models: Dict[str, HyperparamModel]

The models containing hyperparameters for this set.

name: Optional[str] = None

The name fo the hyperparameter set.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write Sphinx autodoc used in a class as dataclasses.dataclass field.

class zensols.datdesc.hyperparam.HyperparamSetLoader(data, config=None, updates=())[source]

Bases: object

Loads a set of hyperparameters from a YAML pathlib.Path, dict or stream io.TextIOBase.

__init__(data, config=None, updates=())
config: Configurable = None

The application configuration used to update the hyperparameters from other sections.

data: Union[Dict[str, Any], Path, TextIOBase]

The source of data to load, which is a YAML pathlib.Path, dict or stream io.TextIOBase.

See:

updates

load(**kwargs) HyperparamSet

Load and return the hyperparameter object graph from data.

Return type:

HyperparamSet

updates: Sequence[Dict[str, Any]] = ()

A sequence of dictionaries with keys as HyperparamModel names and values as sections with values to set after loading using data.

exception zensols.datdesc.hyperparam.HyperparamValueError[source]

Bases: HyperparamError

Raised for bad values set on a hyperparameter.

__annotations__ = {}
__module__ = 'zensols.datdesc.hyperparam'

zensols.datdesc.latex module

Contains the manager classes that invoke the tables to generate.

class zensols.datdesc.latex.CsvToLatexTable(tables, package_name)[source]

Bases: Writable

Generate a Latex table from a CSV file.

__init__(tables, package_name)
package_name: str

The name Latex .sty package.

tables: Sequence[Table]

A list of table instances to create Latex table definitions.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write the Latex table to the writer given in the initializer.

class zensols.datdesc.latex.LatexTable(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None)[source]

Bases: Table

This subclass generates LaTeX tables.

__init__(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None)
format_scientific(x, sig_digits=1)[source]

Format x in scientific notation.

Parameters:
  • x (float) – the number to format

  • sig_digits (int) – the number of digits after the decimal point

Return type:

str

class zensols.datdesc.latex.SlackTable(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, slack_column=0)[source]

Bases: LatexTable

An instance of the table that fills up space based on the widest column.

__init__(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, slack_column=0)
property columns: str

Return the columns field in the Latex environment header.

slack_column: int = 0

Which column elastically grows or shrinks to make the table fit.

zensols.datdesc.opt module

Contains container and utility classes for hyperparameter optimization. These classes find optimial hyperparamters for a model and save the results as JSON files. This module is meant to be used by command line applications configured as Zensols Resource libraries.

see:

Resource libraries

class zensols.datdesc.opt.CompareResult(initial_param, initial_loss, initial_scores, best_eval_ix, best_param, best_loss, best_scores)[source]

Bases: Dictable

Contains the loss and scores of an initial run and a run found on the optimal hyperparameters.

__init__(initial_param, initial_loss, initial_scores, best_eval_ix, best_param, best_loss, best_scores)
best_eval_ix: int

The optimized hyperparameters.

best_loss: float

The optimized loss.

best_param: Dict[str, Any]

The optimized hyperparameters.

best_scores: DataFrame

The optimized scores.

initial_loss: float

The initial loss.

initial_param: Dict[str, Any]

The initial hyperparameters.

initial_scores: DataFrame

The initial scores.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.datdesc.opt.HyperparamResult(name, hyp, scores, loss, eval_ix)[source]

Bases: Dictable

Results of an optimization and optionally the best fit.

__init__(name, hyp, scores, loss, eval_ix)
eval_ix: int

The index of the optimiation.

classmethod from_file(path)[source]

Restore a result from a file name.

Parameters:

path (Path) – the path from which to restore

Return type:

HyperparamResult

hyp: HyperparamModel

The updated hyperparameters.

loss: float

The last loss.

name: str

The name of the of HyperparameterOptimizer, which is the directory name.

scores: DataFrame

The last score results computed during the optimization.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.datdesc.opt.HyperparamRun(runs)[source]

Bases: Dictable

A container for the entire optimization run. The best run contains the best fit (HyperparamResult) as predicted by the hyperparameter optimization algorithm.

__init__(runs)
property best_result: HyperparamResult

The result that had the lowest loss.

property final: HyperparamResult

The results of the final run, which as the best fit (see class docs).

property final_path: Path

The path of the final run.

classmethod from_dir(path)[source]

Return an instance with the runs stored in directory path.

Return type:

HyperparamRun

property initial_loss: float

The loss from the first run.

property loss_stats: Dict[str, float]

The loss statistics (min, max, ave, etc).

property losses: Tuple[float]

The loss value for all runs

runs: Tuple[Tuple[Path, HyperparamResult]]

The results from previous runs.

class zensols.datdesc.opt.HyperparameterOptimizer(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)[source]

Bases: object

Creates the files used to score optimizer output.

__init__(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)
property aggregate_score_dir: Path

The output directory containing runs with the best parameters of the top N results (see aggregate_scores()).

aggregate_scores()[source]

Aggregate best score results as a separate CSV file for each data point with get_score_dataframe(). This is saved as a separate file for each optmiziation run since this method can take a long time as it will re-score the dataset. These results are then “stiched” together with gather_aggregate_scores().

baseline_path: Path = None

A JSON file with hyperparameter settings to set on start. This file contains the output portion of the final.json results, (which are the results parsed and set in HyperparamResult).

property config_factory: ConfigFactory

The app config factory.

gather_aggregate_scores()[source]

Return a dataframe of all the aggregate scores written by aggregate_scores().

Return type:

DataFrame

get_best_result()[source]
Return type:

HyperparamResult

get_best_results()[source]

Return the best results across all hyperparameter optimization runs with keys as run names.

Return type:

Dict[str, HyperparamResult]

get_comparison()[source]

Compare the scores of the default parameters with those predicted by the optimizer of the best run.

Return type:

CompareResult

get_run(result_dir=None)[source]

Get the best run from the file system.

Parameters:

result_dir (Path) – the result directory, which defaults to opt_intermediate_dir

Return type:

HyperparamRun

get_score_dataframe(iterations=None)[source]

Create a dataframe from the results scored from the best hyperparameters.

Parameters:

iterations (int) – the number times the objective is called to produce the results (the objective space is not altered)

Return type:

DataFrame

hyperparam_names: Tuple[str, ...] = ()

The name of the hyperparameters to use to create the space.

See:

_create_space()

property hyperparams: HyperparamModel

The model hyperparameters to be updated by the optimizer.

intermediate_dir: Path = PosixPath('opthyper')

The directory where the intermediate results are saved while the algorithm works.

max_evals: int = 1

The maximum number of evaluations of the hyperparmater optimization algorithm to execute.

name: str = 'default'

The name of the optimization experiment set. This has a bearing on where files are stored (see opt_intermediate_dir).

property opt_intermediate_dir: Path

The optimization result directory for the config/parser.

optimize()[source]

Run the optimization algorithm.

remove_result()[source]

Remove an entire run’s previous optimization results.

property results_intermediate_dir: Path

The directory that has all intermediate results by subdirectory name.

show_progressbar: bool = True

Whether or not to show the progress bar while running the optimization.

write_best_result(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_param_json=False)[source]

Print the results from the best run.

Parameters:

include_param_json (bool) – whether to output the JSON formatted hyperparameters

write_compare(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write the results of a compare of the initial hyperparameters against the optimized.

write_score(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Restore the hyperparameter state, score the data and print the results. Use the baseline parameters if available, otherwise use the parameters from the best best run.

Return type:

HyperparamResult

write_scores(output_file=None, iterations=None)[source]

Write a file of the results scored from the best hyperparameters.

Parameters:
  • output_file (Path) – where to write the CSV file; defaults to a file in opt_intermediate_dir

  • iterations (int) – the number times the objective is called to produce the results (the objective space is not altered)

zensols.datdesc.optscore module

An optimizer that uses a Scorer as an objective.

class zensols.datdesc.optscore.ScoringHyperparameterOptimizer(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)[source]

Bases: HyperparameterOptimizer

An optimizer that uses a Scorer as the objective and a means to determine the loss. The default loss function is defined as 1 - F1 using the f1_score as_dataframe() column.

__init__(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)
property scorer: Scorer

zensols.datdesc.table module

This module contains classes that generate tables.

class zensols.datdesc.table.Table(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None)[source]

Bases: PersistableContainer, Dictable

Generates a Zensols styled Latex table from a CSV file.

__init__(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None)
asflatdict(*args, **kwargs)[source]

Like asdict() but flatten in to a data structure suitable for writing to JSON or YAML.

Return type:

Dict[str, Any]

blank_columns: List[int]

A list of column indexes to set to the empty string (i.e. 0th to fixed the Unnamed: 0 issues).

bold_cells: List[Tuple[int, int]]

A list of row/column cells to bold.

bold_max_columns: List[str]

A list of column names that will have its max value bolded.

capitalize_columns: Dict[str, bool]

Capitalize either sentences (False values) or every word (True values). The keys are column names.

caption: str = ''

The human readable string used to the caption in the table.

code_post: str = None

Like code_pre but modifies the table after this class’s modifications of the table.

code_pre: str = None

Python code executed that manipulates the table’s dataframe before modifications made by this class. The code has a local df variable and the returned value is used as the replacement. This is usually a one-liner used to subset the data etc. The code is evaluated with eval().

column_aligns: str = None

The alignment/justification (i.e. |l|l| for two columns). If not provided, they are automatically generated based on the columns of the table.

column_evals: Dict[str, str]

Keys are column names with values as functions (i.e. lambda expressions) evaluated with a single column value parameter. The return value replaces the column identified by the key.

column_keeps: Optional[List[str]] = None

If provided, only keep the columns in the list

column_removes: List[str]

The name of the columns to remove from the table, if any.

column_renames: Dict[str, str]

Columns to rename, if any.

column_value_replaces: Dict[str, Dict[Any, Any]]

Data values to replace in the dataframe. It is keyed by the column name and values are the replacements. Each value is a dict with orignal value keys and the replacements as values.

property columns: str

Return the columns field in the Latex environment header.

property dataframe: DataFrame

The Pandas dataframe that holds the CSV data.

default_params: Sequence[Sequence[str]]

Default parameters to be substituted in the template that are interpolated by the LaTeX numeric values such as #1, #2, etc. This is a sequence (list or tuple) of (<name>, [<default>]) where name is substituted by name in the template and default is the default if not given in params.

definition_file: Path = None

The YAML file from which this instance was created.

double_hlines: Sequence[int]

Indexes of rows to put double horizontal line breaks.

abstract format_scientific(x, sig_digits=1)[source]

Format x in scientific notation.

Parameters:
  • x (float) – the number to format

  • sig_digits (int) – the number of digits after the decimal point

Return type:

str

format_scientific_column_names: Dict[str, Optional[int]]

Format a column using LaTeX formatted scientific notation using format_scientific(). Keys are column names and values is the mantissa length or 1 if None.

static format_thousand(x, apply_k=True, add_comma=True)[source]

Format a number as a string with comma separating thousands.

Parameters:
  • x (int) – the number to format

  • apply_k (bool) – add a K to the end of large numbers

  • add_comma (bool) – whether to add a comma

Return type:

str

format_thousands_column_names: Dict[str, Optional[Dict[str, Any]]]

Columns to format using thousands. The keys are the column names of the table and the values are either None or the keyword arguments to format_thousand().

property formatted_dataframe: DataFrame

The dataframe with the formatting applied to it used to create the Latex table. Modifications such as string replacements for adding percents is done.

head: str = None

The header to use for the table, which is used as the text in the list of tables and made bold in the table.

hlines: Sequence[int]

Indexes of rows to put horizontal line breaks.

index_col_name: str = None

If set, add an index column with the given name.

make_percent_column_names: Dict[str, int]

Each columnn in the map will get rounded to the value * 100 of the name. For example, {'ann_per': 3} will round column ann_per to 3 decimal places.

name: str

The name of the table, also used as the label.

property package_name: str

Return the package name for the table in table_path.

params: Dict[str, str]

Parameters used in the template that override of the default_params.

path: Union[Path, str]

The path to the CSV file to make a latex table.

percent_column_names: Sequence[str] = ()

Column names that have a percent sign to be escaped.

read_params: Dict[str, str]

Keyword arguments used in the read_csv() call when reading the CSV file.

replace_nan: str = None

Replace NaN values with a the value of this field as tabulate() is not using the missing value due to some bug I assume.

round_column_names: Dict[str, int]

Each column in the map will get rounded to their respective values.

tabulate_params: Dict[str, str]

Keyword arguments used in the tabulate() call when writing the table. The default tells tabulate to not parse/format numerical data.

template: str

The table template, which lives in the application configuration obj.yml.

type: str = None
uses: List[str]

Comma separated list of packages to use.

variables: Dict[str, Union[Tuple[int, int], str]]

A mapping of variable names to a dataframe cell or Python code snipped that is evaluated with exec(). In LaTeX, this is done by setting a newcommand (see LatexTable).

If set to a tuple of (<row>, <column>) the value of the pre-formatted dataframe is used (see unformatted below).

If a Python evalution string, the code values must set variables v to the variable value. A variable stages is a Dict used to get one of the dataframes created at various stages of formatting the table with entries:

  • nascent: same as dataframe

  • unformatted: after the pre-evaluation but before any formatting

  • postformat: after number formatting and post evaluation, but before remaining column and cell modifications

  • formatted: same as formatted_dataframe

For example, the following uses the value at row 2 and column 3 of the unformatted dataframe:

v = stages['unformatted'].iloc[2, 3]
write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

writes: List[str]

A list of what to output for this table. Entries are table and varaibles.

class zensols.datdesc.table.TableFactory(config_factory, table_section_regex, default_table_type)[source]

Bases: Dictable

Reads the table definitions file and writes a Latex .sty file of the generated tables from the CSV data.

__init__(config_factory, table_section_regex, default_table_type)
config_factory: ConfigFactory

The configuration factory used to create Table instances.

create(type=None, **params)[source]

Create a table from the application configuration.

Parameters:
  • type (str) – the name used to find the table by section

  • params (Dict[str, Any]) – the keyword arguments used to create the table

Return type:

Table

Returns:

a new instance of the table defined by the template

See:

get_table_names()

classmethod default_instance()[source]

Get the singleton instance.

Return type:

TableFactory

default_table_type: str

The default name, which resolves to a section name, to use when creating anonymous tables.

from_file(table_path)[source]

Return tables parsed from a YAML file.

Parameters:

table_path (Path) – the file containing the table configurations

Return type:

Iterable[Table]

get_table_names()[source]

Return names of tables used in :meth:create.

Return type:

Iterable[str]

classmethod reset_default_instance()[source]

Force :meth:`default_instance’ to re-instantiate a new instance on a subsequent call.

table_section_regex: Pattern

A regular expression that matches table entries.

to_file(table, table_path)[source]
Return type:

Dict[str, Any]

Module contents

Generate Latex tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file.

Example::
latextablenamehere:

type: slack slack_col: 0 path: ../config/table-name.csv caption: Some Caption placement: t! size: small single_column: true percent_column_names: [‘Proportion’]

exception zensols.datdesc.DataDescriptionError[source]

Bases: APIError

Thrown for any application level error.

__annotations__ = {}
__module__ = 'zensols.datdesc'
exception zensols.datdesc.LatexTableError(reason, table=None)[source]

Bases: DataDescriptionError

Thrown for any application level error related to creating tables.

__annotations__ = {}
__init__(reason, table=None)[source]
__module__ = 'zensols.datdesc'