zensols.datdesc package#

Submodules#

zensols.datdesc.app#

Inheritance diagram of zensols.datdesc.app

Generate LaTeX tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file. Paraemters are both files or both directories. When using directories, only files that match *-table.yml are considered.

class zensols.datdesc.app.Application(data_file_regex=re.compile('^.+-table\\\\.yml$'), hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None)[source]#

Bases: object

Generate LaTeX tables files from CSV files and hyperparameter .sty files.

__init__(data_file_regex=re.compile('^.+-table\\\\.yml$'), hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None)#
data_file_regex: Pattern = re.compile('^.+-table\\.yml$')#

Matches file names of tables process in the LaTeX output.

generate_hyperparam(input_path, output_path, output_format=_OutputFormat.short)[source]#

Write hyperparameter formatted data

Parameters:
  • input_path (Path) – definitions YAML path location or directory

  • output_path (Path) – output file or directory

  • output_format (_OutputFormat) – output format of the hyperparameter metadata

generate_tables(input_path, output_path)[source]#

Create LaTeX tables.

Parameters:
  • input_path (Path) – definitions YAML path location or directory

  • output_path (Path) – output file or directory

hyperparam_file_regex: Pattern = re.compile('^.+-hyperparam\\.yml$')#

Matches file names of tables process in the LaTeX output.

hyperparam_table_default: Settings = None#

Default settings for hyperparameter Table instances.

write_excel(input_path, output_file=None, output_latex_format=False)[source]#

Create an Excel file from table data.

Parameters:
  • input_path (Path) – definitions YAML path location or directory

  • output_file (Path) – the output file, which defaults to the input prefix with the approproate extension

  • output_latex_format (bool) – whether to output with LaTeX commands

class zensols.datdesc.app.PrototypeApplication(app)[source]#

Bases: object

CLI_META = {'is_usage_visible': False}#
__init__(app)#
app: Application#
proto()[source]#

Prototype test.

zensols.datdesc.cli#

Inheritance diagram of zensols.datdesc.cli

Command line entry point to the application.

class zensols.datdesc.cli.ApplicationFactory(*args, **kwargs)[source]#

Bases: ApplicationFactory

__init__(*args, **kwargs)[source]#
zensols.datdesc.cli.main(args=['/Users/landes/opt/lib/python/bin/sphinx-build', '-M', 'html', '/Users/landes/view/util/datdesc/target/doc/src', '/Users/landes/view/util/datdesc/target/doc/build'], **kwargs)[source]#
Return type:

ActionResult

zensols.datdesc.desc#

Inheritance diagram of zensols.datdesc.desc

Metadata container classes.

class zensols.datdesc.desc.DataDescriber(describers, name='default', output_dir=PosixPath('results'), csv_dir=PosixPath('csv'), yaml_dir=PosixPath('config'), mangle_sheet_name=False)[source]#

Bases: PersistableContainer, Dictable

Container class for DataFrameDescriber instances. It also saves their instances as CSV data files and YAML configuration files.

__init__(describers, name='default', output_dir=PosixPath('results'), csv_dir=PosixPath('csv'), yaml_dir=PosixPath('config'), mangle_sheet_name=False)#
add_summary()[source]#

Add a new metadata like DataFrameDescriber as a first entry in describers that describes what data this instance currently has.

Return type:

DataFrameDescriber

Returns:

the added metadata DataFrameDescriber instance

csv_dir: Path = PosixPath('csv')#

The directory where to write the CSV files.

describers: Tuple[DataFrameDescriber]#

The contained dataframe and metadata.

property describers_by_name: Dict[str, DataFrameDescriber]#

Data frame describers keyed by the describer name.

format_tables()[source]#

See DataFrameDescriber.format_table().

classmethod from_yaml_file(path)[source]#

Create a data descriptor from a previously written YAML/CSV files using save().

See:

save()

See:

DataFrameDescriber.from_table()

Return type:

DataDescriber

keys()[source]#
Return type:

Sequence[str]

mangle_sheet_name: bool = False#

Whether to normalize the Excel sheet names when xlsxwriter.exceptions.InvalidWorksheetName is raised.

name: str = 'default'#

The name of the dataset.

output_dir: Path = PosixPath('results')#

The directory where to write the results.

save(output_dir=None, yaml_dir=None, include_excel=False)[source]#

Save both the CSV and YAML configuration file.

Parameters:

include_excel (bool) – whether to also write the Excel file to its default output file name

See:

save_csv()

Return type:

List[Path]

:see save_yaml()

save_csv(output_dir=None)[source]#

Save all provided dataframe describers to an CSV files.

Parameters:

output_dir (Path) – the directory of where to save the data

Return type:

List[Path]

save_excel(output_file=None)[source]#

Save all provided dataframe describers to an Excel file.

Parameters:

output_file (Path) – the Excel file to write, which needs an .xlsx extension; this defaults to a path created from output_dir and name

Return type:

Path

save_yaml(output_dir=None, yaml_dir=None)[source]#

Save all provided dataframe describers YAML files used by the datdesc command.

Parameters:

output_dir (Path) – the directory of where to save the data

Return type:

List[Path]

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, df_params=None)[source]#
Parameters:

df_params (Dict[str, Any]) – the formatting pandas options, which defaults to max_colwidth=80

yaml_dir: Path = PosixPath('config')#

The directory where to write the CSV files.

class zensols.datdesc.desc.DataFrameDescriber(name, df, desc, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None)[source]#

Bases: PersistableContainer, Dictable

A class that contains a Pandas dataframe, a description of the data, and descriptions of all the columns in that dataframe.

__init__(name, df, desc, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None)#
create_table(**kwargs)[source]#

Create a table from the metadata using:

Parameters:

kwargs – key word arguments that override the default parameterized data passed to Table

Return type:

Table

property csv_path: Path#

The CVS file that contains the data this instance describes.

derive(*, name=None, df=None, desc=None, meta=None, index_meta=None)[source]#

Create a new instance based on this instance and replace any non-None kwargs.

If meta is provided, it is merged with the metadata of this instance. However, any metadata provided must match in both column names and descriptions.

Parameters:
Raises:

DataDescriptionError – if multiple metadata columns with differing descriptions are found

Return type:

DataFrameDescriber

derive_with_index_meta(index_format=None)[source]#

Like derive(), but the dataframe is generated with df_with_index_meta() using index_format as a parameter.

Parameters:

index_format (str) – see df_with_index_meta()

Return type:

DataFrameDescriber

desc: str#

The description of the data frame.

df: DataFrame#

The dataframe to describe.

df_with_index_meta(index_format=None)[source]#

Create a dataframe with the first column containing index metadata. This uses index_meta to create the column values.

Parameters:

index_format (str) – the new index column format using index and value, which defaults to {index}

Return type:

DataFrame

Returns:

the dataframe with a new first column of the index metadata, or df if index_meta is None

format_table()[source]#

Replace (in place) dataframe df with the formatted table obtained with Table.formatted_dataframe. The Table is created by with create_table().

classmethod from_table(tab)[source]#

Create a frame descriptor from a Table.

Return type:

DataFrameDescriber

index_meta: Dict[Any, str] = None#

The index metadata, which maps index values to descriptions of the respective row.

property meta: DataFrame#

The column metadata for dataframe, which needs columns name and description. If this is not provided, it is read from file meta_path. If this is set to a tuple of tuples, a dataframe is generated from the form:

((<column name 1>, <column description 1>),
 (<column name 2>, <column description 2>) ...

If both this and meta_path are not provided, the following is used:

(('description', 'Description'),
 ('value', 'Value')))
meta_path: Optional[Path] = None#

A path to use to create meta metadata.

See:

meta

name: str#

The description of the data this describer holds.

property tab_name: str#

The table derived from name.

table_kwargs: Dict[str, Any]#

Additional key word arguments given when creating a table in create_table().

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, df_params=None)[source]#
Parameters:

df_params (Dict[str, Any]) – the formatting pandas options, which defaults to max_colwidth=80

write_pretty(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_metadata=False, title_format='{name} ({desc})', **tabulate_params)[source]#

Like write(), but generate a visually appealing table and optionally column metadata.

zensols.datdesc.domain#

Inheritance diagram of zensols.datdesc.domain

Domain classes used by the API.

exception zensols.datdesc.domain.DataDescriptionError[source]#

Bases: APIError

Thrown for any application level error.

__module__ = 'zensols.datdesc.domain'#
exception zensols.datdesc.domain.LatexTableError[source]#

Bases: DataDescriptionError

Thrown for any application level error related to creating tables.

__annotations__ = {}#
__module__ = 'zensols.datdesc.domain'#
class zensols.datdesc.domain.VariableParam(name, index_format='#{index}', value_format='\\\\{val}')[source]#

Bases: object

Represents a Latex command variable.

__init__(name, index_format='#{index}', value_format='\\\\{val}')#
index_format: str = '#{index}'#

Text to generate for the index number.

name: str#

The name of the variable.

value_format: str = '\\{val}'#

Text to generate for the value of the variable.

zensols.datdesc.hyperparam#

Inheritance diagram of zensols.datdesc.hyperparam

Hyperparameter metadata: access and documentation. This package was designed for the following purposes:

  • Provide a basic scaffolding to update model hyperparameters such as hyperopt.

  • Generate LaTeX tables of the hyperparamers and their descriptions for academic papers.

The object instance graph hierarchy is:

Access to the hyperparameters is done by calling the set or model levels with a dotted path notation string. For example, svm.C first navigates to model svm, then to the hyperparameter named C.

class zensols.datdesc.hyperparam.Hyperparam(name, type, doc, choices=None, value=None, interval=None)[source]#

Bases: Dictable

A hyperparameter’s metadata, documentation and value. The value is accessed (retrieval and setting) at runtime. Do not use this class explicitly. Instead use HyperparamModel.

The index access only applies when type is list or dict. Otherwise, the value member has the value of the hyperparameter.

CLASS_MAP: ClassVar[Dict[str, Type]] = {'bool': <class 'bool'>, 'choice': <class 'str'>, 'dict': <class 'dict'>, 'float': <class 'float'>, 'int': <class 'int'>, 'list': <class 'list'>, 'str': <class 'str'>}#

A mapping for values set in type to their Python class equivalents.

VALID_TYPES: ClassVar[str] = frozenset({'bool', 'choice', 'dict', 'float', 'int', 'list', 'str'})#

Valid settings for type.

__init__(name, type, doc, choices=None, value=None, interval=None)#
choices: Tuple[str, ...] = None#

When type is choice, the value strings used in value.

property cls: Type#

The Python equivalent class of type.

doc: str#

The human readable documentation for the hyperparameter. This is used in documentation generation tasks.

get_type_str(short=True)[source]#
Return type:

str

interval: Union[Tuple[float, float], Tuple[int, int]] = None#

Valid intervals for value as an inclusive interval.

property interval_str: str#
name: str#

The name of the hyperparameter (i.e. C or learning_rate).

type: str#

The type of value (i.e. float, or int).

property value: str | float | int | bool | list | dict | None#

The value of the hyperparamer used in the application.

write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#
class zensols.datdesc.hyperparam.HyperparamContainer[source]#

Bases: Dictable

A container class for Hyperparam instances.

__init__()#
abstract flatten(deep=False)[source]#

Return a flattened directory with the dotted path notation (see module docs).

Parameters:

deep (bool) – if True recurse in to dict and list hyperparameter values

Return type:

Dict[str, Any]

update(params)[source]#

Update parameter values.

Parameters:

params (Union[Dict[str, Any], HyperparamContainer]) – a dict of dotted path notation keys

abstract write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write Sphinx autodoc used in a class as dataclasses.dataclass field.

exception zensols.datdesc.hyperparam.HyperparamError[source]#

Bases: DataDescriptionError

Raised for any error related hyperparameter access.

__annotations__ = {}#
__module__ = 'zensols.datdesc.hyperparam'#
class zensols.datdesc.hyperparam.HyperparamModel(name, doc, desc=None, params=<factory>, table=None)[source]#

Bases: HyperparamContainer

The model level class that contains the parameters. This class represents a machine learning model such as a SVM with hyperparameters such as C and maximum iterations.

__init__(name, doc, desc=None, params=<factory>, table=None)#
clone()[source]#

Make a copy of this instance.

Return type:

HyperparamModel

create_dataframe_describer()[source]#

Return an object with metadata fully describing the hyperparameters of this model.

Return type:

DataFrameDescriber

desc: str = None#

name is not sufficient. Since name has naming constraints, this can be used as in place during documentation generation.

Type:

The description the model used in the documentation when obj

doc: str#

The human readable documentation for the model.

flatten(deep=False)[source]#

Return a flattened directory with the dotted path notation (see module docs).

Parameters:

deep (bool) – if True recurse in to dict and list hyperparameter values

Return type:

Dict[str, Any]

get(name)[source]#
Return type:

HyperparamModel

property metadata_dataframe: DataFrame#

A dataframe describing the values_dataframe.

name: str#

The name of the model (i.e. svm). This name can have only alpha-numeric and underscore charaters.

params: Dict[str, Hyperparam]#

The hyperparameters keyed by their names.

table: Optional[Dict[str, Any]] = None#

Overriding data used when creating a Table from DataFrameDescriber.create_table().

property values_dataframe: DataFrame#

A dataframe with parameter data. This includes the name, type, value and documentation.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]#

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write Sphinx autodoc used in a class as dataclasses.dataclass field.

class zensols.datdesc.hyperparam.HyperparamSet(models=<factory>, name=None)[source]#

Bases: HyperparamContainer

The top level in the object graph hierarchy (see module docs). This contains a set of models and typically where calls by packages such as hyperopt are used to update the hyperparameters of the model(s).

__init__(models=<factory>, name=None)#
create_describer(meta_path=None)[source]#

Return an object with metadata fully describing the hyperparameters of this model.

Parameters:

meta_path (Path) – if provided, set the path on the returned instance

Return type:

DataDescriber

flatten(deep=False)[source]#

Return a flattened directory with the dotted path notation (see module docs).

Parameters:

deep (bool) – if True recurse in to dict and list hyperparameter values

Return type:

Dict[str, Any]

get(name)[source]#
Return type:

HyperparamModel

models: Dict[str, HyperparamModel]#

The models containing hyperparameters for this set.

name: Optional[str] = None#

The name fo the hyperparameter set.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]#

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

write_sphinx(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write Sphinx autodoc used in a class as dataclasses.dataclass field.

class zensols.datdesc.hyperparam.HyperparamSetLoader(data, config=None, updates=())[source]#

Bases: object

Loads a set of hyperparameters from a YAML pathlib.Path, dict or stream io.TextIOBase.

__init__(data, config=None, updates=())#
config: Configurable = None#

The application configuration used to update the hyperparameters from other sections.

data: Union[Dict[str, Any], Path, TextIOBase]#

The source of data to load, which is a YAML pathlib.Path, dict or stream io.TextIOBase.

See:

updates

load(**kwargs) HyperparamSet#

Load and return the hyperparameter object graph from data.

Return type:

HyperparamSet

updates: Sequence[Dict[str, Any]] = ()#

A sequence of dictionaries with keys as HyperparamModel names and values as sections with values to set after loading using data.

exception zensols.datdesc.hyperparam.HyperparamValueError[source]#

Bases: HyperparamError

Raised for bad values set on a hyperparameter.

__annotations__ = {}#
__module__ = 'zensols.datdesc.hyperparam'#

zensols.datdesc.mng#

Inheritance diagram of zensols.datdesc.mng

Contains the manager classes that invoke the tables to generate.

class zensols.datdesc.mng.CsvToLatexTable(tables, package_name)[source]#

Bases: Writable

Generate a Latex table from a CSV file.

__init__(tables, package_name)#
package_name: str#

The name Latex .sty package.

tables: List[Table]#

A list of table instances to create Latex table definitions.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write the Latex table to the writer given in the initializer.

class zensols.datdesc.mng.TableFileManager(table_path)[source]#

Bases: object

Reads the table definitions file and writes a Latex .sty file of the generated tables from the CSV data.

__init__(table_path)#
property package_name: str#
table_path: Path#

The path to the table YAML defintiions file.

property tables: List[Table]#

zensols.datdesc.opt#

Inheritance diagram of zensols.datdesc.opt

Contains container and utility classes for hyperparameter optimization. These classes find optimial hyperparamters for a model and save the results as JSON files. This module is meant to be used by command line applications configured as Zensols Resource libraries.

see:

Resource libraries

class zensols.datdesc.opt.CompareResult(initial_param, initial_loss, initial_scores, best_eval_ix, best_param, best_loss, best_scores)[source]#

Bases: Dictable

Contains the loss and scores of an initial run and a run found on the optimal hyperparameters.

__init__(initial_param, initial_loss, initial_scores, best_eval_ix, best_param, best_loss, best_scores)#
best_eval_ix: int#

The optimized hyperparameters.

best_loss: float#

The optimized loss.

best_param: Dict[str, Any]#

The optimized hyperparameters.

best_scores: DataFrame#

The optimized scores.

initial_loss: float#

The initial loss.

initial_param: Dict[str, Any]#

The initial hyperparameters.

initial_scores: DataFrame#

The initial scores.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.datdesc.opt.HyperparamResult(name, hyp, scores, loss, eval_ix)[source]#

Bases: Dictable

Results of an optimization and optionally the best fit.

__init__(name, hyp, scores, loss, eval_ix)#
eval_ix: int#

The index of the optimiation.

classmethod from_file(path)[source]#

Restore a result from a file name.

Parameters:

path (Path) – the path from which to restore

Return type:

HyperparamResult

hyp: HyperparamModel#

The updated hyperparameters.

loss: float#

The last loss.

name: str#

The name of the of HyperparameterOptimizer, which is the directory name.

scores: DataFrame#

The last score results computed during the optimization.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.datdesc.opt.HyperparamRun(runs)[source]#

Bases: Dictable

A container for the entire optimization run. The best run contains the best fit (HyperparamResult) as predicted by the hyperparameter optimization algorithm.

__init__(runs)#
property best_result: HyperparamResult#

The result that had the lowest loss.

property final: HyperparamResult#

The results of the final run, which as the best fit (see class docs).

property final_path: Path#

The path of the final run.

classmethod from_dir(path)[source]#

Return an instance with the runs stored in directory path.

Return type:

HyperparamRun

property initial_loss: float#

The loss from the first run.

property loss_stats: Dict[str, float]#

The loss statistics (min, max, ave, etc).

property losses: Tuple[float]#

The loss value for all runs

runs: Tuple[Tuple[Path, HyperparamResult]]#

The results from previous runs.

class zensols.datdesc.opt.HyperparameterOptimizer(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)[source]#

Bases: object

Creates the files used to score optimizer output.

__init__(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)#
property aggregate_score_dir: Path#

The output directory containing runs with the best parameters of the top N results (see aggregate_scores()).

aggregate_scores()[source]#

Aggregate best score results as a separate CSV file for each data point with get_score_dataframe(). This is saved as a separate file for each optmiziation run since this method can take a long time as it will re-score the dataset. These results are then “stiched” together with gather_aggregate_scores().

baseline_path: Path = None#

A JSON file with hyperparameter settings to set on start. This file contains the output portion of the final.json results, (which are the results parsed and set in HyperparamResult).

property config_factory: ConfigFactory#

The app config factory.

gather_aggregate_scores()[source]#

Return a dataframe of all the aggregate scores written by aggregate_scores().

Return type:

DataFrame

get_best_result()[source]#
Return type:

HyperparamResult

get_best_results()[source]#

Return the best results across all hyperparameter optimization runs with keys as run names.

Return type:

Dict[str, HyperparamResult]

get_comparison()[source]#

Compare the scores of the default parameters with those predicted by the optimizer of the best run.

Return type:

CompareResult

get_run(result_dir=None)[source]#

Get the best run from the file system.

Parameters:

result_dir (Path) – the result directory, which defaults to opt_intermediate_dir

Return type:

HyperparamRun

get_score_dataframe(iterations=None)[source]#

Create a dataframe from the results scored from the best hyperparameters.

Parameters:

iterations (int) – the number times the objective is called to produce the results (the objective space is not altered)

Return type:

DataFrame

hyperparam_names: Tuple[str, ...] = ()#

The name of the hyperparameters to use to create the space.

See:

_create_space()

property hyperparams: HyperparamModel#

The model hyperparameters to be updated by the optimizer.

intermediate_dir: Path = PosixPath('opthyper')#

The directory where the intermediate results are saved while the algorithm works.

max_evals: int = 1#

The maximum number of evaluations of the hyperparmater optimization algorithm to execute.

name: str = 'default'#

The name of the optimization experiment set. This has a bearing on where files are stored (see opt_intermediate_dir).

property opt_intermediate_dir: Path#

The optimization result directory for the config/parser.

optimize()[source]#

Run the optimization algorithm.

remove_result()[source]#

Remove an entire run’s previous optimization results.

property results_intermediate_dir: Path#

The directory that has all intermediate results by subdirectory name.

show_progressbar: bool = True#

Whether or not to show the progress bar while running the optimization.

write_best_result(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_param_json=False)[source]#

Print the results from the best run.

Parameters:

include_param_json (bool) – whether to output the JSON formatted hyperparameters

write_compare(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write the results of a compare of the initial hyperparameters against the optimized.

write_score(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Restore the hyperparameter state, score the data and print the results. Use the baseline parameters if available, otherwise use the parameters from the best best run.

Return type:

HyperparamResult

write_scores(output_file=None, iterations=None)[source]#

Write a file of the results scored from the best hyperparameters.

Parameters:
  • output_file (Path) – where to write the CSV file; defaults to a file in opt_intermediate_dir

  • iterations (int) – the number times the objective is called to produce the results (the objective space is not altered)

zensols.datdesc.optscore#

Inheritance diagram of zensols.datdesc.optscore

An optimizer that uses a Scorer as an objective.

class zensols.datdesc.optscore.ScoringHyperparameterOptimizer(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)[source]#

Bases: HyperparameterOptimizer

An optimizer that uses a Scorer as the objective and a means to determine the loss. The default loss function is defined as 1 - F1 using the f1_score as_dataframe() column.

__init__(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)#
property scorer: Scorer#

zensols.datdesc.table#

Inheritance diagram of zensols.datdesc.table

This module contains classes that generate tables.

class zensols.datdesc.table.LongTable(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None, slack_col=0)[source]#

Bases: SlackTable

__init__(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None, slack_col=0)#
property latex_environment#

Return the latex environment for the table.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOWrapper) – the writer to dump the content of this writable

class zensols.datdesc.table.SlackTable(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None, slack_col=0)[source]#

Bases: Table

An instance of the table that fills up space based on the widest column.

__init__(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None, slack_col=0)#
property columns: str#

Return the columns field in the Latex environment header.

property latex_environment#

Return the latex environment for the table.

slack_col: int = 0#

Which column elastically grows or shrinks to make the table fit.

class zensols.datdesc.table.Table(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None)[source]#

Bases: PersistableContainer, Dictable

Generates a Zensols styled Latex table from a CSV file.

__init__(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None)#
blank_columns: List[int]#

A list of column indexes to set to the empty string (i.e. 0th to fixed the Unnamed: 0 issues).

bold_cells: List[Tuple[int, int]]#

A list of row/column cells to bold.

bold_max_columns: List[str]#

A list of column names that will have its max value bolded.

capitalize_columns: Dict[str, bool]#

Capitalize either sentences (False values) or every word (True values). The keys are column names.

caption: str#

The human readable string used to the caption in the table.

column_aligns: str = None#

The alignment/justification (i.e. |l|l| for two columns). If not provided, they are automatically generated based on the columns of the table.

column_evals: Dict[str, str]#

Keys are column names with values as functions (i.e. lambda expressions) evaluated with a single column value parameter. The return value replaces the column identified by the key.

column_keeps: Optional[List[str]] = None#

If provided, only keep the columns in the list

column_removes: List[str]#

The name of the columns to remove from the table, if any.

column_renames: Dict[str, str]#

Columns to rename, if any.

column_value_replaces: Dict[str, Dict[Any, Any]]#

Data values to replace in the dataframe. It is keyed by the column name and values are the replacements. Each value is a dict with orignal value keys and the replacements as values.

property columns: str#

Return the columns field in the Latex environment header.

property dataframe: DataFrame#

The Pandas dataframe that holds the CSV data.

df_code: str = None#

Python code executed that manipulates the table’s dataframe. The code has a local df variable and the returned value is used as the replacement. This is usually a one-liner used to subset the data etc. The code is evaluated with eval().

df_code_exec: str = None#

Like df_code but invoke with exec() instead of eval().

df_code_exec_pre: str = None#

Like df_code_pre but invoke with exec() instead of eval().

df_code_pre: str = None#

Like df_code but right after the source data is read and before any modifications. The code is evaluated with eval().

double_hlines: Sequence[int]#

Indexes of rows to put double horizontal line breaks.

static format_thousand(x, apply_k=True, add_comma=True)[source]#

Format a number as a string with comma separating thousands.

Parameters:
  • x (int) – the number to format

  • apply_k (bool) – add a K to the end of large numbers

  • add_comma (bool) – whether to add a comma

Return type:

str

format_thousands_column_names: Dict[str, Optional[Dict[str, Any]]]#

Columns to format using thousands. The keys are the column names of the table and the values are either None or the keyword arguments to format_thousand().

property formatted_dataframe: DataFrame#

The dataframe with the formatting applied to it used to create the Latex table. Modifications such as string replacements for adding percents is done.

get_cmd_args(add_brackets)[source]#
Return type:

Dict[str, str]

get_params(add_brackets)[source]#

Return the parameters used for creating the table.

Return type:

Dict[str, str]

head: str = None#

The header to use for the table, which is used as the text in the list of tables and made bold in the table.

property header: str#

The Latex environment header.

hlines: Sequence[int]#

Indexes of rows to put horizontal line breaks.

index_col_name: str = None#

If set, add an index column with the given name.

property latex_environment: str#

Return the latex environment for the table.

make_percent_column_names: Dict[str, int]#

Each columnn in the map will get rounded to the value * 100 of the name. For example, {'ann_per': 3} will round column ann_per to 3 decimal places.

name: str#

The name of the table, also used as the label.

path: Union[Path, str]#

The path to the CSV file to make a latex table.

percent_column_names: Sequence[str] = ()#

Column names that have a percent sign to be escaped.

placement: str = None#

The placement of the table.

read_kwargs: Dict[str, str]#

Keyword arguments used in the read_csv() call when reading the CSV file.

replace_nan: str = None#

Replace NaN values with a the value of this field as tabulate() is not using the missing value due to some bug I assume.

serialize()[source]#

Return a data structure usable for YAML or JSON output by flattening Python objects.

Return type:

Dict[str, Any]

single_column: bool = True#

Makes the table one column wide in a two column. Setting this to false generates a table* two column table, which won’t work in beamer (slides) document types.

size: str = 'normalsize'#

The size of the table, and one of:

  • Huge

  • huge

  • LARGE

  • Large

  • large

  • normalsize (default)

  • small

  • footnotesize

  • scriptsize

  • tiny

uses: Sequence[str] = ('zentable',)#

Comma separated list of packages to use.

property var_args: Tuple[str]#
write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#

Write this instance as either a Writable or as a Dictable. If class attribute _DICTABLE_WRITABLE_DESCENDANTS is set as True, then use the write() method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating a dict recursively using asdict(), then formatting the output.

If the attribute _DICTABLE_WRITE_EXCLUDES is set, those attributes are removed from what is written in the write() method.

Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.

Parameters:
  • depth (int) – the starting indentation depth

  • writer (TextIOWrapper) – the writer to dump the content of this writable

write_kwargs: Dict[str, str]#

Keyword arguments used in the tabulate() call when writing the table. The default tells tabulate to not parse/format numerical data.

Module contents#

Generate Latex tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file.

Example::
latextablenamehere:

type: slack slack_col: 0 path: ../config/table-name.csv caption: Some Caption placement: t! size: small single_column: true percent_column_names: [‘Proportion’]