zensols.datdesc package¶
Submodules¶
zensols.datdesc.app module¶
Generate LaTeX tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file. Paraemters are both files or both directories. When using directories, only files that match *-table.yml are considered.
- class zensols.datdesc.app.Application(table_factory, hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None, data_file_regex=re.compile('^.+-table\\\\.yml$'))[source]¶
Bases:
object
Generate LaTeX tables files from CSV files and hyperparameter .sty files.
- __init__(table_factory, hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None, data_file_regex=re.compile('^.+-table\\\\.yml$'))¶
-
data_file_regex:
Pattern
= re.compile('^.+-table\\.yml$')¶ Matches file names of tables process in the LaTeX output.
- generate_hyperparam(input_path, output_path, output_format=_OutputFormat.short)[source]¶
Write hyperparameter formatted data
-
hyperparam_file_regex:
Pattern
= re.compile('^.+-hyperparam\\.yml$')¶ Matches file names of tables process in the LaTeX output.
- show_table(name=None)[source]¶
Print a list of example LaTeX tables.
- Parameters:
name (
str
) – the name of the example table or a listing of tables if omitted
-
table_factory:
TableFactory
¶ Reads the table definitions file and writes a Latex .sty file of the generated tables from the CSV data.
zensols.datdesc.cli module¶
Command line entry point to the application.
- class zensols.datdesc.cli.ApplicationFactory(*args, **kwargs)[source]¶
Bases:
ApplicationFactory
zensols.datdesc.desc module¶
Metadata container classes.
- class zensols.datdesc.desc.DataDescriber(describers, name='default', output_dir=PosixPath('results'), csv_dir=PosixPath('csv'), yaml_dir=PosixPath('config'), mangle_sheet_name=False)[source]¶
Bases:
PersistableContainer
,Dictable
Container class for
DataFrameDescriber
instances. It also saves their instances as CSV data files and YAML configuration files.- __init__(describers, name='default', output_dir=PosixPath('results'), csv_dir=PosixPath('csv'), yaml_dir=PosixPath('config'), mangle_sheet_name=False)¶
- add_summary()[source]¶
Add a new metadata like
DataFrameDescriber
as a first entry indescribers
that describes what data this instance currently has.- Return type:
- Returns:
the added metadata
DataFrameDescriber
instance
-
describers:
Tuple
[DataFrameDescriber
,...
]¶ The contained dataframe and metadata.
- property describers_by_name: Dict[str, DataFrameDescriber]¶
Data frame describers keyed by the describer name.
- classmethod from_yaml_file(path)[source]¶
Create a data descriptor from a previously written YAML/CSV files using
save()
.- See:
- See:
- Return type:
-
mangle_sheet_name:
bool
= False¶ Whether to normalize the Excel sheet names when
xlsxwriter.exceptions.InvalidWorksheetName
is raised.
- save(output_dir=None, yaml_dir=None, include_excel=False)[source]¶
Save both the CSV and YAML configuration file.
- Parameters:
include_excel (
bool
) – whether to also write the Excel file to its default output file name- See:
- Return type:
:see
save_yaml()
- save_excel(output_file=None)[source]¶
Save all provided dataframe describers to an Excel file.
- Parameters:
output_file (
Path
) – the Excel file to write, which needs an.xlsx
extension; this defaults to a path created fromoutput_dir
andname
- Return type:
- save_yaml(output_dir=None, yaml_dir=None)[source]¶
Save all provided dataframe describers YAML files used by the
datdesc
command.
- class zensols.datdesc.desc.DataFrameDescriber(name, df, desc, head=None, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None)[source]¶
Bases:
PersistableContainer
,Dictable
A class that contains a Pandas dataframe, a description of the data, and descriptions of all the columns in that dataframe.
- property T: DataFrameDescriber¶
See
transpose()
.
- __init__(name, df, desc, head=None, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None)¶
- derive(*, name=None, df=None, desc=None, meta=None, index_meta=None)[source]¶
Create a new instance based on this instance and replace any non-
None
kwargs.If
meta
is provided, it is merged with the metadata of this instance. However, any metadata provided must match in both column names and descriptions.
- derive_with_index_meta(index_format=None)[source]¶
Like
derive()
, but the dataframe is generated withdf_with_index_meta()
usingindex_format
as a parameter.- Parameters:
index_format (
str
) – seedf_with_index_meta()
- Return type:
- df_with_index_meta(index_format=None)[source]¶
Create a dataframe with the first column containing index metadata. This uses
index_meta
to create the column values.- Parameters:
index_format (
str
) – the new index column format usingindex
andvalue
, which defaults to{index}
- Return type:
- Returns:
the dataframe with a new first column of the index metadata, or
df
ifindex_meta
isNone
- format_table()[source]¶
Replace (in place) dataframe
df
with the formatted table obtained withTable.formatted_dataframe
. TheTable
is created by withcreate_table()
.
- classmethod from_columns(source, name=None, desc=None)[source]¶
Create a new instance by transposing a column data into a new dataframe describer. If
source
is a dataframe, it that has the following columns:Otherwise, each element of the sequence is a row of column, meta descriptions, and data sequences.
-
head:
str
= None¶ A short summary of the table and used in
Table.head
.
-
index_meta:
Dict
[Any
,str
] = None¶ The index metadata, which maps index values to descriptions of the respective row.
- property meta: DataFrame¶
The column metadata for
dataframe
, which needs columnsname
anddescription
. If this is not provided, it is read from filemeta_path
. If this is set to a tuple of tuples, a dataframe is generated from the form:((<column name 1>, <column description 1>), (<column name 2>, <column description 2>) ...
If both this and
meta_path
are not provided, the following is used:(('description', 'Description'), ('value', 'Value')))
- save_excel(output_dir=PosixPath('.'))[source]¶
Save as an Excel file using
csv_path
. The same file naming semantics are used as withDataDescriber.save_excel()
.- See:
- Return type:
-
table_kwargs:
Dict
[str
,Any
]¶ Additional key word arguments given when creating a table in
create_table()
.
- transpose(row_names=((0, 'value', 'Value'),), name_column='name', name_description='Name', index_column='description')[source]¶
Transpose all data in this descriptor by transposing
df
and swappingmeta
withindex_meta
as a new instance.- Parameters:
row_names (
Tuple
[int
,str
,str
]) – a tuple of (row index indf
, the column in the newdf
and the metadata description of that column in the newdf
; the default takes only the first rowdescription_column – the column description this instance’s
df
index_column (
str
) – the name of the new index in the returned instance
- Return type:
- Returns:
a new derived instance of the transposed data
zensols.datdesc.dfstash module¶
A stash implementation that uses a Pandas dataframe and stored as a CSV file.
- class zensols.datdesc.dfstash.DataFrameStash(path, dataframe=None, key_column='key', columns=('value',), mkdirs=True, auto_commit=True, single_column_index=0)[source]¶
Bases:
CloseableStash
A backing stash that persists to a CSV file via a Pandas dataframe. All modification go through the
pandas.DataFrame
and then saved withcommit()
orclose()
.- __init__(path, dataframe=None, key_column='key', columns=('value',), mkdirs=True, auto_commit=True, single_column_index=0)¶
- clear()[source]¶
Delete all data from the from the stash.
Important: Exercise caution with this method, of course.
-
columns:
Tuple
[str
,...
] = ('value',)¶ The columns to create in the spreadsheet. These must be consistent when the data is restored.
- property dataframe: DataFrame¶
The dataframe to proxy in memory. This is settable on instantiation but read-only afterward. If this is not set an empty dataframe is created with the metadata in this class.
- delete(name=None)[source]¶
Delete the resource for data pointed to by
name
or the entire resource ifname
is not given.
- exists(name)[source]¶
Return
True
if data with keyname
exists.Implementation note: This
Stash.exists()
method is very inefficient and should be overriden.- Return type:
- get(name, default=None)[source]¶
Load an object or a default if key
name
doesn’t exist. Semantically, this method tries not to re-create the data if it already exists. This means that if a stash has built-in caching mechanisms, this method uses it.
- load(name)[source]¶
Load a data value from the pickled data with key
name
. Semantically, this method loads the using the stash’s implementation. For exampleDirectoryStash
loads the data from a file if it exists, but factory type stashes will always re-generate the data.
-
mkdirs:
bool
= True¶ Whether to recusively create the directory where
path
is stored if it does not already exist.
zensols.datdesc.hyperparam module¶
Hyperparameter metadata: access and documentation. This package was designed for the following purposes:
Provide a basic scaffolding to update model hyperparameters such as
hyperopt
.Generate LaTeX tables of the hyperparamers and their descriptions for academic papers.
The object instance graph hierarchy is:
Access to the hyperparameters is done by calling the set or model levels
with a dotted path notation string. For example, svm.C
first navigates to
model svm
, then to the hyperparameter named C
.
- class zensols.datdesc.hyperparam.Hyperparam(name, type, doc, choices=None, value=None, interval=None)[source]¶
Bases:
Dictable
A hyperparameter’s metadata, documentation and value. The value is accessed (retrieval and setting) at runtime. Do not use this class explicitly. Instead use
HyperparamModel
.The index access only applies when
type
islist
ordict
. Otherwise, thevalue
member has the value of the hyperparameter.-
CLASS_MAP:
ClassVar
[Dict
[str
,Type
]] = {'bool': <class 'bool'>, 'choice': <class 'str'>, 'dict': <class 'dict'>, 'float': <class 'float'>, 'int': <class 'int'>, 'list': <class 'list'>, 'str': <class 'str'>}¶ A mapping for values set in
type
to their Python class equivalents.
-
VALID_TYPES:
ClassVar
[str
] = frozenset({'bool', 'choice', 'dict', 'float', 'int', 'list', 'str'})¶ Valid settings for
type
.
- __init__(name, type, doc, choices=None, value=None, interval=None)¶
-
doc:
str
¶ The human readable documentation for the hyperparameter. This is used in documentation generation tasks.
-
interval:
Union
[Tuple
[float
,float
],Tuple
[int
,int
]] = None¶ Valid intervals for
value
as an inclusive interval.
-
CLASS_MAP:
- class zensols.datdesc.hyperparam.HyperparamContainer[source]¶
Bases:
Dictable
A container class for
Hyperparam
instances.- __init__()¶
- abstract flatten(deep=False)[source]¶
Return a flattened directory with the dotted path notation (see module docs).
- exception zensols.datdesc.hyperparam.HyperparamError[source]¶
Bases:
DataDescriptionError
Raised for any error related hyperparameter access.
- __module__ = 'zensols.datdesc.hyperparam'¶
- class zensols.datdesc.hyperparam.HyperparamModel(name, doc, desc=None, params=<factory>, table=None)[source]¶
Bases:
HyperparamContainer
The model level class that contains the parameters. This class represents a machine learning model such as a SVM with hyperparameters such as
C
andmaximum iterations
.- __init__(name, doc, desc=None, params=<factory>, table=None)¶
- create_dataframe_describer()[source]¶
Return an object with metadata fully describing the hyperparameters of this model.
- Return type:
-
desc:
str
= None¶ name is not sufficient. Since
name
has naming constraints, this can be used as in place during documentation generation.- Type:
The description the model used in the documentation when obj
- flatten(deep=False)[source]¶
Return a flattened directory with the dotted path notation (see module docs).
- property metadata_dataframe: DataFrame¶
A dataframe describing the
values_dataframe
.
-
name:
str
¶ The name of the model (i.e.
svm
). This name can have only alpha-numeric and underscore charaters.
-
params:
Dict
[str
,Hyperparam
]¶ The hyperparameters keyed by their names.
-
table:
Optional
[Dict
[str
,Any
]] = None¶ Overriding data used when creating a
Table
fromDataFrameDescriber.create_table()
.
- property values_dataframe: DataFrame¶
A dataframe with parameter data. This includes the name, type, value and documentation.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.datdesc.hyperparam.HyperparamSet(models=<factory>, name=None)[source]¶
Bases:
HyperparamContainer
The top level in the object graph hierarchy (see module docs). This contains a set of models and typically where calls by packages such as
hyperopt
are used to update the hyperparameters of the model(s).- __init__(models=<factory>, name=None)¶
- create_describer(meta_path=None)[source]¶
Return an object with metadata fully describing the hyperparameters of this model.
- Parameters:
meta_path (
Path
) – if provided, set the path on the returned instance- Return type:
- flatten(deep=False)[source]¶
Return a flattened directory with the dotted path notation (see module docs).
-
models:
Dict
[str
,HyperparamModel
]¶ The models containing hyperparameters for this set.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.datdesc.hyperparam.HyperparamSetLoader(data, config=None, updates=())[source]¶
Bases:
object
Loads a set of hyperparameters from a YAML
pathlib.Path
,dict
or streamio.TextIOBase
.- __init__(data, config=None, updates=())¶
-
config:
Configurable
= None¶ The application configuration used to update the hyperparameters from other sections.
-
data:
Union
[Dict
[str
,Any
],Path
,TextIOBase
]¶ The source of data to load, which is a YAML
pathlib.Path
,dict
or streamio.TextIOBase
.- See:
- load(**kwargs) HyperparamSet ¶
Load and return the hyperparameter object graph from
data
.- Return type:
HyperparamSet
- exception zensols.datdesc.hyperparam.HyperparamValueError[source]¶
Bases:
HyperparamError
Raised for bad values set on a hyperparameter.
- __annotations__ = {}¶
- __module__ = 'zensols.datdesc.hyperparam'¶
zensols.datdesc.latex module¶
Contains the manager classes that invoke the tables to generate.
- class zensols.datdesc.latex.CsvToLatexTable(tables, package_name)[source]¶
Bases:
Writable
Generate a Latex table from a CSV file.
- __init__(tables, package_name)¶
- class zensols.datdesc.latex.LatexTable(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None)[source]¶
Bases:
Table
This subclass generates LaTeX tables.
- __init__(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None)¶
- class zensols.datdesc.latex.SlackTable(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, slack_column=0)[source]¶
Bases:
LatexTable
An instance of the table that fills up space based on the widest column.
- __init__(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None, slack_column=0)¶
zensols.datdesc.opt module¶
Contains container and utility classes for hyperparameter optimization. These classes find optimial hyperparamters for a model and save the results as JSON files. This module is meant to be used by command line applications configured as Zensols Resource libraries.
- see:
- class zensols.datdesc.opt.CompareResult(initial_param, initial_loss, initial_scores, best_eval_ix, best_param, best_loss, best_scores)[source]¶
Bases:
Dictable
Contains the loss and scores of an initial run and a run found on the optimal hyperparameters.
- __init__(initial_param, initial_loss, initial_scores, best_eval_ix, best_param, best_loss, best_scores)¶
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.datdesc.opt.HyperparamResult(name, hyp, scores, loss, eval_ix)[source]¶
Bases:
Dictable
Results of an optimization and optionally the best fit.
- __init__(name, hyp, scores, loss, eval_ix)¶
- classmethod from_file(path)[source]¶
Restore a result from a file name.
- Parameters:
path (
Path
) – the path from which to restore- Return type:
-
hyp:
HyperparamModel
¶ The updated hyperparameters.
-
name:
str
¶ The name of the of
HyperparameterOptimizer
, which is the directory name.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.datdesc.opt.HyperparamRun(runs)[source]¶
Bases:
Dictable
A container for the entire optimization run. The best run contains the best fit (
HyperparamResult
) as predicted by the hyperparameter optimization algorithm.- __init__(runs)¶
- property best_result: HyperparamResult¶
The result that had the lowest loss.
- property final: HyperparamResult¶
The results of the final run, which as the best fit (see class docs).
- classmethod from_dir(path)[source]¶
Return an instance with the runs stored in directory
path
.- Return type:
-
runs:
Tuple
[Tuple
[Path
,HyperparamResult
]]¶ The results from previous runs.
- class zensols.datdesc.opt.HyperparameterOptimizer(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)[source]¶
Bases:
object
Creates the files used to score optimizer output.
- __init__(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)¶
- property aggregate_score_dir: Path¶
The output directory containing runs with the best parameters of the top N results (see
aggregate_scores()
).
- aggregate_scores()[source]¶
Aggregate best score results as a separate CSV file for each data point with
get_score_dataframe()
. This is saved as a separate file for each optmiziation run since this method can take a long time as it will re-score the dataset. These results are then “stiched” together withgather_aggregate_scores()
.
-
baseline_path:
Path
= None¶ A JSON file with hyperparameter settings to set on start. This file contains the output portion of the
final.json
results, (which are the results parsed and set inHyperparamResult
).
- property config_factory: ConfigFactory¶
The app config factory.
- gather_aggregate_scores()[source]¶
Return a dataframe of all the aggregate scores written by
aggregate_scores()
.- Return type:
- get_best_results()[source]¶
Return the best results across all hyperparameter optimization runs with keys as run names.
- Return type:
- get_comparison()[source]¶
Compare the scores of the default parameters with those predicted by the optimizer of the best run.
- Return type:
- get_run(result_dir=None)[source]¶
Get the best run from the file system.
- Parameters:
result_dir (
Path
) – the result directory, which defaults toopt_intermediate_dir
- Return type:
- get_score_dataframe(iterations=None)[source]¶
Create a dataframe from the results scored from the best hyperparameters.
-
hyperparam_names:
Tuple
[str
,...
] = ()¶ The name of the hyperparameters to use to create the space.
- See:
_create_space()
- property hyperparams: HyperparamModel¶
The model hyperparameters to be updated by the optimizer.
-
intermediate_dir:
Path
= PosixPath('opthyper')¶ The directory where the intermediate results are saved while the algorithm works.
-
max_evals:
int
= 1¶ The maximum number of evaluations of the hyperparmater optimization algorithm to execute.
-
name:
str
= 'default'¶ The name of the optimization experiment set. This has a bearing on where files are stored (see
opt_intermediate_dir
).
- property results_intermediate_dir: Path¶
The directory that has all intermediate results by subdirectory name.
-
show_progressbar:
bool
= True¶ Whether or not to show the progress bar while running the optimization.
- write_best_result(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_param_json=False)[source]¶
Print the results from the best run.
- Parameters:
include_param_json (
bool
) – whether to output the JSON formatted hyperparameters
- write_compare(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the results of a compare of the initial hyperparameters against the optimized.
- write_score(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Restore the hyperparameter state, score the data and print the results. Use the
baseline
parameters if available, otherwise use the parameters from the best best run.- Return type:
- write_scores(output_file=None, iterations=None)[source]¶
Write a file of the results scored from the best hyperparameters.
- Parameters:
output_file (
Path
) – where to write the CSV file; defaults to a file inopt_intermediate_dir
iterations (
int
) – the number times the objective is called to produce the results (the objective space is not altered)
zensols.datdesc.optscore module¶
An optimizer that uses a Scorer
as an objective.
- class zensols.datdesc.optscore.ScoringHyperparameterOptimizer(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)[source]¶
Bases:
HyperparameterOptimizer
An optimizer that uses a
Scorer
as the objective and a means to determine the loss. The default loss function is defined as 1 - F1 using thef1_score
as_dataframe()
column.- __init__(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)¶
zensols.datdesc.table module¶
This module contains classes that generate tables.
- class zensols.datdesc.table.Table(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None)[source]¶
Bases:
PersistableContainer
,Dictable
Generates a Zensols styled Latex table from a CSV file.
- __init__(path, name, template, caption='', head=None, type=None, default_params=<factory>, params=<factory>, definition_file=None, uses=<factory>, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, round_column_names=<factory>, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, format_scientific_column_names=<factory>, column_evals=<factory>, read_params=<factory>, tabulate_params=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, variables=<factory>, writes=<factory>, code_pre=None, code_post=None)¶
- asflatdict(*args, **kwargs)[source]¶
Like
asdict()
but flatten in to a data structure suitable for writing to JSON or YAML.
-
blank_columns:
List
[int
]¶ A list of column indexes to set to the empty string (i.e. 0th to fixed the
Unnamed: 0
issues).
-
capitalize_columns:
Dict
[str
,bool
]¶ Capitalize either sentences (
False
values) or every word (True
values). The keys are column names.
-
code_post:
str
= None¶ Like
code_pre
but modifies the table after this class’s modifications of the table.
-
code_pre:
str
= None¶ Python code executed that manipulates the table’s dataframe before modifications made by this class. The code has a local
df
variable and the returned value is used as the replacement. This is usually a one-liner used to subset the data etc. The code is evaluated witheval()
.
-
column_aligns:
str
= None¶ The alignment/justification (i.e.
|l|l|
for two columns). If not provided, they are automatically generated based on the columns of the table.
-
column_evals:
Dict
[str
,str
]¶ Keys are column names with values as functions (i.e. lambda expressions) evaluated with a single column value parameter. The return value replaces the column identified by the key.
-
column_value_replaces:
Dict
[str
,Dict
[Any
,Any
]]¶ Data values to replace in the dataframe. It is keyed by the column name and values are the replacements. Each value is a
dict
with orignal value keys and the replacements as values.
-
default_params:
Sequence
[Sequence
[str
]]¶ Default parameters to be substituted in the template that are interpolated by the LaTeX numeric values such as #1, #2, etc. This is a sequence (list or tuple) of
(<name>, [<default>])
wherename
is substituted by name in the template anddefault
is the default if not given inparams
.
-
format_scientific_column_names:
Dict
[str
,Optional
[int
]]¶ Format a column using LaTeX formatted scientific notation using
format_scientific()
. Keys are column names and values is the mantissa length or 1 ifNone
.
- static format_thousand(x, apply_k=True, add_comma=True)[source]¶
Format a number as a string with comma separating thousands.
-
format_thousands_column_names:
Dict
[str
,Optional
[Dict
[str
,Any
]]]¶ Columns to format using thousands. The keys are the column names of the table and the values are either
None
or the keyword arguments toformat_thousand()
.
- property formatted_dataframe: DataFrame¶
The
dataframe
with the formatting applied to it used to create the Latex table. Modifications such as string replacements for adding percents is done.
-
head:
str
= None¶ The header to use for the table, which is used as the text in the list of tables and made bold in the table.
-
make_percent_column_names:
Dict
[str
,int
]¶ Each columnn in the map will get rounded to the value * 100 of the name. For example,
{'ann_per': 3}
will round columnann_per
to 3 decimal places.
-
params:
Dict
[str
,str
]¶ Parameters used in the template that override of the
default_params
.
-
read_params:
Dict
[str
,str
]¶ Keyword arguments used in the
read_csv()
call when reading the CSV file.
-
replace_nan:
str
= None¶ Replace NaN values with a the value of this field as
tabulate()
is not using the missing value due to some bug I assume.
-
round_column_names:
Dict
[str
,int
]¶ Each column in the map will get rounded to their respective values.
-
tabulate_params:
Dict
[str
,str
]¶ Keyword arguments used in the
tabulate()
call when writing the table. The default tellstabulate
to not parse/format numerical data.
-
variables:
Dict
[str
,Union
[Tuple
[int
,int
],str
]]¶ A mapping of variable names to a dataframe cell or Python code snipped that is evaluated with
exec()
. In LaTeX, this is done by setting anewcommand
(seeLatexTable
).If set to a tuple of
(<row>, <column>)
the value of the pre-formatted dataframe is used (seeunformatted
below).If a Python evalution string, the code values must set variables
v
to the variable value. A variablestages
is aDict
used to get one of the dataframes created at various stages of formatting the table with entries:nascent
: same asdataframe
unformatted
: after the pre-evaluation but before any formattingpostformat
: after number formatting and post evaluation, but before remaining column and cell modificationsformatted
: same asformatted_dataframe
For example, the following uses the value at row 2 and column 3 of the unformatted dataframe:
v = stages['unformatted'].iloc[2, 3]
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.datdesc.table.TableFactory(config_factory, table_section_regex, default_table_type)[source]¶
Bases:
Dictable
Reads the table definitions file and writes a Latex .sty file of the generated tables from the CSV data.
- __init__(config_factory, table_section_regex, default_table_type)¶
-
config_factory:
ConfigFactory
¶ The configuration factory used to create
Table
instances.
-
default_table_type:
str
¶ The default name, which resolves to a section name, to use when creating anonymous tables.
Module contents¶
Generate Latex tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file.
- Example::
- latextablenamehere:
type: slack slack_col: 0 path: ../config/table-name.csv caption: Some Caption placement: t! size: small single_column: true percent_column_names: [‘Proportion’]