zensols.datdesc package#
Submodules#
zensols.datdesc.app#
Generate LaTeX tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file. Paraemters are both files or both directories. When using directories, only files that match *-table.yml are considered.
- class zensols.datdesc.app.Application(data_file_regex=re.compile('^.+-table\\\\.yml$'), hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None)[source]#
Bases:
object
Generate LaTeX tables files from CSV files and hyperparameter .sty files.
- __init__(data_file_regex=re.compile('^.+-table\\\\.yml$'), hyperparam_file_regex=re.compile('^.+-hyperparam\\\\.yml$'), hyperparam_table_default=None)#
-
data_file_regex:
Pattern
= re.compile('^.+-table\\.yml$')# Matches file names of tables process in the LaTeX output.
- generate_hyperparam(input_path, output_path, output_format=_OutputFormat.short)[source]#
Write hyperparameter formatted data
-
hyperparam_file_regex:
Pattern
= re.compile('^.+-hyperparam\\.yml$')# Matches file names of tables process in the LaTeX output.
zensols.datdesc.cli#
Command line entry point to the application.
- class zensols.datdesc.cli.ApplicationFactory(*args, **kwargs)[source]#
Bases:
ApplicationFactory
zensols.datdesc.desc#
Metadata container classes.
- class zensols.datdesc.desc.DataDescriber(describers, name='default', output_dir=PosixPath('results'), csv_dir=PosixPath('csv'), yaml_dir=PosixPath('config'), mangle_sheet_name=False)[source]#
Bases:
PersistableContainer
,Dictable
Container class for
DataFrameDescriber
instances. It also saves their instances as CSV data files and YAML configuration files.- __init__(describers, name='default', output_dir=PosixPath('results'), csv_dir=PosixPath('csv'), yaml_dir=PosixPath('config'), mangle_sheet_name=False)#
- add_summary()[source]#
Add a new metadata like
DataFrameDescriber
as a first entry indescribers
that describes what data this instance currently has.- Return type:
- Returns:
the added metadata
DataFrameDescriber
instance
-
describers:
Tuple
[DataFrameDescriber
]# The contained dataframe and metadata.
- property describers_by_name: Dict[str, DataFrameDescriber]#
Data frame describers keyed by the describer name.
- classmethod from_yaml_file(path)[source]#
Create a data descriptor from a previously written YAML/CSV files using
save()
.- See:
- See:
- Return type:
-
mangle_sheet_name:
bool
= False# Whether to normalize the Excel sheet names when
xlsxwriter.exceptions.InvalidWorksheetName
is raised.
- save(output_dir=None, yaml_dir=None, include_excel=False)[source]#
Save both the CSV and YAML configuration file.
- Parameters:
include_excel (
bool
) – whether to also write the Excel file to its default output file name- See:
- Return type:
:see
save_yaml()
- save_excel(output_file=None)[source]#
Save all provided dataframe describers to an Excel file.
- Parameters:
output_file (
Path
) – the Excel file to write, which needs an.xlsx
extension; this defaults to a path created fromoutput_dir
andname
- Return type:
- save_yaml(output_dir=None, yaml_dir=None)[source]#
Save all provided dataframe describers YAML files used by the
datdesc
command.
- class zensols.datdesc.desc.DataFrameDescriber(name, df, desc, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None)[source]#
Bases:
PersistableContainer
,Dictable
A class that contains a Pandas dataframe, a description of the data, and descriptions of all the columns in that dataframe.
- __init__(name, df, desc, meta_path=None, meta=None, table_kwargs=<factory>, index_meta=None)#
- derive(*, name=None, df=None, desc=None, meta=None, index_meta=None)[source]#
Create a new instance based on this instance and replace any non-
None
kwargs.If
meta
is provided, it is merged with the metadata of this instance. However, any metadata provided must match in both column names and descriptions.
- derive_with_index_meta(index_format=None)[source]#
Like
derive()
, but the dataframe is generated withdf_with_index_meta()
usingindex_format
as a parameter.- Parameters:
index_format (
str
) – seedf_with_index_meta()
- Return type:
- df_with_index_meta(index_format=None)[source]#
Create a dataframe with the first column containing index metadata. This uses
index_meta
to create the column values.- Parameters:
index_format (
str
) – the new index column format usingindex
andvalue
, which defaults to{index}
- Return type:
- Returns:
the dataframe with a new first column of the index metadata, or
df
ifindex_meta
isNone
- format_table()[source]#
Replace (in place) dataframe
df
with the formatted table obtained withTable.formatted_dataframe
. TheTable
is created by withcreate_table()
.
-
index_meta:
Dict
[Any
,str
] = None# The index metadata, which maps index values to descriptions of the respective row.
- property meta: DataFrame#
The column metadata for
dataframe
, which needs columnsname
anddescription
. If this is not provided, it is read from filemeta_path
. If this is set to a tuple of tuples, a dataframe is generated from the form:((<column name 1>, <column description 1>), (<column name 2>, <column description 2>) ...
If both this and
meta_path
are not provided, the following is used:(('description', 'Description'), ('value', 'Value')))
-
table_kwargs:
Dict
[str
,Any
]# Additional key word arguments given when creating a table in
create_table()
.
zensols.datdesc.domain#
Domain classes used by the API.
- exception zensols.datdesc.domain.DataDescriptionError[source]#
Bases:
APIError
Thrown for any application level error.
- __module__ = 'zensols.datdesc.domain'#
- exception zensols.datdesc.domain.LatexTableError[source]#
Bases:
DataDescriptionError
Thrown for any application level error related to creating tables.
- __annotations__ = {}#
- __module__ = 'zensols.datdesc.domain'#
zensols.datdesc.hyperparam#
Hyperparameter metadata: access and documentation. This package was designed for the following purposes:
Provide a basic scaffolding to update model hyperparameters such as
hyperopt
.Generate LaTeX tables of the hyperparamers and their descriptions for academic papers.
The object instance graph hierarchy is:
Access to the hyperparameters is done by calling the set or model levels
with a dotted path notation string. For example, svm.C
first navigates to
model svm
, then to the hyperparameter named C
.
- class zensols.datdesc.hyperparam.Hyperparam(name, type, doc, choices=None, value=None, interval=None)[source]#
Bases:
Dictable
A hyperparameter’s metadata, documentation and value. The value is accessed (retrieval and setting) at runtime. Do not use this class explicitly. Instead use
HyperparamModel
.The index access only applies when
type
islist
ordict
. Otherwise, thevalue
member has the value of the hyperparameter.-
CLASS_MAP:
ClassVar
[Dict
[str
,Type
]] = {'bool': <class 'bool'>, 'choice': <class 'str'>, 'dict': <class 'dict'>, 'float': <class 'float'>, 'int': <class 'int'>, 'list': <class 'list'>, 'str': <class 'str'>}# A mapping for values set in
type
to their Python class equivalents.
-
VALID_TYPES:
ClassVar
[str
] = frozenset({'bool', 'choice', 'dict', 'float', 'int', 'list', 'str'})# Valid settings for
type
.
- __init__(name, type, doc, choices=None, value=None, interval=None)#
-
doc:
str
# The human readable documentation for the hyperparameter. This is used in documentation generation tasks.
-
interval:
Union
[Tuple
[float
,float
],Tuple
[int
,int
]] = None# Valid intervals for
value
as an inclusive interval.
-
CLASS_MAP:
- class zensols.datdesc.hyperparam.HyperparamContainer[source]#
Bases:
Dictable
A container class for
Hyperparam
instances.- __init__()#
- abstract flatten(deep=False)[source]#
Return a flattened directory with the dotted path notation (see module docs).
- exception zensols.datdesc.hyperparam.HyperparamError[source]#
Bases:
DataDescriptionError
Raised for any error related hyperparameter access.
- __annotations__ = {}#
- __module__ = 'zensols.datdesc.hyperparam'#
- class zensols.datdesc.hyperparam.HyperparamModel(name, doc, desc=None, params=<factory>, table=None)[source]#
Bases:
HyperparamContainer
The model level class that contains the parameters. This class represents a machine learning model such as a SVM with hyperparameters such as
C
andmaximum iterations
.- __init__(name, doc, desc=None, params=<factory>, table=None)#
- create_dataframe_describer()[source]#
Return an object with metadata fully describing the hyperparameters of this model.
- Return type:
-
desc:
str
= None# name is not sufficient. Since
name
has naming constraints, this can be used as in place during documentation generation.- Type:
The description the model used in the documentation when obj
- flatten(deep=False)[source]#
Return a flattened directory with the dotted path notation (see module docs).
- property metadata_dataframe: DataFrame#
A dataframe describing the
values_dataframe
.
-
name:
str
# The name of the model (i.e.
svm
). This name can have only alpha-numeric and underscore charaters.
-
params:
Dict
[str
,Hyperparam
]# The hyperparameters keyed by their names.
-
table:
Optional
[Dict
[str
,Any
]] = None# Overriding data used when creating a
Table
fromDataFrameDescriber.create_table()
.
- property values_dataframe: DataFrame#
A dataframe with parameter data. This includes the name, type, value and documentation.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]#
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.datdesc.hyperparam.HyperparamSet(models=<factory>, name=None)[source]#
Bases:
HyperparamContainer
The top level in the object graph hierarchy (see module docs). This contains a set of models and typically where calls by packages such as
hyperopt
are used to update the hyperparameters of the model(s).- __init__(models=<factory>, name=None)#
- create_describer(meta_path=None)[source]#
Return an object with metadata fully describing the hyperparameters of this model.
- Parameters:
meta_path (
Path
) – if provided, set the path on the returned instance- Return type:
- flatten(deep=False)[source]#
Return a flattened directory with the dotted path notation (see module docs).
-
models:
Dict
[str
,HyperparamModel
]# The models containing hyperparameters for this set.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_doc=False)[source]#
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.datdesc.hyperparam.HyperparamSetLoader(data, config=None, updates=())[source]#
Bases:
object
Loads a set of hyperparameters from a YAML
pathlib.Path
,dict
or streamio.TextIOBase
.- __init__(data, config=None, updates=())#
-
config:
Configurable
= None# The application configuration used to update the hyperparameters from other sections.
-
data:
Union
[Dict
[str
,Any
],Path
,TextIOBase
]# The source of data to load, which is a YAML
pathlib.Path
,dict
or streamio.TextIOBase
.- See:
- load(**kwargs) HyperparamSet #
Load and return the hyperparameter object graph from
data
.- Return type:
HyperparamSet
- exception zensols.datdesc.hyperparam.HyperparamValueError[source]#
Bases:
HyperparamError
Raised for bad values set on a hyperparameter.
- __annotations__ = {}#
- __module__ = 'zensols.datdesc.hyperparam'#
zensols.datdesc.mng#
Contains the manager classes that invoke the tables to generate.
- class zensols.datdesc.mng.CsvToLatexTable(tables, package_name)[source]#
Bases:
Writable
Generate a Latex table from a CSV file.
- __init__(tables, package_name)#
zensols.datdesc.opt#
Contains container and utility classes for hyperparameter optimization. These classes find optimial hyperparamters for a model and save the results as JSON files. This module is meant to be used by command line applications configured as Zensols Resource libraries.
- see:
- class zensols.datdesc.opt.CompareResult(initial_param, initial_loss, initial_scores, best_eval_ix, best_param, best_loss, best_scores)[source]#
Bases:
Dictable
Contains the loss and scores of an initial run and a run found on the optimal hyperparameters.
- __init__(initial_param, initial_loss, initial_scores, best_eval_ix, best_param, best_loss, best_scores)#
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.datdesc.opt.HyperparamResult(name, hyp, scores, loss, eval_ix)[source]#
Bases:
Dictable
Results of an optimization and optionally the best fit.
- __init__(name, hyp, scores, loss, eval_ix)#
- classmethod from_file(path)[source]#
Restore a result from a file name.
- Parameters:
path (
Path
) – the path from which to restore- Return type:
-
hyp:
HyperparamModel
# The updated hyperparameters.
-
name:
str
# The name of the of
HyperparameterOptimizer
, which is the directory name.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.datdesc.opt.HyperparamRun(runs)[source]#
Bases:
Dictable
A container for the entire optimization run. The best run contains the best fit (
HyperparamResult
) as predicted by the hyperparameter optimization algorithm.- __init__(runs)#
- property best_result: HyperparamResult#
The result that had the lowest loss.
- property final: HyperparamResult#
The results of the final run, which as the best fit (see class docs).
- classmethod from_dir(path)[source]#
Return an instance with the runs stored in directory
path
.- Return type:
-
runs:
Tuple
[Tuple
[Path
,HyperparamResult
]]# The results from previous runs.
- class zensols.datdesc.opt.HyperparameterOptimizer(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)[source]#
Bases:
object
Creates the files used to score optimizer output.
- __init__(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)#
- property aggregate_score_dir: Path#
The output directory containing runs with the best parameters of the top N results (see
aggregate_scores()
).
- aggregate_scores()[source]#
Aggregate best score results as a separate CSV file for each data point with
get_score_dataframe()
. This is saved as a separate file for each optmiziation run since this method can take a long time as it will re-score the dataset. These results are then “stiched” together withgather_aggregate_scores()
.
-
baseline_path:
Path
= None# A JSON file with hyperparameter settings to set on start. This file contains the output portion of the
final.json
results, (which are the results parsed and set inHyperparamResult
).
- property config_factory: ConfigFactory#
The app config factory.
- gather_aggregate_scores()[source]#
Return a dataframe of all the aggregate scores written by
aggregate_scores()
.- Return type:
- get_best_results()[source]#
Return the best results across all hyperparameter optimization runs with keys as run names.
- Return type:
- get_comparison()[source]#
Compare the scores of the default parameters with those predicted by the optimizer of the best run.
- Return type:
- get_run(result_dir=None)[source]#
Get the best run from the file system.
- Parameters:
result_dir (
Path
) – the result directory, which defaults toopt_intermediate_dir
- Return type:
- get_score_dataframe(iterations=None)[source]#
Create a dataframe from the results scored from the best hyperparameters.
-
hyperparam_names:
Tuple
[str
,...
] = ()# The name of the hyperparameters to use to create the space.
- See:
_create_space()
- property hyperparams: HyperparamModel#
The model hyperparameters to be updated by the optimizer.
-
intermediate_dir:
Path
= PosixPath('opthyper')# The directory where the intermediate results are saved while the algorithm works.
-
max_evals:
int
= 1# The maximum number of evaluations of the hyperparmater optimization algorithm to execute.
-
name:
str
= 'default'# The name of the optimization experiment set. This has a bearing on where files are stored (see
opt_intermediate_dir
).
- property results_intermediate_dir: Path#
The directory that has all intermediate results by subdirectory name.
-
show_progressbar:
bool
= True# Whether or not to show the progress bar while running the optimization.
- write_best_result(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_param_json=False)[source]#
Print the results from the best run.
- Parameters:
include_param_json (
bool
) – whether to output the JSON formatted hyperparameters
- write_compare(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#
Write the results of a compare of the initial hyperparameters against the optimized.
- write_score(writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#
Restore the hyperparameter state, score the data and print the results. Use the
baseline
parameters if available, otherwise use the parameters from the best best run.- Return type:
- write_scores(output_file=None, iterations=None)[source]#
Write a file of the results scored from the best hyperparameters.
- Parameters:
output_file (
Path
) – where to write the CSV file; defaults to a file inopt_intermediate_dir
iterations (
int
) – the number times the objective is called to produce the results (the objective space is not altered)
zensols.datdesc.optscore#
An optimizer that uses a Scorer
as an objective.
- class zensols.datdesc.optscore.ScoringHyperparameterOptimizer(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)[source]#
Bases:
HyperparameterOptimizer
An optimizer that uses a
Scorer
as the objective and a means to determine the loss. The default loss function is defined as 1 - F1 using thef1_score
as_dataframe()
column.- __init__(name='default', hyperparam_names=(), max_evals=1, show_progressbar=True, intermediate_dir=PosixPath('opthyper'), baseline_path=None)#
zensols.datdesc.table#
This module contains classes that generate tables.
- class zensols.datdesc.table.LongTable(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None, slack_col=0)[source]#
Bases:
SlackTable
- __init__(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None, slack_col=0)#
- property latex_environment#
Return the latex environment for the table.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOWrapper
) – the writer to dump the content of this writable
- class zensols.datdesc.table.SlackTable(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None, slack_col=0)[source]#
Bases:
Table
An instance of the table that fills up space based on the widest column.
- __init__(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None, slack_col=0)#
- property latex_environment#
Return the latex environment for the table.
- class zensols.datdesc.table.Table(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None)[source]#
Bases:
PersistableContainer
,Dictable
Generates a Zensols styled Latex table from a CSV file.
- __init__(path, name, caption, head=None, placement=None, size='normalsize', uses=('zentable', ), single_column=True, hlines=<factory>, double_hlines=<factory>, column_keeps=None, column_removes=<factory>, column_renames=<factory>, column_value_replaces=<factory>, column_aligns=None, percent_column_names=(), make_percent_column_names=<factory>, format_thousands_column_names=<factory>, column_evals=<factory>, read_kwargs=<factory>, write_kwargs=<factory>, replace_nan=None, blank_columns=<factory>, bold_cells=<factory>, bold_max_columns=<factory>, capitalize_columns=<factory>, index_col_name=None, df_code=None, df_code_pre=None, df_code_exec=None, df_code_exec_pre=None)#
-
blank_columns:
List
[int
]# A list of column indexes to set to the empty string (i.e. 0th to fixed the
Unnamed: 0
issues).
-
capitalize_columns:
Dict
[str
,bool
]# Capitalize either sentences (
False
values) or every word (True
values). The keys are column names.
The human readable string used to the caption in the table.
-
column_aligns:
str
= None# The alignment/justification (i.e.
|l|l|
for two columns). If not provided, they are automatically generated based on the columns of the table.
-
column_evals:
Dict
[str
,str
]# Keys are column names with values as functions (i.e. lambda expressions) evaluated with a single column value parameter. The return value replaces the column identified by the key.
-
column_value_replaces:
Dict
[str
,Dict
[Any
,Any
]]# Data values to replace in the dataframe. It is keyed by the column name and values are the replacements. Each value is a
dict
with orignal value keys and the replacements as values.
-
df_code:
str
= None# Python code executed that manipulates the table’s dataframe. The code has a local
df
variable and the returned value is used as the replacement. This is usually a one-liner used to subset the data etc. The code is evaluated witheval()
.
-
df_code_exec_pre:
str
= None# Like
df_code_pre
but invoke withexec()
instead ofeval()
.
-
df_code_pre:
str
= None# Like
df_code
but right after the source data is read and before any modifications. The code is evaluated witheval()
.
- static format_thousand(x, apply_k=True, add_comma=True)[source]#
Format a number as a string with comma separating thousands.
-
format_thousands_column_names:
Dict
[str
,Optional
[Dict
[str
,Any
]]]# Columns to format using thousands. The keys are the column names of the table and the values are either
None
or the keyword arguments toformat_thousand()
.
- property formatted_dataframe: DataFrame#
The
dataframe
with the formatting applied to it used to create the Latex table. Modifications such as string replacements for adding percents is done.
-
head:
str
= None# The header to use for the table, which is used as the text in the list of tables and made bold in the table.
-
make_percent_column_names:
Dict
[str
,int
]# Each columnn in the map will get rounded to the value * 100 of the name. For example,
{'ann_per': 3}
will round columnann_per
to 3 decimal places.
-
read_kwargs:
Dict
[str
,str
]# Keyword arguments used in the
read_csv()
call when reading the CSV file.
-
replace_nan:
str
= None# Replace NaN values with a the value of this field as
tabulate()
is not using the missing value due to some bug I assume.
- serialize()[source]#
Return a data structure usable for YAML or JSON output by flattening Python objects.
-
single_column:
bool
= True# Makes the table one column wide in a two column. Setting this to false generates a
table*
two column table, which won’t work in beamer (slides) document types.
-
size:
str
= 'normalsize'# The size of the table, and one of:
Huge
huge
LARGE
Large
large
normalsize (default)
small
footnotesize
scriptsize
tiny
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#
Write this instance as either a
Writable
or as aDictable
. If class attribute_DICTABLE_WRITABLE_DESCENDANTS
is set asTrue
, then use thewrite()
method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adict
recursively usingasdict()
, then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDES
is set, those attributes are removed from what is written in thewrite()
method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOWrapper
) – the writer to dump the content of this writable
Module contents#
Generate Latex tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file.
- Example::
- latextablenamehere:
type: slack slack_col: 0 path: ../config/table-name.csv caption: Some Caption placement: t! size: small single_column: true percent_column_names: [‘Proportion’]