zensols.dataframe package¶
Submodules¶
zensols.dataframe.config module¶
Configuration classes using dataframes as sources.
- class zensols.dataframe.config.DataframeConfig(csv_path, default_section, columns=None, column_eval=None, counts=None)[source]¶
Bases:
DictionaryConfigA
Configurablethat dataframes as sources. This is useful for providing labels to nominial label vectorizers.- __init__(csv_path, default_section, columns=None, column_eval=None, counts=None)[source]¶
Initialize the configuration from a dataframe (see parameters).
- Parameters:
csv_path (
Path) – the path to the CSV file to create the dataframedefault_section (
str) – the singleton section name, which has as options a list of the columns of the dataframecolumns (
Dict[str,str]) – the columns to add to the configuration from the dataframe withkey, valuesascolumn names, option namescolumn_eval (
str) – Python code to evaluate and apply to each column if providedcounts (
Dict[str,str]) – additional option entries in the section to add as counts of respective columns withkey, valuesascolumn option names, new entry option names; where the ``column option namesare those given as values from thecolumnsdict
- default_section¶
- serializer¶
zensols.dataframe.stash module¶
Stashes that operate on a dataframe, which are useful to common machine learning tasks.
- class zensols.dataframe.stash.AutoSplitDataframeStash(dataframe_path, split_col, key_path, distribution)[source]¶
Bases:
SplitKeyDataframeStashAutomatically a dataframe in to train, test and validation datasets by adding a
split_colwith the split name.- __init__(dataframe_path, split_col, key_path, distribution)¶
- exception zensols.dataframe.stash.DataframeError[source]¶
Bases:
APIErrorThrown for dataframe stash issues.
- __module__ = 'zensols.dataframe.stash'¶
- class zensols.dataframe.stash.DataframeStash(dataframe_path)[source]¶
Bases:
ReadOnlyStash,Deallocatable,Writable,PrimeableStashA factory stash that uses a Pandas data frame from which to load. It uses the data frame index as the keys and
pandas.Seriesas values. The dataframe is usually constructed by reading a file (i.e.CSV) and doing some transformation before using it in an implementation of this stash.The dataframe created by
_get_dataframe()must have a string or integer index since keys for all stashes are of typestr. The index will be mapped to a string if it is an int automatically.- __init__(dataframe_path)¶
- clear()[source]¶
Delete all data from the from the stash.
Important: Exercise caution with this method, of course.
- property dataframe¶
-
dataframe_path:
Path¶ The path to store the pickeled version of the generated dataframe created with
_get_dataframe().
- exists(name)[source]¶
Return
Trueif data with keynameexists.Implementation note: This
Stash.exists()method is very inefficient and should be overriden.- Return type:
- load(name)[source]¶
Load a data value from the pickled data with key
name. Semantically, this method loads the using the stash’s implementation. For exampleDirectoryStashloads the data from a file if it exists, but factory type stashes will always re-generate the data.- See:
get()- Return type:
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writerusing indentiondepth.- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- class zensols.dataframe.stash.DefaultDataframeStash(dataframe_path, split_col, key_path, input_csv_path)[source]¶
Bases:
SplitKeyDataframeStashA default implementation of
DataframeSplitStashthat creates the Pandas dataframe by simply reading it from a specificed CSV file. The index is a string type appropriate for a stash.- __init__(dataframe_path, split_col, key_path, input_csv_path)¶
- class zensols.dataframe.stash.ResourceFeatureDataframeStash(dataframe_path, split_col, installer, resource)[source]¶
Bases:
SplitColumnDataframeStashA dataframe that installs a corpus and then reads a file to create the Pandas dataframe.
- __init__(dataframe_path, split_col, installer, resource)¶
- class zensols.dataframe.stash.SplitColumnDataframeStash(dataframe_path, split_col)[source]¶
Bases:
DataframeStashA stash that provides a way to get the labels and label count of the dataframe.
- __init__(dataframe_path, split_col)¶
- class zensols.dataframe.stash.SplitKeyDataframeStash(dataframe_path, split_col, key_path)[source]¶
Bases:
SplitColumnDataframeStash,SplitKeyContainerA stash and split key container that reads from a dataframe.
- __init__(dataframe_path, split_col, key_path)¶
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writerusing indentiondepth.- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable