zensols.dataframe package¶
Submodules¶
zensols.dataframe.config module¶
Configuration classes using dataframes as sources.
- class zensols.dataframe.config.DataframeConfig(csv_path, default_section, columns=None, column_eval=None, counts=None)[source]¶
Bases:
DictionaryConfig
A
Configurable
that dataframes as sources. This is useful for providing labels to nominial label vectorizers.- __init__(csv_path, default_section, columns=None, column_eval=None, counts=None)[source]¶
Initialize the configuration from a dataframe (see parameters).
- Parameters:
csv_path (
Path
) – the path to the CSV file to create the dataframedefault_section (
str
) – the singleton section name, which has as options a list of the columns of the dataframecolumns (
Dict
[str
,str
]) – the columns to add to the configuration from the dataframe withkey, values
ascolumn names, option names
column_eval (
str
) – Python code to evaluate and apply to each column if providedcounts (
Dict
[str
,str
]) – additional option entries in the section to add as counts of respective columns withkey, values
ascolumn option names, new entry option names; where the ``column option names
are those given as values from thecolumns
dict
- default_section¶
- serializer¶
zensols.dataframe.stash module¶
Stashes that operate on a dataframe, which are useful to common machine learning tasks.
- class zensols.dataframe.stash.AutoSplitDataframeStash(dataframe_path, split_col, key_path, distribution)[source]¶
Bases:
SplitKeyDataframeStash
Automatically a dataframe in to train, test and validation datasets by adding a
split_col
with the split name.- __init__(dataframe_path, split_col, key_path, distribution)¶
- exception zensols.dataframe.stash.DataframeError[source]¶
Bases:
APIError
Thrown for dataframe stash issues.
- __module__ = 'zensols.dataframe.stash'¶
- class zensols.dataframe.stash.DataframeStash(dataframe_path)[source]¶
Bases:
ReadOnlyStash
,Deallocatable
,Writable
,PrimeableStash
A factory stash that uses a Pandas data frame from which to load. It uses the data frame index as the keys and
pandas.Series
as values. The dataframe is usually constructed by reading a file (i.e.CSV) and doing some transformation before using it in an implementation of this stash.The dataframe created by
_get_dataframe()
must have a string or integer index since keys for all stashes are of typestr
. The index will be mapped to a string if it is an int automatically.- __init__(dataframe_path)¶
- clear()[source]¶
Delete all data from the from the stash.
Important: Exercise caution with this method, of course.
- property dataframe¶
-
dataframe_path:
Path
¶ The path to store the pickeled version of the generated dataframe created with
_get_dataframe()
.
- exists(name)[source]¶
Return
True
if data with keyname
exists.Implementation note: This
Stash.exists()
method is very inefficient and should be overriden.- Return type:
- load(name)[source]¶
Load a data value from the pickled data with key
name
. Semantically, this method loads the using the stash’s implementation. For exampleDirectoryStash
loads the data from a file if it exists, but factory type stashes will always re-generate the data.- See:
get()
- Return type:
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writer
using indentiondepth
.- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable
- class zensols.dataframe.stash.DefaultDataframeStash(dataframe_path, split_col, key_path, input_csv_path)[source]¶
Bases:
SplitKeyDataframeStash
A default implementation of
DataframeSplitStash
that creates the Pandas dataframe by simply reading it from a specificed CSV file. The index is a string type appropriate for a stash.- __init__(dataframe_path, split_col, key_path, input_csv_path)¶
- class zensols.dataframe.stash.ResourceFeatureDataframeStash(dataframe_path, split_col, installer, resource)[source]¶
Bases:
SplitColumnDataframeStash
A dataframe that installs a corpus and then reads a file to create the Pandas dataframe.
- __init__(dataframe_path, split_col, installer, resource)¶
-
installer:
Installer
¶ The installer used to download and uncompress dataset.
-
resource:
Resource
¶ Use to resolve the corpus file.
- class zensols.dataframe.stash.SplitColumnDataframeStash(dataframe_path, split_col)[source]¶
Bases:
DataframeStash
A stash that provides a way to get the labels and label count of the dataframe.
- __init__(dataframe_path, split_col)¶
- class zensols.dataframe.stash.SplitKeyDataframeStash(dataframe_path, split_col, key_path)[source]¶
Bases:
SplitColumnDataframeStash
,SplitKeyContainer
A stash and split key container that reads from a dataframe.
- __init__(dataframe_path, split_col, key_path)¶
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the contents of this instance to
writer
using indentiondepth
.- Parameters:
depth (
int
) – the starting indentation depthwriter (
TextIOBase
) – the writer to dump the content of this writable