zensols.deeplearn.dataframe package


zensols.deeplearn.dataframe.batch module

An implementation of batch level API for Pandas dataframe based data.

class zensols.deeplearn.dataframe.batch.DataframeBatch(batch_stash, id, split_name, data_points)[source]

Bases: Batch

A batch of data that contains instances of DataframeDataPoint, each of which has the row data from the dataframe.

__init__(batch_stash, id, split_name, data_points)

A utility method to a tensor of all features of all columns in the datapoints.

Return type:



a tensor of shape (batch size, feature size), where the feaure size is the number of all features vectorized; that is, a data instance for each row in the batch, is a flattened set of features that represent the respective row from the dataframe

class zensols.deeplearn.dataframe.batch.DataframeBatchStash(name, config_factory, delegate, config, chunk_size, workers, data_point_type, batch_type, split_stash_container, vectorizer_manager_set, batch_size, model_torch_config, data_point_id_sets_path, decoded_attributes, batch_feature_mappings=None, batch_limit=9223372036854775807)[source]

Bases: BatchStash

A stash used for batches of data using DataframeBatch instances. This stash uses an instance of DataframeFeatureVectorizerManager to vectorize the data in the batches.

__init__(name, config_factory, delegate, config, chunk_size, workers, data_point_type, batch_type, split_stash_container, vectorizer_manager_set, batch_size, model_torch_config, data_point_id_sets_path, decoded_attributes, batch_feature_mappings=None, batch_limit=9223372036854775807)
property feature_vectorizer_manager: DataframeFeatureVectorizerManager
property flattened_features_shape: Tuple[int]
property label_shape: Tuple[int]
class zensols.deeplearn.dataframe.batch.DataframeDataPoint(id, batch_stash, row)[source]

Bases: DataPoint

A data point used in a batch, which contains a single row of data in the Pandas dataframe. When created, column is saved as an attribute in the instance.

__init__(id, batch_stash, row)
row: InitVar

zensols.deeplearn.dataframe.util module

Utility functionality for dataframe related containers.

class zensols.deeplearn.dataframe.util.DataFrameDictable[source]

Bases: Dictable

A container with utility methods that JSON and write Pandas dataframes.


Default width when writing the dataframe.


String used for NaNs.


zensols.deeplearn.dataframe.vectorize module

Contains classes used to vectorize dataframe data.

class zensols.deeplearn.dataframe.vectorize.DataframeFeatureVectorizerManager(name, config_factory, torch_config, configured_vectorizers, prefix, label_col, stash, include_columns=None, exclude_columns=None)[source]

Bases: FeatureVectorizerManager, Writable

A pure instance based feature vectorizer manager for a Pandas dataframe. All vectorizers used in this vectorizer manager are dynamically allocated and attached.

This class not only acts as the feature manager itself to be used in a FeatureVectorizerManager, but also provides a batch mapping to be used in a BatchStash.

__init__(name, config_factory, torch_config, configured_vectorizers, prefix, label_col, stash, include_columns=None, exclude_columns=None)
property batch_feature_mapping: BatchFeatureMapping

Return the mapping for zensols.deeplearn.batch.Batch instances.


Generate a feature id from the column name. This just attaches the prefix to the column name.

Return type:


property dataset_metadata: DataframeMetadata

Create a metadata from the data in the dataframe.

exclude_columns: Tuple[str] = None

The columns to be excluded, or if None (the default), no columns are excluded as features.


Return the shape if all vectorizers were used.

Return type:


include_columns: Tuple[str] = None

The columns to be included, or if None (the default), all columns are used as features.

property label_attribute_name: str

Return the label attribute.

label_col: str

The column that contains the label/class.

property label_shape: Tuple[int]

Return the shape if all vectorizers were used.

prefix: str

The prefix to use for all vectorizers in the dataframe (i.e. adl_ for the Adult dataset test case example).

stash: DataframeStash

The stash that contains the dataframe.

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write the contents of this instance to writer using indention depth.

  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

class zensols.deeplearn.dataframe.vectorize.DataframeMetadata(prefix, label_col, label_values, continuous, descrete)[source]

Bases: Writable

Metadata for a Pandas dataframe.

__init__(prefix, label_col, label_values, continuous, descrete)
continuous: Tuple[str]

The list of data columns that are continuous.

descrete: Dict[str, Tuple[str]]

A mapping of label to nominals the column takes for descrete mappings.

label_col: str

The column that contains the label/class.

label_values: Tuple[str]

All classes (unique across label_col).

prefix: str

The prefix to use for all vectorizers in the dataframe (i.e. adl_ for the Adult dataset test case example).

write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write the contents of this instance to writer using indention depth.

  • depth (int) – the starting indentation depth

  • writer (TextIOBase) – the writer to dump the content of this writable

Module contents

Contains API framework code for vectorizing and batching dataframe data without the necessity of a domain specific model implementation.