Model¶
Please first read the data pre-processing documentation before proceeding. In this document, the model for the Iris example is defined and we train and test the model.
The model contains of two kinds of configuration and set up:
network settings: configuration of the neural network itself, such as types and numbers of layers
model settings: configuration of the model, such as the criterion, optimizer, and learning rate
Class Name Parlance¶
Network models usually implement layers of a deep learning network. To follow the convention set by PyTorch, the Python classes that implement the layers are referred to as modules and don’t carry the term layer in the class name.
Given that the modules that implement layers in this framework typically require a lot of configuration, a separate settings class is given to each corresponding module implementation. For example, a DeepLinearNetworkSettings configures a DeepLinear layer.
Network Settings¶
The network settings declares which network modules to use in the model. The network settings is a class that contains the configuration that tells the module how to build itself, and what that module is. In the Iris example, we first need to create the class:
@dataclass
class IrisNetworkSettings(DeepLinearNetworkSettings):
def get_module_class_name(self) -> str:
return __name__ + '.IrisNetwork'
We only need to extend from DeepLinearNetworkSettings, which already has all
the configuration we need since our model is a simple linear set of layers.
We’ll define that configuration soon. However, we must override the abstract
method get_module_class_name
, which tells us what model to create at
train/test time.
Configuring the Network Settings¶
The IrisNetworkSettings
instance will be populated by the ConfigFactory
with fields inherited from DeepLinearNetworkSettings:
[net_settings]
class_name = iris.model.IrisNetworkSettings
dropout = 0.1
activation = None
middle_features = eval: [5, 1]
in_features = 4
out_features = 3
proportions = False
repeats = 1
batch_norm_d = None
batch_norm_features = None
which will create a deep linear network expecting 4 input features (one for each row feature of the flower size data), with 5 times the number of features in the second layer, same number of feature for the third layer, and finally an output of three features–one for each flower type. See DeepLinearNetworkSettings for more details.
Network Model¶
The BaseNetworkModule class provides additional debugging and logging convenience methods.
Debugging¶
The method _debug
in the base class logs as debug to
the passed logger but also provides additional formatting indicating the name
of the model, which is taken from MODULE_NAME
. For a simple module like
this, it might seem unnecessary. However, this additional information is
crucial in debugging large models. The _shape_debug
method logs the shape of
a tensor.
Extending the Base Module¶
Finally, we provide an implementation of BaseNetworkModule, which extends
from torch.nn.Module
.
class IrisNetwork(BaseNetworkModule):
MODULE_NAME = 'iris'
def __init__(self, net_settings: IrisNetworkSettings):
super().__init__(net_settings, logger)
self.fc = DeepLinear(net_settings)
def deallocate(self):
super().deallocate()
self._deallocate_children_modules()
def _forward(self, batch: Batch) -> Tensor:
if self.logger.isEnabledFor(logging.DEBUG):
self._debug(f'label shape: {batch.get_labels().shape}, ' +
f'{batch.get_labels().dtype}')
x = batch.get_flower_dimensions()
self._shape_debug('input', x)
x = self.fc(x)
self._shape_debug('linear', x)
return x
Note that we pass our own logger to the base class, which is needed for the
aforementioned logging convenience methods and is set attribute logger
in the
base class. Another important difference is we provide a deallocate
method
(see the memory management section for more information)
and a _forward
method. This private forward method now takes a batch object
instance that’s been decoded by a vectorizer. The Batch object has the
vectorized tensors ready to be used, the labels, and BatchMetadata. The
Batch object get load the original data point objects with the
get_data_points method.
Configuring the Network Model¶
In the initializer, all we need to do is create a DeepLinear module with the configuration give from the ConfigFactory that was used to create the network settings instance.
The model is configured as follows:
[model_settings]
class_name = zensols.deeplearn.ModelSettings
path = path: ${default:temporary_dir}/model
nominal_labels = False
learning_rate = 0.1
batch_iteration = gpu
epochs = 15
which creates a ModelSettings
class used to create the model that stores the
model in the temporary directory under model
. The nominal_labels
tell the
framework that the class is not an integer nominal index for each class. this
is because the model outputs a three neuron output for each flower type (see
the network model section). The model will train for 15
epochs using a learning rate of 0.1 using the (default) torch.nn.Adam
optimizer with the (default) loss function torch.nn.CrossEntropyLoss
. See
ModelSettings documentation for more information.
The batch_iteration = gpu
means the entire batch data set will be decoded in
to GPU memory (see the training section). See the ModelSettings
class for full documentation on each option.
Executor¶
Finally we define the ModelExecutor, which is the class that trains, validates and tests the model:
[executor]
class_name = zensols.deeplearn.model.ModelExecutor
model_name = Iris
model_settings = instance: model_settings
net_settings = instance: net_settings
dataset_stash = instance: iris_dataset_stash
dataset_split_names = eval: 'train dev test'.split()
result_path = path: ${default:results_dir}
This creates an executor that uses the string Iris
in all generated graphs
and result output. It refers to the model and network settings we have already
defined. It uses the iris_dataset_stash
we defined in the
preprocess documentation.
You can use the executor directly as demonstrated in the Iris notebook or with a facade as shown in the facade documentation.
During the training of the model, if the update_path
path is configured on
the executor, the training and validation loss is plotted. This file also
informs the ModelExecutor of any changes while training by providing
configuration as a JSON file. For example:
{"epoch": 20}
resets the current epoch to 20
by the TrainManager. By doing this, you can
shorten or lengthen training time. If the file exists, but is empty, or
otherwise cannot be parsed, the training is early stopped.
Training¶
The executor is used to train, validation and test the model. During the training phase, this includes:
Loading the batch(es) in memory.
Training the model with the training data set with auto gradients on for each epoch.
Using the validation data set to calculate the validation loss for each epoch.
Across each epoch, if and only if the validation loss is lower, the model is saved.
Add the validation loss and outcome of each data point to a ModelResult.
If the validation loss has not decreased within the window set by the ModelSettings
max_consecutive_increased_count
parameter, then early stop training the model.If the model has been trained on the number of
epochs
set in the ModelSettings, then stop.Otherwise, the learning rate is adjusted with a schedule based on the
scheduler_class_name
ModelSettings parameter and we iterate over another epoch.
Memory Management¶
When the training process starts, batches are loaded in one of three ways:
gpu
, buffers all data in the GPU,cpu
, which means keep all batches in CPU memory (the default), orbuffered
which means to buffer only one batch at a time (only for very large data).
When using the gpu
setting, all batches (and thus all data) is loaded in to
GPU memory all at once. This means the output of the decoding process detailed
in the vectorizer documentation. If this parameter is set to cpu
, all
batches would be decoded to CPU memory, then moved to the GPU for each epoch.
For the buffered
setting, all batches are decoded for each epoch.
In order to keep a low memory profile, the Python garbage collector is called
at different intervals depending on the gc_level
parameter in the
ModelSettings.
See the model section for more details on the model settings.
Testing¶
The testing data set is loaded in the same way as the training data set. In the same way, the outcome of each testing data point is stored in ModelResult after being loaded from the file system saved from the training phase.