zensols.deeplearn.layer package

Submodules

zensols.deeplearn.layer.conv module

Convolution network creation utilities.

class zensols.deeplearn.layer.conv.Convolution1DLayerFactory(stride=1, padding=0, pool_stride=1, pool_padding=0, in_channels=1, out_channels=1, kernel_filter=2, pool_kernel_filter=2)[source]

Bases: ConvolutionLayerFactory

Two dimensional convoluation and output shape factory.

property C_in: int
property F: int
property L_in: int
__init__(stride=1, padding=0, pool_stride=1, pool_padding=0, in_channels=1, out_channels=1, kernel_filter=2, pool_kernel_filter=2)
create_batch_norm_layer()[source]

Create the batch norm layer that follows the pool layer.

Return type:

Module

create_conv_layer()[source]

Create the convolution layer for this layer in the stack.

Return type:

Module

create_pool_layer()[source]

Create the pool layer that follows the convolutional layer.

Return type:

Module

in_channels: int = 1

Number of channels in the input image (C_in).

kernel_filter: int = 2

Size of the kernel filter dimension in length (F).

out_channels: int = 1

Number of channels/filters produced by the convolution.

pool_kernel_filter: Tuple[int] = 2

The filter used for max pooling.

class zensols.deeplearn.layer.conv.Convolution2DLayerFactory(stride=1, padding=0, pool_stride=1, pool_padding=0, width=1, height=1, depth=1, kernel_filter=(2, 2), n_filters=1, pool_kernel_filter=(2, 2))[source]

Bases: ConvolutionLayerFactory

Two dimensional convoluation and output shape factory. Implementation as matrix multiplication section taken from the `Standford CNN`_ class.

Example (im2col)::

W_in = H_in = 227 Ch_in = D_in = 3 Ch_out = D_out = 3 K = 96 F = (11, 11) S = 4 P = 0 W_out = H_out = 227 - 11 + (2 * 0) / 4 = 55 output locations X_col = Fw^2 * D_out x W_out * H_out = 11^2 * 3 x 55 * 55 = 363 x 3025

Example (im2row)::

W_row = 96 filters of size 11 x 11 x 3 => K x 11 * 11 * 3 = 96 x 363

Result of convolution: transpose(W_row) dot X_col. Must reshape back to 55 x 55 x 96

property D: int
property F: int
property H: int
property H_out
property K: int
property W: int
property W_out
property W_row
property X_col
__init__(stride=1, padding=0, pool_stride=1, pool_padding=0, width=1, height=1, depth=1, kernel_filter=(2, 2), n_filters=1, pool_kernel_filter=(2, 2))
create_batch_norm_layer()[source]

Create the batch norm layer that follows the pool layer.

Return type:

Module

create_conv_layer()[source]

Create the convolution layer for this layer in the stack.

Return type:

Module

create_pool_layer()[source]

Create the pool layer that follows the convolutional layer.

Return type:

Module

depth: int = 1

The volume, which is usually same as n_filters (D).

height: int = 1

The height of the image/data (H).

kernel_filter: Tuple[int, int] = (2, 2)

The kernel filter dimension in width X height (F).

n_filters: int = 1

The number of filters, aka the filter depth/volume (K).

pool_kernel_filter: Tuple[int] = (2, 2)

The filter used for max pooling.

width: int = 1

The width of the image/data (W).

class zensols.deeplearn.layer.conv.ConvolutionLayerFactory(stride=1, padding=0, pool_stride=1, pool_padding=0)[source]

Bases: Dictable

Create convolution layers and output shape calculator.

property P: int

Padding.

property S: int

Stride.

__init__(stride=1, padding=0, pool_stride=1, pool_padding=0)
clone()[source]

Return a clone of this factory instance.

Return type:

ConvolutionLayerFactory

abstract create_batch_norm_layer()[source]

Create the batch norm layer that follows the pool layer.

Return type:

Module

abstract create_conv_layer()[source]

Create the convolution layer for this layer in the stack.

Return type:

Module

abstract create_pool_layer()[source]

Create the pool layer that follows the convolutional layer.

Return type:

Module

property dim: int
iter_layers(use_pool=True)[source]

Iterate through over subsequent convolution and pooled stacked networks. Use with :function:`itertools.islice` to limit the output.

Return type:

Iterable[ConvolutionLayerFactory]

Returns:

subsequent layers after the current instance for all valid layers

next_layer(use_pool=True)[source]

Get a new factory that represents the next layer of the convolution stack.

Parameters:

use_pool (bool) – whether to use the output shape of the pool for the next layer’s intput and output chanel settings

Return type:

ConvolutionLayerFactory

property out_conv_shape: Tuple[int, ...]

The convolution layer shape before flattened in to one dimension.

property out_pool_shape: Tuple[int, ...]

The pooling layer shape before flattened in to one dimension.

padding: int = 0

The zero’d number of cells on the ends of the image/data (P).

pool_padding: int = 0

The pooling zero’d number of cells on the ends of the image/data.

pool_stride: int = 1

The pooling stride, which is the number of cells to skip for each.

stride: int = 1

The stride, which is the number of cells to skip for each (S).

validate(raise_error=True)[source]

Validate the parameters of the factory.

Parameters:

raise_error (bool) – if True raises and error when invalid

Raises:

LayerError – if invalid and raise_error is True

Return type:

str

zensols.deeplearn.layer.crf module

Conditional random field PyTorch module forked from Kemal Kurniawan’s pytorch_crf GitHub repository. See the Torch CRF section of the README.md module documentation for more information.

see:

pytorch_crf

see:

Torch CRF Readme

class zensols.deeplearn.layer.crf.CRF(num_tags, batch_first=False, score_reduction='skip')[source]

Bases: Module

Conditional random field.

This module implements a conditional random field [LMP01]. The forward computation of this class computes the log likelihood of the given sequence of tags and emission score tensor. This class also has ~CRF.decode method which finds the best tag sequence given an emission score tensor using Viterbi algorithm.

Parameters:
  • num_tags – Number of tags.

  • batch_first – Whether the first dimension corresponds to the size of a minibatch.

  • score_reduction

    reduces how the score output over batches, and then

    tags, and has shape (batch size, number of tags) with the exception of tags, which has shape (batch_size, sequence length, number of tags); how output is returned in decode() by:

    • skip: do not return scores, only the decoded output (default)

    • none: return the scores unaltered, then divide by the batch count

    • tags: all scores

    • sum: sum the max over batches, then divide by the batch count

    • max: max over each batch max, then divide by the batch count

    • min: min over each batch max, then divide by the batch count

    • mean: average the max over batchs, then divide by the batch count

start_transitions

Start transition score tensor of size (num_tags,).

Type:

~torch.nn.Parameter

end_transitions

End transition score tensor of size (num_tags,).

Type:

~torch.nn.Parameter

transitions

Transition score tensor of size (num_tags, num_tags).

Type:

~torch.nn.Parameter

[LMP01]

Lafferty, J., McCallum, A., Pereira, F. (2001). “Conditional random fields: Probabilistic models for segmenting and labeling sequence data”. Proc. 18th International Conf. on Machine Learning. Morgan Kaufmann. pp. 282–289.

__init__(num_tags, batch_first=False, score_reduction='skip')[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

decode(emissions, mask=None)[source]

Find the most likely tag sequence using Viterbi algorithm.

Return type:

Union[List[List[int]], Tuple[List[List[int]], Tensor]]

Parameters:
  • emissions (~torch.Tensor) – Emission score tensor of size (seq_length, batch_size, num_tags) if batch_first is False, (batch_size, seq_length, num_tags) otherwise.

  • mask (~torch.ByteTensor) – Mask tensor of size (seq_length, batch_size) if batch_first is False, (batch_size, seq_length) otherwise.

Returns:

List of list containing the best tag sequence for each batch and optionally the scores based on the (~`score_reduction`) parameter in __init__().

forward(emissions, tags, mask=None, reduction='sum')[source]

Compute the conditional log likelihood of a sequence of tags given emission scores.

Return type:

Tensor

Parameters:
  • emissions (~torch.Tensor) – Emission score tensor of size (seq_length, batch_size, num_tags) if batch_first is False, (batch_size, seq_length, num_tags) otherwise.

  • tags (~torch.LongTensor) – Sequence of tags tensor of size (seq_length, batch_size) if batch_first is False, (batch_size, seq_length) otherwise.

  • mask (~torch.ByteTensor) – Mask tensor of size (seq_length, batch_size) if batch_first is False, (batch_size, seq_length) otherwise.

  • reduction – Specifies the reduction to apply to the output: none|sum|mean|token_mean. none: no reduction will be applied. sum: the output will be summed over batches. mean: the output will be averaged over batches. token_mean: the output will be averaged over tokens.

Returns:

The log likelihood. This will have size (batch_size,) if reduction is none, () otherwise.

Return type:

~torch.Tensor

reset_parameters()[source]

Initialize the transition parameters.

The parameters will be initialized randomly from a uniform distribution between -0.1 and 0.1.

Return type:

None

zensols.deeplearn.layer.linear module

Convenience classes for linear layers.

class zensols.deeplearn.layer.linear.DeepLinear(net_settings, sub_logger=None)[source]

Bases: BaseNetworkModule

A layer that has contains one more nested layers, including batch normalization and activation. The input and output layer shapes are given and an optional 0 or more middle layers are given as percent changes in size or exact numbers.

If the network settings are configured to have batch normalization, batch normalization layers are added after each linear layer.

The drop out and activation function (if any) are applied in between each layer allowing other drop outs and activation functions to be applied before and after. Note that the activation is implemented as a function, and not a layer.

For example, if batch normalization and an activation function is configured and two layers are configured, the network is configured as:

  1. linear

  2. batch normalization

3. activation 5. dropout 6. linear 7. batch normalization 8. activation 9. dropout

The module also provides the output features of each layer with n_features_after_layer() and ability to forward though only the first given set of layers with forward_n_layers().

MODULE_NAME: ClassVar[str] = 'linear'

The module name used in the logging message. This is set in each inherited class.

__init__(net_settings, sub_logger=None)[source]

Initialize the deep linear layer.

Parameters:
  • net_settings (DeepLinearNetworkSettings) – the deep linear layer configuration

  • sub_logger (Logger) – the logger to use for the forward process in this layer

deallocate()[source]

Deallocate all resources for this instance.

forward_n_layers(x, n_layers, full_forward=False)[source]

Forward throught the first 0 index based N layers.

Parameters:
  • n_layers (int) – the number of layers to forward through (0-based index)

  • full_forward (bool) – if True, also return the full forward as a second parameter

Return type:

Tensor

Returns:

the tensor output of all layers or a tuple of (N-th layer, all layers)

get_batch_norm_layers()[source]

Return all batch normalize layers.

Return type:

Tuple[Module]

get_linear_layers()[source]

Return all linear layers.

Return type:

Tuple[Module]

n_features_after_layer(nth_layer)[source]

Get the output features of the Nth (0 index based) layer.

Parameters:

nth_layer – the layer to use for getting the output features

Return type:

int

property out_features: int

The number of features output from all layers of this module.

class zensols.deeplearn.layer.linear.DeepLinearNetworkSettings(name, config_factory, torch_config, batch_norm_d, batch_norm_features, dropout, activation, in_features, out_features, middle_features, proportions, repeats)[source]

Bases: ActivationNetworkSettings, DropoutNetworkSettings, BatchNormNetworkSettings

Settings for a deep fully connected network using DeepLinear.

__init__(name, config_factory, torch_config, batch_norm_d, batch_norm_features, dropout, activation, in_features, out_features, middle_features, proportions, repeats)
get_module_class_name()[source]

Returns the fully qualified class name of the module to create by ModelManager. This module takes as the first parameter an instance of this class.

Important: This method is not used for nested modules. You must declare specific class names in the configuration for those nested class naems.

Return type:

str

in_features: int

The number of features to the first layer.

middle_features: Tuple[Union[int, float, Dict[str, Any]], ...]

The number of features in the middle layers; if proportions is True, then each number is how much to grow or shrink as a percetage of the last layer, otherwise, it’s the number of features.

If any element is a dictionary, then it iterprets the keys as:

  • value: the value as if the entry was a number, and defaults to 1

  • apply: a sequence of strings indicating the order or the layers to

    apply with default linear, bnorm, activation, dropout; if a layer is omitted it won’t be applied

  • batch_norm_features: the number of features to use in a batch, which

    might change based on ordering or last to use the last number of parameters computed in the deep linear network; otherwise it is computed as the size of the current linear input

out_features: Union[int, Dict[str, Any]]

The number of features as output from the last layer. If a dictionary, it follows the same rules as middle_features.

proportions: bool

Whether or not to interpret middle_features as a proportion of the previous layer or use directly as the size of the middle layer.

repeats: int

The number of repeats of the middle_features configuration.

zensols.deeplearn.layer.recur module

This file contains a convenience wrapper around RNN, GRU and LSTM modules in PyTorch.

class zensols.deeplearn.layer.recur.RecurrentAggregation(net_settings, sub_logger=None)[source]

Bases: BaseNetworkModule

A recurrent neural network model with an output aggregation. This includes RNNs, LSTMs and GRUs.

MODULE_NAME: ClassVar[str] = 'recur'

The module name used in the logging message. This is set in each inherited class.

__init__(net_settings, sub_logger=None)[source]

Initialize the recurrent layer.

Parameters:
deallocate()[source]

Deallocate all resources for this instance.

property out_features: int

The number of features output from all layers of this module.

class zensols.deeplearn.layer.recur.RecurrentAggregationNetworkSettings(name, config_factory, torch_config, dropout, network_type, aggregation, bidirectional, input_size, hidden_size, num_layers)[source]

Bases: DropoutNetworkSettings

Settings for a recurrent neural network. This configures a RecurrentAggregation layer.

__init__(name, config_factory, torch_config, dropout, network_type, aggregation, bidirectional, input_size, hidden_size, num_layers)
aggregation: str

A convenience operation to aggregate the parameters; this is one of: max: return the max of the output states ave: return the average of the output states last: return the last output state none: do not apply an aggregation function.

bidirectional: bool

Whether or not the network is bidirectional.

get_module_class_name()[source]

Returns the fully qualified class name of the module to create by ModelManager. This module takes as the first parameter an instance of this class.

Important: This method is not used for nested modules. You must declare specific class names in the configuration for those nested class naems.

Return type:

str

hidden_size: int

The size of the hidden states of the network.

input_size: int

The input size to the network.

network_type: str

One of rnn, lstm or gru.

num_layers: int

The number of “stacked” layers.

zensols.deeplearn.layer.recurcrf module

Contains an implementation of a recurrent with a conditional random field layer. This is usually configured as a BiLSTM CRF.

class zensols.deeplearn.layer.recurcrf.RecurrentCRF(net_settings, sub_logger=None, use_crf=True)[source]

Bases: BaseNetworkModule

Adapt the CRF module using the framework based BaseNetworkModule class. This provides methods forward_recur_decode() and decode(), which decodes the input.

This adds a recurrent neural network and a fully connected feed forward decoder layer before the CRF layer.

MODULE_NAME: ClassVar[str] = 'recur crf'

The module name used in the logging message. This is set in each inherited class.

__init__(net_settings, sub_logger=None, use_crf=True)[source]

Initialize the reccurent CRF layer.

Parameters:
deallocate()[source]

Deallocate all resources for this instance.

decode(x, mask)[source]

Forward the input though the recurrent network, decoder, and then the CRF.

Parameters:
  • x (Tensor) – the input

  • mask (Tensor) – the mask used to block the last N states not provided

Return type:

Tuple[Tensor, Tensor]

Returns:

the CRF sequence output and the score provided by the CRF’s veterbi algorithm as a tuple

forward_recur_decode(x)[source]

Forward the input through the recurrent network (i.e. LSTM), batch normalization and activation (if confgiured), and decoder output.

Note: this layer forwards batch normalization, activation and drop out (for those configured) after the recurrent layer is forwarded. However, the subordinate recurrent layer can also be configured with a dropout when having more than one stacked layer.

Parameters:

x (Tensor) – the network input

Return type:

Tensor

Returns:

the fully connected linear feed forward decoded output

to(*args, **kwargs)[source]

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)[source]
to(dtype, non_blocking=False)[source]
to(tensor, non_blocking=False)[source]
to(memory_format=torch.channels_last)[source]

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtypes. In addition, this method will only cast the floating point or complex parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters:
  • device (torch.device) – the desired device of the parameters and buffers in this module

  • dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module

  • tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module

  • memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns:

self

Return type:

Module

Examples:

>>> # xdoctest: +IGNORE_WANT("non-deterministic")
>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
class zensols.deeplearn.layer.recurcrf.RecurrentCRFNetworkSettings(name, config_factory, torch_config, batch_norm_d, batch_norm_features, dropout, activation, network_type, bidirectional, input_size, hidden_size, num_layers, num_labels, decoder_settings, score_reduction)[source]

Bases: ActivationNetworkSettings, DropoutNetworkSettings, BatchNormNetworkSettings

Settings for a recurrent neural network using RecurrentCRF.

__init__(name, config_factory, torch_config, batch_norm_d, batch_norm_features, dropout, activation, network_type, bidirectional, input_size, hidden_size, num_layers, num_labels, decoder_settings, score_reduction)
bidirectional: bool

Whether or not the network is bidirectional (usually True).

decoder_settings: DeepLinearNetworkSettings

The decoder feed forward network.

get_module_class_name()[source]

Returns the fully qualified class name of the module to create by ModelManager. This module takes as the first parameter an instance of this class.

Important: This method is not used for nested modules. You must declare specific class names in the configuration for those nested class naems.

Return type:

str

hidden_size: int

The size of the hidden states of the network.

input_size: int

The input size to the layer.

network_type: str

One of rnn, lstm or gru (usually lstm).

num_labels: int

The number of output labels from the CRF.

num_layers: int

The number of “stacked” layers.

score_reduction: str

Reduces how the score output over batches.

See:

CRF

to_recurrent_aggregation()[source]
Return type:

RecurrentAggregationNetworkSettings

Module contents

Provides neural network layer implementations, which are all subclasses of torch.nn.Module.

exception zensols.deeplearn.layer.LayerError[source]

Bases: ModelError

Thrown for all deep learning layer errors.

__annotations__ = {}
__module__ = 'zensols.deeplearn.layer'