flint.nn¶

Module¶

class flint.nn.modules.Module[source]¶

Bases: object

Base class for all modules.

Parameters: name (str) – name of the module

add_module(name: str, module: Optional[flint.nn.modules.module.Module]) → None[source]¶

Add a child module to the current module.

Parameters

name (str) – name of the child module
module (Module) – child module to be added to the module

children() → Iterator[flint.nn.modules.module.Module][source]¶

Returns an iterator over immediate children modules.

Yields: module (Module) – A child module

eval() → flint.nn.modules.module.Module[source]¶

Sets the module in evaluation mode.

This has effect only on the following modules:

flint.nn.Dropout

See their documentations for details of their behaviors in training / evaluation mode.

Returns: module
Return type: Module

modules() → Iterator[flint.nn.modules.module.Module][source]¶

Returns an iterator over all modules in the network, only yielding the module itself.

Yields: Module – a module in the network

Note

Duplicate modules are returned only once.

named_children() → Iterator[Tuple[str, flint.nn.modules.module.Module]][source]¶

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields: (string, Module) – Tuple containing a name and child module

named_modules(memo: Optional[Set[flint.nn.modules.module.Module]] = None, prefix: str = '') → Iterator[Tuple[str, flint.nn.modules.module.Module]][source]¶

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. Borrowed from: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py

Parameters

memo (Set) – a set for recording visited modules
prefix (str) – prefix to prepend to all parameter names

Yields

(string, Module) – Tuple of name and module

Note

Duplicate modules are returned only once.

named_parameters(prefix: str = '', recurse: bool = True) → Iterator[Tuple[str, flint.nn.parameter.Parameter]][source]¶

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Adapted from: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py

Parameters

prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – True: yield parameters of this module and all submodules False: yield only parameters that are direct members of this module

Yields

(string, Parameter) – Tuple containing the name and parameter

parameters(recurse: bool = True) → Iterator[flint.nn.parameter.Parameter][source]¶

Returns an iterator over module parameters, only yielding the parameter itself.

Parameters: recurse (bool) – If True, yields parameters of this module and all submodules. If False, yields only parameters that are direct members of this module.
Yields: Parameter – module parameter

register_parameter(name: str, param: Optional[flint.nn.parameter.Parameter]) → None[source]¶

Add a parameter to the module.

Parameters

name (str) – name of the parameter
param (Parameter) – parameter to be added to the module

train(mode: bool = True) → flint.nn.modules.module.Module[source]¶

Sets the module in training mode.

This has effect only on the following modules:

flint.nn.Dropout

See their documentations for details of their behaviors in training / evaluation mode.

Parameters: mode (bool, optional, default=True) – Whether to set training mode (True) or evaluation mode (False)
Returns: module
Return type: Module

training: bool¶

Containers¶

class flint.nn.modules.Sequential(*args: flint.nn.modules.module.Module)[source]¶

class flint.nn.modules.Sequential(arg: OrderedDict[str, Module])

Bases: flint.nn.modules.module.Module

A sequential container. Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in.

Activations¶

class flint.nn.modules.ReLU[source]¶

Bases: flint.nn.modules.module.Module

ReLU (Rectified Linear Unit) activation function. See flint.nn.functional.relu() for more details.

class flint.nn.modules.LeakyReLU(negative_slope: float = 0.01)[source]¶

Bases: flint.nn.modules.module.Module

Leaky ReLU activation function. See flint.nn.functional.leaky_relu() for more details.

\[\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x) \]

Parameters: negative_slope (float, optional, default=1e-2) – Controls the angle of the negative slope.

class flint.nn.modules.Sigmoid[source]¶

Bases: flint.nn.modules.module.Module

Sigmoid activation function. See flint.nn.functional.sigmoid() for more details.

\[\text{sigmoid}(x) = \frac{1}{1 + \exp(-x)} \]

class flint.nn.modules.Tanh[source]¶

Bases: flint.nn.modules.module.Module

Tanh (Hyperbolic Tangent) activation function. See flint.nn.functional.tanh() for more details.

\[\text{tanh}(x) = \frac{\sinh(x)}{\cosh(x)} = \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(-x)} \]

class flint.nn.modules.GELU[source]¶

Bases: flint.nn.modules.module.Module

Gaussian Error Linear Units (GELU) function. See flint.nn.functional.gelu() for more details.

\[\text{GELU}(x) = x \cdot \Phi(x) = x \cdot \frac{1}{2} [1 + \text{erf} (x / \sqrt{2})] \]

where \(\Phi(x)\) is the Cumulative Distribution Function for Gaussian Distribution.

We can approximate it with:

\[\text{GELU}(x) = 0.5 x (1 + \text{tanh}[ \sqrt{2 / \pi} (x + 0.044715 x^3) ]) \]

or

\[\text{GELU}(x) = x \sigma(1.702 x) \]

References

“Gaussian Error Linear Units (GELUs).” Dan Hendrycks and Kevin Gimpel. arXiv 2016.

Loss Functions¶

class flint.nn.modules.Loss(reduction: str = 'mean')[source]¶: Bases: flint.nn.modules.module.Module

class flint.nn.modules.BCELoss(reduction: str = 'mean')[source]¶

Bases: flint.nn.modules.loss.Loss

Binary Cross Entropy Loss:

\[\text{loss} = y \log(x) + (1 - y) \log(1 - x) \]

See flint.nn.functional.binary_cross_entropy() for more details.

Parameters: reduction (str, optional) – ‘none’ / ‘mean’ / ‘sum’

class flint.nn.modules.MSELoss(reduction: str = 'mean')[source]¶

Bases: flint.nn.modules.loss.Loss

Mean Squared Error Loss: \((x - y)^2\). See flint.nn.functional.mse_loss() for more details.

Parameters: reduction (str, optional) – ‘none’ / ‘mean’ / ‘sum’

class flint.nn.modules.NllLoss(reduction: str = 'mean')[source]¶

Bases: flint.nn.modules.loss.Loss

Negative Log Likelihood Loss. See flint.nn.functional.nll_loss() for more details.

Parameters: reduction (str, optional) – ‘none’ / ‘mean’ / ‘sum’

class flint.nn.modules.CrossEntropyLoss(reduction: str = 'mean')[source]¶

Bases: flint.nn.modules.loss.Loss

Cross Entropy Loss, combines softmax() and nll_loss(). See flint.nn.functional.cross_entropy() for more details.

Parameters: reduction (str, optional) – ‘none’ / ‘mean’ / ‘sum’

Convolution¶

class flint.nn.modules.Conv1d(in_channels: int, out_channels: int, kernel_size: Union[T, Tuple[T]], stride: Union[T, Tuple[T]] = 1, padding: Union[T, Tuple[T]] = 0, dilation: Union[T, Tuple[T]] = 1, bias: bool = True)[source]¶

Bases: flint.nn.modules.conv._ConvNd

Apply a 1D convolution over an input signal composed of several input planes.

input shape: (batch_size, in_channels, L_in)
output shape: (batch_size, out_channels, L_out)

where:

\[\text{L\_out} = \frac{\text{L\_in + 2 * padding - dilation * (kernel\_size - 1) - 1}}{\text{stride}} + 1 \]

Parameters

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
tuple (stride int or) – Stride of the convolution kernels as they move over the input volume
optional – Stride of the convolution kernels as they move over the input volume
default=1 – Stride of the convolution kernels as they move over the input volume
padding (int or tuple, optional, default=0) – Zero-padding added to both sides of the input
dilation (int or tuple, optional, default=1) – Spacing between kernel elements
bias (bool, optional, default=True) – Enable bias or not

class flint.nn.modules.Conv2d(in_channels: int, out_channels: int, kernel_size: Union[T, Tuple[T]], stride: Union[T, Tuple[T]] = 1, padding: Union[T, Tuple[T]] = 0, dilation: Union[T, Tuple[T]] = 1, bias: bool = True)[source]¶

Bases: flint.nn.modules.conv._ConvNd

Apply a 2D convolution over an input signal composed of several input planes. See flint.nn.functional.conv2d() for more details.

input shape: (batch_size, in_channels, h_in, w_in)
output shape: (batch_size, out_channels, h_out, w_out)

where:

\[\text{h\_out} = \frac{\text{h\_in + 2 * padding[0] - dilation[0] * (kernel\_size[0] - 1) - 1}}{\text{stride}[0]} + 1 \]

\[\text{w\_out} = \frac{\text{w\_in + 2 * padding[1] - dilation[1] * (kernel\_size[1] - 1) - 1}}{\text{stride}[1]} + 1 \]

Parameters

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple[int, int], optional, default=1) – Stride of the convolution kernels as they move over the input volume
padding (int or tuple[int, int], optional, default=0) – Zero-padding added to both sides of the input
dilation (int or tuple[int, int], optional, default=1) – Spacing between kernel elements
bias (bool, optional, default=True) – Enable bias or not

Linear¶

class flint.nn.modules.Linear(in_features: int, out_features: int, bias: bool = True)[source]¶

Bases: flint.nn.modules.module.Module

Full connected layer

\[y = x A^T + b \]

Input shape: (batch_size, in_features)
Output shape: (batch_size, out_features)

Parameters

in_features (int) – Size of each input sample.
out_features (int) – Size of each output sample.
bias (bool, optional, default=True) – Enable bias or not.

class flint.nn.modules.Identity(*args: Any, **kwargs: Any)[source]¶

Bases: flint.nn.modules.module.Module

A placeholder identity operator that is argument-insensitive.

Input shape: \((*)\), where \(*\) means any number of dimensions.
Output shape: \((*)\), same shape as the input.

Parameters

args (Any) – Any argument (unused).
kwargs (Any) – Any keyword argument (unused).

Pooling¶

class flint.nn.modules.MaxPool1d(kernel_size: Union[T, Tuple[T]], stride: Union[T, Tuple[T]] = 1, padding: Union[T, Tuple[T]] = 0, dilation: Union[T, Tuple[T]] = 1, return_indices: bool = False)[source]¶

Bases: flint.nn.modules.pooling._MaxPoolNd

Apply a 1D max pooling over an input signal composed of several input planes. See flint.nn.functional.maxpool1d() for more details.

input shape: (batch_size, in_channels, L_in)
output shape: (batch_size, out_channels, L_out)

where:

\[\text{L\_out} = \frac{\text{L\_in + 2 * padding - dilation * (kernel\_size - 1) - 1}}{\text{stride}} + 1 \]

Note

It should be noted that, PyTorch argues the input will be implicitly zero-padded when padding is non-zero in its documentation. However, in fact, it uses implicit negative infinity padding rather than zero-padding, see this issue.

In this class, zero-padding is used.

Parameters

kernel_size (_size_1_t) – Size of the sliding window, must be > 0.
stride (_size_1_t) – Stride of the window, must be > 0. Default to kernel_size.
padding (_size_1_t, optional, default=0)) – Zero-padding added to both sides of the input, must be >= 0 and <= kernel_size / 2.
dilation (_size_1_t, optional, default=1) – Spacing between the elements in the window, must be > 0
return_indices (bool, optional, default=False) – If True, will return the max indices along with the outputs

class flint.nn.modules.MaxPool2d(kernel_size: Union[T, Tuple[T]], stride: Optional[Union[T, Tuple[T]]] = None, padding: Union[T, Tuple[T]] = 0, dilation: Union[T, Tuple[T]] = 1, return_indices: bool = False)[source]¶

Bases: flint.nn.modules.pooling._MaxPoolNd

Apply a 2D max pooling over an input signal composed of several input planes. See flint.nn.functional.maxpool2d() for more details.

input shape: (batch_size, in_channels, h_in, w_in)
output shape: (batch_size, out_channels, h_out, w_out)

where:

\[\text{h\_out} = \frac{\text{h\_in + 2 * padding[0] - dilation[0] * (kernel\_size[0] - 1) - 1}}{\text{stride}[0]} + 1 \]

\[\text{w\_out} = \frac{\text{w\_in + 2 * padding[1] - dilation[1] * (kernel\_size[1] - 1) - 1}}{\text{stride}[1]} + 1 \]

Note

It should be noted that, PyTorch argues the input will be implicitly zero-padded when padding is non-zero in its documentation. However, in fact, it uses implicit negative infinity padding rather than zero-padding, see this issue.

In this class, zero-padding is used.

Parameters

kernel_size (_size_2_t) – Size of the sliding window, must be > 0.
stride (_size_2_t) – Stride of the window, must be > 0. Default to kernel_size.
padding (_size_2_t, optional, default=0) – Zero-padding added to both sides of the input, must be >= 0 and <= kernel_size / 2.
dilation (_size_2_t, optional, default=1)) – Spacing between the elements in the window, must be > 0
return_indices (bool, optional, default=False) – If True, will return the max indices along with the outputs

Unfold¶

class flint.nn.modules.Unfold(kernel_size: Union[T, Tuple[T]], stride: Union[T, Tuple[T]] = 1, padding: Union[T, Tuple[T]] = 0, dilation: Union[T, Tuple[T]] = 1)[source]¶

Bases: flint.nn.modules.module.Module

Extracts sliding local blocks from a batched input tensor. See flint.nn.functional.unfold() for more details.

input shape: \((N, C, H, W)\)
output shape: \((N, C \times \prod(\text{kernel\_size}), L)\)

where:

\[L = \prod_d \frac{\text{spatial\_size[d] + 2 * padding[d] - dilation[d] * (kernel\_size[d] - 1) - 1}}{\text{stride}[d]} + 1 \]

where \(\text{spatial\_size}\) is formed by the spatial dimensions of input (H and W above), and \(d\) is over all spatial dimensions.

Parameters

input (Tensor) – Input tensor
kernel_size (int or tuple) – Size of the sliding blocks.
stride (int or tuple, optional, default=1) – Stride of the sliding blocks in the input spatial dimensions.
padding (int or tuple, optional, default=0) – Implicit zero padding to be added on both sides of input.
dilation (int or tuple, optional, default=1) – A parameter that controls the stride of elements within the neighborhood.

Dropout¶

class flint.nn.modules.Dropout(p: float = 0.5)[source]¶

Bases: flint.nn.modules.module.Module

Dropout is used to randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution during training. Furthermore, the outputs are scaled by a factor of \(\frac{1}{1 - p}\) during training. Each channel will be zeroed out independently on every forward call.

During evaluation, the module simply computes an identity function.

This has proven to be an effective technique for regularization and preventing the co-adaptation of neurons as described in the paper [1].

See flint.nn.functional.dropout() for more details.

Parameters: p (float, optional, default=0.5) – Probability of an element to be zeroed

References

“Improving Neural Networks by Preventing Co-adaptation of Feature Detectors.” Geoffrey E. Hinton, et al. arXiv 2012.

Flatten¶

class flint.nn.modules.Flatten[source]¶

Bases: flint.nn.modules.module.Module

Flatten the input. Does not affect the batch size.

Note

If inputs are shaped (batch,) without a feature axis, then flattening adds an extra channel dimension and output shape is (batch, 1).