flint.nn.functional¶

flint.nn.functional.binary_cross_entropy(input: flint.tensor.Tensor, target: flint.tensor.Tensor, reduction: str = 'mean') → flint.tensor.Tensor[source]¶

Binary Cross Entropy Loss

\[\text{loss} = - (y \log(x) + (1 - y) \log(1 - x)) \]

Parameters

input (Tensor) – Tensor of shape (batch_size, *)
target (Tensor) – Tensor of the same shape as input
reduction (str, optional, default='mean') – ‘none’ / ‘mean’ / ‘sum’

flint.nn.functional.conv1d(input: flint.tensor.Tensor, weight: flint.tensor.Tensor, bias: Optional[flint.tensor.Tensor] = None, stride: Tuple[int] = (1,), padding: Tuple[int] = (0,), dilation: Tuple[int] = (1,))[source]¶

Apply a 1D convolution over an input signal composed of several input planes.

input shape: (batch_size, in_channels, L_in)
output shape: (batch_size, out_channels, L_out)

where:

\[\text{L\_out} = \frac{\text{L\_in + 2 * padding - dilation * (kernel\_size - 1) - 1}}{\text{stride}} + 1 \]

Parameters

input (Tensor) – Input tensor
weight (Tensor) – Weight of the conv1d layer
bias (Tensor, optional) – Bias of the conv1d layer
stride (Tuple[int], optional, default: (1, )) – Stride of the convolution
padding (Tuple[int], optional, default: (0, )) – Zero-padding added to both sides of the input
dilation (Tuple[int], optional, default: (1, )) – Spacing between kernel elements

flint.nn.functional.conv2d(input: flint.tensor.Tensor, weight: flint.tensor.Tensor, bias: Optional[flint.tensor.Tensor] = None, stride: Tuple[int] = (1, 1), padding: Tuple[int] = (0, 0), dilation: Tuple[int] = (1, 1))[source]¶

Apply a 2D convolution over an input signal composed of several input planes.

input shape: (batch_size, in_channels, h_in, w_in)
output shape: (batch_size, out_channels, h_out, w_out)

where:

\[\text{h\_out} = \frac{\text{h\_in + 2 * padding[0] - dilation[0] * (kernel\_size[0] - 1) - 1}}{\text{stride}[0]} + 1 \]

\[\text{w\_out} = \frac{\text{w\_in + 2 * padding[1] - dilation[1] * (kernel\_size[1] - 1) - 1}}{\text{stride}[1]} + 1 \]

Note

Use unfold function to perform the convolution as a single matrix multiplication. For more details, see [1].

Parameters

input (Tensor) – Input tensor
weight (Tensor) – Weight of the conv1d layer
bias (Tensor, optional) – Bias of the conv2d layer
stride (Tuple[int, int], optional, default=(1, 1)) – Stride of the convolution
padding (Tuple[int, int], optional, default=(0, 0))) – Zero-padding added to both sides of the input
dilation (Tuple[int, int], optional, default=(1, 1)) – Spacing between kernel elements

References

Why GEMM is at the heart of deep learning? Pete Warden. 2015.

flint.nn.functional.cross_entropy(input: flint.tensor.Tensor, target: flint.tensor.Tensor, reduction: str = 'mean') → flint.tensor.Tensor[source]¶

Cross Entropy Loss

Note

Combine softmax() and nll_loss(), which is DIFFERENT FROM nn.functional.cross_entropy() IN PYTORCH!

Parameters

input (Tensor) – A 2-dim (batch_size, n_classes) tensor
target (Tensor) – A 1-dim (batch_size) tensor where each value: 0 <= target[i] <= n_classes-1
reduction (str, optional, default='mean') – ‘none’ / ‘mean’ / ‘sum’

flint.nn.functional.dropout(input: flint.tensor.Tensor, p: float = 0.5, training: bool = True) → flint.tensor.Tensor[source]¶

Dropout is used to randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution during training. Furthermore, the outputs are scaled by a factor of \(\frac{1}{1 - p}\) during training. Each channel will be zeroed out independently on every forward call.

During evaluation, the module simply computes an identity function.

This has proven to be an effective technique for regularization and preventing the co-adaptation of neurons as described in the paper [1].

Parameters

p (float, optional, default=0.5) – Probability of an element to be zeroed
training (bool) – Apply dropout if is True

References

“Improving Neural Networks by Preventing Co-adaptation of Feature Detectors.” Geoffrey E. Hinton, et al. arXiv 2012.

flint.nn.functional.flatten(input: flint.tensor.Tensor) → flint.tensor.Tensor[source]¶: Flatten the input. Does not affect the batch size.

Note

If inputs are shaped (batch,) without a feature axis, then flattening adds an extra channel dimension and output shape is (batch, 1).

flint.nn.functional.gelu(input: flint.tensor.Tensor) → flint.tensor.Tensor[source]¶

Compute GELU (Gaussian Error Linear Units) [1] element-wise.

\[\text{GELU}(x) = x \cdot \Phi(x) = x \cdot \frac{1}{2} [1 + \text{erf} (x / \sqrt{2})] \]

where \(\Phi(x)\) is the Cumulative Distribution Function for Gaussian Distribution.

We can approximate it with:

\[\text{GELU}(x) = 0.5 x (1 + \text{tanh}[ \sqrt{2 / \pi} (x + 0.044715 x^3) ]) \]

or

\[\text{GELU}(x) = x \sigma(1.702 x) \]

References

“Gaussian Error Linear Units (GELUs).” Dan Hendrycks and Kevin Gimpel. arXiv 2016.

flint.nn.functional.leaky_relu(input: flint.tensor.Tensor, negative_slope: float = 0.01) → flint.tensor.Tensor[source]¶

Compute Leaky ReLU element-wise.

\[\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x) \]

Parameters: negative_slope (float, optional, default=1e-2) – Controls the angle of the negative slope.

flint.nn.functional.linear(input: flint.tensor.Tensor, weight: flint.tensor.Tensor, bias: Optional[flint.tensor.Tensor] = None)[source]¶: Apply a linear transformation to the incoming data.

\[y = x A^T + b \]

flint.nn.functional.max_pool1d(input: flint.tensor.Tensor, kernel_size: Tuple[int], stride: Tuple[int] = (1,), padding: Tuple[int] = (0,), dilation: Tuple[int] = (1,), return_indices: bool = False)[source]¶

Apply a 1D max pooling over an input signal composed of several input planes.

input shape: (batch_size, in_channels, L_in)
output shape: (batch_size, out_channels, L_out)

where:

\[\text{L\_out} = \frac{\text{L\_in + 2 * padding - dilation * (kernel\_size - 1) - 1}}{\text{stride}} + 1 \]

Note

It should be noted that, PyTorch argues the input will be implicitly zero-padded when padding is non-zero in its documentation. However, in fact, it uses implicit negative infinity padding rather than zero-padding, see this issue.

In this class, zero-padding is used.

Parameters

kernel_size (Tuple[int]) – Size of the sliding window, must be > 0.
stride (Tuple[int]) – Stride of the window, must be > 0. Default to kernel_size.
padding (Tuple[int], optional, default=0) – Zero-padding added to both sides of the input, must be >= 0 and <= kernel_size / 2.
dilation (Tuple[int], optional, default=1) – Spacing between the elements in the window, must be > 0
return_indices (bool, optional, default=False)) – If True, will return the max indices along with the outputs

flint.nn.functional.max_pool2d(input: flint.tensor.Tensor, kernel_size: Tuple[int], stride: Tuple[int], padding: Tuple[int] = (0, 0), dilation: Tuple[int] = (1, 1), return_indices: bool = False)[source]¶

Apply a 2D max pooling over an input signal composed of several input planes.

input shape: (batch_size, in_channels, h_in, w_in)
output shape: (batch_size, out_channels, h_out, w_out)

where:

\[\text{h\_out} = \frac{\text{h\_in + 2 * padding[0] - dilation[0] * (kernel\_size[0] - 1) - 1}}{\text{stride}[0]} + 1 \]

\[\text{w\_out} = \frac{\text{w\_in + 2 * padding[1] - dilation[1] * (kernel\_size[1] - 1) - 1}}{\text{stride}[1]} + 1 \]

Note

Use unfold function to perform the max pooling as a single matrix multiplication. For more details, see [1].

Note

It should be noted that, PyTorch argues the input will be implicitly zero-padded when padding is non-zero in its documentation. However, in fact, it uses implicit negative infinity padding rather than zero-padding, see this issue.

In this class, zero-padding is used.

Parameters

kernel_size (Tuple[int, int]) – Size of the sliding window, must be > 0.
stride (Tuple[int, int]) – Stride/hop of the window. Default to kernel_size.
padding (Tuple[int, int], optional, default=(0, 0)) – Zero-padding added to both sides of the input, must be >= 0 and <= kernel_size / 2.
dilation (Tuple[int, int], optional, default=(1, 1)) – Spacing between the elements in the window, must be > 0
return_indices (bool, optional, default=False) – If True, will return the max indices along with the outputs

References

Why GEMM is at the heart of deep learning? Pete Warden. 2015.

flint.nn.functional.mse_loss(input: flint.tensor.Tensor, target: flint.tensor.Tensor, reduction: str = 'mean') → flint.tensor.Tensor[source]¶

Mean Squared Error Loss \((x - y)^2\)

Parameters

input (Tensor) – Tensor of shape (batch_size, *)
target (Tensor) – Tensor of the same shape as input
reduction (str, optional, default='mean') – ‘none’ / ‘mean’ / ‘sum’

flint.nn.functional.nll_loss(input: flint.tensor.Tensor, target: flint.tensor.Tensor, reduction: str = 'mean') → flint.tensor.Tensor[source]¶

Negative Log Likelihood Loss

Note

Here I apply log() on the prediction data, which is DIFFERENT FROM nn.functional.nll_loss() IN PYTORCH!

Parameters

input (Tensor) – A 2-dim (batch_size, n_classes) tensor
target (Tensor) – A 1-dim (batch_size) tensor where each value: 0 <= target[i] <= n_classes-1
reduction (str, optional, default='mean') – ‘none’ / ‘mean’ / ‘sum’

flint.nn.functional.pad(input: flint.tensor.Tensor, pad: Tuple[int], value: int = 0) → flint.tensor.Tensor[source]¶

Pad tensor.

Parameters

input (Tensor) – N-dimensional tensor
pad (_tuple_any_t[int]) – Padding sizes, a m-elements tuple, where m/2 <= input dimensions and m is even. The padding sizes are described starting from the m/2 to last dimension to the last dimension. That is, m/2 dimensions of input will be padded.
value (int, optional, default=0) – Fill value for ‘constant’ padding

flint.nn.functional.relu(input: flint.tensor.Tensor) → flint.tensor.Tensor[source]¶: Compute ReLU (Rectified Linear Unit) element-wise.

flint.nn.functional.sigmoid(input: flint.tensor.Tensor) → flint.tensor.Tensor[source]¶: Compute Sigmoid element-wise.

\[\text{sigmoid}(x) = \frac{1}{1 + \exp(-x)} \]

flint.nn.functional.tanh(input: flint.tensor.Tensor) → flint.tensor.Tensor[source]¶: Compute Tanh (Hyperbolic Tangent) element-wise.

\[\text{tanh}(x) = \frac{\sinh(x)}{\cosh(x)} = \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(-x)} \]

flint.nn.functional.unfold(input: flint.tensor.Tensor, kernel_size: Union[T, Tuple[T]], stride: Union[T, Tuple[T]] = 1, padding: Union[T, Tuple[T]] = 0, dilation: Union[T, Tuple[T]] = 1)[source]¶

Extracts sliding local blocks from a batched input tensor.

input shape: \((N, C, H, W)\)
output shape: \((N, C \times \prod(\text{kernel\_size}), L)\)

where:

\[L = \prod_d \frac{\text{spatial\_size[d] + 2 * padding[d] - dilation[d] * (kernel\_size[d] - 1) - 1}}{\text{stride}[d]} + 1 \]

where \(\text{spatial\_size}\) is formed by the spatial dimensions of input (H and W above), and \(d\) is over all spatial dimensions.

Parameters

input (Tensor) – Input tensor
kernel_size (int or tuple) – Size of the sliding blocks.
stride (int or tuple, optional, default=1) – Stride of the sliding blocks in the input spatial dimensions.
padding (int or tuple, optional, default=0) – Implicit zero padding to be added on both sides of input.
dilation (int or tuple, optional, default=1) – A parameter that controls the stride of elements within the neighborhood.