flint.optim¶
- class flint.optim.Optimizer(params=None, lr: float = 0.01, weight_decay: float = 0.0)[source]¶
Bases:
object
Base class for all optimizers.
- Parameters
params (iterable) – An iterable of Tensor
lr (float, optional, default=0.01) – Learning rate
weight_decay (float, optional, default=0.) – Weight decay (L2 penalty)
- class flint.optim.SGD(params=None, lr: float = 0.01, momentum: float = 0.0, nesterov: bool = False, weight_decay: float = 0.0)[source]¶
Bases:
flint.optim.optimizer.Optimizer
Implementation of Stochastic Gradient Descent (optionally with momentum).
\[v_{t+1} = \mu \cdot v_t + g_{t+1} \]\[\theta_{t+1} = \theta_t - \text{lr} \cdot v_{t+1} \]where \(\theta\), \(g\), \(v\) and \(\mu\) denote the parameters, gradient, velocity, and momentum respectively.
- Parameters
params (iterable) – An iterable of Tensor
lr (float, optional, default=0.01) – Learning rate
momentum (float, optional, default=0.) – Momentum factor
nesterov (bool, optional, default=False) – Enable Nesterov momentum or not
weight_decay (float, optional, default=0) – Weight decay (L2 penalty)
- class flint.optim.Adadelta(params=None, rho: float = 0.99, eps: float = 1e-06, lr: float = 1.0, weight_decay: float = 0.0)[source]¶
Bases:
flint.optim.optimizer.Optimizer
Implementation of Adadelta algorithm proposed in [1].
\[h_t = \rho h_{t-1} + (1 - \rho) g_t^2 \]\[g'_t = \sqrt{\frac{\Delta \theta_{t-1} + \epsilon}{h_t + \epsilon}} \cdot g_t \]\[\Delta \theta_t = \rho \Delta \theta_{t-1} + (1 - \rho) (g'_t)^2 \]\[\theta_t = \theta_{t-1} - g'_t \]where \(h\) is the moving average of the squared gradients, \(\epsilon\) is for improving numerical stability.
- Parameters
params (iterable) – An iterable of Tensor
rho (float, optional, default=0.9) – Coefficient used for computing a running average of squared gradients
eps (float, optional, default=1e-6) – Term added to the denominator to improve numerical stability
lr (float, optional, default=1.0) – Coefficient that scale delta before it is applied to the parameters
weight_decay (float, optional, default=0) – Weight decay (L2 penalty)
References
- class flint.optim.Adagrad(params=None, lr: float = 0.01, eps: float = 1e-10, weight_decay: float = 0.0)[source]¶
Bases:
flint.optim.optimizer.Optimizer
Implementation of Adagrad algorithm proposed in [1].
\[h_t = h_{t-1} + g_t^2 \]\[\theta_{t+1} = \theta_t - \frac{\text{lr}}{\sqrt{h_t + \epsilon}} \cdot g_t \]- Parameters
params (iterable) – An iterable of Tensor
lr (float, optional, default=0.01) – Learning rate
eps (float, optional, default=1e-10) – Term added to the denominator to improve numerical stability
weight_decay (float, optional, default=0)) – Weight decay (L2 penalty)
References
“Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” John Duchi, et al. JMRL 2011.
- class flint.optim.Adam(params=None, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0.0)[source]¶
Bases:
flint.optim.optimizer.Optimizer
Implementation of Adam algorithm proposed in [1].
\[v_t = \beta_1 v_{t-1} + (1 - \beta_1) g_t \]\[h_t = \beta_2 h_{t-1} + (1 - \beta_2) g_t^2 \]Bias correction:
\[\hat{v}_t = \frac{v_t}{1 - \beta_1^t} \]\[\hat{h}_t = \frac{h_t}{1 - \beta_2^t} \]Update parameters:
\[\theta_t = \theta_{t-1} - \text{lr} \cdot \frac{\hat{v}_t}{\sqrt{\hat{h}_t + \epsilon}} \]- Parameters
params (iterable) – An iterable of Tensor
lr (float, optional, default=1e-3) – Learning rate
betas (Tuple[float, float], optional, default=(0.9, 0.999)) – Coefficients used for computing running averages of gradient and its square
eps (float, optional, default=1e-8) – Term added to the denominator to improve numerical stability
weight_decay (float, optional, default=0) – Weight decay (L2 penalty)
References
“Adam: A Method for Stochastic Optimization.” Diederik P. Kingma and Jimmy Ba. ICLR 2015.
- class flint.optim.RMSprop(params=None, lr: float = 0.01, alpha: float = 0.99, eps: float = 1e-08, weight_decay: float = 0.0)[source]¶
Bases:
flint.optim.optimizer.Optimizer
Implementation of RMSprop algorithm proposed in [1].
\[h_t = \alpha h_{t-1} + (1 - \alpha) g_t^2 \]\[\theta_{t+1} = \theta_t - \frac{\text{lr}}{\sqrt{h_t + \epsilon}} \cdot g_t \]- Parameters
params (iterable)) – An iterable of Tensor
lr (float, optional, default=0.01)) – Learning rate
alpha (float, optional, default=0.99) – Coefficient used for computing a running average of squared gradients
eps (float, optional, default=1e-8) – Term added to the denominator to improve numerical stability
weight_decay (float, optional, default=0) – Weight decay (L2 penalty)
References