Commit d440f6fb authored by David Frank's avatar David Frank
Browse files

Add detailed documentation for first-order methods

parent 50eb96fb
Pipeline #778905 passed with stages
in 9 minutes and 1 second
......@@ -16,11 +16,6 @@ CG
.. doxygenclass:: elsa::CG
GradientDescent
===============
.. doxygenclass:: elsa::GradientDescent
Iterative Shrinkage-Thresholding Algorithm
====
......@@ -31,15 +26,6 @@ Fast Iterative Shrinkage-Thresholding Algorithm
.. doxygenclass:: elsa::FISTA
Nesterov's Fast Gradient Method
===============================
.. doxygenclass:: elsa::FGM
Optimized Gradient Method
=========================
.. doxygenclass:: elsa::OGM
Alternating Direction Method of Multipliers
=====
......@@ -55,3 +41,32 @@ Orthogonal Matching Pursuit
===============
.. doxygenclass:: elsa::OrthogonalMatchingPursuit
.. _elsa-solvers-api-first-order-methods:
First-order optimization algorithms
===================================
.. include:: first_order_methods.rst
.. _elsa-solvers-api-gradientdescent:
GradientDescent
###############
.. doxygenclass:: elsa::GradientDescent
.. _elsa-solvers-api-fgm:
Nesterov's Fast Gradient Method
###############################
.. doxygenclass:: elsa::FGM
.. _elsa-solvers-api-ogm:
Optimized Gradient Method
#########################
.. doxygenclass:: elsa::OGM
.. _elsa-first-order-methods-doc:
Background
##########
A short description of the pre-conditions for the algorithm will be discussed here.
For a more detailed discussion see the paper _Optimized first-order methods for smooth convex
minimization by Kim and Fessler (link at the bottom).
Intuition
*********
Let's first establish a visual intuition of momentum based gradient descent algorithms.
A nice analogy, is a ball in hilly terrain. The ball is at a random position, with zero
initial velocity. The algorithm determines the gradient of potential energy, which is the
force acting on the ball. Which in our case, is exactly the (negative) gradient of \f$f\f$.
Then the algorithm updates the velocity, which in turn updates the position of the ball.
Compared to a vanilla gradient descent, where the position is directly integrated instead of
the velocity.
Phrased differently, the velocity is a look ahead position, from where the gradient of the
current solution is applied to.
Nesterov's algorithm improves on that, by computing the gradient at the look ahead position,
instead of at the current solutions position.
Problem definition
******************
First-order algorithms solve problems of the form
.. math::
\min_{x \in \mathbb{R}^d} f(x)
with two assumptions:
- :math:`f: \mathbb{R}^d \to \mathbb{R}` is a convex continuously differentiable function
with Lipschitz continuous gradient, i.e. :math:`f \in C_{L}^{1, 1}(\mathbb{R}^d)` (with
:math:`L > 0` is the Lipschitz constant)
- The problem is solvable, i.e. there exists an optimal :math:`x^{*}`
Solvers
*******
These solvers are currently implemented in elsa:
#. :ref:`Gradient Descent <elsa-solvers-api-gradientdescent>`
#. :ref:`Nesterov's Fast Gradient Method <elsa-solvers-api-fgm>`
#. :ref:`Optimized Gradient Method <elsa-solvers-api-ogm>`
.. _elsa_solvers_index:
*******
elsa solvers
*******
......
......@@ -5,20 +5,46 @@
namespace elsa
{
/**
* @brief Class representing Nesterov's Fast Gradient Method.
*
* @author Michael Loipführer - initial code
*
* @tparam data_t data type for the domain and range of the problem, defaulting to real_t
* @brief Class implementing Nesterov's Fast Gradient Method.
*
* This class implements Nesterov's Fast Gradient Method. FGM is a first order method to
* efficiently optimize convex functions with Lipschitz-Continuous gradients.
*
* No particular stopping rule is currently implemented (only a fixed number of iterations,
* default to 100).
* @details
* # Algorithm overview #
* The algorithm repeats the following update steps for \f$i = 0, \dots, N-1\f$
* \f{align*}{
* y_{i+1} &= x_i - \frac{1}{L} f'(x_i) \\
* t_{i+1} &= \frac{1 + \sqrt{1 + 4 t^2_i}}{2} \\
* x_{i+1} &= y_{i} + \frac{t_i - 1}{t_{i+1}}(y_{i+1} - y_i)
* \f}
* The inputs are \f$f \in C_{L}^{1, 1}(\mathbb{R}^d)\f$, \f$x_0 \in \mathbb{R}^d\f$,
* \f$y_0 = x_0\f$, \f$t_0 = 1\f$
*
* The presented (and also implemented) version of the algorithm corresponds to _FGM1_ in the
* referenced paper.
*
* ## Convergence ##
* Compared to the standard gradient descent, which has a convergence rate of
* \f$\mathcal{O}(\frac{1}{N})\f$, the Nesterov's gradient method boots the convergence rate to
* \f$\mathcal{O}(\frac{1}{N}^2)\f$
*
* In the current implementation, no particular stopping rule is implemented, only a fixed
* number of iterations is used.
*
* ## References ##
* - Kim, D., Fessler, J.A. _Optimized first-order methods for smooth convex minimization_
(2016) https://doi.org/10.1007/s10107-015-0949-3
*
* @tparam data_t data type for the domain and range of the problem, defaulting to real_t
*
* @see \verbatim embed:rst
For a basic introduction and problem statement of first-order methods, see
:ref:`here <elsa-first-order-methods-doc>` \endverbatim
*
* References:
* https://doi.org/10.1007/s10107-015-0949-3
* @author
* - Michael Loipführer - initial code
* - David Frank - Detailed Documentation
*/
template <typename data_t = real_t>
class FGM : public Solver<data_t>
......
......@@ -7,13 +7,18 @@ namespace elsa
/**
* @brief Class representing a simple gradient descent solver with a fixed, given step size.
*
* @author Tobias Lasser - initial code
*
* @tparam data_t data type for the domain and range of the problem, defaulting to real_t
*
* This class implements a simple gradient descent iterative solver with a fixed, given step
* size. No particular stopping rule is currently implemented (only a fixed number of
* iterations, default to 100).
*
* @tparam data_t data type for the domain and range of the problem, defaulting to real_t
*
* @see \verbatim embed:rst
For a basic introduction and problem statement of first-order methods, see
:ref:`here <elsa-first-order-methods-doc>` \endverbatim
*
* @author
* - Tobias Lasser - initial code
*/
template <typename data_t = real_t>
class GradientDescent : public Solver<data_t>
......
......@@ -7,20 +7,44 @@ namespace elsa
/**
* @brief Class representing the Optimized Gradient Method.
*
* @author Michael Loipführer - initial code
*
* @tparam data_t data type for the domain and range of the problem, defaulting to real_t
*
* This class implements the Optimized Gradient Method as introduced by Kim and Fessler in 2016.
* OGM is a first order method to efficiently optimize convex functions with
* Lipschitz-Continuous gradients. It can be seen as an improvement on Nesterov's Fast Gradient
* Method.
*
* No particular stopping rule is currently implemented (only a fixed number of iterations,
* default to 100).
* @details
* # Algorithm overview #
* The algorithm repeats the following update steps for \f$i = 0, \dots, N-1\f$
* \f{align*}{
* y_{i+1} &= x_i - \frac{1}{L} f'(x_i) \\
* \Theta_{i+1} &=
* \begin{cases}
* \frac{1 + \sqrt{1 + 4 \Theta_i^2}}{2} & i \leq N - 2 \\
* \frac{1 + \sqrt{1 + 8 \Theta_i^2}}{2} & i \leq N - 1 \\
* \end{cases} \\
* x_{i+1} &= y_{i} + \frac{\Theta_i - 1}{\Theta_{i+1}}(y_{i+1} - y_i) +
* \frac{\Theta_i}{\Theta_{i+1}}(y_{i+1} - x_i)
* \f}
* The inputs are \f$f \in C_{L}^{1, 1}(\mathbb{R}^d)\f$, \f$x_0 \in \mathbb{R}^d\f$,
* \f$y_0 = x_0\f$, \f$t_0 = 1\f$
*
* ## Comparison to Nesterov's Fast Gradient ##
* The presented algorithm accelerates FGM by introducing an additional momentum term, which
* doesn't add a great computational amount.
*
* ## References ##
* - Kim, D., Fessler, J.A. _Optimized first-order methods for smooth convex minimization_
(2016) https://doi.org/10.1007/s10107-015-0949-3
*
* @tparam data_t data type for the domain and range of the problem, defaulting to real_t
*
* @see \verbatim embed:rst
For a basic introduction and problem statement of first-order methods, see
:ref:`here <elsa-first-order-methods-doc>` \endverbatim
*
* References:
* https://doi.org/10.1007/s10107-015-0949-3
* @author
* - Michael Loipführer - initial code
* - David Frank - Detailed Documentation
*/
template <typename data_t = real_t>
class OGM : public Solver<data_t>
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment