## Table of Contents

- Saddle points
- Lagrangian duality as a saddle-point
- Stranger duals: Turning Lagrangian duality into von Neumann's minimax theorem

## Saddle points

Consider a two-player zero sum game, where there is a player $x \in X \subseteq \R^n$ that wishes to maximize the value of the game and another player $y \in Y \subseteq \R^m$ that wishes to minimize it. We can model the value of the game as a function $f \colon X \times Y \to \R$. Without any assumptions on $f$, we get the following inequality, called the max-min inequality:

We can interpret the sides of this inequality as follows. On the right hand side, $y$ makes the first move, followed by $x$, who has knowledge of $y$'s move. On the left hand side, $x$ makes the first move, and $y$ follows with knowledge of $x$'s move. Intuitively, it is better for you if you play after your opponent, and this is what the max-min inequality says.

A key question is existence of saddle-point.

We say $f$ is *convex-concave* if
$f(\cdot, y) \colon X \to \R$ is convex for each $y \in Y$ and $f(x, \cdot)$ is concave for each $x \in X$.
[^1]

In this setting, we have the following, Von Neumann's Minimax theorem.

If $X$ and $Y$ are compact convex sets and $f$ is convex-concave, then we have

In fact there exists a generalization of this theorem to quasiconcave functions called Sion's minimax theorem, although I will not use it any further in this post. In applications, the biggest constraint that compactness of $X$ and $Y$ imposes is that those sets must be bounded.

## Lagrangian duality as a saddle-point

Let $f_0, f_1, \ldots, f_m \colon X \to \R$ be convex functions, defined on a convex domain $X$. We consider convex optimization problems in the following standard form:

Denote the optimal value of this problem by $p^*$.
As is standard in convex optimization, we define a quantity, the *Lagrangian* $\mathcal{L}(x, \lambda)$ for $\lambda \in \R_{\geq 0}^m$
as follows:

Observe that if $f_i(x) > 0$ for any $i$, then taking $\lambda_i \to +\infty$ we can make $\mathcal{L}(x, \lambda)$ arbitrarily large. Thus we may actually think of the optimization problem above as a two-player game, where $x$ (the 'primal player') plays first trying to minimize the optimization problem, and $\lambda$ (the 'dual player') plays second trying to maximize it. Then we have

In the last line, we define $d^*$ to be the value of $\max_{\lambda \geq 0} \min_{x \in X} \mathcal{L}(x, \lambda)$,
the *Lagrangian dual* optimization problem.
In contrast, we call the original optimization problem the *primal*.
By applying the max-min inequality, we have just shown *weak duality*, that $p^* \geq d^*$.

In convex optimization, we have a very famous form of a minimax theorem (although for some reason, I almost never see it called that). It is Slater's condition:

If there exists $x \in X$ such that $f_i(x) < 0$ for all $i \in \{1, \ldots, m\}$, then $p^* = d^*$. For any $i \in \{1, \ldots, m\}$, if $f_i$ is linear, we only require $f_i(x) \leq 0$.

In convex optimization the situation where $p^* = d^*$ is called *strong duality*, but I really think of it as just a minimax theorem.
This is really clear when we write the conclusion to match the von Neumann minimax theorem:

Despite the obvious similarity in the statement of the theorem, this theorem is not directly comparable to von Neumann's minimax theorem. We can immediately observe the following differences:

- $\mathcal{L}(x, \lambda)$ is a convex-
*linear*function, not just convex-concave. Linearity implies concavity, so this condition is stronger. - $Y = \R_{\geq 0}^m$ is fixed in Slater's condition, while in von Neumann's minimax theorem it can be any compact convex set.
- We place a condition on $X$ that is often merely technical in applications: that there exists some point $x \in X$ that is strictly feasible with respect to the constraints $f_1, \ldots, f_m$.
- $X$ is allowed to be any convex set, namely, it need not be compact (i.e. bounded). $Y$ is not bounded either!

I find it quite interesting, though, that given we also require the first 3 conditions to be true (the last of which does not usually cause trouble), we get an 'unbounded' version of the minimax theorem.

Quote the following table from [LJJ20].

Setting | Gradient complexity | Lower bound |
---|---|---|

Strongly-Convex-Linear | $O(\sqrt{\kappa_x/\varepsilon})$ | ? |

Strongly-Convex-Concave | $\widetilde{O}(\sqrt{\kappa_x/\varepsilon})$ | $\widetilde{\Omega}(\sqrt{\kappa_x/\varepsilon})$ |

$\kappa_x$ is the strong-convexity condition number.

## Stranger duals: Turning Lagrangian duality into von Neumann's minimax theorem

To be continued