Convexity and Minimaxity

August 10th, 2023

Table of Contents

Saddle points

Consider a two-player zero sum game, where there is a player xXRnx \in X \subseteq \R^n that wishes to maximize the value of the game and another player yYRmy \in Y \subseteq \R^m that wishes to minimize it. We can model the value of the game as a function f ⁣:X×YRf \colon X \times Y \to \R. Without any assumptions on ff, we get the following inequality, called the max-min inequality:

minxXmaxyYf(x,y)maxyYminxXf(x,y)\min_{x \in X} \max_{y \in Y} f(x, y) \geq \max_{y \in Y} \min_{x \in X} f(x, y)

We can interpret the sides of this inequality as follows. On the right hand side, yy makes the first move, followed by xx, who has knowledge of yy's move. On the left hand side, xx makes the first move, and yy follows with knowledge of xx's move. Intuitively, it is better for you if you play after your opponent, and this is what the max-min inequality says.

A key question is existence of saddle-point.

We say ff is convex-concave if f(,y) ⁣:XRf(\cdot, y) \colon X \to \R is convex for each yYy \in Y and f(x,)f(x, \cdot) is concave for each xXx \in X. [^1]

In this setting, we have the following, Von Neumann's Minimax theorem.


If XX and YY are compact convex sets and ff is convex-concave, then we have

minxXmaxyYf(x,y)=maxyYminxXf(x,y)\min_{x \in X} \max_{y \in Y} f(x, y) = \max_{y \in Y} \min_{x \in X} f(x, y)

In fact there exists a generalization of this theorem to quasiconcave functions called Sion's minimax theorem, although I will not use it any further in this post. In applications, the biggest constraint that compactness of XX and YY imposes is that those sets must be bounded.

Lagrangian duality as a saddle-point

Let f0,f1,,fm ⁣:XRf_0, f_1, \ldots, f_m \colon X \to \R be convex functions, defined on a convex domain XX. We consider convex optimization problems in the following standard form:

min f0(x)s.t. fi(x)0i{1,,m}\begin{align*} \min \ & f_0(x) \\ \text{s.t. } & f_i(x) \leq 0 \qquad \forall i \in \{1, \ldots, m\} \end{align*}

Denote the optimal value of this problem by pp^*. As is standard in convex optimization, we define a quantity, the Lagrangian L(x,λ)\mathcal{L}(x, \lambda) for λR0m\lambda \in \R_{\geq 0}^m as follows:

L(x,λ)=f0(x)+i=1mλifi(x) \mathcal{L}(x, \lambda) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x)

Observe that if fi(x)>0f_i(x) > 0 for any ii, then taking λi+\lambda_i \to +\infty we can make L(x,λ)\mathcal{L}(x, \lambda) arbitrarily large. Thus we may actually think of the optimization problem above as a two-player game, where xx (the 'primal player') plays first trying to minimize the optimization problem, and λ\lambda (the 'dual player') plays second trying to maximize it. Then we have

p=minxXmaxλ0L(x,λ)by abovemaxλ0minxXL(x,λ)max-min inequality=d \begin{align*} p^* &= \min_{x \in X} \max_{\lambda \geq 0} \mathcal{L}(x, \lambda) & \text{by above}\\ &\geq \max_{\lambda \geq 0} \min_{x \in X} \mathcal{L}(x, \lambda) & \text{max-min inequality}\\ &= d^* \end{align*}

In the last line, we define dd^* to be the value of maxλ0minxXL(x,λ)\max_{\lambda \geq 0} \min_{x \in X} \mathcal{L}(x, \lambda), the Lagrangian dual optimization problem. In contrast, we call the original optimization problem the primal. By applying the max-min inequality, we have just shown weak duality, that pdp^* \geq d^*.

In convex optimization, we have a very famous form of a minimax theorem (although for some reason, I almost never see it called that). It is Slater's condition:


If there exists xXx \in X such that fi(x)<0f_i(x) < 0 for all i{1,,m}i \in \{1, \ldots, m\}, then p=dp^* = d^*. For any i{1,,m}i \in \{1, \ldots, m\}, if fif_i is linear, we only require fi(x)0f_i(x) \leq 0.

In convex optimization the situation where p=dp^* = d^* is called strong duality, but I really think of it as just a minimax theorem. This is really clear when we write the conclusion to match the von Neumann minimax theorem:

minxXmaxλ0L(x,λ)=maxλ0minxXL(x,λ) \min_{x \in X} \max_{\lambda \geq 0} \mathcal{L}(x, \lambda) = \max_{\lambda \geq 0} \min_{x \in X} \mathcal{L}(x, \lambda)

Despite the obvious similarity in the statement of the theorem, this theorem is not directly comparable to von Neumann's minimax theorem. We can immediately observe the following differences:

  1. L(x,λ)\mathcal{L}(x, \lambda) is a convex-linear function, not just convex-concave. Linearity implies concavity, so this condition is stronger.
  2. Y=R0mY = \R_{\geq 0}^m is fixed in Slater's condition, while in von Neumann's minimax theorem it can be any compact convex set.
  3. We place a condition on XX that is often merely technical in applications: that there exists some point xXx \in X that is strictly feasible with respect to the constraints f1,,fmf_1, \ldots, f_m.
  4. XX is allowed to be any convex set, namely, it need not be compact (i.e. bounded). YY is not bounded either!

I find it quite interesting, though, that given we also require the first 3 conditions to be true (the last of which does not usually cause trouble), we get an 'unbounded' version of the minimax theorem.

Quote the following table from [LJJ20].

SettingGradient complexityLower bound

κx\kappa_x is the strong-convexity condition number.

Stranger duals: Turning Lagrangian duality into von Neumann's minimax theorem

To be continued