In this blog post, I will show the equivalence of five LQR formulations. They are very common in today’s control theory community (e.g., policy optimization, data-driven control, etc). However, it may not be very clear at a first glance. I do not know any material that summarizes all of them, so I’m happy if it can help you.

Here I focus on discrete-time systems. We can show the continuous-time case almost in the same way. I assume a basic knowledge of the infinite horizon LQR and discrete-time systems.

Throughout, I make the following assumptions:

<aside> 💡

Assumptions:

$(A,B)$ stabilizable, $(Q,A)$ detectable, $Q\succeq0,\,R\succ0,$ and $W\succ 0$.
Control input $u_t$ is generated by a linear static policy $u_t=Kx_t$ </aside>

Well, the matrix $W$may have a different meaning in each formulation. We can relax as $W\succeq 0,$ but I assume this for simplicity.

The second assumption is reasonable because we know the LQR optimal control is linear and static! This form facilitates a policy optimization approach, which became an active research subject after Fazel et al. This form is famous for the remarkable gradient dominance property!

Five different formulations

First, let me describe the following five formulations:

Random initial point formulation
Matrix optimization form 1
Matrix optimization form 2
Stochastic process noise formulation
Frequency domain form

All of them are often used in many papers. In particular, we may find all of them when we read recent policy optimization papers. This somehow indicates the theoretical richness!

I denote $\langle A,B\rangle =\mathbf{Tr}(A^\mathsf{T}B)$.

F1. Random initial point formulation

In this form, we use a random vector as the initial point. The dynamic does not contain a process noise.

System:
- $x_{t+1}=Ax_t+Bu_t$
- $u_t = Kx_t$
- Initial point $x_0$ is a random vector satisfying $\mathbb{E}\left[x_0\right]=0, \,\mathbb{E}\left[x_0x_0^\top\right]=W\succeq0.$
Optimization problem:
- Cost:
$$ J(K):=\mathbb{E}\left[\sum_{t=0}^{\infty}\left(x_t^{\top} Q x_t+u_t^{\top} R u_t\right)\right] $$
- Constraint: Schur stability of $A+BK$