Written by Yuto Watanabe https://watanabeyuto.github.io/ (Feb 7th, 2026)
I recommend the light mode when reading this article.
In this blog post, I will show the equivalence of five LQR formulations. They are very common in today’s control theory community (e.g., policy optimization, data-driven control, etc). However, it may not be very clear at a first glance. I do not know any material that summarizes all of them, so I’m happy if it can help you.
Here I focus on discrete-time systems. We can show the continuous-time case almost in the same way. I assume a basic knowledge of the infinite horizon LQR and discrete-time systems.
Throughout, I make the following assumptions:
<aside> 💡
Assumptions:
Well, the matrix $W$may have a different meaning in each formulation. We can relax as $W\succeq 0,$ but I assume this for simplicity.
The second assumption is reasonable because we know the LQR optimal control is linear and static! This form facilitates a policy optimization approach, which became an active research subject after Fazel et al. This form is famous for the remarkable gradient dominance property!
First, let me describe the following five formulations:
All of them are often used in many papers. In particular, we may find all of them when we read recent policy optimization papers. This somehow indicates the theoretical richness!
I denote $\langle A,B\rangle =\mathbf{Tr}(A^\mathsf{T}B)$.