Kalman Filter Tutorial- Derivation of the Kalman Filter by Using the Recursive Least-Squares Method – Fusion of Engineering, Control, Coding, Machine Learning, and Science

In this tutorial, we present a simple derivation of the Kalman filter equations. Before reading this tutorial, we advise you to read this tutorial on the recursive least squares method, and this tutorial on mean and covariance matrix propagation through the system dynamics. In the follow-up post which can be found here, we explain how to implement the Kalman filter in Python. The YouTube video accompanying this post is given below.

Model and Model Assumptions

We are considering the following state-space model of a dynamical system:

(1) $\begin{align*}\mathbf{x}_{k}& =A_{k-1}\mathbf{x}_{k-1}+B_{k-1}\mathbf{u}_{k-1}+\mathbf{w}_{k-1} \\\mathbf{y}_{k}& = C_{k}\mathbf{x}_{k}+\mathbf{v}_{k}\end{align*}$

where

$k$ is the discrete-time instant
$\mathbf{x}_{k}\in \mathbb{R}^{n}$ is the state vector at the discrete time instant $k$
$\mathbf{u}_{k-1}\in \mathbb{R}^{m}$ is the control input vector
$\mathbf{w}_{k-1}\in \mathbb{R}^{n}$ is the disturbance or process noise vector. We assume that $\mathbf{w}_{k}$ is white, zero mean, uncorrelated, with the covariance matrix given by $Q_{k}=E[\mathbf{w}_{k}\mathbf{w}_{k}^{T}]$ , where $E[\cdot]$ is the mathematical expectation operator.
$A_{k-1}\in \mathbb{R}^{n\times n}$ and $B_{k-1}\in \mathbb{R}^{n\times m}$ are state and input matrices
$C_{k}\in \mathbb{R}^{r\times n}$ is the output matrix
$\mathbf{y}_{k}\in \mathbb{R}^{r}$ is the output vector (observed measurements)
$\mathbf{v}_{k}\in \mathbb{R}^{r}$ is the measurement noise vector. We assume that $\mathbf{v}_{k}$ is white, zero mean, uncorrelated, with the covariance matrix given by $R_{k}=E[\mathbf{v}_{k}\mathbf{v}_{k}^{T}]$

This is a time-varying system. However, if the system is time-invariant, then all the system matrices are constant. We assume that the covariance matrices $R_{k}$ , $Q_{k}$ , as well as system matrices $A_{k}$ , $B_{k}$ , and $C_{k}$ are known. The goal is to design an observer (estimator) that will estimate the state vector $\mathbf{x}_{k}$ .

A priori and a posteriori state estimates, errors, and covariance matrices

In Kalman filtering, we have two important state estimates: a priori state estimates and a posteriori state estimates. Apart from this post and Kalman filtering, a priori and a posteriori are Latin phrases whose meaning is explained here.

The a priori state estimate of the state vector $\mathbf{x}_{k}$ is obtained implicitly on the basis of the output vector measurements $\mathbf{y}_{1},\mathbf{y}_{2},\ldots, \mathbf{y}_{k-1}$ (as well as on some other information that is not relevant for this discussion). That is, the a priori state estimate is obtained on the basis of the past measurements up to the current discrete-time instant $k$ and NOT on the basis of the measurement $\mathbf{y}_{k}$ at the current discrete-time instant $k$ . The a priori state estimate is denoted by

(2) $\begin{align*}\hat{\mathbf{x}}_{k}^{-} \;\; -\;\; \text{a priori state estimate}\end{align*}$

where “the hat notation” denotes an estimate, and where the minus superscript denotes the a priori state estimate. The minus superscript originates from the fact that this estimate is obtained before we process the measurement $\mathbf{y}_{k}$ at the time instant $k$ .

The a posteriori estimate of the state $\mathbf{x}_{k}$ is obtained implicitly on the basis of the measurements $\mathbf{y}_{1},\mathbf{y}_{2},\ldots, \mathbf{y}_{k-1},\mathbf{y}_{k}$ . The a posteriori estimate is denoted by

(3) $\begin{align*}\hat{\mathbf{x}}_{k}^{+} \;\; -\;\; \text{a posteriori state estimate}\end{align*}$

where the plus superscript in the state estimate notation denotes the fact that the a posteriori estimate is obtained by processing the measurement $\mathbf{y}_{k}$ obtained at the time instant $k$ .

In our previous post, we explained how to derive a recursive least squares method from scratch. We defined and derived the expression for the covariance matrices of estimation errors in the context of recursive least squares filtering. Here, we also need to introduce covariance matrices of the a priori and a posteriori estimation errors. The covariance matrices of the a priori and a posteriori estimation errors are defined as follows

(4) $\begin{align*}P_{k}^{-}=E[(\mathbf{x}_{k}-\mathbf{x}_{k}^{-})(\mathbf{x}_{k}-\mathbf{x}_{k}^{-})^{T}] \\P_{k}^{+}=E[(\mathbf{x}_{k}-\mathbf{x}_{k}^{+})(\mathbf{x}_{k}-\mathbf{x}_{k}^{+})^{T}]\end{align*}$

where $P_{k}^{-}$ is the covariance matrix of the a priori estimation error and $P_{k}^{+}$ is the covariance matrix of the a posteriori estimation error.

Derivation of Kalman Equations from the Recursive-Least Squares Method

The following figure illustrates how the Kalman filter works in practice.

Figure 1: Illustration of Kalman filtering equations.

For the time being it is beneficial to briefly explain this figure. Even if some concepts are not completely clear, they will become clear by the end of this post. Immediately after the discrete-time instant $k-1$ , we compute the a posteriori state estimate $\mathbf{x}_{k-1}^{+}$ and the covariance matrix of the estimation error $P_{k-1}^{+}$ . Then, we propagate these quantities through the equations describing the system dynamics and covariance matrix propagation (that is also derived on the basis of the system dynamics). That is, we propagate the a posteriori state and covariance through our model. By propagating these quantities, we obtain the a priori state estimate $\mathbf{x}_{k}^{-}$ and the covariance matrix of the estimation error $P_{k}^{-}$ for the time step $k$ . Then, after the measurement vector $\mathbf{y}_{k}$ is observed at the discrete-time step $k$ , we use this measurement and the recursive-least squares method to compute the a posteriori estimate $\mathbf{x}_{k}^{+}$ and the covariance matrix of the estimation error $P_{k}^{+}$ for the time instant $k$ .

Now that we have a general idea of how Kalman filtering works, let us now derive the Kalman filter equations. At the initial time instant $k=0$ , we need to choose an initial guess of the state estimate. Let this chosen initial guess of the state estimate be denoted by $\hat{\mathbf{x}}_{0}$ . This will be our initial a posteriori state estimate, that is

(5) $\begin{align*}\hat{\mathbf{x}}_{0}^{+}=\hat{\mathbf{x}}_{0}\end{align*}$

Then, the question is how to compute the a priori estimate $\hat{\mathbf{x}}_{1}^{-}$ for the time instant $1$ , before the measurements are taken?

The natural answer is that the system states, as well as the estimates, need to satisfy the system dynamics (1), and consequently,

(6) $\begin{align*}\hat{\mathbf{x}}_{1}^{-}& =A_{0}\hat{\mathbf{x}}_{0}^{+}+B_{0}\mathbf{u}_{0}\end{align*}$

where we excluded the disturbance part since the disturbance vector is not known. Besides the initial guess of the estimate, we also need to select an initial guess of the covariance matrix of the estimation error. That is, we need to select $P_{0}^{+}$ . If we have perfect knowledge about the initial state, then we select $P_{0}^{+}$ as a zero matrix, that is $P_{0}^{+}=0\cdot I$ , where $I$ is an identity matrix. This is because the covariance matrix of the estimation error is the measure of uncertainty, and if we have perfect knowledge, then the measure of uncertainty should be zero. On the other hand, if we do not have any a priori knowledge of the initial state, then $P_{0}^{+}=c\cdot I$ , where $c$ is a large number. Next, we need to compute the covariance matrix of the a priori state estimation error, that is, we need to compute $P_{1}^{-}$ . In our previous post, which can be found here, we derived the expression for the time propagation of the covariance matrix of the state estimation error:

(7) $\begin{align*}P_{k}=A_{k-1}P_{k-1}A_{k-1}^{T}+Q_{k-1}\end{align*}$

By using this expression, we obtain the following equation for $P_{1}^{-}$

(8) $\begin{align*}P_{1}^{-}=A_{0}P_{0}^{+}A_{0}^{T}+Q_{0}\end{align*}$

So this is the best we can do at the initial time step $k=0$ before we obtain measurement information. We computed the a priori estimate $\mathbf{x}_{1}^{-}$ and the covariance matrix $P_{1}^{-}$ for the time step $k=1$ , and before the measurement $\mathbf{y}_{1}$ at the time step $k=1$ has arrived.

At the time instant $k=1$ , the measurement $\mathbf{y}_{1}$ , arrives. Then, the natural question is how to update the a priori estimate $\hat{\mathbf{x}}_{1}^{-}$ and the covariance matrix $P_{1}^{-}$ ?

To update these quantities, we need to use the recursive least-squares method. We derived the equations of the recursive least-squares method in our previous post, which can be found here. Here are the recursive least-squares equations:

Update the gain matrix $K_{k}$ :
(9) $\begin{align*}K_{k}=P_{k-1}C_{k}^{T}(R_{k}+C_{k}P_{k-1}C_{k}^{T})^{-1}\end{align*}$
Update the estimate:
(10) $\begin{align*}\hat{\mathbf{x}}_{k}= \hat{\mathbf{x}}_{k-1}+K_{k}(\mathbf{y}_{k}-C_{k}\hat{\mathbf{x}}_{k-1}) \end{align*}$
Propagate the covariance matrix by using this equation:
(11) $\begin{align*}P_{k}=(I-K_{k}C_{k})P_{k-1}(I-K_{k}C_{k})^{T}+K_{k}R_{k}K_{k}^{T}\end{align*}$

or this equation
(12) $\begin{align*}P_{k}=(I-K_{k}C_{k})P_{k-1}\end{align*}$

Now, the question is how to use these equations to derive the Kalman filter equations? First, we perform the following substitutions in the recursive least-squares equations (these substitutions will be explained later in the text):

(13) $\begin{align*}P_{k-1}\rightarrow P_{k}^{-} \end{align*}$

(14) $\begin{align*}P_{k}\rightarrow P_{k}^{+} \end{align*}$

(15) $\begin{align*}\hat{\mathbf{x}}_{k-1} \rightarrow \hat{\mathbf{x}}_{k}^{-}\end{align*}$

(16) $\begin{align*}\hat{\mathbf{x}}_{k} \rightarrow \hat{\mathbf{x}}_{k}^{+}\end{align*}$

Here is the main idea behind these substitutions. Let us first focus on the equation (13). In the case of the recursive least squares method, the covariance before the measurement arrives is $P_{k-1}$ . In the case of the Kalman filter, $P_{k-1}$ becomes $P_{k}^{-}$ , since $P_{k}^{-}$ is the covariance before the measurement arrived. Next, let us focus on (14). In the case of the recursive least squares method, the covariance after the measurement arrives is $P_{k}$ . In the case of the Kalman filter, $P_{k}$ becomes $P_{k}^{+}$ , since $P_{k}^{+}$ is the updated covariance after the measurement arrived. Next, let us focus on (15). In the case of the recursive least squares, $\hat{\mathbf{x}}_{k-1}$ is the state estimate before the measurement $\mathbf{y}_{k}$ arrived. Consequently, in the case of the Kalman filter, $\hat{\mathbf{x}}_{k-1}$ becomes the a priori estimate $\hat{\mathbf{x}}_{k}^{-}$ , since $\hat{\mathbf{x}}_{k}^{-}$ does not take into account the measurement $\mathbf{y}_{k}$ . Finally, let us focus on (16). In the case of the recursive least squares method, $\hat{\mathbf{x}}_{k}$ is the state estimate that takes into account the measurement $\mathbf{y}_{k}$ . Consequently, in the case of the Kalman filter, $\hat{\mathbf{x}}_{k}$ becomes $\hat{\mathbf{x}}_{k}^{+}$ , since $\hat{\mathbf{x}}_{k}^{+}$ takes into account the measurement $\mathbf{y}_{k}$ obtained at the time instant $k$ .

After these substitutions, we obtain the update equations:

(17) $\begin{align*}K_{k}=P_{k}^{-}C_{k}^{T}(R_{k}+C_{k}P_{k}^{-}C_{k}^{T})^{-1} \\\end{align*}$

(18) $\begin{align*}\hat{\mathbf{x}}_{k}^{+}= \hat{\mathbf{x}}_{k}^{-}+K_{k}(\mathbf{y}_{k}-C_{k}\hat{\mathbf{x}}_{k}^{-}) \end{align*}$

(19) $\begin{align*}P_{k}^{+}&=(I-K_{k}C_{k})P_{k}^{-}(I-K_{k}C_{k})^{T}+K_{k}R_{k}K_{k}^{T} \\P_{k}^{+}& =(I-K_{k}C_{k})P_{k}^{-}\end{align*}$

Now, let us go back at the time instant $k=1$ . By using the newly established update equations, we update $\mathbf{x}_{1}^{-}$ and $P_{1}^{-}$ , as follows

(20) $\begin{align*}K_{1} & =P_{1}^{-}C_{1}^{T}(R_{1}+C_{1}P_{1}^{-}C_{0}^{T})^{-1} \\\hat{\mathbf{x}}_{1}^{+} & = \hat{\mathbf{x}}_{1}^{-}+K_{1}(\mathbf{y}_{1}-C_{1}\hat{\mathbf{x}}_{1}^{-}) \\P_{1}^{+}&=(I-K_{1}C_{1})P_{1}^{-}(I-K_{1}C_{1})^{T}+K_{1}R_{1}K_{1}^{T} \\P_{1}^{+}& =(I-K_{1}C_{1})P_{1}^{-}\end{align*}$

As the result, we obtained $P_{1}^{+}$ and the a posteriori estimate $\hat{\mathbf{x}}_{1}^{+}$ . Next, similarly to the initial step, we propagate $\hat{\mathbf{x}}_{1}^{-}$ through the system dynamics (model of the system), and $P_{1}^{+}$ through covariance propagation equation:

(21) $\begin{align*}\hat{\mathbf{x}}_{2}^{-}& =A_{1}\mathbf{x}_{1}^{+}+B_{1}\mathbf{u}_{1} \\P_{2}^{-}&=A_{1}P_{1}^{+}A_{1}^{T}+Q_{1}\end{align*}$

to obtain $\hat{\mathbf{x}}_{2}^{-}$ and $P_{2}^{-}$ . After this step, we compute the a posteriori estimate and the covariance matrix of the estimation error by using Eqs. (17), (18), and (19) for $k=2$ . We learned that the two main steps of the Kalman filter are:

Before the measurement arrives, use the system dynamics to propagate the a posteriori estimate and covariance matrix of the estimation error from the step $k-1$ , to obtain the a priori estimate and covariance matrix of the estimation error for the step $k$ .
At the time step $k$ , use the recursive least squares equations, and the obtained measurement, to compute the a posteriori estimate and the covariance matrix of the estimation error.

This is illustrated in Fig. 2 (the same as Fig. 1 which is repeated here for completeness).

Figure 2: Graphical explanation of the Kalman filter (same as Fig. 1).

Let us now summarize the Kalman filter equations.

Summary of the derived Kalman filter equations

In the initial step, we select and initial estimate $\hat{\mathbf{x}}_{0}^{+}$ and the covariance matrix of the estimation error $P_{0}^{+}$ . Then, for $k=1,2,3,\ldots$ , we perform steps 1 and 2 summarized below

Step 1: Propagate the a posteriori estimate $\hat{\mathbf{x}}_{k-1}^{+}$ and the covariance matrix of the estimation error $P_{k-1}^{+}$ , through the system dynamics to obtain the a priori state estimate and the covariance matrix of the estimation error for the step $k$ :

(22) $\begin{align*}\hat{\mathbf{x}}_{k}^{-}& =A_{k-1}\mathbf{x}_{k-1}^{+}+B_{k-1}\mathbf{u}_{k-1} \\P_{k}^{-}& =A_{k-1}P_{k-1}^{+}A_{k-1}^{T}+Q_{k-1}\end{align*}$

Step 2: Obtain the measurement $\mathbf{y}_{k}$ . Use this measurement, and the a priori estimate $\hat{\mathbf{x}}_{k}^{-}$ and the covariance matrix of the estimation error $P_{k}^{-}$ to compute the a posteriori state estimate and the covariance matrix of the estimation error

(23) $\begin{align*}K_{k} & =P_{k}^{-}C_{k}^{T}(R_{k}+C_{k}P_{k}^{-}C_{k}^{T})^{-1} \\\hat{\mathbf{x}}_{k}^{+} & = \hat{\mathbf{x}}_{k}^{-}+K_{k}(\mathbf{y}_{k}-C_{k}\hat{\mathbf{x}}_{k}^{-}) \\P_{k}^{+}&=(I-K_{k}C_{k})P_{k}^{-}(I-K_{k}C_{k})^{T}+K_{k}R_{k}K_{k}^{T} \\P_{k}^{+}& =(I-K_{k}C_{k})P_{k}^{-} \end{align*}$

Here are a few comments about the derived Kalman filter equations. The matrix $K_{k}$ is called the Kalman filter gain matrix. Then, the equation for the covariance update

(24) $\begin{align*}P_{k}^{+}&=(I-K_{k}C_{k})P_{k}^{-}(I-K_{k}C_{k})^{T}+K_{k}R_{k}K_{k}^{T}\end{align*}$

is more numerically stable and robust than the alternative equation

(25) $\begin{align*}P_{k}^{+}& =(I-K_{k}C_{k})P_{k}^{-} \end{align*}$

Consequently, the equation (24) is preferable for computing the covariance update. Another important thing to mention is that the Kalman filter gain $K_{k}$ does not depend on the measurements $\mathbf{y}_{k}$ . This is because the calculations of $P_{k}^{-}$ , $P_{k}^{+}$ , and $K_{k}$ only depend on the system matrices $A_{k-1}$ , $C_{k-1}$ , and $Q_{k}$ and $R_{k}$ . This means that the Kalman filter matrices $K_{k}$ , for $k=1,2,3,\ldots$ can be precomputed before the Kalman filter starts to operate online. This is important for saving computational time in embedded systems.

Model and Model Assumptions

A priori and a posteriori state estimates, errors, and covariance matrices

Derivation of Kalman Equations from the Recursive-Least Squares Method

Summary of the derived Kalman filter equations

admin

You might also like

Introduction to Subspace System Identification with Python- System Identification Tutorial

Tutorial on Matrix Operations in Python by Using Numpy Matrix Libary

Control Systems Lecture: From Differential Equations to Transfer Functions and Response Simulation in MATLAB