January 7, 2025

Kalman Filter Tutorial- Derivation of the Kalman Filter by Using the Recursive Least-Squares Method


In this tutorial, we present a simple derivation of the Kalman filter equations. Before reading this tutorial, we advise you to read this tutorial on the recursive least squares method, and this tutorial on mean and covariance matrix propagation through the system dynamics. In the follow-up post which can be found here, we explain how to implement the Kalman filter in Python. The YouTube video accompanying this post is given below.

Model and Model Assumptions

We are considering the following state-space model of a dynamical system:

(1)   \begin{align*}\mathbf{x}_{k}& =A_{k-1}\mathbf{x}_{k-1}+B_{k-1}\mathbf{u}_{k-1}+\mathbf{w}_{k-1} \\\mathbf{y}_{k}& = C_{k}\mathbf{x}_{k}+\mathbf{v}_{k}\end{align*}

where

  • k is the discrete-time instant
  • \mathbf{x}_{k}\in \mathbb{R}^{n} is the state vector at the discrete time instant k
  • \mathbf{u}_{k-1}\in \mathbb{R}^{m} is the control input vector
  • \mathbf{w}_{k-1}\in  \mathbb{R}^{n} is the disturbance or process noise vector. We assume that \mathbf{w}_{k} is white, zero mean, uncorrelated, with the covariance matrix given by Q_{k}=E[\mathbf{w}_{k}\mathbf{w}_{k}^{T}], where E[\cdot] is the mathematical expectation operator.
  • A_{k-1}\in \mathbb{R}^{n\times n} and B_{k-1}\in \mathbb{R}^{n\times m} are state and input matrices
  • C_{k}\in \mathbb{R}^{r\times n} is the output matrix
  • \mathbf{y}_{k}\in \mathbb{R}^{r} is the output vector (observed measurements)
  • \mathbf{v}_{k}\in  \mathbb{R}^{r} is the measurement noise vector. We assume that \mathbf{v}_{k} is white, zero mean, uncorrelated, with the covariance matrix given by R_{k}=E[\mathbf{v}_{k}\mathbf{v}_{k}^{T}]

This is a time-varying system. However, if the system is time-invariant, then all the system matrices are constant. We assume that the covariance matrices R_{k}, Q_{k}, as well as system matrices A_{k}, B_{k}, and C_{k} are known. The goal is to design an observer (estimator) that will estimate the state vector \mathbf{x}_{k}.

A priori and a posteriori state estimates, errors, and covariance matrices

In Kalman filtering, we have two important state estimates: a priori state estimates and a posteriori state estimates. Apart from this post and Kalman filtering, a priori and a posteriori are Latin phrases whose meaning is explained here.

The a priori state estimate of the state vector \mathbf{x}_{k} is obtained implicitly on the basis of the output vector measurements \mathbf{y}_{1},\mathbf{y}_{2},\ldots, \mathbf{y}_{k-1} (as well as on some other information that is not relevant for this discussion). That is, the a priori state estimate is obtained on the basis of the past measurements up to the current discrete-time instant k and NOT on the basis of the measurement \mathbf{y}_{k} at the current discrete-time instant k. The a priori state estimate is denoted by

(2)   \begin{align*}\hat{\mathbf{x}}_{k}^{-} \;\; -\;\; \text{a priori state estimate}\end{align*}

where “the hat notation” denotes an estimate, and where the minus superscript denotes the a priori state estimate. The minus superscript originates from the fact that this estimate is obtained before we process the measurement \mathbf{y}_{k} at the time instant k.

The a posteriori estimate of the state \mathbf{x}_{k} is obtained implicitly on the basis of the measurements \mathbf{y}_{1},\mathbf{y}_{2},\ldots, \mathbf{y}_{k-1},\mathbf{y}_{k}. The a posteriori estimate is denoted by

(3)   \begin{align*}\hat{\mathbf{x}}_{k}^{+} \;\; -\;\; \text{a posteriori state estimate}\end{align*}

where the plus superscript in the state estimate notation denotes the fact that the a posteriori estimate is obtained by processing the measurement \mathbf{y}_{k} obtained at the time instant k.

In our previous post, we explained how to derive a recursive least squares method from scratch. We defined and derived the expression for the covariance matrices of estimation errors in the context of recursive least squares filtering. Here, we also need to introduce covariance matrices of the a priori and a posteriori estimation errors. The covariance matrices of the a priori and a posteriori estimation errors are defined as follows

(4)   \begin{align*}P_{k}^{-}=E[(\mathbf{x}_{k}-\mathbf{x}_{k}^{-})(\mathbf{x}_{k}-\mathbf{x}_{k}^{-})^{T}] \\P_{k}^{+}=E[(\mathbf{x}_{k}-\mathbf{x}_{k}^{+})(\mathbf{x}_{k}-\mathbf{x}_{k}^{+})^{T}]\end{align*}

where P_{k}^{-} is the covariance matrix of the a priori estimation error and P_{k}^{+} is the covariance matrix of the a posteriori estimation error.

Derivation of Kalman Equations from the Recursive-Least Squares Method

The following figure illustrates how the Kalman filter works in practice.

Figure 1: Illustration of Kalman filtering equations.

For the time being it is beneficial to briefly explain this figure. Even if some concepts are not completely clear, they will become clear by the end of this post. Immediately after the discrete-time instant k-1, we compute the a posteriori state estimate \mathbf{x}_{k-1}^{+} and the covariance matrix of the estimation error P_{k-1}^{+}. Then, we propagate these quantities through the equations describing the system dynamics and covariance matrix propagation (that is also derived on the basis of the system dynamics). That is, we propagate the a posteriori state and covariance through our model. By propagating these quantities, we obtain the a priori state estimate \mathbf{x}_{k}^{-} and the covariance matrix of the estimation error P_{k}^{-} for the time step k. Then, after the measurement vector \mathbf{y}_{k} is observed at the discrete-time step k, we use this measurement and the recursive-least squares method to compute the a posteriori estimate \mathbf{x}_{k}^{+} and the covariance matrix of the estimation error P_{k}^{+} for the time instant k.

Now that we have a general idea of how Kalman filtering works, let us now derive the Kalman filter equations. At the initial time instant k=0, we need to choose an initial guess of the state estimate. Let this chosen initial guess of the state estimate be denoted by \hat{\mathbf{x}}_{0}. This will be our initial a posteriori state estimate, that is

(5)   \begin{align*}\hat{\mathbf{x}}_{0}^{+}=\hat{\mathbf{x}}_{0}\end{align*}

Then, the question is how to compute the a priori estimate \hat{\mathbf{x}}_{1}^{-} for the time instant 1, before the measurements are taken?

The natural answer is that the system states, as well as the estimates, need to satisfy the system dynamics (1), and consequently,

(6)   \begin{align*}\hat{\mathbf{x}}_{1}^{-}& =A_{0}\hat{\mathbf{x}}_{0}^{+}+B_{0}\mathbf{u}_{0}\end{align*}

where we excluded the disturbance part since the disturbance vector is not known. Besides the initial guess of the estimate, we also need to select an initial guess of the covariance matrix of the estimation error. That is, we need to select P_{0}^{+}. If we have perfect knowledge about the initial state, then we select P_{0}^{+} as a zero matrix, that is P_{0}^{+}=0\cdot I, where I is an identity matrix. This is because the covariance matrix of the estimation error is the measure of uncertainty, and if we have perfect knowledge, then the measure of uncertainty should be zero. On the other hand, if we do not have any a priori knowledge of the initial state, then P_{0}^{+}=c\cdot I, where c is a large number. Next, we need to compute the covariance matrix of the a priori state estimation error, that is, we need to compute P_{1}^{-}. In our previous post, which can be found here, we derived the expression for the time propagation of the covariance matrix of the state estimation error:

(7)   \begin{align*}P_{k}=A_{k-1}P_{k-1}A_{k-1}^{T}+Q_{k-1}\end{align*}

By using this expression, we obtain the following equation for P_{1}^{-}

(8)   \begin{align*}P_{1}^{-}=A_{0}P_{0}^{+}A_{0}^{T}+Q_{0}\end{align*}

So this is the best we can do at the initial time step k=0 before we obtain measurement information. We computed the a priori estimate \mathbf{x}_{1}^{-} and the covariance matrix P_{1}^{-} for the time step k=1, and before the measurement \mathbf{y}_{1} at the time step k=1 has arrived.

At the time instant k=1, the measurement \mathbf{y}_{1}, arrives. Then, the natural question is how to update the a priori estimate \hat{\mathbf{x}}_{1}^{-} and the covariance matrix P_{1}^{-}?

To update these quantities, we need to use the recursive least-squares method. We derived the equations of the recursive least-squares method in our previous post, which can be found here. Here are the recursive least-squares equations:

  1. Update the gain matrix K_{k}:

    (9)   \begin{align*}K_{k}=P_{k-1}C_{k}^{T}(R_{k}+C_{k}P_{k-1}C_{k}^{T})^{-1}\end{align*}

  2. Update the estimate:

    (10)   \begin{align*}\hat{\mathbf{x}}_{k}= \hat{\mathbf{x}}_{k-1}+K_{k}(\mathbf{y}_{k}-C_{k}\hat{\mathbf{x}}_{k-1}) \end{align*}

  3. Propagate the covariance matrix by using this equation:

    (11)   \begin{align*}P_{k}=(I-K_{k}C_{k})P_{k-1}(I-K_{k}C_{k})^{T}+K_{k}R_{k}K_{k}^{T}\end{align*}


    or this equation

    (12)   \begin{align*}P_{k}=(I-K_{k}C_{k})P_{k-1}\end{align*}


Now, the question is how to use these equations to derive the Kalman filter equations? First, we perform the following substitutions in the recursive least-squares equations (these substitutions will be explained later in the text):

(13)   \begin{align*}P_{k-1}\rightarrow P_{k}^{-}   \end{align*}

(14)   \begin{align*}P_{k}\rightarrow P_{k}^{+}   \end{align*}

(15)   \begin{align*}\hat{\mathbf{x}}_{k-1} \rightarrow \hat{\mathbf{x}}_{k}^{-}\end{align*}

(16)   \begin{align*}\hat{\mathbf{x}}_{k} \rightarrow \hat{\mathbf{x}}_{k}^{+}\end{align*}

Here is the main idea behind these substitutions. Let us first focus on the equation (13). In the case of the recursive least squares method, the covariance before the measurement arrives is P_{k-1}. In the case of the Kalman filter, P_{k-1} becomes P_{k}^{-}, since P_{k}^{-} is the covariance before the measurement arrived. Next, let us focus on (14). In the case of the recursive least squares method, the covariance after the measurement arrives is P_{k}. In the case of the Kalman filter, P_{k} becomes P_{k}^{+}, since P_{k}^{+} is the updated covariance after the measurement arrived. Next, let us focus on (15). In the case of the recursive least squares, \hat{\mathbf{x}}_{k-1} is the state estimate before the measurement \mathbf{y}_{k} arrived. Consequently, in the case of the Kalman filter, \hat{\mathbf{x}}_{k-1} becomes the a priori estimate \hat{\mathbf{x}}_{k}^{-}, since \hat{\mathbf{x}}_{k}^{-} does not take into account the measurement \mathbf{y}_{k}. Finally, let us focus on (16). In the case of the recursive least squares method, \hat{\mathbf{x}}_{k} is the state estimate that takes into account the measurement \mathbf{y}_{k}. Consequently, in the case of the Kalman filter, \hat{\mathbf{x}}_{k} becomes \hat{\mathbf{x}}_{k}^{+}, since \hat{\mathbf{x}}_{k}^{+} takes into account the measurement \mathbf{y}_{k} obtained at the time instant k.

After these substitutions, we obtain the update equations:

(17)   \begin{align*}K_{k}=P_{k}^{-}C_{k}^{T}(R_{k}+C_{k}P_{k}^{-}C_{k}^{T})^{-1} \\\end{align*}

(18)   \begin{align*}\hat{\mathbf{x}}_{k}^{+}= \hat{\mathbf{x}}_{k}^{-}+K_{k}(\mathbf{y}_{k}-C_{k}\hat{\mathbf{x}}_{k}^{-}) \end{align*}

(19)   \begin{align*}P_{k}^{+}&=(I-K_{k}C_{k})P_{k}^{-}(I-K_{k}C_{k})^{T}+K_{k}R_{k}K_{k}^{T} \\P_{k}^{+}& =(I-K_{k}C_{k})P_{k}^{-}\end{align*}

Now, let us go back at the time instant k=1. By using the newly established update equations, we update \mathbf{x}_{1}^{-} and P_{1}^{-}, as follows

(20)   \begin{align*}K_{1} & =P_{1}^{-}C_{1}^{T}(R_{1}+C_{1}P_{1}^{-}C_{0}^{T})^{-1}  \\\hat{\mathbf{x}}_{1}^{+} & = \hat{\mathbf{x}}_{1}^{-}+K_{1}(\mathbf{y}_{1}-C_{1}\hat{\mathbf{x}}_{1}^{-}) \\P_{1}^{+}&=(I-K_{1}C_{1})P_{1}^{-}(I-K_{1}C_{1})^{T}+K_{1}R_{1}K_{1}^{T} \\P_{1}^{+}& =(I-K_{1}C_{1})P_{1}^{-}\end{align*}

As the result, we obtained P_{1}^{+} and the a posteriori estimate \hat{\mathbf{x}}_{1}^{+}. Next, similarly to the initial step, we propagate \hat{\mathbf{x}}_{1}^{-} through the system dynamics (model of the system), and P_{1}^{+} through covariance propagation equation:

(21)   \begin{align*}\hat{\mathbf{x}}_{2}^{-}& =A_{1}\mathbf{x}_{1}^{+}+B_{1}\mathbf{u}_{1} \\P_{2}^{-}&=A_{1}P_{1}^{+}A_{1}^{T}+Q_{1}\end{align*}

to obtain \hat{\mathbf{x}}_{2}^{-} and P_{2}^{-}. After this step, we compute the a posteriori estimate and the covariance matrix of the estimation error by using Eqs. (17), (18), and (19) for k=2. We learned that the two main steps of the Kalman filter are:

  1. Before the measurement arrives, use the system dynamics to propagate the a posteriori estimate and covariance matrix of the estimation error from the step k-1, to obtain the a priori estimate and covariance matrix of the estimation error for the step k.
  2. At the time step k, use the recursive least squares equations, and the obtained measurement, to compute the a posteriori estimate and the covariance matrix of the estimation error.

This is illustrated in Fig. 2 (the same as Fig. 1 which is repeated here for completeness).

Figure 2: Graphical explanation of the Kalman filter (same as Fig. 1).

Let us now summarize the Kalman filter equations.

Summary of the derived Kalman filter equations

In the initial step, we select and initial estimate \hat{\mathbf{x}}_{0}^{+} and the covariance matrix of the estimation error P_{0}^{+}. Then, for k=1,2,3,\ldots, we perform steps 1 and 2 summarized below

Step 1: Propagate the a posteriori estimate \hat{\mathbf{x}}_{k-1}^{+} and the covariance matrix of the estimation error P_{k-1}^{+}, through the system dynamics to obtain the a priori state estimate and the covariance matrix of the estimation error for the step k:

(22)   \begin{align*}\hat{\mathbf{x}}_{k}^{-}& =A_{k-1}\mathbf{x}_{k-1}^{+}+B_{k-1}\mathbf{u}_{k-1} \\P_{k}^{-}& =A_{k-1}P_{k-1}^{+}A_{k-1}^{T}+Q_{k-1}\end{align*}

Step 2: Obtain the measurement \mathbf{y}_{k}. Use this measurement, and the a priori estimate \hat{\mathbf{x}}_{k}^{-} and the covariance matrix of the estimation error P_{k}^{-} to compute the a posteriori state estimate and the covariance matrix of the estimation error

(23)   \begin{align*}K_{k} & =P_{k}^{-}C_{k}^{T}(R_{k}+C_{k}P_{k}^{-}C_{k}^{T})^{-1} \\\hat{\mathbf{x}}_{k}^{+} & = \hat{\mathbf{x}}_{k}^{-}+K_{k}(\mathbf{y}_{k}-C_{k}\hat{\mathbf{x}}_{k}^{-})  \\P_{k}^{+}&=(I-K_{k}C_{k})P_{k}^{-}(I-K_{k}C_{k})^{T}+K_{k}R_{k}K_{k}^{T} \\P_{k}^{+}& =(I-K_{k}C_{k})P_{k}^{-} \end{align*}

Here are a few comments about the derived Kalman filter equations. The matrix K_{k} is called the Kalman filter gain matrix. Then, the equation for the covariance update

(24)   \begin{align*}P_{k}^{+}&=(I-K_{k}C_{k})P_{k}^{-}(I-K_{k}C_{k})^{T}+K_{k}R_{k}K_{k}^{T}\end{align*}

is more numerically stable and robust than the alternative equation

(25)   \begin{align*}P_{k}^{+}& =(I-K_{k}C_{k})P_{k}^{-} \end{align*}

Consequently, the equation (24) is preferable for computing the covariance update. Another important thing to mention is that the Kalman filter gain K_{k} does not depend on the measurements \mathbf{y}_{k}. This is because the calculations of P_{k}^{-}, P_{k}^{+}, and K_{k} only depend on the system matrices A_{k-1}, C_{k-1}, and Q_{k} and R_{k}. This means that the Kalman filter matrices K_{k}, for k=1,2,3,\ldots can be precomputed before the Kalman filter starts to operate online. This is important for saving computational time in embedded systems.