January 9, 2025

Extended Kalman Filter Tutorial With Example and Disciplined Python Codes – PART I -Derivation


In this tutorial, we derive the extended Kalman filter that is used for the state estimation of nonlinear systems. We furthermore develop a Python implementation of the Kalman filter and we test the extended Kalman filter by using an example of a nonlinear dynamical system. In this first part of the tutorial, we explain how to derive the extended Kalman filter.

The YouTube videos accompanying this tutorial are given below.

PART 1:

PART 2:

Before reading this tutorial, it is recommended to go over these tutorials on linear Kalman filtering:

  1. Introduction to Kalman Filter: Derivation of the Recursive Least Squares Method
  2. Introduction to Kalman Filter: Disciplined Python Implementation of Recursive Least Squares Method
  3. Time Propagation of State Vector Expectation and State Covariance Matrix of Linear Dynamical Systems – Intro to Kalman Filtering
  4. Kalman Filter Tutorial- Derivation of the Kalman Filter by Using the Recursive Least-Squares Method
  5. Disciplined Kalman Filter Implementation in Python by Using Object-Oriented Approach

General Information About Extended Kalman Filter

The extended Kalman filter is a generalization of the linear Kalman filter for nonlinear dynamical systems in the fairly general mathematical form given by the following state and output equations:

(1)   \begin{align*}\mathbf{x}_{k} & =\mathbf{f}_{k-1}\big(\mathbf{x}_{k-1}, \mathbf{u}_{k-1} \big)+\mathbf{d}_{k-1}, \\\mathbf{y}_{k} & =\mathbf{g}_{k}\big(\mathbf{x}_{k} \big) + \mathbf{q}_{k} \end{align*}

where

  • k is a discrete-time instant. That is, we look at the system behavior at the time instants 0, \Delta t, 2\Delta t, \ldots,  k \Delta t ,\ldots, where \Delta t is the sampling period of the system. That is, before applying the filter, we assume that the signals are sampled in time.
  • \mathbf{x}_{k}\in \mathbb{R}^{n} is the n-dimensional state vector at the discrete-time instant k.
  • \mathbf{u}_{k}\in \mathbb{R}^{m} is the m-dimensional input vector at the discrete-time instant k.
  • \mathbf{d}_{k}\in \mathbb{R}^{n} is the n_{1}-dimensional disturbance vector affecting the state and dynamics of the system. This vector is also called the process noise vector. We assume that the disturbance vector is white, zero-mean, uncorrelated, with the covariance matrix given by Q_{k}.
  • \mathbf{y}_{k} \in \mathbb{R}^{r} is the r-dimensional output vector. This vector is also known as the observation vector.
  • \mathbf{q}_{k} \in \mathbb{R}^{r} is the r-dimensional measurement noise vector. We assume that the measurement noise vector is white, zero-mean, uncorrelated, with the covariance matrix given by R_{k}.
  • \mathbf{f}_{k-1}() is the nonlinear vector function that is allowed to vary from one time instant to another. This functions mathematically models the dynamics of the system.
  • \mathbf{g}_{k}() is the nonlinear vector function that is allowed to vary from one time instant to another. This function mathematically models the relationship between the observed vector and the state of the system.

Before we start with the development of the extended Kalman filter, we need to explain the following notation. In Kalman filtering, we have two important state estimates: the a priori state estimate and the a posteriori state estimate. Apart from this post and Kalman filtering, a priori and a posteriori are Latin phrases whose meaning is explained here.

The a priori state estimate of the state vector \mathbf{x}_{k} is obtained implicitly on the basis of the output vector measurements \mathbf{y}_{1},\mathbf{y}_{2},\ldots, \mathbf{y}_{k-1} (as well as on some other information that will be explained later). That is, the a priori state estimate is obtained on the basis of the past measurements up to the current discrete-time instant k and NOT on the basis of the measurement \mathbf{y}_{k} at the current discrete-time instant k. The a priori state estimate is denoted by

(2)   \begin{align*}\hat{\mathbf{x}}_{k}^{-} \;\; -\;\; \text{a priori state estimate}\end{align*}

where “the hat notation” denotes an estimate, and where the minus superscript denotes the a priori state estimate. The minus superscript originates from the fact that this estimate is obtained before we process the measurement \mathbf{y}_{k} at the time instant k.

The a posteriori estimate of the state \mathbf{x}_{k} is obtained implicitly on the basis of the measurements \mathbf{y}_{1},\mathbf{y}_{2},\ldots, \mathbf{y}_{k-1},\mathbf{y}_{k}. The a posteriori estimate is denoted by

(3)   \begin{align*}\hat{\mathbf{x}}_{k}^{+} \;\; -\;\; \text{a posteriori state estimate}\end{align*}

where the plus superscript in the state estimate notation denotes the fact that the a posteriori estimate is obtained by processing the measurement \mathbf{y}_{k} obtained at the time instant k.

Another concept that is important for understanding and implementing the extended Kalman filter is the concept of the covariance matrices of the estimation error. The a priori covariance matrix of the state estimation error is defined by

(4)   \begin{align*}P_{k}^{-}=E[(\mathbf{x}_{k}-\hat{\mathbf{x}}_{k}^{-})(\mathbf{x}_{k}-\hat{\mathbf{x}}_{k}^{-})^{T}]\end{align*}

On the other hand, the a posteriori covariance matrix of the state estimation error is defined by

(5)   \begin{align*}P_{k}^{+}=E[(\mathbf{x}_{k}-\hat{\mathbf{x}}_{k}^{+})(\mathbf{x}_{k}-\hat{\mathbf{x}}_{k}^{+})^{T}]\end{align*}

Derivation of the Extended Kalman Filter

We start from the nonlinear state equation that is rewritten over here for clarity

(6)   \begin{align*}\mathbf{x}_{k} & =\mathbf{f}_{k-1}\big(\mathbf{x}_{k-1}, \mathbf{u}_{k-1} \big)+\mathbf{d}_{k-1}\end{align*}

Let us assume that at the discrete-time instant k-1 the a posteriori estimate \hat{\mathbf{x}}_{k-1}^{+} is available. Also, let us assume that the value of \mathbf{u}_{k-1}=\mathbf{u}_{k-1}^{*} is also known at this time instant. The first main idea of the extended Kalman filter is to linearize the state equation around \hat{\mathbf{x}}_{k-1}^{+}. So, let us do that. By linearizing the state equation, we obtain

(7)   \begin{align*}\mathbf{x}_{k}& \approx \mathbf{f}_{k-1}\big(\hat{\mathbf{x}}_{k-1}^{+}, \mathbf{u}_{k-1}^{*} \big)+A_{k-1}\big(\mathbf{x}_{k-1} -\hat{\mathbf{x}}_{k-1}^{+}\big)+\mathbf{d}_{k-1} \\\mathbf{x}_{k}& \approx A_{k-1}\mathbf{x}_{k-1}+\mathbf{f}_{k-1}\big(\hat{\mathbf{x}}_{k-1}^{+}, \mathbf{u}_{k-1}^{*} \big)-A_{k-1}\hat{\mathbf{x}}_{k-1}^{+}+\mathbf{d}_{k-1} \\\mathbf{x}_{k}& \approx A_{k-1}\mathbf{x}_{k-1} + \mathbf{s}_{k-1}+\mathbf{d}_{k-1}\end{align*}

where

  • The matrix A_{k-1} is obtained by linearizing the nonlinear dynamics around \hat{\mathbf{x}}_{k-1}^{+} and \mathbf{u}_{k-1}^{*}

(8)   \begin{align*}A_{k-1}=\frac{\partial \mathbf{f}_{k-1}\big(\mathbf{x}_{k-1}, \mathbf{u}_{k-1} \big) }{\partial \mathbf{x}_{k-1} }\Big|_{\mathbf{x}_{k-1}= \hat{\mathbf{x}}_{k-1}^{+},\mathbf{u}_{k-1} =\mathbf{u}_{k-1}^{*}} \end{align*}

where \mathbf{u}_{k-1}^{*} is the known value of the input at the time instant k-1. This matrix is the Jacobian matrix of the function \mathbf{f}_{k-1}\big(\mathbf{x}_{k-1}, \mathbf{u}_{k-1} \big) evaluated at \hat{\mathbf{x}}_{k-1}^{+} and \mathbf{u}_{k-1}^{*}. The Jacobian matrix is defined by

(9)   \begin{align*}\frac{\partial \mathbf{f}_{k-1}\big(\mathbf{x}_{k-1}, \mathbf{u}_{k-1} \big) }{\partial \mathbf{x}_{k-1} }=\begin{bmatrix} \frac{\partial  f_{1,k-1} }{\partial x_{1,k-1}}  & \frac{\partial  f_{1,k-1} }{\partial x_{2,k-1}} & \ldots & \frac{\partial  f_{1,k-1} }{\partial x_{n,k-1}} \\ \frac{\partial  f_{2,k-1} }{\partial x_{1,k-1}}  & \frac{\partial  f_{2,k-1} }{\partial x_{2,k-1}} & \ldots & \frac{\partial  f_{2,k-1} }{\partial x_{n,k-1}}  \\ \vdots & \vdots & \ldots & \vdots \\  \frac{\partial  f_{n,k-1} }{\partial x_{1,k-1}}  & \frac{\partial  f_{n,k-1} }{\partial x_{2,k-1}} & \ldots & \frac{\partial  f_{n,k-1} }{\partial x_{n,k-1}}  \end{bmatrix}\end{align*}

where

(10)   \begin{align*}\mathbf{f}_{k-1}\big(\mathbf{x}_{k-1}, \mathbf{u}_{k-1} \big)=\begin{bmatrix} f_{1,k-1} \\ f_{2,k-1} \\ \vdots \\ f_{n,k-1}  \end{bmatrix},\;\; \mathbf{x}_{k-1}=\begin{bmatrix} x_{1,k-1} \\ x_{2,k-1} \\ \vdots \\ x_{n,k-1}  \end{bmatrix}\end{align*}

That is, the (i,j)th entry of the Jacobian matrix is the partial derivative of the ith entry of the vector function \mathbf{f}_{k-1}\big(\mathbf{x}_{k-1}, \mathbf{u}_{k-1} \big) with respect to the jth entry of the state vector \mathbf{x}_{k-1}.

  • The vector \mathbf{s}_{k-1} contains known quantities at the time instant k-1 and is defined as

(11)   \begin{align*}\mathbf{s}_{k-1} = \mathbf{f}_{k-1}\big(\hat{\mathbf{x}}_{k-1}^{+}, \mathbf{u}_{k-1}^{*} \big)-A_{k-1}\hat{\mathbf{x}}_{k-1}^{+}\end{align*}

The next step is to linearize the output equation. As it will be explained later, at the time instant k-1 we can propagate the state estimate \hat{\mathbf{x}}_{k-1}^{+} through the nonlinear dynamics to compute \hat{\mathbf{x}}_{k}^{-}. We do that before the output measurement at the time instant k arrives. The second main idea of the Kalman filter is to linearize the output equation around the known value of the a priori state estimate \hat{\mathbf{x}}_{k}^{-}. Let us do that. First, for clarity, here is our output equation again:

(12)   \begin{align*}\mathbf{y}_{k} & =\mathbf{g}_{k}\big(\mathbf{x}_{k} \big) + \mathbf{q}_{k}\end{align*}

The linearization procedure produces:

(13)   \begin{align*}\mathbf{y}_{k} & \approx \mathbf{g}_{k}\big(\hat{\mathbf{x}}_{k}^{-} \big)+ C_{k}\big( \mathbf{x}_{k}-\hat{\mathbf{x}}_{k}^{-}\big)+ \mathbf{q}_{k} \\\mathbf{y}_{k} & \approx C_{k} \mathbf{x}_{k}+ \mathbf{g}_{k}\big(\hat{\mathbf{x}}_{k}^{-} \big)- C_{k}\hat{\mathbf{x}}_{k}^{-}+ \mathbf{q}_{k} \\\mathbf{y}_{k} & \approx C_{k} \mathbf{x}_{k}+\mathbf{b}_{k}+\mathbf{q}_{k}\end{align*}

where

  • The matrix C_{k} is obtained by linearizing the nonlinear output equation around \hat{\mathbf{x}}_{k}^{-}

(14)   \begin{align*}C_{k}=\frac{\partial \mathbf{g}_{k}\big(\mathbf{x}_{k} \big) }{\partial \mathbf{x}_{k} }\Big|_{\mathbf{x}_{k}= \hat{\mathbf{x}}_{k}^{-}} \end{align*}

This matrix is the Jacobian matrix of the function \mathbf{g}_{k}\big(\mathbf{x}_{k} \big) evaluated at \hat{\mathbf{x}}_{k}^{-}. The Jacobian matrix is defined by

(15)   \begin{align*}\frac{\partial \mathbf{g}_{k}\big(\mathbf{x}_{k} \big) }{\partial \mathbf{x}_{k} }=\begin{bmatrix} \frac{\partial  g_{1,k} }{\partial x_{1,k}}  & \frac{\partial  g_{1,k} }{\partial x_{2,k}} & \ldots & \frac{\partial  g_{1,k} }{\partial x_{n,k}} \\ \frac{\partial  g_{2,k} }{\partial x_{1,k}}  & \frac{\partial  g_{2,k} }{\partial x_{2,k}} & \ldots & \frac{\partial  g_{2,k} }{\partial x_{n,k}}  \\ \vdots & \vdots & \ldots & \vdots \\  \frac{\partial  g_{n,k} }{\partial x_{1,k}}  & \frac{\partial  g_{n,k} }{\partial x_{2,k}} & \ldots & \frac{\partial  g_{n,k} }{\partial x_{n,k}}  \end{bmatrix}\end{align*}

where

(16)   \begin{align*}\mathbf{g}_{k}\big(\mathbf{x}_{k} \big)=\begin{bmatrix} g_{1,k} \\ g_{2,k} \\ \vdots \\ g_{n,k}  \end{bmatrix},\;\; \mathbf{x}_{k}=\begin{bmatrix} x_{1,k} \\ x_{2,k} \\ \vdots \\ x_{n,k}  \end{bmatrix}\end{align*}

That is, the (i,j)th entry of the Jacobian matrix is the partial derivative of the ith entry of the vector function \mathbf{g}_{k}\big(\mathbf{x}_{k} \big) with respect to the jth entry of the state vector \mathbf{x}_{k}.

The vector \mathbf{b}_{k} is defined as follows

(17)   \begin{align*}\mathbf{b}_{k} = \mathbf{g}_{k}\big(\hat{\mathbf{x}}_{k}^{-} \big)- C_{k}\hat{\mathbf{x}}_{k}^{-}\end{align*}

It is important to emphasize that this vector is completely known.

Let us now summarize the linearized state and output equations:

(18)   \begin{align*}\textbf{Linearized state equation:}\;\;  \mathbf{x}_{k}& \approx A_{k-1}\mathbf{x}_{k-1} + \mathbf{s}_{k-1}+\mathbf{d}_{k-1} \\\textbf{Linearized output equation:}\;\;  \mathbf{y}_{k} & \approx C_{k} \mathbf{x}_{k}+\mathbf{b}_{k}+\mathbf{q}_{k}\end{align*}

The third main idea of the extended Kalman filter is to apply the linear Kalman filter to these equations. The only divergence from the linear Kalman filter is to use the nonlinear state equation to propagate the a posteriori estimates in time. We derived the linear Kalman filter in our previous tutorial which can be found here. Consequently, we will use the derived equations and apply them to our case.

EXTENDED KALMAN FILTER:

At the initial discrete-time step k=0 we select an initial a posteriori estimate \hat{\mathbf{x}}_{0}^{+} and an initial a posteriori covariance matrix of the estimation error P_{0}^{+}. Then, for k=1,2,\ldots we recursively perform the following steps:

STEP 1 (after the time step k-1 and before the time step k): At this time step we know the a posteriori estimate \hat{\mathbf{x}}_{k-1}^{+} and the a posteriori covariance matrix P_{k-1}^{+} from the time step k-1. First, we compute the linearized matrix A_{k-1}:

(19)   \begin{align*}A_{k-1}=\frac{\partial \mathbf{f}_{k-1}\big(\mathbf{x}_{k-1}, \mathbf{u}_{k-1} \big) }{\partial \mathbf{x}_{k-1} }\Big|_{\mathbf{x}_{k-1}= \hat{\mathbf{x}}_{k-1}^{+},\mathbf{u}_{k-1} =\mathbf{u}_{k-1}^{*}} \end{align*}

Then, we compute the a priori estimate \hat{\mathbf{x}}_{k}^{-} and the a priori covariance matrix P_{k}^{-} for the upcoming time step k by propagating \hat{\mathbf{x}}_{k-1}^{+} and P_{k-1}^{+} through the nonlinear dynamics and the covariance equation

(20)   \begin{align*}\hat{\mathbf{x}}_{k}^{-}=\mathbf{f}_{k-1}\big(\hat{\mathbf{x}}_{k-1}^{+}, \mathbf{u}_{k-1}= \mathbf{u}_{k-1}^{*} \big) \\P_{k}^{-}=A_{k-1}P_{k-1}^{+}A_{k-1}^{T}+Q_{k-1}\end{align*}

At the end of this step, compute the Jacobian matrix of the output equation

(21)   \begin{align*}C_{k}=\frac{\partial \mathbf{g}_{k}\big(\mathbf{x}_{k} \big) }{\partial \mathbf{x}_{k} }\Big|_{\mathbf{x}_{k}= \hat{\mathbf{x}}_{k}^{-}} \end{align*}

STEP 2 (immediately after the time step k): At the time step k, the measurement \mathbf{y}_{k} arrives. Immediately after the time step k, we compute the Kalman gain matrix

(22)   \begin{align*}K_{k}=P_{k}^{-}C_{k}^{T}\big(C_{k}P_{k}^{-}C_{k}^{T}+R_{k} \big)^{-1}\end{align*}

Then, compute the a posteriori state estimate:

(23)   \begin{align*}\hat{\mathbf{x}}_{k}^{+} & =\hat{\mathbf{x}}_{k}^{-}+K_{k}\big(\mathbf{y}_{k}-C_{k}\hat{\mathbf{x}}_{k}^{-}-\mathbf{b}_{k} \big) \\\hat{\mathbf{x}}_{k}^{+} & =\hat{\mathbf{x}}_{k}^{-}+K_{k}\big(\mathbf{y}_{k}-C_{k}\hat{\mathbf{x}}_{k}^{-}- \mathbf{g}_{k}\big(\hat{\mathbf{x}}_{k}^{-} \big)+ C_{k}\hat{\mathbf{x}}_{k}^{-}  \big) \\\hat{\mathbf{x}}_{k}^{+} & =\hat{\mathbf{x}}_{k}^{-}+K_{k}\big(\mathbf{y}_{k}- \mathbf{g}_{k}\big(\hat{\mathbf{x}}_{k}^{-} \big) \big) \end{align*}

where the third form of the above-stated equation is the most appropriate for computations. Finally, compute the a posteriori covariance matrix by using either this equation

(24)   \begin{align*}P_{k}^{+}=\big(I-K_{k}C_{k} \big)P_{k}^{-}\end{align*}

or

(25)   \begin{align*}P_{k}^{+}=\big(I-K_{k}C_{k} \big)P_{k}^{-}\big(I-K_{k}C_{k} \big)^{T}+K_{k}R_{k}K_{k}^{T}\end{align*}

Due to numerical stability, the equation (25) is a more preferable option to be used for covariance matrix time propagation.

The above-summarized two steps are graphically illustrated in the figure below.