January 28, 2025

Clear and Concise Particle Filter Tutorial with Python Implementation- Part 2: Derivation of Particle Filter Algorithm From Scratch


This is the second part of the tutorial series on particle filters. In this tutorial part, we derive the particle filter algorithm from scratch. Here are all the tutorials in this series (click on the link to access the tutorial webpage):

PART 1: In Part 1, we explain all the statistics and dynamical system concepts necessary for understanding particle filters. Also in Part 1, we present the problem formulation for state estimation by using particle filters.

PART 2: You are currently reading Part 2. In Part 2 we derive the particle filter algorithm from scratch.

PART 3: In part 3, we explain how to implement the particle filter algorithm in Python from scratch.

In this tutorial part, first, we briefly revise the big picture of particle filters. Then, we briefly revise the main results from probability and statistics that a student needs to know in order to understand the derivation of particle filters. Then, we derive the particle filter algorithm. Finally, we concisely summarize the particle filter algorithm. Before reading this tutorial part, it is suggested to go over the first tutorial part.

The YouTube tutorial accompanying this webpage is given below.

Big Picture of Particle Filters – Approximation of Posterior Probability Density Function of State Estimate

As explained in the first tutorial part, for presentation clarity and not to blur the main ideas of particle filters with too many control theory details, we develop a particle filter for the linear state-space model given by the following two equations:

(1)   \begin{align*}\mathbf{x}_{k}& =A\mathbf{x}_{k-1}+B\mathbf{u}_{k-1}+\mathbf{w}_{k-1} \\\mathbf{y}_{k}& =C\mathbf{x}_{k}+\mathbf{v}_{k}\end{align*}

where \mathbf{x}_{k}\in \mathbb{R}^{n} is the state vector at the discrete time step k, \mathbf{u}_{k-1}\in \mathbb{R}^{m_{1}} is the control input vector at the time step k-1, \mathbf{w}_{k-1}\in \mathbb{R}^{m_{2}} is the process disturbance vector (process noise vector) at the time step k-1, \mathbf{y}_{k}\in \mathbb{R}^{r} is the observed output vector at the time step k, \mathbf{v}_{k}\in \mathbb{R}^{r} is the measurement noise vector at the discrete time step k, and A, B, and C are system matrices.

In the first tutorial part, we explained that the main goal of particle filters is to recursively approximate the state posterior probability density function, denoted by

(2)   \begin{align*}p(\mathbf{x}_{k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})\end{align*}

where \mathbf{y}_{0:k} and \mathbf{u}_{0:k-1} denote the sets of inputs and outputs

(3)   \begin{align*}\mathbf{y}_{0:k} & =\{\mathbf{y}_{0},\mathbf{y}_{1},\mathbf{y}_{2},\ldots,\mathbf{y}_{k} \} \\\mathbf{u}_{0:k-1} & =\{\mathbf{u}_{0},\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{k-1} \} \\\end{align*}

In the particle filter and robotics literature, the posterior density is also called the filtering density (filtering distribution) or the marginal density (marginal distribution). Some researchers call the density p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) as the posterior density (this density will be formally introduced later on in this tutorial), and they refer to the density p(\mathbf{x}_{k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) as the filtering density or as the marginal density. However, we will not follow this naming convention. We refer to p(\mathbf{x}_{k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) as the posterior density, and we refer to the density p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) as the smoothing density. This is mainly because the density p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) takes the complete state sequence into account, and this fact means that estimates are smoothed.

At the time instant k, the particle filter computes the set of particles

(4)   \begin{align*}\{(\mathbf{x}_{k}^{(i)},w^{(i)}_{k}) | i=1,2,3,\ldots, N\}\end{align*}

A particle with the index of i consists of the tuple (\mathbf{x}_{k}^{(i)},w^{(i)}_{k}), where \mathbf{x}_{k}^{(i)} is the state sample, and w^{(i)}_{k} is the importance weight. The set of particles approximates the posterior distribution as follows

(5)   \begin{align*}p(\mathbf{x}_{k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) \approx \sum_{i=1}^{N}w^{(i)}_{k} \delta(\mathbf{x}_{k}- \mathbf{x}_{k}^{(i)})\end{align*}

where \delta(\cdot) is the Dirac delta function that is shifted and centered at \mathbf{x}_{k}^{(i)}.

The particle filter design problem is to design an update rule (update functions) for particles that will update the particles from the time step k-1 to the time step k:

(6)   \begin{align*}& w^{(i)}_{k-1} \rightarrow w^{(i)}_{k} \\& \mathbf{x}_{k-1}^{(i)} \rightarrow \mathbf{x}_{k}^{(i)} \\& i=1,2,3,\ldots, N\end{align*}

where the update rule should incorporate the observed output \mathbf{y}_{k}, control input \mathbf{u}_{k-1}, state-space model (1), state transition probability, and measurement probability (see the first tutorial part for the precise definitions of these probabilities).

Once we have the approximation of the posterior distribution (5), we can approximate the mean and variance of the estimate, as well as any statistics, such as higher-order moments or confidence intervals.

Basic Results from Statistics and Probability Necessary to Know For Deriving the Particle Filter

To develop the particle filter algorithm, it is important to know the basic rules of probability and the Bayes’ theorem. Here, we briefly revise the main results from the probability theory you need to know in order to understand the derivation of the particle filter algorithm. Let p(\mathbf{z}_{1}|\mathbf{z}_{2}) be an arbitrary conditional probability density function, and let \mathbf{z}_{1}, and \mathbf{z}_{2} be two vectors. Then, we have

(7)   \begin{align*}p(\mathbf{z}_{1}|\mathbf{z}_{2})=\frac{p(\mathbf{z}_{1},\mathbf{z}_{2})}{p(\mathbf{z}_{2})}  \end{align*}

or

(8)   \begin{align*}p(\mathbf{z}_{1},\mathbf{z}_{2})=p(\mathbf{z}_{1}|\mathbf{z}_{2})p(\mathbf{z}_{2}) \end{align*}

On the other hand, we also have

(9)   \begin{align*}p(\mathbf{z}_{1},\mathbf{z}_{2})=p(\mathbf{z}_{2}|\mathbf{z}_{1})p(\mathbf{z}_{1})\end{align*}

By combining (8) and (9), we have

(10)   \begin{align*}p(\mathbf{z}_{1}|\mathbf{z}_{2})p(\mathbf{z}_{2})=p(\mathbf{z}_{2}|\mathbf{z}_{1})p(\mathbf{z}_{1})\end{align*}

From the last equation, we obtain

(11)   \begin{align*}p(\mathbf{z}_{1}|\mathbf{z}_{2})=\frac{p(\mathbf{z}_{2}|\mathbf{z}_{1})p(\mathbf{z}_{1})}{p(\mathbf{z}_{2})}\end{align*}

The equation represents the mathematical formulation of Bayes’ theorem or Bayes’ rule. Although it looks simple, this is one of the most important and fundamental results in statistics and probability.

In the context of the Bayesian estimation, we want to infer statistics or an estimate of \mathbf{z}_{1} from the observation \mathbf{z}_{2}. The density p(\mathbf{z}_{1}|\mathbf{z}_{2}) is called the posterior and we want to compute this density. The density p(\mathbf{z}_{2}|\mathbf{z}_{1}) is called the likelihood. The density p(\mathbf{z}_{1}) is called the prior density, and the density p(\mathbf{z}_{2}) is called the marginal density. Usually, the marginal density is not important and it serves as a normalizing factor. Consequently, the Bayes’ rule can be written as

(12)   \begin{align*}p(\mathbf{z}_{1}|\mathbf{z}_{2})=\gamma p(\mathbf{z}_{2}|\mathbf{z}_{1})p(\mathbf{z}_{1})\end{align*}

where \gamma is the normalizing constant. The Bayes rule can be also expanded to the cases of three or more vectors. For example, if we would have three vectors \mathbf{z}_{1}, \mathbf{z}_{2}, and \mathbf{z}_{3}, the Bayes’ rule takes this form

(13)   \begin{align*}p(\mathbf{z}_{1}|\mathbf{z}_{2},\mathbf{z}_{3})=\frac{p(\mathbf{z}_{2}|\mathbf{z}_{1},\mathbf{z}_{3})p(\mathbf{z}_{1}|\mathbf{z}_{3})}{p(\mathbf{z}_{2}|\mathbf{z}_{3})}\end{align*}

or in the normalized form

(14)   \begin{align*}p(\mathbf{z}_{1}|\mathbf{z}_{2},\mathbf{z}_{3})=\gamma p(\mathbf{z}_{2}|\mathbf{z}_{1},\mathbf{z}_{3})p(\mathbf{z}_{1}|\mathbf{z}_{3})\end{align*}

Derivation of Particle Filter Algorithm

After revising the basics of the Bayes’ rule, let us now go back to the derivation of the particle filter. We start from the probability density function that takes into account all the states starting from \mathbf{x}_{0} and ending at \mathbf{x}_{k}:

(15)   \begin{align*}p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})\end{align*}

where \mathbf{x}_{0:k}, \mathbf{y}_{0:k}, and \mathbf{u}_{0:k-1} are sets defined as follows

(16)   \begin{align*}\mathbf{x}_{0:k}=\{ \mathbf{x}_{0},\mathbf{x}_{1},\ldots, \mathbf{x}_{k} \} \\\mathbf{y}_{0:k}=\{ \mathbf{y}_{0},\mathbf{y}_{1},\ldots, \mathbf{y}_{k} \} \\\mathbf{u}_{0:k-1}=\{ \mathbf{u}_{0},\mathbf{u}_{1},\ldots, \mathbf{u}_{k-1} \}\end{align*}

This density takes into account the complete state-sequence \mathbf{x}_{0},\mathbf{x}_{1},\mathbf{x}_{2},\ldots, \mathbf{x}_{k}. That is, the complete state sequence is \mathbf{x}_{0},\mathbf{x}_{1},\mathbf{x}_{2},\ldots, \mathbf{x}_{k} is seen as a random “variable” or “augmented random vector” that we want to estimate. By knowing the mathematical form of this density, we can easily derive the posterior distribution (5). The density (15) is often called the smoothing posterior density. The particle filter is derived by first considering the expectation involving the smoothing density

(17)   \begin{align*}E[\boldsymbol{\beta}(\mathbf{x}_{0:k})|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}] = \int_{\mathbf{x}_{0:k}\in \tilde{S}} \boldsymbol{\beta}(\mathbf{x}_{0:k}) p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})\textrm{d}\mathbf{x}_{0:k} \end{align*}

where \boldsymbol{\beta}(\mathbf{x}_{0:k}) is a nonlinear function of the state sequence, E[\cdot] is the expectation operator, and \tilde{S} is the state support (domain) of the smoothed posterior density.

This equation is the expectation of the (vector) function of the state sequence \mathbf{x}_{0:k} computed under the assumption that the state sequence is distributed according to the posterior smoothed density p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}). Note here that \mathbf{x}_{0:k} is defined as a set in (16). However, we are also using \mathbf{x}_{0:k} as a variable. This is a slight abuse of notation that is necessary in order to ease the notation complexity in this tutorial and to make the tutorial as clear as possible. You can think of \mathbf{x}_{0:k} as a vector that is formed by stacking (augmenting) the states \mathbf{x}_{0},\mathbf{x}_{1},\mathbf{x}_{2},\ldots, \mathbf{x}_{k} on top of each other.

The equation (17) is the most important equation for us. The main purpose of the estimation process is to compute an approximation of the smoothed posterior density that will enable us to compute this integral. Namely, if we would know the smoothed posterior density, then we would be able at least in theory (keep in mind that solving an expectation integral can sometimes be a very difficult task) to approximate any statistics of the smoothed posterior state estimate. For example, if we want to compute the mean of the estimated state sequence \mathbf{x}_{0:k} , then \boldsymbol{\beta}(\mathbf{x}_{0:k})=\mathbf{x}_{0:k}.

However, we do not know the smoothed posterior density p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}), and we cannot compute or approximate this expectation integral.

Here comes the first remedy for this problem (we will need several remedies). The remedy is to use the importance sampling method. In our previous tutorial, given here, we explained the basics of the importance sampling method. The main idea of the importance sampling is to select a new density or a function denoted by q(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) (called as the proposal density), and to approximate the integral (17) as follows

(18)   \begin{align*}E[\boldsymbol{\beta}(\mathbf{x}_{0:k})|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}] \approx \sum_{i=1}^{N}\tilde{w}_{k}^{(i)}  \boldsymbol{\beta}(\mathbf{x}_{0:k}^{(i)}) \end{align*}

where \mathbf{x}_{0:k}^{(i)} is a sample of the state sequence \mathbf{x}_{0:k} sampled from the smoothed proposal (importance) density q(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}), and the weight is defined by

(19)   \begin{align*}\tilde{w}_{k}^{(i)}=\frac{p(\mathbf{x}_{0:k}^{(i)}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})}{q(\mathbf{x}_{0:k}^{(i)}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})} \end{align*}

the general form of the weight is

(20)   \begin{align*}\tilde{w}_{k}=\frac{p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})}{q(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})}\end{align*}

What did we achieve with this strategy? First of all, notice that we did not use the unknown smoothed posterior distribution to generate the state sequence samples \mathbf{x}_{0:k}^{(i)}. We generated the state sequence samples \mathbf{x}_{0:k}^{(i)} from the smoothed proposal distribution with the density of q(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}). We can generate these samples since we have complete freedom to select the proposal density. So one part of the issue is resolved.

However, the unknown smoothed posterior is used to compute the weights \tilde{w}_{k}^{(i)}. If we can somehow compute the weights without explicitly using the smoothed posterior distribution p(\mathbf{x}_{0:k}^{(i)}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}), then, we can approximate the original expectation without using the smoothed posterior distribution. From this fact, we conclude that the key to the computation of the expectation is to compute the weight \tilde{w}_{k}^{(i)} without explicit knowledge of the smoothed posterior density. In the sequel, we show that this is indeed possible.

To summarize: the main idea of the particle filter derivation is to derive the recursive equation for the weight \tilde{w}_{k}. From the recursive equation for the weight \tilde{w}_{k} of the smoothed posterior density, we will be able to derive the recursive equation for the importance weight w_{k} of the original posterior density p(\mathbf{x}_{k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}).

Let us now start with the derivation of the recursive relation for \tilde{w}_{k}. We can formally write the smoothing density as follows

(21)   \begin{align*}p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})=p(\mathbf{x}_{0:k}|\mathbf{y}_{k},\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})\end{align*}

Then, by using Bayes’ rule, we have

(22)   \begin{align*}p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) & =p(\mathbf{x}_{0:k}|\mathbf{y}_{k},\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1}) \\& = \gamma \cdot p(\mathbf{y}_{k}|\mathbf{x}_{0:k},\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})\end{align*}

where \gamma is the normalization factor explained previously. Next, since the state-space model is a Markov state-space model, we have

(23)   \begin{align*}p(\mathbf{y}_{k}|\mathbf{x}_{0:k},\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})=p(\mathbf{y}_{k}|\mathbf{x}_{k})\end{align*}

This is the direct consequence of the form of the output equation of the state-space model (1). The distribution of the output only depends on the state and measurement noise at the time step k. The density p(\mathbf{y}_{k}|\mathbf{x}_{k}) is the measurement probability density function explained in the first tutorial part.

By substituting (23) in (22), we obtain

(24)   \begin{align*}p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) & = \gamma \cdot p(\mathbf{y}_{k}|\mathbf{x}_{k})p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})\end{align*}

Next, we can write

(25)   \begin{align*}p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) & = \gamma \cdot p(\mathbf{y}_{k}|\mathbf{x}_{k})p(\mathbf{x}_{k},\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})\end{align*}

Next, we can use (8) to write the density p(\mathbf{x}_{k},\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1}) as follows

(26)   \begin{align*}p(\mathbf{x}_{k},\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})=p(\mathbf{x}_{k}|\mathbf{x}_{0:k-1},\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})p(\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})\end{align*}

Next, since the state-space model is Markov, the probability density p(\mathbf{x}_{k}|\mathbf{x}_{0:k-1},\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1}) is

(27)   \begin{align*}p(\mathbf{x}_{k}|\mathbf{x}_{0:k-1},\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})=p(\mathbf{x}_{k}|\mathbf{x}_{k-1},\mathbf{u}_{k-1})\end{align*}

This is because of the structure of the state equation of (1). The probability density function of the state at the time instant k depends only on the state and input at the time step k-1. The density p(\mathbf{x}_{k}|\mathbf{x}_{k-1},\mathbf{u}_{k-1}) in (27) is the state transition density of the state-space model introduced in the first tutorial part.

On the other hand, the density p(\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1}) should not depend on the input \mathbf{u}_{k-1}. This is because from the state equation of (1) it follows that the state at the time instant k-1 depends on \mathbf{u}_{k-2} and not on \mathbf{u}_{k-1}. That is, we have

(28)   \begin{align*}p(\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})=p(\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-2})\end{align*}

By substituting (27) and (28) in (26), we obtain

(29)   \begin{align*}p(\mathbf{x}_{k},\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-1})=p(\mathbf{x}_{k}|\mathbf{x}_{k-1},\mathbf{u}_{k-1})p(\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-2})\end{align*}

By substituting (29) in (25), we obtain

(30)   \begin{align*}p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) & = \gamma \cdot p(\mathbf{y}_{k}|\mathbf{x}_{k}) p(\mathbf{x}_{k}|\mathbf{x}_{k-1},\mathbf{u}_{k-1})p(\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-2})\end{align*}

The expression (30) is important since it establishes a recursive relation for the probability density p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}). Namely, we can observe that the density p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}), depends on its previous value at the time step k-1, measurement density, and the state transition density. This recursive relation is very important for deriving the recursive expression for the importance weight w_{k}.

Next, we need to derive a recursive equation for the smoothed proposal density (also known as smoothed importance density). Again, we start from the general form of the proposal density that takes into account the complete state sequence \mathbf{x}_{0},\mathbf{x}_{1},\mathbf{x}_{2},\ldots, \mathbf{x}_{k}:

(31)   \begin{align*}q(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})\end{align*}

This density can be written as

(32)   \begin{align*}q(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) = q(\mathbf{x}_{k},\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) \end{align*}

Next, by using (8), we have

(33)   \begin{align*}q(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) &  = q(\mathbf{x}_{k},\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})  \\& =q(\mathbf{x}_{k}| \mathbf{x}_{0:k-1}, \mathbf{y}_{0:k},\mathbf{u}_{0:k-1})q(\mathbf{x}_{0:k-1}| \mathbf{y}_{0:k},\mathbf{u}_{0:k-1})\end{align*}

Since the proposal density is not strictly related to our dynamical system, the Markov property does not hold by default. However, we can always make an assumption that similar to the Markov property. Consequently, we assume

(34)   \begin{align*}q(\mathbf{x}_{k}| \mathbf{x}_{0:k-1}, \mathbf{y}_{0:k},\mathbf{u}_{0:k-1})=q(\mathbf{x}_{k}| \mathbf{x}_{k-1}, \mathbf{u}_{k-1})\end{align*}

Also, the conditional density q(\mathbf{x}_{0:k-1}| \mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) does not depend on the future output \mathbf{y}_{k}. Also, this density does not depend on the input \mathbf{u}_{k-1}. This is because from the state equation of the model (1) it follows that the last state in the sequence \mathbf{x}_{0:k-1}, that is the state \mathbf{x}_{k-1}, does not depend on \mathbf{u}_{k-1}. Instead, the state \mathbf{x}_{k-1} depends on \mathbf{u}_{k-2}. Consequently, we have that the conditional density has the following form

(35)   \begin{align*}q(\mathbf{x}_{0:k-1}| \mathbf{y}_{0:k},\mathbf{u}_{0:k-1})=q(\mathbf{x}_{0:k-1}| \mathbf{y}_{0:k-1},\mathbf{u}_{0:k-2}) \end{align*}

By substituting (34) and (35) in (33), we obtain the final equation for the batch importance density:

(36)   \begin{align*}q(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})=q(\mathbf{x}_{k}| \mathbf{x}_{k-1}, \mathbf{u}_{k-1})q(\mathbf{x}_{0:k-1}| \mathbf{y}_{0:k-1},\mathbf{u}_{0:k-2})\end{align*}

We see that the smoothed proposal density q(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) depends on its value at the time step k-1 and q(\mathbf{x}_{k}| \mathbf{x}_{k-1}, \mathbf{u}_{k-1}). This important conclusion and derivation enables us to develop a recursive relation for the weight \tilde{w}_{k}, and consequently, for the weight w_{k}.

Now, we can substitute the recursive relations given by (30) and (36) into the equation for the importance weight (20) of the smoothed posterior density

(37)   \begin{align*}\tilde{w}_{k} & =\frac{p(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})}{q(\mathbf{x}_{0:k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1})} \\& = \frac{\gamma \cdot p(\mathbf{y}_{k}|\mathbf{x}_{k}) p(\mathbf{x}_{k}|\mathbf{x}_{k-1},\mathbf{u}_{k-1})p(\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-2})}{q(\mathbf{x}_{k}| \mathbf{x}_{k-1}, \mathbf{u}_{k-1})q(\mathbf{x}_{0:k-1}| \mathbf{y}_{0:k-1},\mathbf{u}_{0:k-2})} \\& = \frac{\gamma \cdot p(\mathbf{y}_{k}|\mathbf{x}_{k}) p(\mathbf{x}_{k}|\mathbf{x}_{k-1},\mathbf{u}_{k-1})}{q(\mathbf{x}_{k}| \mathbf{x}_{k-1}, \mathbf{u}_{k-1})}\cdot \frac{p(\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-2})}{q(\mathbf{x}_{0:k-1}| \mathbf{y}_{0:k-1},\mathbf{u}_{0:k-2})}\end{align*}

The last term in the last equation is actually the value of the weight of the smoothed posterior density at the time step k-1:

(38)   \begin{align*}\tilde{w}_{k-1} = \frac{p(\mathbf{x}_{0:k-1}|\mathbf{y}_{0:k-1},\mathbf{u}_{0:k-2})}{q(\mathbf{x}_{0:k-1}| \mathbf{y}_{0:k-1},\mathbf{u}_{0:k-2})}\end{align*}

Consequently, we can write (37) as follows

(39)   \begin{align*}\tilde{w}_{k} & = \frac{\gamma \cdot p(\mathbf{y}_{k}|\mathbf{x}_{k}) p(\mathbf{x}_{k}|\mathbf{x}_{k-1},\mathbf{u}_{k-1})}{q(\mathbf{x}_{k}| \mathbf{x}_{k-1}, \mathbf{u}_{k-1})}\cdot \tilde{w}_{k-1} \end{align*}

Here, it is important to observe the following:

We can compute the weight \tilde{w}_{k} without knowing the full state trajectory \mathbf{x}_{0:k-1} or the previous input trajectory \mathbf{u}_{0:k-2}. The information that is necessary to compute the weight is stored in \mathbf{y}_{k}, \mathbf{x}_{k-1}, \mathbf{u}_{k-1}, and \tilde{w}_{k-1}. Due to this, from (39), it follows that the importance weight w_{k} of the posterior density p(\mathbf{x}_{k}|\mathbf{y}_{0:k},\mathbf{u}_{0:k-1}) satisfies the following equation


(40)   \begin{align*}w_{k} \propto  \frac{ p(\mathbf{y}_{k}|\mathbf{x}_{k}) p(\mathbf{x}_{k}|\mathbf{x}_{k-1},\mathbf{u}_{k-1})}{q(\mathbf{x}_{k}| \mathbf{x}_{k-1}, \mathbf{u}_{k-1})}\cdot w_{k-1} \end{align*}

where \propto is the proportionality symbol that is often used in probability and statistics to denote that one distribution or a quantity is proportional to another one. Here, we used the proportionality symbol instead of equality. This is done since we need to introduce a normalization factor that will ensure that all the weights w_{k}^{(i)}, i=1,2,3,\ldots, N sum up to 1. In practice, we first compute the right-hand side (40)

(41)   \begin{align*}\frac{ p(\mathbf{y}_{k}|\mathbf{x}_{k}^{(i)}) p(\mathbf{x}_{k}^{(i)}|\mathbf{x}_{k-1}^{(i)},\mathbf{u}_{k-1})}{q(\mathbf{x}_{k}^{(i)}| \mathbf{x}_{k-1}^{(i)}, \mathbf{u}_{k-1})}\cdot w_{k-1}^{(i)}, \;\; i=1,2,3,\ldots, N \end{align*}

and we normalize these computed quantities to obtain w_{k}^{(i)}. This will be explained in more detail in the sequel.

Now that we know how the weights should be computed in the recursive manner, we need to address the following problem.

We can observe that in order to compute the weight (40), we need to know the density (function) q(\mathbf{x}_{k}| \mathbf{x}_{k-1}, \mathbf{u}_{k-1}). How to select this function?

The idea is to select this function such that

  1. We can easily draw samples from the distribution corresponding to this density.
  2. The expression for the weights (40) simplifies.

One of the simplest possible particle filters, called the bootstrap particle filter or the Sequential Importance Resampling (SIR) filter selects this function as the state transition density

(42)   \begin{align*}q(\mathbf{x}_{k}| \mathbf{x}_{k-1}, \mathbf{u}_{k-1}) = p(\mathbf{x}_{k}|\mathbf{x}_{k-1},\mathbf{u}_{k-1})\end{align*}

By substituting (42) in (40), we obtain

(43)   \begin{align*}w_{k} \propto  p(\mathbf{y}_{k}|\mathbf{x}_{k}) \cdot w_{k-1} \end{align*}

This implies that for the sample \mathbf{x}_{k}^{(i)}, the weight w_{k}^{(i)} is computed as follows

(44)   \begin{align*}w_{k}^{(i)} \propto  p(\mathbf{y}_{k}|\mathbf{x}_{k}^{(i)}) \cdot w_{k-1}^{(i)} \end{align*}

That is, the weight is computed by using the previous value of the weight and the value of the measurement density at the sample \mathbf{x}_{k}^{(i)} (also, the weight should be normalized). Here, it should be kept in mind that samples \mathbf{x}_{k}^{(i)} should be sampled from the distribution with the density q(\mathbf{x}_{k}| \mathbf{x}_{k-1}, \mathbf{u}_{k-1}) = p(\mathbf{x}_{k}|\mathbf{x}_{k-1},\mathbf{u}_{k-1}). That is, the samples are selected from the state transition probability (state transition distribution). The computed weights are used to approximate the posterior distribution given by the equation (5).

Now, we are ready to formulate the particle filter algorithm.

Particle Filter Algorithm

First, we state the algorithm, and then, we provide detailed comments about every step of the algorithm.

STATMENT OF THE SIR (BOOTSTRAP) PARTICLE FILTER ALGORITHM: For the initial set of particles

(45)   \begin{align*}\{(\mathbf{x}_{0}^{(i)},w_{0}^{(i)})| i=1,2,3,\ldots, N \}\end{align*}

Perform the following steps for k=1,2,3,4,\ldots

  • STEP 1: Using the state samples \mathbf{x}_{k-1}^{(i)}, i=1,2,\ldots, N, computed at the previous step k-1, at the time step k generate N state samples \mathbf{x}_{k}^{(i)}, i=1,2,\ldots, N, from the state transition probability with the density of p(\mathbf{x}_{k}|\mathbf{x}_{k-1}^{(i)},\mathbf{u}_{k-1}). That is,

    (46)   \begin{align*}\mathbf{x}_{k}^{(i)} \sim p(\mathbf{x}_{k}|\mathbf{x}_{k-1}^{(i)},\mathbf{u}_{k-1})\end{align*}

  • STEP 2: By using the observed output \mathbf{y}_{k}, and the weight w_{k-1}^{(i)} computed at the previous time step k-1, calculate the intermediate weight update, denoted by \hat{w}_{k}^{(i)}, by using the density function of the measurement probability

    (47)   \begin{align*}\hat{w}_{k}^{(i)}=p(\mathbf{y}_{k}|\mathbf{x}_{k}^{(i)}) \cdot w_{k-1}^{(i)}, \;\; i=1,2,3,\ldots, N \end{align*}



    After all the intermediate weights \hat{w}_{k}^{(i)}, i=1,2,3,\ldots, N are computed, compute the normalized weights w_{k}^{(i)} as follows

    (48)   \begin{align*}w_{k}^{(i)}=\frac{1}{\bar{w}} \hat{w}_{k}^{(i)}, \\\bar{w}=\sum_{i=1}^{N} \hat{w}_{k}^{(i)}\end{align*}



    After STEP 1 and STEP 2, we computed the set of particles for the time step k

    (49)   \begin{align*}\{(\mathbf{x}_{k}^{(i)},w_{k}^{(i)})| i=1,2,3,\ldots, N \}\end{align*}


  • STEP 3 (RESAMPLING): This is a resampling step that is performed only if the condition given below is satisfied. Calculate the constant N_{eff}

    (50)   \begin{align*}N_{eff}=\frac{1}{\sum_{i=1}^{N} (w_{k}^{(i)})^{2}}\end{align*}



    If N_{eff}<N/3, then generate a new set of N particles from the computed set of particles

    (51)   \begin{align*}\{(\mathbf{x}_{k}^{(i)},w_{k}^{(i)})| i=1,2,3,\ldots, N \}^{\text{NEW}}=\text{RESAMPLE}\Big({\{(\mathbf{x}_{k}^{(i)},w_{k}^{(i)})| i=1,2,3,\ldots, N \}}^{\text{OLD}}\Big)\end{align*}


    by using a resampling method (see the comments below). In (51), the “OLD” set of particles is the set of particles computed after STEP 1 and STEP 2, that is, the set given in (49).

Algorithm Explanation: We initialize the algorithm with an initial set of particles. This set of particles can be obtained by generating initial state samples from some distribution that covers the expected range of initial states and computing the weights. For example, we can assume a uniform distribution and we can assign equal weights to all the points generated from a uniform distribution.

In step 1 of the algorithm, for the given value of the state sample \mathbf{x}_{k-1}^{(i)} that is computed at the previous time k-1 step of the algorithm, and for the given input value \mathbf{u}_{k-1}, we generate a state sample \mathbf{x}_{k}^{(i)} from the state transition distribution. As explained in our previous tutorial, this step consists of generating a sample of a process noise vector in (1), and by computing \mathbf{x}_{k}^{(i)} by using the state equation of the model (1).

In step 2 of the algorithm, we first compute intermediate (not-normalized) weights \hat{w}_{k}^{(i)}. To do that, we use the weights w_{k-1}^{(i)} from the previous time step k-1 of the algorithm, and the value of the measurement density that is computed for the current output measurement \mathbf{y}_{k} and the state sample \mathbf{x}_{k}^{(i)} computed at step 1 of the algorithm. After we compute all the intermediate weights, we simply normalize them to compute the weights w_{k}^{(i)}.

Step 3 of the algorithm is performed only if the condition stated above is satisfied. If the condition is satisfied, we need to resample our particles. Here is the reason for resampling. It often happens that after several iterations of the particle filter algorithm, we will have one or only a few particles with non-negligible weights. This means that most of the weights of particles will have a very small value. Effectively, this means that the weights of these particles are equal to zero. This is the so-called degeneracy problem in particle filtering. Since the time propagation of every particle consumes computational resources, we waste computational resources by propagating particles with very small weights. One of the most effective approaches to deal with the degeneracy problem is to resample particles. This means that we generate a new set of N particles from the original set of particles. We will thoroughly explain two resampling approaches in the third part of this tutorial. One of the approaches for selecting particles from the original particle set (called “OLD” in the algorithm) is based on selecting N state samples with replacements from the original set. The probability of selecting the sample x_{k}^{(i)} is proportional to its weight w_{k}^{(i)} in the “OLD” particle set. That is, we draw samples from the probability whose density is actually an approximation of the posterior density (5) at the time step k. After, we draw samples, we set their weights to 1/N. That is, we reset the weights. Since this type of resampling can have multiple instances of the same state sample in the new set, when we add their weights, we can actually obtain the final weight of the given state sample for the approximation of the posterior distribution. However, this is only done after all the time steps (time iterations) of the particle filter are completed, and when we finally want to extract an approximation of the posterior distribution. More about this in the third tutorial part.