November 24, 2024

Correct and Intuitive Explanation of Observability of Linear Dynamical Systems


In this control engineering and control theory tutorial, we address the following problems and questions related to observability of linear dynamical systems

  1. What is observability of linear dynamical systems?
  2. How to properly and intuitively understand the concept of observability of dynamical systems?
  3. How to test the observability of dynamical systems?
  4. What is the connection between observability and conditions for the unique solution of systems of linear equations?

This tutorial is split into two parts. In the first part, we derive and thoroughly explain the theoretical conditions, and in the second part, we explain how to test observability of a system in Python.

Here is the motivation for creating the observability tutorial. The observability of dynamical systems is one of the most fundamental concepts in control engineering that is often not properly explained in control engineering classes. For example, it often happens that students/engineers immediately start designing observers/Kalman filters/particle filters without even testing observability. When the filter is not working, they start to scratch their heads and waste a significant amount of time checking the written code and trying to find a bug. In fact, you cannot uniquely estimate the system state if the system is not observable. This means that you cannot design an estimator for an unobservable system. Or you can, but only for the observable part of the system state (there is something called a triangular decomposition that reveals the observable subspace).

Then, another pitfall is that people often blindly memorize the observability rank condition without understanding the physical meaning of this condition and without truly understanding the observability concept. In fact, observability is nothing less than a fancy name invented by control engineers in the ’60s for the existence and uniqueness of the solution of a linear system of equations that appear when we “lift” the system dynamics over time. In fact, most of linear control theory is nothing less than applied linear algebra.

Observability analysis is also important for sensor placement problems, in which we are trying to maximize our estimation performance by placing the sensors at the proper locations in the system. In the tutorial presented below, we thoroughly and clearly explain the concept of observability and how to test observability by testing the existence and uniqueness of the solution of a linear system of equations.

The YouTube tutorials accompanying this webpage tutorial are given below.

Before we start with explanations, let us motivate the problem with the following example shown in the figure below.

This is a system composed of two masses connected by springs and dampers. This is a lumped model of a number of physical systems that exhibit oscillatory behavior. The state variables are

  • x_{1}=s_{1} – position of the first mass.
  • x_{2}=\dot{s}_{1} – velocity of the first mass.
  • x_{3}=s_{2} – position of the second mass.
  • x_{4}=\dot{s}_{2} – velocity of the second mass.

The state vector is

(1)   \begin{align*}\mathbf{x}=\begin{bmatrix}x_{1}  \\ x_{2} \\ x_{3}  \\ x_{4}  \end{bmatrix}\end{align*}

For more details about state-space modeling of the double mass spring damper system, see this tutorial.

Sensors are usually expensive and in addition, they complicate the mechatronics design of the system. We would like to minimize the number of sensors, while at the same time, achieving the control objective. On the other hand, ideally, we would like to observe all the state variables such that we can use them in a state feedback controller. The observability analysis of this system can provide us with the answers to the following and similar questions:

  • Can we reconstruct the complete state vector by only observing the position state variable x_{1}? That is, can we compute the velocity of the first mass, as well as the position and velocity of the second mass by only observing the position of the first mass?
  • If not, then, how many entries of the state vector do we need to directly observe in order to reconstruct the complete state vector?
  • How many data samples of the observed variable do we need to have to completely reconstruct the state?

The easiest approach for understanding observability is to start from a discrete-time state-space model given by the following equation

(2)   \begin{align*}\mathbf{x}_{k+1}=A\mathbf{x}_{k} \\\mathbf{y}_{k}=C\mathbf{x}_{k}\end{align*}

where A\in \mathbb{R}^{n} and C\in \mathbb{R}^{r} are state and output system matrices, \mathbf{x}_{k}\in \mathbb{R}^{n} is state, \mathbf{y}_{k}\in \mathbb{R}^{r} is the output, and k is a discrete-time instant.

In a practical scenario, the vector \mathbf{y}_{k} is a vector of quantities that are observed by sensors, and the state is either a physical state (for example involving mechanical quantities) or the state is linearly related to the physical state through some matrix which is usually known as a similarity transformation.

The first question is: Why is it easier to understand the observability concept by considering discrete-time systems?

This is because it is easier to propagate the dynamics and equations of discrete-time systems over time. Namely, we can establish recursive relationships connecting different quantities at different time steps. We just perform back-substitution to relate quantities at different time intervals. This will become completely clear by the end of this tutorial. Here, we will briefly illustrate this. For example, by starting from the time instant 0 and by recursively substituting states by using the state-space model (2), we obtain

(3)   \begin{align*}\mathbf{x}_{1}& =A\mathbf{x}_{0} \\\mathbf{x}_{2}& =A\mathbf{x}_{1}=A^{2}\mathbf{x}_{0}  \\\mathbf{x}_{3}& =A\mathbf{x}_{2}=A^{3}\mathbf{x}_{0} \\\vdots \\\mathbf{x}_{l}& =A\mathbf{x}_{l-1}=A^{l}\mathbf{x}_{0}\end{align*}

where l is a positive integer. For linear continuous-time systems, it is more challenging to establish such relationships since we need to use integrals and differential calculus. For example, a continuous-time equivalent to the last equation in (3) is

(4)   \begin{align*}\mathbf{x}(t)=e^{A_{c}t}\mathbf{x}(0)\end{align*}

where A_{c} is a continuous-time system matrix of the linear system \dot{\mathbf{x}}=A_{c}\mathbf{x}, and \mathbf{x}(0) is the initial state. The derivation of the equation (4) involves a relatively complex linear algebra and differential vector calculus.

The second question is: Is there a reason why we are not taking into account inputs when analyzing observability?

The answer is that known control inputs can easily be included as known quantities when analyzing observability, and we can subtract their effect from the observed outputs, and the resulting quantity can be used to derive the observability condition which is practically independent of the inputs. That is, the inputs do not affect the observability condition, and when analyzing the system observability, the inputs can be completely neglected.

Let us continue with the explanation of observability. Here is a formal definition of observability that can be found in many control theory books:

Definition of Observability: The state-space model

(5)   \begin{align*}\mathbf{x}_{k+1}=A\mathbf{x}_{k} \\\mathbf{y}_{k}=C\mathbf{x}_{k}\end{align*}

is observable if any initial state of the system, denoted by \mathbf{x}_{0} can be uniquely determined from the set of output observations \{\mathbf{y}_{0},\mathbf{y}_{1}, \ldots, \mathbf{y}_{s} \}, where s is a positive integer.

From this definition, we can conclude that the observability problem is closely related to the problem of estimating the state of the system from a sequence of output measurements. Secondly, we can observe that in the definition of observability, there is a parameter s. This parameter s is the length of the observation horizon. Its length generally depends on the number of observed outputs of the system, the structure of the system matrices A and C, and on the state dimension. There is a common misconception among control engineers that s needs to be at least equal to n-1 (n is state dimension of the system) to make the system observable. To show this, let us consider the following example

Example 1: Is the following system observable

(6)   \begin{align*}\mathbf{x}_{k+1}& =A \mathbf{x}_{k}  \\\underbrace{\begin{bmatrix} \mathbf{y}_{1,k} \\ \mathbf{y}_{2,k} \end{bmatrix}}_{\mathbf{y}_{k}}& =\underbrace{\begin{bmatrix}k_{1} & 0 \\ 0 & k_{2}   \end{bmatrix}}_{C} \underbrace{\begin{bmatrix}  \mathbf{x}_{1,k} \\ \mathbf{x}_{2,k}  \end{bmatrix}}_{ \mathbf{x}_{k}} \end{align*}

where k_{1} and k_{2} are non-zero constants. Here, we directly observe scaled state variables since the matrix C is a square 2 by 2 matrix. In addition, due to the fact that the constants k_{1} and k_{2} are non-zero, the matrix C is invertible. From (6), we have

(7)   \begin{align*}\mathbf{y}_{0}=C\mathbf{x}_{0}\end{align*}

This is a system of two equations with two unknowns (the entries of the initial state vector \mathbf{x}_{0}). Since the matrix C is invertible, there is a unique solution given by the following equation

(8)   \begin{align*}\mathbf{x}_{0}=C^{-1}\mathbf{y}_{0}\end{align*}

This example shows that we can actually uniquely estimate the initial state of the system by using only a single initial measurement of the system output. This is the direct consequence of the fact we are directly observing all (linearly scaled) state variables.

Let us now form the so-called lifted state-space description. The description is called “lifted” since we lift the state-space model over time to obtain a batch vector consisting of output samples. From (2) we have

(9)   \begin{align*}\mathbf{y}_{0}= & C\mathbf{x}_{0} \\\mathbf{y}_{1}=& C\mathbf{x}_{1}=CA\mathbf{x}_{0} \\\mathbf{y}_{2}=& C\mathbf{x}_{2}=CA\mathbf{x}_{1}= CA^{2}\mathbf{x}_{0} \\\vdots \\\mathbf{y}_{s-1}=& C\mathbf{x}_{s-1}=CA\mathbf{x}_{s-2}=\ldots =  CA^{s-1}\mathbf{x}_{0} \end{align*}

These equations are obtained by back substitution of the state equation of (2). Namely, from the state equation, we have

(10)   \begin{align*}\mathbf{x}_{1}& =A\mathbf{x}_{0}  \\\mathbf{x}_{2}& =A\mathbf{x}_{1}= A^{2}\mathbf{x}_{0}\\\vdots \\\mathbf{x}_{s-1}& =A\mathbf{x}_{s-2}=A^{2}\mathbf{x}_{s-3}=\ldots = A^{s-1}\mathbf{x}_{0}\end{align*}

The equation (9) can be written in the vector form

(11)   \begin{align*}\begin{bmatrix} \mathbf{y}_{0} \\ \mathbf{y}_{1}  \\ \mathbf{y}_{2} \\ \vdots \\ \mathbf{y}_{s-1} \end{bmatrix}=\begin{bmatrix} C \\ CA \\ CA^{2} \\ \vdots \\ CA^{s-1} \end{bmatrix}\mathbf{x}_{0}\end{align*}

Or more compactly

(12)   \begin{align*}\mathbf{y}_{0,s-1}=O_{s-1}\mathbf{x}_{0}\end{align*}

where

(13)   \begin{align*}\mathbf{y}_{0,s-1}=\begin{bmatrix}  \mathbf{y}_{0} \\ \mathbf{y}_{1}  \\ \mathbf{y}_{2} \\ \vdots \\ \mathbf{y}_{s-1} \end{bmatrix}, \;\; O_{s-1}=\begin{bmatrix}  C \\ CA \\ CA^{2} \\ \vdots \\ CA^{s-1}  \end{bmatrix}\end{align*}

Here we need to explain the following

  • The matrix O_{s-1}\in \mathbb{R}^{s(r)\times n} is the s-1 step observability matrix.
  • The vector \mathbf{y}_{0,s-1} is the s-1 step lifted output vector.

The equation (12) represents a system of equations with the unknowns that are the entries of the vector \mathbf{x}_{0}. From the derivations presented above, we can observe that the observability problem is equivalent to the problem of the existence and uniqueness of the system of equations (12).

We need to determine the conditions that will guarantee that this system of equations will have a unique solution.

In order to find the correct value for s, let us analyze the dimensions of the system (12). This system has to have at least as many equations as the number of unknowns. We have in total s\cdot r equations and n unknowns. We can also allow for more equations than unknowns. This means that the necessary condition is that s\cdot r \ge n. In the case of Single-Input-Single-Output (SISO) systems, we have that r=1, and consequently, we have that s \ge n. In the case of Multiple-Input-Multiple-Output (MIMO) systems, depending on r, we can allow s< n, as long as the underlying linear system of equations has at least the number of equations equal to the number of unknowns. However, since we want to derive the general condition that will be applicable to both SISO and MIMO systems, we will take s \ge n. But, do we really need to select s such that s>n? The answer is NO. There is a theorem called Cayley-Hamilton Theorem telling us that every matrix satisfies its characteristic polynomial. This means that we can express the powers of A^{i}, where i\ge n as linear combinations of powers of A smaller than n. This implies that any block row of the observability matrix after the block row CA^{s-1} can be expressed as a linear combination of previous rows.

Taking this analysis into account, we select s=n. Then, our system has the following form

(14)   \begin{align*}\mathbf{y}_{0,n-1}=O_{n-1}\mathbf{x}_{0}\end{align*}

where

(15)   \begin{align*}\mathbf{y}_{0,n-1}=\begin{bmatrix}  \mathbf{y}_{0} \\ \mathbf{y}_{1}  \\ \mathbf{y}_{2} \\ \vdots \\ \mathbf{y}_{n-1} \end{bmatrix}, \;\; O_{n-1}=\begin{bmatrix}  C \\ CA \\ CA^{2} \\ \vdots \\ CA^{n-1}  \end{bmatrix}\end{align*}

Now, we have to address the following question:

What is the condition that the matrix O_{n-1} needs to satisfy such that this system has a unique solution?

The following theorem answers our question.

Observability Condition Theorem: The system of equations (14) has a unique solution if and only if

(16)   \begin{align*}\text{rank}\Big( O_{n-1} \Big)= n\end{align*}

That is, the system is observable if and only if the condition (16) is satisfied.

In words, if the matrix O_{n-1} is a full-column rank matrix, then the system is observable. And the other way around, if the system is observable, then the matrix O_{n-1} is a full column rank matrix. Here, we will only prove that the rank condition (16) will guarantee the observability of the system.

Let us assume that the condition (16) is satisfied. Then, by multiplying the equation (14) from left by O^{T}_{n-1} we have

(17)   \begin{align*}O_{n-1}^{T}\mathbf{y}_{0,n-1}=O_{n-1}^{T}O_{n-1}\mathbf{x}_{0}\end{align*}

Since the condition (16) is satisfied, the matrix O_{n-1}^{T}O_{n-1} is invertible, and consequently, the system of equations (17) has a unique solution, and the solution is given by the following equation

(18)   \begin{align*}\mathbf{x}_{0} =\Big(O_{n-1}^{T}O_{n-1} \Big)^{-1} O_{n-1}^{T}\mathbf{y}_{0,n-1}\end{align*}

Here, one more important thing should be mentioned. Consider this least-squares minimization problem

(19)   \begin{align*}\min_{\mathbf{x}_{0}}\left\|\mathbf{y}_{0,n-1}-O_{n-1}\mathbf{x}_{0}  \right\|_{2}^{2}\end{align*}

Under the condition that the matrix O_{n-1} has a full column rank, the solution is given by the following equation:

(20)   \begin{align*}\mathbf{x}_{0} =\Big(O_{n-1}^{T}O_{n-1} \Big)^{-1} O_{n-1}^{T}\mathbf{y}_{0,n-1}\end{align*}

That is precisely the equation (18). From this, we can observe that there close relationship between observability, least-squares solution, and state estimation.