8.7 Determination

Frances Zhu

8. Attitude Determination, Control, and Sensing

8.7 Determination

authored by Dr. Zhu with major contributions from Dr. Manchester

Attitude Determination Dr. Andrew Ketsdever MAE 5595

Determining attitude is the act of taking sensor measurements and calculating the spacecraft’s attitude. As I was foreshadowing before in the sensor section, we need at least two unit vectors (directions) to uniquely determine attitude. These unit vectors or directional vectors can be toward a number of reference points: the sun, the Earth’s horizon, the relative direction of Earth’s magnetic field, or a star. A star tracker can uniquely determine attitude because there are multiple stars in the camera’s field of view, offering at least those two unit vectors. The attitude determination problem is solved with a star tracker that can give us a 3DOF attitude estimate! Unfortunately, if you don’t have a star tracker or if you want to achieve a more accurate or precise attitude estimate, you will need to incorporate multiple sensor measurements together. A sun sensor, Earth sensor, or magnetometer individually can only provide one unit vector, so thus, we need to combine these types of sensor measurements to get a 3DOF attitude estimate.

There are two types of attitude determination algorithms. A lost-in-space or static determination algorithm incorporates instantaneous observations (say a sun vector and magnetic field vector at a snapshot in time) to produce an attitude. Lost-in-space implies that the spacecraft has no memory, opens its eyes or camera shutters for the first time or after a reboot, and needs to produce an initial estimate of attitude. The other type of determination algorithm comes after the initial observation when we’ve accumulated multiple sensor measurements over time. With continuous observations, we can combine past measurements and our knowledge of attitude dynamics to get an ever more precise estimate of the spacecraft’s attitude, achieving precision below the noise level of any sensor! Because of this algorithm’s ability to reject noise, we call these algorithms filters, which fall under the umbrella of determination and estimation.

This section will review common determination algorithms. There will be a LOT of math in the derivation of these algorithms. The intuition is 1) you need observations that span the 3 degrees of freedom and 2) with clever matrix manipulation, geometric relationships, and attitude representation identities, you can find an attitude representation that minimizes the error in rotating observations between an inertial and body frame. If you’re interested in implementing these algorithms, there are many papers and likely code repositories you can dig into outside this text.

TRIAD

TRIAD is one of the first attitude determination algorithms, derived in the early 60’s [Black]. This static algorithm was used for the US transit satellites, a predecessor to GPS. Say we have 2 unit vectors, known in both body and inertial frames, and are linearly independent (cannot be parallel). The goal is to find a rotation matrix from the observations, $\leftindex^N{{v}_1}, \leftindex^N{{v}_2}, \leftindex^B{{v}_1}, \leftindex^B{{v}_2}$ Let’s start with this known relationship:

$\begin{bmatrix}\leftindex^N{{v}_1} &\leftindex^N{{v}_2}\end{bmatrix} = \leftindex^N{R^B} \begin{bmatrix}\leftindex^B{{v}_1} &\leftindex^B{{v}_2} \end{bmatrix}$

We can make a 3rd linearly independent vector by taking a cross-product of our observations:

$\leftindex^N{\begin{bmatrix}v_1 & v_2 & (v_1 \times v_2)\end{bmatrix}} = \leftindex^N{R^B} \leftindex^B{\begin{bmatrix}v_1 & v_2 & (v_1 \times v_2)\end{bmatrix}}$

$\leftindex^N{V} = \leftindex^N{R^B} \leftindex^B{V}$

As long as $v_1$ and $v_2$ are linearly independent, V will be invertible:

$\Longrightarrow \leftindex^N{R^B} = \leftindex^N{V} \leftindex^B{V^{-1}}$

Matrix inverses are computationally intensive so let’s try to avoid that operator. Since V is orthogonal by definition, the inverse is also the transpose. Transposes are much simpler to take. The final, classic TRIAD solution is:

$\Longrightarrow \leftindex^N{R^B} = \leftindex^N{V} \leftindex^B{V^{T}}$

The TRIAD method is very simple to implement but! TRIAD can only handle 2 observations, doesn’t account for dynamics, and doesn’t offer “covariance” or “error bound” information. TRIAD is typically used with a sun sensor and magnetometer.

Diagram depicting traditional triad attitude determination technique with the use of a magnetometer and sun sensor. By Rahul Rughani.

Wahba’s Problem/ Batch Estimation

What happens if we want to use more than 2 observations to improve our estimate? Say we have a batch of measurements and we want to solve the static attitude problem at a single time step with all these measurements. We define a least-squares cost function for attitude, which is a minimization problem to find the rotation that minimizes the squared error between the inertial observation and rotated body observation:

$L=\sum_i w_i \left\Vert \leftindex^N{v_i} - \leftindex^N{R^B} \leftindex^B{v_i} \right\Vert^2_2$

Where $w_i$ are scalar weights. Weights should be inversely proportional to sensor variance to give a maximum likelihood answer.

To find attitude (the rotation matrix), we solve the following optimization problem:

$\displaystyle{\min_{R} L(R)}$ such that $\qquad R \in SO(3)$

The SO(3) constraint makes this minimization problem interesting and the solutions diverse. We will talk about three solutions: 1) convex optimization, 2) SVD method and 3) q-Method.

Convex Optimization Solution

The convex optimization solution makes use of constraints and relaxation techniques to ultimately achieve a convex optimization problem that can be solved. Let’s begin the thought process.

If we look closer at Wahba’s cost function, we can transform the squared error into a trace with a trick!

Where the trace trick is:

$x^T A y = Tr[x^T A y ]$

$= Tr[y x^T A]$

There’s a striking observation: the minimization problem or cost function is linear with respect to the rotation matrix, R! To get a solution to this linear minimization problem, we can form an intermediate variable, B, which is the attitude profile matrix. It stores the observations in the body and inertial frames over time.

$B = \sum_i w_i \leftindex^B{v_i}\leftindex^N{v_i^T}$

We can now re-write Wahba’s problem as:

$\displaystyle{\max_{R} Tr[B \;R]}$ such that $R^T R = I$ and $det(R) = 1$

The minimization problem is linear but the constraints are non-convex. You can imagine that the edge of the circle are solutions for the equality constraint and the shaded region are solutions to the inequality constraint.

$\text{constraints} = \begin{cases}x^t x = 1 & \text{non-convex} \\x^Tx \leq 1 & \text{convex} \end{cases}$

Non-convex optimization problems are really very hard to solve so we’d like to transform the minimization problem into a convex problem. Replacing the equality with an inequality is called “relaxation”. The new, more easily solved optimization problem is:

$\displaystyle{\max_{R} Tr[B \;R]}$ such that $I - R^T R \geq 0$

Because the objective is linear, we will always end up on the boundary of the solution set, satisfying the constraint with equality. This particular set up is a special type of relaxation: “tight relaxation”. Linear optimization problems with inequality constraints are solved with semidefinite programs, which are numerical solvers. There are plenty of good solvers in various computer languages and libraries, like SEDUMI in Matlab.

We just walked through the convex optimization solution for Wahba’s problem. Given multiple observations in the body and inertial frame, an optimization problem was set up and transformed into a form that can be encoded into a numerical solver. The computer then solves for the attitude, R.

SVD solution

The SVD solution makes use of singular value decomposition (SVD) of the attitude profile matrix we defined earlier. Here’s the thought process.

The rotation matrix is defined by and can be written:

$\leftindex^N{R^B} = \leftindex^N{r_1} \leftindex^B{r_1^T} + \leftindex^N{r_2} \leftindex^B{r_2^T} + \leftindex^N{r_3} \leftindex^B{r_3^T}$

Where $r_i$ are mutually orthogonal unit vectors. Previously we used $v$ as the vectors but we need that letter for SVD, so we’re using $r$ for these reference unit vectors now.

But what happens if $r_i$ are not mutually orthogonal, which is likely for some random measurements spacecraft typically see? Let’s revisit the attitude profile matrix:

$B = \sum_i w_i \leftindex^B{r_i} \leftindex^N{r_i^T}$

The optimal $\leftindex^N{R^B}$ is the closest special-orthogonal matrix to $B^T$ .

To do this, we use singular value decomposition:

$M = U \Sigma V^T$

Where $M$ is some matrix, $U$ and $V$ are orthogonal matrices, and $\Sigma$ is a diagonal matrix. Diagonal elements of $Sigma$ are called “singular values”, $\sigma_i \geq 0$ . SVD is like an eigendecomposition but more general. We can make a special orthogonal matrix by setting $\sigma_i = \pm 1$

In summary, the SVD algorithm procedure is:

Form the attitude profile matrix, B, from measurements
Compute the SVD of the attitude profile matrix, B $U \Sigma V^T = svd(B^T)$
Compute the special orthogonal matrix with the SVD components

$\leftindex^N{R^B} = U \begin{bmatrix}1 & 0 & 0 \\0 & 1 & 0 \\0 & 0 & det(U) det(V)\end{bmatrix}V^T$

q-Method

The q-method is named after the quaternion variable name, q. We have to reform the optimization problem to incorporate the quaternion and solve for the quaternion.

Remember the optimization problem can be reformed to a linear form from the convex optimization solution? Here it is again and with the attitude profile matrix:

$L(\leftindex^N{R^B}) &= - \; \sum_i w_i \leftindex^N{r_i^T} \leftindex^N{R^B} \leftindex^B{r_i}$

$= - \; Tr[(\sum_i w_i \leftindex^B{r_i} \leftindex^N{r_i^T}) \leftindex^N{R^B} ]$

$= - \; Tr(B \leftindex^N{R^B})$

This cost function, L, is linear with respect to the rotation matrix, R.

To incorporate the quaternion into the loss function, let’s relate the quaternion to the rotation matrix first:

$R = I + 2 \underbar{v}^\times (sI + \underbar{v}^\times)$

Where R is the rotation matrix, I is the identity matrix, v is the vector component of the quaternion, s is the scalar component of the quaternion, and the cross product operator is in the vector superscript. This expression is a quadratic function of q.

Since $L(R)$ is linear and $R(q)$ is quadratic, $L$ takes on the higher polynomial power and can be written as a quadratic function:

$L(q) = L(R(q)) = -q^T K q$

This is the ultimate form we want to craft the problem into but what does K look like?

To find out what K looks like, we need to manipulate L(R) with the use of several matrix and quaternion properties. Here’s a matrix property we will eventually use:

$\underbar{v}^\times \underbar{v}^\times = \underbar{v} \; \underbar{v}^T - (\underbar{v}^T \underbar{v})I$

Here’s the quaternion identity we will use:

$q^T q = s^2 + \underbar{v}^T \; \underbar{v} = 1$

The rotation matrix can be transformed into this form:

$R &= I + 2 \underbar{v}^\times (sI + \underbar{v}^\times)$

$= (s^2 + \underbar{v}^T \; \underbar{v})I + 2s \underbar{v}^\times + 2 \underbar{v} \; \underbar{v}^T - 2 (\underbar{v}^T \underbar{v})I$

$= (s^2 - \underbar{v}^T \; \underbar{v})I + 2s \underbar{v}^\times + 2 \underbar{v} \; \underbar{v}^T$

Let’s plug in $R(q)$ into $L(R)$ :

$L(\leftindex^N{R^B}) &= - \sum_i w_i \leftindex^N{r_i^T} \leftindex^N{R^B} \leftindex^B{r_i}$

$= - \sum_i w_i \leftindex^N{r_i^T} [ (s^2 - \underbar{v}^T \; \underbar{v})I + 2s \underbar{v}^\times + 2 \underbar{v} \; \underbar{v}^T ] \leftindex^B{r_i}$

The goal is to factor all of the “q stuff” out to the sides:

In terms of the quaternion, Wahba’s problem is now:

${\min_{q} -q^T K q}$ OR ${\max_{q} q^T K q}$ such that $q^T q = 1$

Since ||q|| = 1, we only get to pick its direction.

The final conclusion is that $q^T K q$ will be maximized when q is parallel to the eigenvector corresponding to the maximum eigenvalue of K.

In summary, the q-Method algorithm procedure is:

1.Form the attitude profile matrix, B, and intermediate product, Z:

$B = \sum_i w_i \leftindex^B{r_i} \leftindex^N{r_i^T}$ and $Z= \sum_i w_i (\leftindex^B{r_i} \times \leftindex^N{r_i})$

2. Construct K

$K=\begin{bmatrix} B + B^T-Tr(B)I & Z \\ Z^T & Tr(B) \end{bmatrix}$

3. Calculate $\max(eig(K))$ and corresponding eigenvector

4. Normalize the maximum eigenvector to obtain $\leftindex^N{q^B}$ . Woo! We have our quaternion as a result!

Conclusion remarks: lots of other methods (QUEST, ESOQ, etc.) are derived from the q-Method by using tricks for solving step 3.

Kalman Filter

What if instead of a batch of measurements at one time step, you have a sequence of observations over time? This system is dynamic in that we have to account for time. It would make sense to incorporate a dynamics model to inform if your measurements are consistent with the way spacecraft dynamics are expected to propagate, right? A common solution is the Kalman filter, which is the optimal linear state estimator for a linear dynamical system.

“In statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, one of the primary developers of its theory”. The Kalman filter has strong roots in aerospace control: It was during a visit by Kálmán to the NASA Ames Research Center that Schmidt saw the applicability of Kálmán’s ideas to the nonlinear problem of trajectory estimation for the Apollo program leading to its incorporation in the Apollo navigation computer. This Kalman filter was first described and partially developed in technical papers by Swerling (1958), Kalman (1960), and Kalman and Bucy (1961).

The Apollo computer used 2k of magnetic core RAM and 36k wire rope […]. The CPU was built from ICs […]. The clock speed was under 100 kHz […]. The fact that the MIT engineers were able to pack such good software (one of the very first applications of the Kalman filter) into such a tiny computer is truly remarkable.

— Interview with Jack Crenshaw, by Matthew Reed, TRS-80.org (2009)

To explain the Kalman filter, we need to go through a little background on probability and linear systems. Then, we’ll walk through the Kalman filter algorithm. If you’d like a different explanation for Kalman filters, I HIGHLY recommend visiting Tucker McClure’s tutorial.

Background

Notes on Expectation:

$E[f(x)] = \int_{-\infty}^{\infty} f(x) p(x) dx$

Expectation is the value of a function weighted by the probability distribution. This expression gives the value of f(x) you would expect to see averaged over many random samples. Expectation is a linear operator:

$E[x+y] = E[x] + E[y]$

$E[\alpha x] = \alpha E[x]$

Where $alpha$ is a scalar. Common examples of expectation are mean and standard deviation:

$\mu = E[x]$

$P = E[(x-\mu)(x-\mu)^T]$

Notes on Stochastic Linear Systems:

A discrete-time linear time-varying (LTV) system with additive noise has the following form:

$x_{k+1} = A_k x_k + B_k u_k + \underbrace{w_k}_{\text{process noise}}$

$y_k = C_k x_k + \underbrace{v_k}_{\text{measurement noise}}$

Where measurement and process noise is normally distributed with a mean $\mu$ = 0 and covariance of W and V:

$w_k \sim \mathcal{N} (0,W)$ \quad , \quad $v_k \sim \mathcal{N} (0,V)$

Notes on Multivariate Gaussians:

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. We care about multivariates because attitude is in 3 dimensions! The multivariate distribution is given by:

$p(x) = \frac{1}{\sqrt{(2\pi)^n det(P)}} \exp(\tfrac{1}{2}(x\mu)^T P^{-1} (x-\mu))$

mean: $\mu = E[x]\in \mathbb{R}^n$ , covariance: $P=E[(x-\mu)(x-\mu)^T] \in S^n$

Where n is the number of states.

Problem Statement

We are trying to minimize the error between the true spacecraft state and an estimate. For our problem of estimating attitude, the state is a vector of variables, which includes an attitude representation and angular velocity, typically quaternion and angular velocity:

$x = \begin{bmatrix}q \\ \omega\end{bmatrix}$

Specifically, we are trying to minimize the mean squared error (MMSE) estimate by varying the state estimate:

$\begin{align*}\underset{\bar{x}}{\arg\min} \; E[(x-\bar{x})^T (x-\bar{x})] & = E[Tr((x-\bar{x})^T (x-\bar{x}))] \\&=Tr[P]\end{align*}$

This formulation is also known as “least squares” or “minimum variance”. The least squares is seen in the first line, where the quadratic is being minimized. The variance can be seen in the second line being minimized.

Another way to form the problem is to find the maximum a-posteriori (MAP) estimate. A-posteriori relates prior knowledge or observations to make predictions. We want to find the estimate that is most likely given previous observations. That problem statement is mathematically expressed:

$\begin{align*}\underset{x}{\arg\max} \underbrace{p(x|y)}_{\text{probability of x conditioned on y} }\end{align*}$

Where x is the state estimate and y are previous measurements.

The MMSE and MAP estimate turn out to be the same for a Gaussian so we can derive the Kalman filter using either criterion.

Algorithm Solution

To implement a Kalman filter as a recursive linear MMSE estimator, we have to make the following assumptions:

Assume we have an estimate of the state that includes measurement information up to $t_k$ :

$\bar{x}_{k|k} = E[x_k|y_{1...k}]$

Assume we know the state error covariance:

$P_{k|k} = E[(x_k-\bar{x}_{k|k})(x_k-\bar{x}_{k|k})^T]$

We want to update our estimate $\bar{x}$ and covariance $P$ to include a new measurement at time $t_{k+1}$ .

The Kalman Filter can be broken into two steps:

Prediction step:

$E[x_k w_k^T] = E[w_k x_k^T] = 0$

Measurement update:
Define “innovation”, $z_{k+1}$ :

$z_{k+1} = y_{k+1} - C \bar{x}_{k+1|k}$
$= \underbrace{C x_{k+1} + v_{k+1}}_{\text{y_k+1}} - C \bar{x}_{k+1|k}$

$=C (x_{k+1} - \bar{x}_{k+1|k}) + v_{k+1}$

The subsequent innovation covariance $S_{k+1}$ is:

$S_{k+1} = E[z_{k+1} z_{k+1}^T]$

$= E[(C (x_{k+1} - \bar{x}_{k+1|k}) + v_{k+1})(C (x_{k+1} - \bar{x}_{k+1|k}) + v_{k+1})^T]$

Noise $V_{k+1}$ and state $x_{k+1}$ are uncorrelated:

$S_{k+1} = C E[(x_{k+1} - \bar{x}_{k+1|k})(x_{k+1} - \bar{x}_{k+1|k})^T] C^T + E[v_{k+1}v_{k+1}^T]$
$= C P_{k+1|k} C^T + V$

Innovation is the thing we feed back into the filter to correct our state estimate. The state update is then:

$\bar{x}_{k+1|k+1} = \bar{x}_{k+1|k} + \underbrace{L_{k+1}}_{\text{Kalman Gain}} z_{k+1}$

Covariance update is:
$P_{k+1|k+1} = E[x_{k+1} - \bar{x}_{k+1|k+1})(x_{k+1} - \bar{x}_{k+1|k+1})^T]$
$= E[x_{k+1} - \bar{x}_{k+1|k} - L_{k+1} z_{k+1})(x_{k+1} - \bar{x}_{k+1|k} - L_{k+1} z_{k+1})^T]$
$= E[x_{k+1} - \bar{x}_{k+1|k} - L_{k+1} (C (x_{k+1} - \bar{x}_{k+1|k}) + v_{k+1}))(x_{k+1} - \bar{x}_{k+1|k} - L_{k+1} (C (x_{k+1} - \bar{x}_{k+1|k}) + v_{k+1}))^T]$
$= (I-L_{k+1}C) E[(x_{k+1} - \bar{x}_{k+1|k})(x_{k+1} - \bar{x}_{k+1|k})^T] (I-L_{k+1}C)^T + L_{k+1} E[v_{k+1}v_{k+1}^T] L_{k+1}^T$
$= \underbrace{(I-L_{k+1}C) P_{k+1|k} (I-L_{k+1}C)^T + L_{k+1} V L_{k+1}^T}_{\text{"Joseph Form" - you may see other versions}}$

To find the Kalman Gain matrix, let’s revisit the MMSE problem:

$\underset{\bar{x}_{k+1}}{\arg\min} \; E[(x_{k+1}-\bar{x}_{k+1|k+1})^T (x_{k+1}-\bar{x}_{k+1|k+1})]$

$= E[Tr((x_{k+1}-\bar{x}_{k+1|k+1})^T (x_{k+1}-\bar{x}_{k+1|k+1}))]$

$=Tr[P_{k+1|k+1}]$

To find the solution of this quadratic function of $L_{k+1}$ , we’re going to take the partial derivative and set it to 0, solving for: $L_{k+1}$ : $\text{set} \frac{\partial Tr[P_{k+1|k+1}] }{ \partial L_{k+1}} = 0\text{and solve for} L_{k+1}$

$-2P_{k+1|k}C^T + 2L_{k+1}C P_{k+1|k}C^T+ 2L_{k+1}V = 0$

$-P_{k+1|k}C^T + L_{k+1} S_{k+1} = 0$
$\Longrightarrow L_{k+1} &= P_{k+1|k}C^T S_{k+1}^{-1}\end{align*}$

To summarize all the juicy equations in one straightforward procedure, the Kalman Filter algorithm is as follows:

Start with $x_{0|0} , P_{0|0}, W, V$
Predict:

$\bar{x}_{k+1|k} = A \bar{x}_{k|k} + B u_k$
$P_{k+1|k} &= A P_{k|k} A^T + W$

Calculate innovation and covariance:

$z_{k+1} = y_{k+1} - C \bar{x}_{k+1|k} \quad , \quad S_{k+1} = C P_{k+1|k} C^T + V$

Calculate Kalman Gain:

$L_{k+1} = P_{k+1|k}C^T S_{k+1}^{-1}$

Update:

$\bar{x}_{k+1|k+1} = \bar{x}_{k+1|k} + L_{k+1} z_{k+1}$

$P_{k+1|k+1} = (I-L_{k+1}C) P_{k+1|k} (I-L_{k+1}C)^T + L_{k+1} V L_{k+1}^T$

Go back to step 2 once you get a new measurement

Concluding Remarks

This “vanilla” Kalman filter derivation applies to linear systems. If we were to extend this process to nonlinear systems, we would have to derive linear state matrices $A_k$ and control input matrices $B_k$ with respect to time. This kind of Kalman filter is called an extended Kalman Filter (EKF). Another variant of the Kalman filter is the Unscented Kalman Filter (UKF), which estimates nonlinear dynamics where 1) fully nonlinear dynamics are propagated instead of linearized and 2) multiple points are propagated over time [TowardDataScience]. Particle filters are in the filtering family but instead of estimating with linear projections, the Particle filter does so by a sequential Monte Carlo method, which is a fancy way of saying repeated random sampling.

Estimation of Mackey-Glass time-series with the EKF and UKF using a known model. The graph shows a comparison of estimation errors for complete sequence Wan, Eric A., and Rudolph Van Der Merwe. “The unscented Kalman filter for nonlinear estimation.” Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No. 00EX373). Ieee, 2000.

Ever since the use of Kalman Filters in the Apollo program, they have been the industry standard in estimating spacecraft dynamics in orbit as they solve the dynamic state estimation problem. They are not computationally intensive and reject noise from sensor measurements quite effectively. Extended Kalman Filters are the most commonly used Kalman filter variant as it is the most computationally compact and efficient filter.

License

Icon for the Creative Commons Attribution 4.0 International License