The fist exercise out of eight that I have to solve in the optimal estimation course. The focus is on MAP estimation, MMSE estimation; MMAE estimation; ML estimation.

In this case, I need to use the minimum risk (Bayes) estimators from The Estimation Paradigm (the static case) and also apply the knowledge from Introduction to Optimal Estimation and Dynamics.

Context

The measurement system

I need to estimate the depth of the water below a ship by using a ultrasonic depth gauge sensor. It is mounted on the bottom of the boat and presents a transmitter and a receiver which capture a tone burst transmitted downwards.

The principle is ToF (Time of Flight).

The ToF is proportional to the depth $x$ . Let $c$ be the speed of sound in the water, then

x = \frac{c}{2} \cdot T o F

For clarity, we denote the true depth of the water with $x$ and the noisy measurement with $z$ .

However, the measurement can be disturbed by multiple factors:

Secondary echoes (Multipath Interference)
- The echo may reflect at the bottom of the boat causing a second echo that arrives at $t = 4 x / c$ . The second echo may cause a third echo, and so on. This makes the depth appear twice as deep ( $2 x$ ) than it actually is.
Electronic noise
- The measurement is contaminated by Gaussian noise with standard deviation $σ$ . If we assume that $x$ is the real depth and $z$ is the result of our measurement, we can adopt the following Gaussian mixture model (Likelihood Function) for the conditional probability density of $z$

p (z ∣ x) = P_{0} \frac{1}{σ 2 π} exp (- \frac{( z - x ) ^{2}}{2 σ ^{2}}) + P_{1} \frac{1}{σ 2 π} exp (- \frac{( z - 2 x ) ^{2}}{2 σ ^{2}})

$P_{1}$ is the probability that the first echo is missed and replaced by the second echo. Obviously, $P_{0}$ is the probability that the first echo is correctly detected. $P_{0} + P_{1} = 1$ .

In simple words, electronic interference adds “jitter” to the measurement, represented by a standard deviation $σ$ .

Prior Knowledge (Ground-Truth)

Based on the information from a nautical map, the shipper has some prior knowledge about the depth. The map indicates an interval of possible depths. This interval is considered to be softly bounded. Such prior knowledge can be modeled with a Generalized Normal Distribution.

p (x) = \frac{β}{2 α Γ ( \frac{1}{β} )} exp (- (\frac{∣ x - μ ∣}{α})^{β})

So, the captain gives the Prior Distribution $p (x)$ . Instead of a standard bell curve, it uses a Generalized Normal Distribution, which allows for a “softly bounded” interval of likely depths using parameters $α$ (scale), $μ$ (mean), and $β$ (shape).

For example, if the depth interval is $[x_{min}, x_{ma x}]$ , then $μ = \frac{1}{2} (x_{min} + x_{ma x})$ and $α = \frac{1}{2} (x_{ma x} - x_{min})$

My case

First Question: If we model $p (x)$ and $p (z ∣ x)$ against $z$ for $x = 1.5 m$ and $x = 2 m$ , what do these PDFs model?

The prior $p (x)$ shows my belief before even measuring. It’s nearly uniform between $x_{min} = 1 m$ and $x_{ma x} = 3 m$ , meaning I consider all depth in that range roughly equally likely. Again, this one reflects the captain’s knowledge based on the nautical map.
The Likelihood $p (z ∣ x)$ tells me what measurement I should expect given the two true depths (blue with $x = 1.5 m$ and red with $x = 2 m$ ).
- The blue PDF shows two peaks - one main peak at $z = 1.5 m$ (correct echo, 95% probable), and a small peak at $z = 3 m$ (secondary echo at 2x, 5% probable)
- Same thing for the red PDF - main peak at $z = 2 m$ and small peak at $z = 4 m$ .
The peaks are sharp, showing the sensor is precise ( $σ = 0.1 m$ ).

What happens if I modify $σ$ ? But $β$ ?

Modifying $σ$ :

The peak location stays the same, but the PDFs are more spreaded. As $σ$ >>, the noise gets larger ⇒ the measurement is less precise

The integral under each peak stays the same

But the probability of getting a measurement near a value changes. The peak heights change.

With $σ = 0.3$ (wider), there’s higher probability of measuring $z = 1.7 m$ when $x = 1.5 m$ . With $σ = 0.05$ (narrower), that probability is much lower.

Modifying $β$ :

Changes which depths are considered more likely a priori

As $β$ >>, I approach uniform distribution, meaning I assign all probabilities the same weight.

In my current case, $β = 20$ is good for “depth is somewhere between 1-3m, no strong preference”.

example for $β = 5$ and $σ = 0.3$

Second Question: Compute the PDFs regarding $p (z)$ and $p (x ∣ z)$ . Now we model these against $x$ for $z = 3.1 m$ and for $z = 4 m$ .

What do they represent?

The evidence $p (z)$ is the total probability of observing a measurement $z$ . It is calculated by integrating the likelihood over all possible true depths $x$ .

p (z) = \int_{\infty}^{\infty} p (z ∣ x) p (x) d x

The posterior $p (x ∣ z)$ represent my updated belief about the true depth $x$ after seeing measurement $z$ . Per Bayes’ Theorem:

p (x ∣ z) = \frac{p ( z ∣ x ) p ( x )}{p ( z )} \propto p (z ∣ x) p (x)

The marginal PDF $p (z)$ and the posterior PDF $p (x ∣ z)$ look like this:

The top plot $p (z)$ shows the two plateaus — the taller one for $z \in [1, 3]$ corresponds to the $90%$ chance of a direct reflection from the uniform prior $x \in [1, 3]$ . What follows is the $10%$ plateau which corresponds to the echoes.
- “What measurement are we likely to see?”
The bottom plot $p (x ∣ z)$ captures the two cases mentioned above.
- “Given I measured $z$ , what is the true depth $x$ ?”
- For $z = 4 m$ , a direct reflection is impossible since the maximum known depth is $3 m$ (I know that from the prior). Therefore a measurement of $4 m$ cannot possibly be a direct echo. The model infers it must be a double reflection which happens at $2 x$ , resulting in the distinct, confident peak at exactly $x = 2.0 m$ .
- The case regarding $z = 3.1 m$ is ambiguous and presents two conflicting possibilities. It could be a double reflection, meaning the true depth is half of the measurement (the first small peak at $\sim x = 1.55 m$ ). Alternatively, it could be a direct reflection of a true depth very close to the $3 m$ maximum, pushed up to $3.1 m$ by sensor noise. Since direct reflections are highly probable, the model strongly leans toward this explanation, causing the massive spike at the $x = 3.0 m$ boundary.

Third Question: Create m-files that calculate:

The MMSE estimator $\overset{x}{^}_{MMSE} (z)$ for $z = 3.1 m$ and for $z = 4 m$ .
- Minimum Mean Square Error calculates the expected value, or the center of mass, of the posterior distribution. It minimizes the squared error of the estimate, meaning its position is influenced by all possible outcomes, including the small distant probabilities of secondary echoes.
- $\overset{x}{^}_{MMSE} = \int x p (x ∣ z) d x$
The MAP estimator $\overset{x}{^}_{MAP} (z)$ for $z = 3.1 m$ and for $z = 4 m$ .
- Maximum A Posteriori maximizes the posterior distribution $p (x ∣ z)$ . It identifies the absolute highest peak of the combined probability, representing the single most likely depth when both the sensor measurement and the prior bounds are factored in.
- $\overset{x}{^}_{M A P} = ar g max_{x} p (x ∣ z)$
The MMAE estimator $\overset{x}{^}_{MMAE} (z)$ for $z = 3.1 m$ and for $z = 4 m$ .
- Minimum Mean Absolute Error calculates the median of the posterior distribution. It finds the exact depth that divides the total probability area perfectly in half, making it more robust against distant secondary peaks than the MMSE. (look more into this)
- $\overset{x}{^}_{MM A E} = \int_{- \infty}^{x} p (x^{'} ∣ z) d x^{'} = 0.5$
  - In other words, minimize the expected absolute error $E [∣ x - \overset{x}{^} ∣ ∣ z]$ . The solution is provably the median of the posterior, hence finding where the CDF(Cumulative Distribution Function) crosses 0.5.
The ML estimator $\overset{x}{^}_{ML} (z)$ for $z = 3.1 m$ and for $z = 4 m$ .
- Maximum Likelihood maximizes the the likelihood function $p (z ∣ x)$ . It strictly trusts the sensor data and finds the depth that makes the observed measurement most probable, completely ignoring the prior knowledge from the nautical map.
- $\overset{x}{^}_{M L} = ar g max_{x} p (z ∣ x)$

Can you explain the results, especially the ones for $z = 3.1 m$ ?

Results:

z=4m

All four estimators agree at $x = 2 m$ . The posterior is unimodal so, naturally, all estimators converge because the prior eliminates any chance of a direct echo.

z=3.1m

In this case, the posterior is bimodal (two peaks which I explained earlier).

The MMSE is the only one that’s pulled more to the left since it acts as a center of gravity. It does incline towards the correct answer, but in this case the value of 2.66 doesn’t make sense given the posterior.
The MAP is the same as ML but multiplies the likelihood by the prior $p (x)$ first. The prior slightly penalizes $x = 3.0$ since it’s near the boundary, nudging the peak marginally left to $x = 2.98$ . Very close to ML here because the likelihood peak dominates.

The MMAE integrates the posterior from left to right until it has accumulated 50% of the total probability mass. The small left peak at $x = 1.55$ contributes some mass, which means the $50%$ point is reached slightly earlier than the MAP peak, pulling it to $2.93 m$ . Essentially asking “where is the middle of all the probability?”

p(x|z)                        CDF
|                              1|          ___
|  /\      /\                   |         /
| /  \    /  \                0.5|_ _ _ _/· · ·  ← median here
|/    \  /    \                  |      /
|      \/      \               0|_____/
+-------------->x               +------------>x

The ML looks at $p (z ∣ x)$ and asks “for which x is this measurement most likely?”. It finds the peak of the likelihood. Since $z = 3.1$ is just inside the prior boundary, the direct echo peak lands at $x \approx 3.0$ . No prior involved at all.

Fourth+Fifth Question:

Calculate for each case in 3 the conditional risk. Compare and explain the results. Do that for any of the following cost functions:

Quadratic cost function $(x - \overset{x}{^}^{2})$
Absolute cost function $(∣ x - \overset{x}{^} ∣)$
Uniform cost function with $Δ = 0.05$ . The definition of $Δ$ is $C_{u ni} (x ∣ \overset{x}{^}) = 1$ if $∣ x - \overset{x}{^} ∣ > Δ$

The risks that are calculated may have a physical unit. Don’t forget to add them.

From The Estimation Paradigm, the risk is defined as the expected cost of an estimation error:

R (\overset{x}{^} ∣ z) = E_{x} [C (\overset{x}{^} ∣ x) ∣ z] = \int C (\overset{x}{^} ∣ x) p (x ∣ z) d x

z = 3.1m

Estimator	Quadratic (m²)	Absolute (m)	Uniform (-)
MMSE	0.3098	0.4435	0.9997
MAP	0.4100	0.3211	0.4838
MMAE	0.3819	0.3071	0.4732
ML	0.4258	0.3406	0.6333

z = 4.0m

Estimator	Quadratic (m²)	Absolute (m)	Uniform (-)
MMSE	0.0025	0.0399	0.3168
MAP	0.0025	0.0399	0.3267
MMAE	0.0025	0.0399	0.3267
ML	0.0025	0.0399	0.3267

For $z = 4.0 m$ all estimators agree, so the risks are nearly identical across estimators for each cost function. The posterior distribution is unimodal, meaning there is only one logical explanation for the measurement.
For $z = 3.1 m$ each estimator is lowest on its own cost function (MMSE lowest quadratic, MMAE lowest absolute, MAP lowest uniform) — exactly as the theory predicts.
The uniform risk of MMSE at $z = 3.1 m$ is nearly $1.0$ , meaning it almost always falls outside the $Δ = 0.05 m$ window. Since the uniform cost function penalizes any estimate outside the 0.05m threshold, and there is virtually zero probability mass in that valley, the MMSE is almost guaranteed to incur the maximum penalty.

🚀 Costin Chitic

Recent Notes

Actor-Critic Methods

Deep Q-Learning

Monte Carlo Learning

Proximal Policy Optimization (PPO)

Q-Learning

Fundamentals of parameter estimation - Part I

Context

The measurement system

Prior Knowledge (Ground-Truth)

My case

Graph View

Table of Contents

Backlinks