EM on a 2D Gaussian Mixture

What you are seeing: a synthetic 2D dataset drawn from three Gaussian clusters with known parameters (their true 2-sigma ellipses are drawn faint). The EM algorithm tries to recover those parameters using nothing but the data: it alternates between soft-assigning each point to a cluster (E-step) and re-fitting each cluster's mean, covariance, and weight from the soft assignments (M-step).

Each iteration is guaranteed to never decrease the log-likelihood $\log p(X | \theta) = \sum_n \log \sum_k \pi_k\,N(x_n; \mu_k, \Sigma_k)$ . Watch the trace under the plot. The cluster ellipses are 2-sigma confidence regions of the currently estimated $\Sigma_k$ . Each data point is colored by its argmax responsibility $\gamma_{nk}$ .

Figure 1. EM iterations on a synthetic 3-cluster 2D Gaussian mixture, with log-likelihood trace. Method: closed-form M-step, soft-assignment E-step.

dataset

init seed1

WHAT TO TRY

Pick two overlapping clusters or unequal weights: EM soft assignments blur where the clusters overlap, and the fitted ellipses settle onto the true faint ones over a few E and M steps.
Change the init seed: EM is non-convex, so a bad seed converges to a wrong local optimum (one cluster swallowing two). Re-seed and it can find the right fit.
Set K wrong: watch the algorithm merge real clusters or split one. The fit quality tells you the model is misspecified.