Back

EM on a 2D Gaussian mixture

What you are seeing: a synthetic 2D dataset drawn from three Gaussian clusters with known parameters (their true 2-sigma ellipses are drawn faint). The EM algorithm tries to recover those parameters using nothing but the data: it alternates between soft-assigning each point to a cluster (E-step) and re-fitting each cluster's mean, covariance, and weight from the soft assignments (M-step).

Each iteration is guaranteed to never decrease the log-likelihood logp(Xθ)=nlogkπkN(xn;μk,Σk)\log p(X | \theta) = \sum_n \log \sum_k \pi_k\,N(x_n; \mu_k, \Sigma_k). Watch the trace under the plot. The cluster ellipses are 2-sigma confidence regions of the currently estimated Σk\Sigma_k. Each data point is colored by its argmax responsibility γnk\gamma_{nk}.

Figure 1. EM iterations on a synthetic 3-cluster 2D Gaussian mixture, with log-likelihood trace. Method: closed-form M-step, soft-assignment E-step.
K3
init seed1

WHAT TO TRY

  • Vary each control and watch the rail readouts respond.
  • Compare the diagnostic plot against the live scene.