Attention as soft retrieval
Single-head dot-product attention over a small key-value bank. Each key is a point in 2D; values are scalars rendered as bar heights. Drag the query around the $(k_x, k_y)$ plane to recompute the attention weights, then watch the output bar approach the weighted average of the value bars. As temperature $\tau$ drops to zero the attention collapses to one-hot retrieval of the nearest key; as $\tau$ grows the distribution flattens toward uniform.
tau0.50
q_x0.00
q_y0.00
WHAT TO TRY
- Vary each control and watch the rail readouts respond.
- Compare the diagnostic plot against the live scene.