Last time, t-SNE threw a party and let friend groups form naturally.

No assigned seating. No labels. Just vibes and proximity.

But today's contestant?

They look at unlabeled data and say, "I don't just want to see the clusters. I want to know the probability of belonging to each one."

Please welcome Gaussian Mixture Models — or GMM for short.

The algorithm that doesn't just find your friend group. It tells you how much you belong to each one.

📈 GMM- “The Probabilistic Matchmaker”

Most clustering algorithms are decisive. K-Means, for example, looks at a data point and says: "You're in Group 3. That's final. Go stand over there."

Hard assignment. No nuance.

But real data is rarely that clean.

What about the person who kind of fits in with the engineers and kind of fits in with the designers? What about the customer who's mostly a bargain hunter but occasionally splurges?

GMM doesn't force a verdict. Instead, it says:

"You're 70% Cluster A and 30% Cluster B. Both are valid. You contain multitudes."

This is soft assignment — and it's a much more honest way to describe messy, real-world data.

A Gaussian Mixture Model is a probabilistic model that assumes a dataset is generated by a mixture of several Gaussian (bell curve) distributions, each representing a cluster.

🧠 Play By Play: How t-SNE Works

Let’s walk through what happens behind the scenes.

1️⃣ Assume the Data Is Made of Overlapping Bells

GMM starts with a simple idea:

Your data is made up of multiple bell curves (Gaussians) stacked on top of each other.

Each bell represents a cluster. Some are narrow, some are wide, and they can overlap.

The goal?
Figure out where those bells are, how wide they are, and how much data each one explains.

Think of it like hearing a chord and trying to pick out the individual notes.

2️⃣ Make an Initial Guess

GMM needs a starting point, so it initializes the Gaussian bells — placing them somewhere in the data, giving each one a shape, and assigning each cluster a rough "weight" (how much of the data it's responsible for).

This first guess is imperfect. The bells are probably in the wrong places.

That's fine. We're about to fix that.

3️⃣ E-Step: "Who Do You Probably Belong To?"

Now each data point is asked:

“Given these bells, what are the changes you came from each one?”

Instead of assigning a hard cluster, each point gets probabilities.

  • Close to one bell → high probability there

  • In between → split probability

4️⃣ M-Step: "Now Let's Rebuild the Bells."

Armed with those probabilities, GMM goes back and rebuilds the Gaussian bells to better fit the data.

It recalculates:

  • The center of each bell (the mean)

  • The shape and spread of each bell (the covariance)

  • The weight of each bell (how much data it explains)

This step is called the Maximization step.

The bells shift. They reshape. They rebalance.

5️⃣ Repeat Until It Stabilizes

GMM alternates between the E-step and M-step over and over.

Each iteration, the bells get better at explaining the data. Each iteration, the probability assignments get more accurate.

Eventually, the bells stop moving meaningfully. The algorithm has converged. The clusters have found their shape.

This back-and-forth loop is called Expectation-Maximization — or EM — and it's one of the most elegant ideas in all of machine learning.

TLDR: The t-SNE Mood Board

Step

What PCA Does

Vibe

1

Assumes the data is a mix of bell curves.

“This is all Gaussians, deep down.”

2

Place the bells somewhere to start.

“We’ll figure it out.”

3

Assigns soft probabilities to each point.

“You’re 60% mine.”

4

Rebuilds the bells based on these probabilities.

“Let me reshape around you.”

5

Repeats until stable.

“We’ve reached an understanding.”

Conclusion

Where t-SNE showed you where clusters live, GMM tells you what they are — and how confidently any given point belongs to each one.

It's not just clustering. It's probabilistic reasoning about structure.

GMM takes the messy, overlapping reality of data and, instead of forcing clean lines, it embraces the fuzziness. It models uncertainty as a feature, not a bug.

Because sometimes the most honest answer isn't "you belong here."

It's "you're probably here — but don't rule out there."

In our next and final episode, our next contestant walks in with a ruler, a hierarchy, and zero interest in specifying the number of clusters ahead of time: Hierarchical Clustering.

Stay tuned. The auditions continue.

Reply

Avatar

or to participate

Keep Reading