In our last episode, we had our analytical detective, Decision Tree, take center stage to wow us all.

Today, however, we have an interesting star. This guy is all about awareness.

Ever meet someone in your neighborhood who seems to know what everyone else is up to?

Well, I’m introducing you to the Machine Learning variant of that nosy neighbor:

K-Nearest Neighbor!

K-Nearest Neighbor: The Ultimate Eavesdropper!

🕵️ K-Nearest Neighbor (KNN) - “The Neighborhood Watch.”

Forget about complex algorithms for a minute.

Imagine you’re at a party, and you meet someone new. You want to figure out what kind of person they are (e.g., are they a quiet introvert? Or a loud party animal? ).

How would you do it?

You’d look at the people around them! (Or maybe lack of). If they’re quietly hanging by themselves in the corner, yeah — probably an introvert.

If they’re loud and talkatively participating in a game of charades, probably a party animal.

This is how KNN works in machine learning.

KNN is a simple machine learning algorithm that guesses what something is by looking at the “K” most similar things nearby and choosing the majority.

🤖 How KNN Works

KNN is an algorithm used for both classification (predicting a category) and regression (predicting a number) .

The core idea revolves around three simple questions:

“Who are your neighbors?”:
1. When you have a new, unknown piece of data (like our new person at the party), KNN looks at the K data points that are "closest" to it in the training data.
“What are your neighbors like?”:
1. For Classification: If most of its K closest neighbors belong to a certain category (e.g., "introvert"), then the new data point is assigned that same category. It's a "majority vote."
2. For Regression: If you're predicting a number, KNN takes the average (or median) of the values of its K closest neighbors.
“How do we measure closest?”
1. This is usually done using a distance metric. The most common one is Euclidean Distance (think of it as the straight-line distance between two points on a graph). But it could be others, depending on the data.

📚 What Is “K” In KNN?

Simple! In K-Nearest Neighbor, "k" tells the algorithm how many nearby data points to consider for making a prediction.

If k = 5, it looks at the 5 closest points and picks the most common class among them. This helps decide what the new data point should be classified as.

Image Credit: Amit Chauhan

✍️ How Does KNN Find Its “Nearest” Neighbor?

KNN uses distance metrics, which basically measure how far apart data points are. The most common one is Euclidean Distance (say ‘hi’ to those straight lines in the image above).

Euclidean Distance

Recognize this guy? He’s kind of a big deal in math.

Euclid: The guy who made straight lines a big deal for 2000 years.

That’s the Greek mathematician, Euclid. He did a whole lot more than just nail the look of a “mage” in an RPG video game.

He built an entire system of geometry based on just a few rules, which has remained a gold standard for over 2000 years.

Based on his intellectual groundwork, Euclidean Distance was formed. Euclidean Distance is the straight-line distance between 2 points.

Alright, heads up, here comes the formula for it:

Math’s way of saying: “How far is too far?”

See (x₁, y₁) and (x₂, y₂)? Those are literally just coordinates of two points. You simply follow these 4 steps:

Subtract the coordinates:
Find how far apart the points are horizontally and vertically:
(x₂−x₁) and (y₂−y₁)
Square the differences:
This gets rid of negatives and gives you the squared "legs" of a right triangle.
Add the squares:
Remember the Pythagorean Theorem? You’re applying it here:
a²+ b²= c²
Take the square root:
This gives you the actual straight-line distance (the hypotenuse).

Let’s run through a quick example, yeah?

Let’s say our (x₁, y₁) and (x₂, y₂) are: (1, 2) and (4, 6).

Following the four steps above, we get the following (take a moment to try to figure this out on your own, by the way):

Euclidean Distance = 5

Voila! Now there are other distance metrics used in KNN, such as Manhattan Distance and Minkowski Distance (which I’ll show you in the future), but the most common one is Euclidean Distance.

So get comfortable with it!

✅ Advantages & Disadvantages

KNN is great for working with small to medium-sized datasets, as well as being simple to understand and implement.

Those are some brownie points for sure. But there are some slip-ups as well.

With large datasets or data with a lot of dimensions, that’s when KNN starts to falter.

👨‍💻 So…Do You Actually Understand KNN?

You understand KNN now. But understanding the theory and actually building it? Two different things.

Alright, you made it this far… respect 🤝

Want to unlock the rest? Join the Pro Tier and get the good stuff. In other words, the stuff I didn't gatekeep. 😊 This paid tier is brand new, which means you have a chance to be one of the first people in.

Upgrade

What you unlock:

⚡ Founding Member Deal: First 100 subscribers lock in $5/month for life. After that, it's $9.99/month
Cheat Sheets / Visual Guides (Weekly)
10-Question Quiz (Easy / Medium / Hard) + Solutions (Weekly)
Hands-On Notebook: Google Colab Link + Jupyter Notebook For Coding Implementation (Weekly)
Discount Codes For All E-Books

Supervised Learning's Got Talent - Episode 4