Last time on Supervised Learning’s Got Talent, we were introduced to the talented decision maker of the show, Logistic Regression.

Today, we’re going to branch out and meet our next Supervised Learning contestant. This contestant, like Logistic Regression, is also skilled at making decisions, but the difference is?

They are a lot more detective-like in their process.

Introducing the Decision Tree!

Decision Tree. They may not produce oxygen, but they sure make some definitive calls!

🌳 Decision Tree - “The Overthinking Oracle”

Remember how Logistic Regression is good at making decisions? Well, Decision Trees do this too (I mean, it’s right in their name, right?), but with a key difference:

Decision Trees ask a lot of questions before getting to a decision.

Imagine that you’re trying to decide whether to go out for ice cream. What are you going to do? You’ll probably ask a series of questions:

“Is the ice cream shop even open?” (Yes/No)
“Do I have money?” (Yes/No)
“Am I willing to deal with the guilt later?” (Yes/No)

Notice how each question is relevant to the decision? That’s exactly how a Decision Tree works!

It’s literally a flowchart of questions that leads to a decision or a prediction.

At their core, Decision Trees break your data into smaller and smaller chunks by asking yes/no questions at every level—like a nosy neighbor who won't stop until they know everything about your weekend.

🤖 Components Of A Decision Tree

Root Node: The topmost node representing the most important feature that splits the data first.

Internal Nodes: These are points where the tree makes decisions based on certain features. Each node asks a yes/no question to help further divide the data! Kind of like mini crossroad signs within the tree.

Branches: These are the paths you follow after answering a question at a node. They lead you to the next node/outcome, like following a breadcrumb trail.

Leaf Nodes: These nodes provide the final decision/result. Think of them as the end of the trail where you get your answer or prediction, such as classifying data or predicting a value.

Decision Tree: The ultimate family tree for data nerds.

Let’s say you’re trying to decide what show to watch on Netflix.

It all starts at the Root Node, with a big question like “Am I feeling adventurous?”

Based on your answer (that’s a branch!), you travel down to an Internal Node for another question, say, “Is my partner awake?”

You keep following branches through more internal questions until you land on a Leaf Node - that’s your definitive choice, such as “Rewatch Game Of Thrones!”.

Here’s yet another example of the Decision Tree in action:

If you’re intuitive, you may say to yourself: Okay, yeah, great. Decision Trees ask lots of questions, but how do they know which question to ask next?

The answer to that is something called Information Gain and GINI Index.

📚 Information Gain

Information Gain is all about finding the question that makes your data less confusing. It’s based on the concept of Entropy: the measurement of how messy your data is. Here’s the formula:

Entropy: Math’s fancy way of saying “It’s a hot mess in here.”

p_i
→ This is the proportion (percentage) of things in a group.
Example: If 7 out of 10 nuts are tasty, p_i= 0.7
log⁡2(p_i)
→ This asks: How surprising is it to find this thing?
The lower the probability, the more surprising it is.
Multiply: p_i × log⁡2(p_i)
→ This combines: how common something is × how surprising it is.
Sum them all up (Σ)
→ Add this up for all categories (like tasty nuts and yucky nuts).
Put a minus sign in front (–)
→ This flips the negative result to positive because entropy should always be positive.

✍️ GINI Index

One more thing I want to show you is something called the GINI Index. It measures how mixed or impure a dataset is.

While Information Gain asks, “How much cleaner did I make this data?”, the GINI Index asks, “How clean is this pile?”

It measures impurity directly, by following this equation:

Believe it or not, this equation tells you how mixed up your life really is.

p_i = the probability (or proportion) of class i in the dataset.
Sum ∑ = add this up for all classes.
p_i²= multiply each class’s probability by itself.
Subtract from 1 = this gives you the impurity.

👨‍💻 Coding Exercise + Practice Quiz + Visual PDF

And that’s the Decision Tree. The best way to actually make this stick is to get your hands on it. Run the code, and test your knowledge.

Let’s now decide to practice.

Alright, you made it this far… respect 🤝

Want to unlock the rest? Join the Pro Tier and get the good stuff. In other words, the stuff I didn't gatekeep. 😊 This paid tier is brand new, which means you have a chance to be one of the first people in.

Upgrade

What you unlock:

⚡ Founding Member Deal: First 100 subscribers lock in $5/month for life. After that, it's $9.99/month
Cheat Sheets / Visual Guides (Weekly)
10-Question Quiz (Easy / Medium / Hard) + Solutions (Weekly)
Hands-On Notebook: Google Colab Link + Jupyter Notebook For Coding Implementation (Weekly)
Discount Codes For All E-Books

Supervised Learning's Got Talent - Episode 3