Understanding Classification in Machine Learning

Machine Learning, Artificial Intelligence, and Generative AI are fields of engineering that are advancing at sky-high rates today. Since we’re walking through these changes in a series of blogs, I’ll attempt to break down the more abstruse concepts of this sphere in the simplest ways I can. Without further ado, let’s jump in.

Classification

As the name suggests, classification is a broad umbrella of algorithms tasked with classifying raw data into preset labels.

Too technical? Here’s the simpler version.

A classification model is nothing more than a sequence of steps that helps us sort raw data into categories of our choice. It can be as simple as labeling a picture as sky or earth, or as complex as classifying a retina scan as proof of whether someone may be suffering from Alzheimer’s disease.

It’s important to note that the labels in classification form a categorical variable with discrete values like yes/no or sky/earth.

Having painted a picture of what classification encompasses, let’s now look at some real-world applications.

What is Classification Used For?

Classification has a wide variety of applications across industries. Many problems can be expressed as associations between features and target variables, especially when labeled data is available.

Some practical use cases include:

Email filtering
Speech-to-text
Handwriting recognition
Biometric identification
Document classification

In the world of consulting, two use cases stand out: churn prediction and customer segmentation.

Churn prediction: The term churn refers to a customer discontinuing a service provided by a company. For instance, if you cancel your Spotify subscription to switch to YouTube Music, that’s churn. Predicting churn means using classification algorithms to figure out if a customer is likely to leave.

Customer segmentation: This involves predicting which category a customer belongs to—for example, whether a passenger is more likely to book first class, business class, or economy on an airline.

Binary vs. Multi-Class Classification

Before diving into algorithms, let’s clarify the difference between binary and multi-class classification. Binary classification involves predictions limited to two possible outcomes (e.g., churn or no churn). Multi-class classification involves predictions among more than two outcomes (e.g., economy, business, or first class).

To pull this off in the real world, engineers use algorithms such as Naive Bayes, Logistic Regression, Decision Trees, K-Nearest Neighbors (KNN), Support Vector Machines (SVMs), and Neural Networks.

Some of these algorithms can directly handle multiple classes, while others only work with binary outcomes. For the latter, we extend them into multi-class problems using strategies like One-vs-All and One-vs-One. Let’s dive deeper.

One-vs-All Strategy

You might come across tough textbook definitions of this, but at its core, One-vs-All (OvA) is simply reusing a binary classifier as many times as the number of classes present.

Here’s an easy example. Imagine you’re classifying colors into Red, Green, or Blue.

Classifier 1 → Is it Red or Not Red?

Classifier 2 → Is it Green or Not Green?

Classifier 3 → Is it Blue or Not Blue?

Color	R	G	B
Red	255	0	0
Green	0	255	0
Blue	0	0	255
Yellow	255	255	0

Each classifier is trained separately. For example, the “Red classifier” only decides whether a given color is red or not.

If we feed in a new color (200, 50, 50), the classifiers might output:

Red classifier → 0.8 (high probability)

Green classifier → 0.1 (low probability)

Blue classifier → 0.1 (low probability)

Since the Red classifier is most confident, the model classifies the color as Red.

One-vs-One Strategy

In contrast, the One-vs-One (OvO) strategy changes the question from “Is it this or not?” to “Is it this or that?”

Here’s where a bit of high-school math sneaks in. For a problem with N classes, we train a classifier for every possible pair of classes. The total number of classifiers is:

𝑁 × ( 𝑁 − 1 ) 2 2 N×(N−1)

That’s just the number of ways to choose a pair from N items (permutation/combination refresher!).

For our three colors—Red, Green, Blue—we’d end up with:

Classifier 1 → Red vs. Green

Classifier 2 → Green vs. Blue

Classifier 3 → Blue vs. Red

Now, if we test the color (200, 50, 50):

Red vs. Green → predicts Red

Green vs. Blue → predicts Blue

Blue vs. Red → predicts Red

Here, Red wins the majority vote, so the final prediction is Red.

Conclusion

Through this article, I’ve tried to ease into the world of machine learning classification by keeping the explanations simple and intuitive. We touched upon what classification means, how binary and multi-class problems differ, and how One-vs-All and One-vs-One strategies work—with a little help from combinations and probability.

We haven’t yet gone into the deeper mathematics or coding implementations, but this foundation should set the stage for more detailed discussions in the upcoming write-ups.