Machine Learning, Artificial Intelligence, and Generative AI are fields of engineering that are advancing at sky-high rates today. Since we’re walking through these changes in a series of blogs, I’ll attempt to break down the more abstruse concepts of this sphere in the simplest ways I can. Without further ado, let’s jump in. Classification As the name suggests, classification is a broad umbrella of algorithms tasked with classifying raw data into preset labels. Too technical? Here’s the simpler version. A classification model is nothing more than a sequence of steps that helps us sort raw data into categories of our choice. It can be as simple as labeling a picture as sky or earth, or as complex as classifying a retina scan as proof of whether someone may be suffering from Alzheimer’s disease. It’s important to note that the labels in classification form a categorical variable with discrete values like yes/no or sky/earth. Having painted a picture of what classification encompasses, let’s now look at some real-world applications. What is Classification Used For? Classification has a wide variety of applications across industries. Many problems can be expressed as associations between features and target variables, especially when labeled data is available. Some practical use cases include: Email filtering Speech-to-text Handwriting recognition Biometric identification Document classification Email filtering Speech-to-text Handwriting recognition Biometric identification Document classification In the world of consulting, two use cases stand out: churn prediction and customer segmentation. Churn prediction: The term churn refers to a customer discontinuing a service provided by a company. For instance, if you cancel your Spotify subscription to switch to YouTube Music, that’s churn. Predicting churn means using classification algorithms to figure out if a customer is likely to leave. Churn prediction: Customer segmentation: This involves predicting which category a customer belongs to—for example, whether a passenger is more likely to book first class, business class, or economy on an airline. Customer segmentation: Binary vs. Multi-Class Classification Before diving into algorithms, let’s clarify the difference between binary and multi-class classification. Binary classification involves predictions limited to two possible outcomes (e.g., churn or no churn). Multi-class classification involves predictions among more than two outcomes (e.g., economy, business, or first class). To pull this off in the real world, engineers use algorithms such as Naive Bayes, Logistic Regression, Decision Trees, K-Nearest Neighbors (KNN), Support Vector Machines (SVMs), and Neural Networks. Some of these algorithms can directly handle multiple classes, while others only work with binary outcomes. For the latter, we extend them into multi-class problems using strategies like One-vs-All and One-vs-One. Let’s dive deeper. One-vs-All Strategy You might come across tough textbook definitions of this, but at its core, One-vs-All (OvA) is simply reusing a binary classifier as many times as the number of classes present. Here’s an easy example. Imagine you’re classifying colors into Red, Green, or Blue. Classifier 1 → Is it Red or Not Red? Classifier 2 → Is it Green or Not Green? Classifier 3 → Is it Blue or Not Blue? Color R G B Red 255 0 0 Green 0 255 0 Blue 0 0 255 Yellow 255 255 0 Color R G B Red 255 0 0 Green 0 255 0 Blue 0 0 255 Yellow 255 255 0 Color R G B Color Color R R G G B B Red 255 0 0 Red Red 255 255 0 0 0 0 Green 0 255 0 Green Green 0 0 255 255 0 0 Blue 0 0 255 Blue Blue 0 0 0 0 255 255 Yellow 255 255 0 Yellow Yellow 255 255 255 255 0 0 Each classifier is trained separately. For example, the “Red classifier” only decides whether a given color is red or not. If we feed in a new color (200, 50, 50), the classifiers might output: Red classifier → 0.8 (high probability) Green classifier → 0.1 (low probability) Blue classifier → 0.1 (low probability) Since the Red classifier is most confident, the model classifies the color as Red. One-vs-One Strategy In contrast, the One-vs-One (OvO) strategy changes the question from “Is it this or not?” to “Is it this or that?” Here’s where a bit of high-school math sneaks in. For a problem with N classes, we train a classifier for every possible pair of classes. The total number of classifiers is: 𝑁 × ( 𝑁 − 1 ) 2 2 N×(N−1) That’s just the number of ways to choose a pair from N items (permutation/combination refresher!). For our three colors—Red, Green, Blue—we’d end up with: Classifier 1 → Red vs. Green Classifier 2 → Green vs. Blue Classifier 3 → Blue vs. Red Now, if we test the color (200, 50, 50): Red vs. Green → predicts Red Green vs. Blue → predicts Blue Blue vs. Red → predicts Red Here, Red wins the majority vote, so the final prediction is Red. Conclusion Through this article, I’ve tried to ease into the world of machine learning classification by keeping the explanations simple and intuitive. We touched upon what classification means, how binary and multi-class problems differ, and how One-vs-All and One-vs-One strategies work—with a little help from combinations and probability. We haven’t yet gone into the deeper mathematics or coding implementations, but this foundation should set the stage for more detailed discussions in the upcoming write-ups.