Hey friends! ๐ It's me, Miss Neura, here today to unpack the Naive Bayes classifier.
Now I know "naive" doesn't sound very flattering in the name. ๐ But don't let that fool you!
Naive Bayes is actually a super simple yet powerful algorithm for classification tasks like spam detection and sentiment analysis. ๐
It works by calculating conditional probabilities based on Bayes' theorem and an assumption of independence between features.
I know that sounds a little math-y, but stick with me! ๐ค I'll break down Bayes and the "naive" assumption piece by piece in easy-to-understand terms.
By the end, you'll have a clear understanding of how Naive Bayes ingests data to make predictions for categorical variables. ๐ The key is maximizing those probabilities!
Let's start with a quick history lesson to see where Naive Bayes originated before we dive into the nitty gritty details. โณ
The original Bayes' theorem dates back to the 1700s when Thomas Bayes first described it.
The theorem provided a way to calculate conditional probabilities.
It laid the foundation for understanding evidence-based statistics and probabilistic reasoning.
Over the years, Bayes' theorem became an important tool across fields like economics, medicine, and computing.
Fast forward to the 1960s - researchers started extending Bayes for classifying data in machine learning.
But it took high levels of computation to estimate the probabilities needed.
Then in the 1990s, the "naive" conditional independence assumption dramatically simplified calculations. ๐ก
This breakthrough yielded the Naive Bayes classifier algorithm we know and love today! ๐ฅฐ
Now let's dive into exactly how Naive Bayes works its probabilistic magic! ๐ฉ โจ
The "naive" in Naive Bayes comes from an assumption - all the โfeaturesโ we use are totally independent from each other! โ๐ค
For example, say we're building a spam filter using words in the email as features. ๐ง
The naive assumption means the word "free" appearing has nothing to do with the word "money" appearing. ๐ฐ
In the real world, this is often false - spam emails tend to have multiple sketchy words together. ๐ฌ
But it makes the math so much easier! ๐ We just calculate the probability of each word on its own.
To classify an email as spam or not spam, we:
1๏ธโฃ Find the base rate of spam emails (the prior probability of spam)
2๏ธโฃ Calculate the probability of each word appearing in spam emails and not spam emails (the likelihoods)
3๏ธโฃ Use Bayes' theorem to multiply these together and get the posterior probability that the email is spam
Posterior = Prior x Likelihood1 x Likelihood2 x Likelihood3... ๐งฎ
4๏ธโฃ Compare the posterior probability of spam vs not spam
Whichever posterior is higher tells us how to classify the email! ๐
So in a nutshell:
Let's walk through the key steps of the Naive Bayes algorithm to see the math in action.
We'll use a simple example trying to classify emails as spam or not spam based on 2 keyword features: contains "free" and contains "money".
1๏ธโฃ Gather your training data
We need a training set with emails labeled as spam or not spam to start. Let's say we have 100 emails:
2๏ธโฃ Calculate the prior probabilities
The prior probability of an email being spam P(spam) is 20/100 or 0.2
The prior probability of not spam P(not spam) is 80/100 or 0.8
These are our base rates before seeing any email features.
3๏ธโฃ Calculate the likelihood probabilities
Let's say in the training data:
So the likelihood P("free"|spam) is 15/20 = 0.75
And P("free"|not spam) is 5/80 = 0.0625
We then do the same for the "money" feature.
4๏ธโฃ Multiply likelihoods and prior to get posteriors
For an email with "free" and "money", the posterior probabilities are:
P(spam|"free","money") = P(spam) x P("free"|spam) x P("money"|spam)
P(not spam|"free", "money") = P(not spam) x P("free"|not spam) x P("money"|not spam)
5๏ธโฃ Classify based on highest posterior
If P(spam|"free","money") is higher, we classify the email as spam!
Fast and simple โก๏ธ
Performs well with small data ๐
Easy to implement ๐ป
Interpretable ๐ต๏ธโโ๏ธ
Resilient to irrelevant features ๐ช
Naive assumption ๐คจ
Prone to overfitting ๐คช
Metrics difficult to calculate ๐
Not suitable for complex data ๐ฎ
Spam filtering ๐ง
Sentiment analysis ๐๐ก
Recommender systems ๐๏ธ
Text classification ๐
Disease prediction ๐ฉบ
Bayes' theorem - Defines conditional probability P(A|B) as P(B|A)P(A)/P(B).
Likelihood - Probability of data given a hypothesis, P(D|H).
Prior probability - Initial probability before new evidence, P(H).
Posterior probability - Updated probability after new evidence, P(H|D).
Conditional independence - Assumption features are unrelated.
Gaussian distribution - Normal distribution shaped like a bell curve.