Home Naive Bayes
Post
Cancel

Naive Bayes

What is Bayes?

Bayesian analysis is a statistical paradigm that answers research questions about unknown parameters using probability statements. It can be expressed in the following formula:

$ P(A\vert B) = $ $ P(B\vert A) * P(A) \over P(B) $ $ = $ $ P(A \cap B) \over P(A) $

And you can get more information about Bayes’ Theroem in here

What is Naive bayes?

Classify the data based on the Bayesian theory. I will explain step by step below. Before starting steps, let’s assume that we have a dataset like the one in the image.


The gray dot represents a data point that we want to predict. Then we will predict the category between “Walks” and “Drives” based on probability:

  • Walks probability:
$ P(Walks \vert X) = $ $P(X \vert Walks) * P(Walks) \over P(X) $
  • Drives probability:
$ P(Drives \vert X) = $ $P(X \vert Drives) * P(Drives) \over P(X) $



Steps of calculating probability.



  • Step 1
    Calculate probability of $ P(Walks \vert X) $

  • Step 2
    Calculate probability of $ P(Drives \vert X) $

  • Step 3
    Classify the data by the probabilities calculated in step 1 and step 2.



How to calculate $ P(Walks \vert X) $?



  • Step 1
$P(Walks) = \frac{\text{Number of Walkers}}{\text{Total Observations}}$

Thus, in this case, the probability of $P(Walks)$ will be $\frac{10}{30}$

  • Step 2
    To do this step, we must select a radius and we are going to draw a circle of your desired size around our observation like image below. Then, we will count all the points that are inside the circle. And it will be the number of similar observations. In this step, we are going to find P(X) which expressed by:
$P(X) = \frac{\text{Number of Similar Observations}}{\text{Total Observations}}$
And in this case, $P(X)$ will be $\frac{4}{30}$
  • Step 3

In this step, we will find $P(X \mid Walks)$, which is expressed as:

$P(X \mid Walks) = \frac{\text{Among those who Walk}}{\text{Total number of Walkers}} $ And in this case, $(X \mid Walks)$ will be $\frac{3}{10}$
  • Step 4

In this step we are going to find $P(Walks \vert X)$ and it will be:

$P(Walks \mid X) = \frac{\frac {3}{10} \times \frac{10}{30}} {\frac{4}{30}} = 0.75$

Example



Code



1
2
3
4
5
6
7
8
9
10
11
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
     
classifier = GaussianNB()
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)



Result







Implementation

This post is licensed under CC BY 4.0 by the author.