What is Bayes?
Bayesian analysis is a statistical paradigm that answers research questions about unknown parameters using probability statements. It can be expressed in the following formula:
And you can get more information about Bayes’ Theroem in here
What is Naive bayes?
Classify the data based on the Bayesian theory. I will explain step by step below. Before starting steps, let’s assume that we have a dataset like the one in the image.
The gray dot represents a data point that we want to predict. Then we will predict the category between “Walks” and “Drives” based on probability:
- Walks probability:
- Drives probability:
Steps of calculating probability.
Step 1
Calculate probability of $ P(Walks \vert X) $Step 2
Calculate probability of $ P(Drives \vert X) $Step 3
Classify the data by the probabilities calculated in step 1 and step 2.
How to calculate $ P(Walks \vert X) $?
- Step 1
Thus, in this case, the probability of $P(Walks)$ will be $\frac{10}{30}$
- Step 2
To do this step, we must select a radius and we are going to draw a circle of your desired size around our observation like image below. Then, we will count all the points that are inside the circle. And it will be the number of similar observations. In this step, we are going to find P(X) which expressed by:
And in this case, $P(X)$ will be $\frac{4}{30}$
- Step 3
In this step, we will find $P(X \mid Walks)$, which is expressed as:
- Step 4
In this step we are going to find $P(Walks \vert X)$ and it will be:
Example
Code
1
2
3
4
5
6
7
8
9
10
11
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
classifier = GaussianNB()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)