What is Logistic Regression?
Logistic regression is an example of supervised learning. It is used to calculate or predict the probability of a binary event occurring. Let’s assume that you have a data set like image below.
How do you predict a value using linear regression, such as Simple Linear Regression or Multiple Linear Regression? If you use linear regression on the data set above, you will get a graph like image below.
But what hint does linear regression give us? As shown in the graph, as people get older, they take action more often, and when they are younger, they take action less often.
So, if you draw a graph applying this to the extreme, you get a graph like the image below.
To make the graph less extreme, we use the sigmoid function with the formula.
Let’s assume the formula for a straight line:
And the formula for the sigmoid function is:
Combining the two formulas gives the formula shown below, which is the formula of logistic regression:
And the graph will be drawn like:
So, how can we use this graph? You can get the probabilities for each age like image below.
After getting the probabilities, classify them as 0 or 1 based on a specific threshold, as shown in the image below.
Example
Code
1
2
3
4
5
6
7
8
9
10
11
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)