What is Kernel PCA?
PCA is a linear algorithm whereas Kernel PCA is a non-linear algorithm.
Kernel PCA is used for:
- Noise filtering
- Visualization
- Feature extraction
- Stock market predictions
- Gene data analysis
The goals of Kernel PCA are:
- Identifying patterns in data.
- Detect the correlation between variables.
The role of Kernel PCA is:
- Standardize the data.
- Obtain the eigenvectors and eigenvalues from the covariance matrix or correlation matrix, or perform Singular Value Decomposition (SVD).
- Sort eigenvalues in descending order and choose the $k$ eigenvectors that correspond to the $k$ largest eigenvalues where $k$ is the number of dimensions of the new feature subspace.
- Construct the projection matrix W from the selected $k$ eigen vectors.
- Transform the original data set X via W to obtain a k-dimensional feature subspace Y.
Kernel PCA can be expressed as:
Example
Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import KernelPCA
from sklearn.linear_model import LogisticRegression
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
kpca = KernelPCA(n_components=2, kernel='rbf')
X_train = kpca.fit_transform(X_train)
X_test = kpca.transform(X_test)
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)