Principal Component Analysis (PCA) is a dimensionality reduction technique used to reduce the number of features in a dataset while retaining most of the information.
Steps to Perform PCA in R
We will load the iris data.
Standardize the data and then compute PCA.
library(factoextra)
Loading required package: ggplot2
Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
data <- irispca_result <-prcomp(data[, 1:4], scale = T)pca_result
The Scree Plot suggests to decide the number of principle components to retain by looking an elbow point where the explained variance starts to level off.
The biplot visualizes both the samples (points) and the variables (arrows). Points that are close to each other represent samples with similar characteristics, while the direction and length of the arrows indicate the contribution of each variable to the principal components.
Visualization of PCA in 3d Scatter Plot
A 3d scatter plot allows us to see the relationships between three principle components simultaneously and also gives us a better understanding of how much variance is explained by these components.
It also allows for interactive exploration where we can rotate the plot and view it from a different angles.
Next, we will create a dataframe of the 3 principle components and negate PC2 and PC3 for visual preference to make the plot look more organised and symmetric in 3d space.
fig <-plot_ly(components, x =~PC1, y =~PC2, z =~PC3, color =~data$Species, colors =c('darkgreen','darkblue','darkred')) %>%add_markers(size =12)fig <- fig %>%layout(title ="3d Visualization of PCA",scene =list(bgcolor ="lightgray"))fig