Principal Component Analysis (PCA) is a dimensionality reduction technique used to reduce the number of features in a dataset while retaining most of the information.

Steps to Perform PCA in R

We will load the iris data.

Standardize the data and then compute PCA.

library(factoextra)

Loading required package: ggplot2

Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':
last_plot

The following object is masked from 'package:stats':
filter

The following object is masked from 'package:graphics':
layout

data <- irispca_result <-prcomp(data[, 1:4], scale = T)pca_result

The Scree Plot suggests to decide the number of principle components to retain by looking an elbow point where the explained variance starts to level off.

The biplot visualizes both the samples (points) and the variables (arrows). Points that are close to each other represent samples with similar characteristics, while the direction and length of the arrows indicate the contribution of each variable to the principal components.

Visualization of PCA in 3d Scatter Plot

A 3d scatter plot allows us to see the relationships between three principle components simultaneously and also gives us a better understanding of how much variance is explained by these components.

It also allows for interactive exploration where we can rotate the plot and view it from a different angles.

Next, we will create a dataframe of the 3 principle components and negate PC2 and PC3 for visual preference to make the plot look more organised and symmetric in 3d space.

fig <-plot_ly(components, x =~PC1, y =~PC2, z =~PC3, color =~data$Species, colors =c('darkgreen','darkblue','darkred')) %>%add_markers(size =12)fig <- fig %>%layout(title ="3d Visualization of PCA",scene =list(bgcolor ="lightgray"))fig