PCA is one of famous techniqeus for dimension reduction, feature extraction, and data visualization. – PCA is a method that rotates the dataset in a way such that the rotated features are statistically uncorrelated. – This rotation is often followed by selecting only a subset of the new features, based on how useful the features are. – PCA is used to reduce the number of independent variables in a dataset and is applicable when the ratio of data points to independent variables is low. – PCA transforms a linear combination of variables such that the resulting variable expresses the maximum variance within the combination of variables. – Every principal component will ALWAYS be orthogonal (perpendicular) to every other principal component, and hence linearly independent to each other.

Steps for PCA: (1) Standardisation or scaling the data. (2) Computing Covariance matrix (3) Calculating EigenVectore and EigenValues (4) Computing Principal Components (5) Reducing the dimension of data by selecting best components without losing

information

Example of the PCA Algorithm

Let’s try displaying the Times stories using the principal components. – First we make an empty plot— just to set up the axes nicely for the data which will actually be displayed. – Then plot a blue “m” at the location of each music story, and a red “a” at the location of each art story. – Although we have gone from 4431 dimensions to 2, and so thrown away a lot of information, we could draw a line across this plot and have most of the art stories on one side of it and all the music stories on the other. – If we let ourselves use the first four or five principal components, we’d still have a thousand- fold savings in dimensions, but we’d be able to get almost-perfect separation between the two classes. – This is a sign that PCA is really doing a good job at summarizing the information in the word-count vectors, and in turn that the bags of words give us a lot of information about the meaning of the stories.

Let’s discuss the rest in comments! Thanks for reading.