# Singular Value Decomposition (Unsupervised Algorithms)

Singular value decomposition is used to reduce a dataset containing a large number of values to a dataset with significantly fewer values. This reduced dataset will still contain a large fraction of the variability present in the original data. It is used to extract and untangle information, like PCA.

Eigenvalues and Eigenvectors

An eigenvector of an n × n matrix A is a nonzero vector x such that Ax = λx for some scalar λ. A scalar λ is called an eigenvalue of A if there is a nontrivial solution x of Ax = λx; such an x is called an eigenvector corresponding to λ.

Example : So in the first example, recall we have A1v = 4v. Thus, v is an eigenvector of A with a corresponding eigenvalue λ = 4. If you also try A1w, you will find out A1w = w so w is an eigenvector of A with a corresponding eigenvalues of 1.

Diagonalization

A matrix A is diagonalizable if we can rewrite it as a product A=IDI−1, where I is an invertible matrix (and thus I−1 exists) and D is a diagonal matrix (where all off-diagonal elements are zero).

Since I is invertible, P it must be square; hence, it really only makes sense for square matrices.

Another useful note is that if A=IDI−1, then AI=ID. Let’s define I through its columns ai and D via its diagonal entries, we can consider the columns of I separately from each other, the columns of I must be the eigenvectors of A and the values on the diagonal must be eigenvalues of A.

Somehow, the singular value decomposition is essentially diagonalization in a more general sense. Singular value decomposition is used to reduce a dataset containing a large number of values to a dataset with significantly fewer values.

Let’s start by reviewing the matrix transformations

Singular value decomposition

Singular value decomposition (or SVD) is a factorization of a matrix. In fact, is a generalized version of eigenvalue decomposition. Before, for eigenvalue decomposition, we needed to have square matrices. So, a size n × n matrix would have at most n distinct eigenvalues (possibly less if numbers repeated). This is no longer the case.

Given an m×n matrix A with m > n, A can be factorized by SVD into three matrices:

– U is an m × n orthogonal matrix that satisfies UT U = In,
– S is a n×n diagonal matrix,
– V isann×northogonalmatrixsatisfyingVVT =VTV =In,

such that A = USV T . The entries in the diagonal matrix S are known as the singular values of A. They turn out to be the square roots of the eigenvalues of the square matrix AT A. So, if A is a real symmetric matrix with positive eigenvalues, then the singular values and eigenvalues are the same. However, this is not true in general. It is important to realize they are related, but distinct factorizations.

Image Compression by Using SVD(Singular Value Decomposition ): http://fourier.eng.hmc.edu/e161/lectures/svdcompression.html

## Applications: Data Compression

The SVD is a thoroughly useful decomposition, useful for a whole ton of stuff. I’d like to quickly provide you with some examples, just to show you a small glimpse of what this can be used for in computer science, math, and other disciplines.

One application of the SVD is data compression. Consider some matrix A with rank five hundred; that is, the columns of this matrix span a 500-dimensional space. Encoding this matrix on a computer is going to take quite a lot of memory! We might be interested in approximating this matrix with one of lower rank – how close can we get to this matrix if we only approximate it as a matrix with rank one hundred, so that we only have to store a hundred columns? What if we use a matrix of rank twenty? Can we summarize all of the information in this very dense, 500-rank matrix with only a rank twenty matrix?

It turns out that you can prove that taking the n largest singular values A, replacing the rest with zero (to form Σ′), and recomputing UΣ′VT gives you the provably-best n-rank approximation to the matrix. Not only that, but the total of the first n singular values divided by the sum of all the singular values is the percentage of “information” that those singular values contain. If we want to keep 90% of the information, we just need to compute sums of singular values until we reach 90% of the sum, and discard the rest of the singular values. This yields a quick and dirty compression algorithm for matrices – take the SVD, drop all but a few singular values, and then recompute the approximated matrix. Since we only need to store the columns of U and V that actually get used (many get dropped since we set elements on the diagonal of Σ to zero), we greatly reduce the memory usage.
Here’s a tiger: We can convert this tiger to black and white, and then just treat this tiger as a matrix, where each element is the pixel intensity at the relevant location. Here are the singular values of this tiger: Note that this is a log scale (base 10). Most of the action and the largest singular values are the first thirty or so, and they contain a majority of the “information” in this matrix! We can plot the cumulative percentage, to see how much the first thirty or fifty singular values contain of the information: After just fifty of the singular values, we already have over 70% of the information contained in this tiger! Finally, let’s take some approximations and plot a few approximate tigers: Note that after about thirty or fifty components, adding more singular values doesn’t visually seem to improve image quality. By a quick application of SVD, you’ve just compressed a 500×800 pixel image into a 50×500 matrix (for U), 50 singular values, and a 800×50 matrix (for V).

The MATLAB code for generating these is incredibly straight-forward, as follows below.
Low-Rank Matrix Approximation Image Compression