Principal Component Analysis examines relationships of variables. It can be used to reduce the number of variables in regression and clustering, for example.
Each principal component in Principal Component Analysis is the linear combination of the variables and gives a maximized variance. Let X be a matrix for n observations by p variables, and the covariance matrix is S. Then for a linear combination of the variables
/math-ab9343ff9fcb3706ee7abdb2cb379b0e.png?v=0)
where
is the ith variable,
are linear combination coefficients for
, they can be denoted by a column vector
, and normalized by
. The variance of
will be
.
The vector
is found by maximizing the variance. And
is called the first principal component. The second principal component can be found in the same way by maximizing:
subject to the constraints
and /math-ae413d86760f05319c501f0496c6c868.png?v=0)
It gives the second principal component that is orthogonal to the first one. Remaining principal components can be derived in a similar way. In fact coefficients
can be calculated from eigenvectors of the matrix S. Origin uses different methods according to the way of excluding missing values.
An observation containing one or more missing values will be excluded in the analysis. And a matrix
for SVD can be derived from X depending on the matrix type for analysis.
be the matrix X with each column's mean subtracted from each variable and each column scaled by
.
be the matrix X with each column's mean subtracted from each variable and each column scaled by
where
is the standard deviation of the ith variable.Perform SVD on
.
/math-d498c41ab09cd28f0ed57082a46e997c.png?v=0)
where V is an n by p matrix with
, P is a p by p matrix, and
is a diagonal matrix with diagonal elements
.
/math-10d0f1ef59c0ff4f54dd64b9e67956eb.png?v=0)
.
is the scores corresponding to the principal component. And scores will be missing values corresponding to an observation containing missing values.An observation is excluded only in the calculation of covariance or correlation between two variables if missing values exist in either of the two variables for the observation.
Eigenvalues and eigenvectors are calculated from the covariance or correlation matrix S.
/math-1eb525b55da5f5a7b146d626ef443681.png?v=0)
where P is a p by p matrix and D is a diagonal matrix with diagonal elements
.
is the ith eigenvalue for the ith principal component. And eigenvalues are sorted in descending order./math-bb91d01fd09e5459e2c215fa93bf2c63.png?v=0)
is the matrix X with each column's mean subtracted from each variable.Bartlett's Test tests the equality of the remaining p-k eigenvalues. It is available only when analysis matrix is covariance matrix.
/math-f7c8bd18f6e041157a76f1fa02889635.png?v=0)
It approximates a
distribution with
degrees of freedom.
/math-465397770aeec65eb2b9adadcdd2c942.png?v=0)