Discriminant Analysis is used to allocate observations to groups using information from observations whose group memberships are known (i.e., training data).
Let
be the training data with n observations and p variables on
groups.
is a row vector of the sample mean for the jth group,
is the number of observations for the jth group. The within-group covariance matrix for group j can be expressed as:
The pooled within-group covariance matrix is:
Note that missing values are excluded in a listwise way in the analysis (i.e., an observation containing one or more missing values will be excluded in the analysis).
If training data are assumed to follow a multivariate normal distribution, the following likelihood-ratio test statistic G can be used to test for equality of within-group covariance matrices.
where
For large n, G is approximately distributed as a
variable with
degrees of freedom.
Canonical discriminant analysis is used to find the linear combination of the p variables that maximizes the ratio of between-group to within-group variation. The formed canonical variates can then be used to discriminate between groups.
Let the training data with total means subtracted be X, and its rank be k, then the orthogonal matrix Q can be calculated from QR decomposition (for full column rank) or SVD from X. And
is the first k columns of Q. Let
be an n by
orthogonal matrix to define groups. Then let the k by
matrix V be
The SVD of V is:
Non-zero diagonal elements of the matrix
are the l canonical correlations associated with the l canonical variates,
i=1,2,...,l and
.
Eigenvalues of the within-group sums of squares matrix are:
/math-4e02cd8f2b56fe09ee5b91cfca052400.png?v=0)
statistic with
degrees of freedom is used:/math-ec58bf195d61ef078e1bc352e76becac.png?v=0)
. It is scaled so that the canonical variates have unit pooled within-group variance. i.e.
/math-ea1d8b4a1a9949c04ce897debcb0f3cb.png?v=0)
to be positive, where R is the Cholesky factorization of S./math-842807b1d59b68029f9e47502204de56.png?v=0)
is a row vector of means for variables./math-f83a67f933bd9989a9c82a155597d0f4.png?v=0)
is a diagonal matrix, whose diagonal elements are the square roots of the diagonal elements of pooled within group covariance matrix S./math-3d43ba2f89c50ad3d6e41f32eaad76dc.png?v=0)
/math-ae15be586008ad972cdb2e54d0a8c6c4.png?v=0)
and
are row vectors of the canonical group mean and group mean for the jth group, respectively./math-e7722c1abe5d5c070d236c68e07ee5c8.png?v=0)
is the canonical score for the ith observation
.Mahalanobis distance is a measure of the distance of an observation from a group. It has two forms. For an observation
from the jth group, the distance is:
/math-4e610adaa8869efc930751889369fb0d.png?v=0)
/math-d92d0334e383634096e4117f51874db1.png?v=0)
The prior probabilities reflect the user’s view as to the likelihood of the observations coming from the different groups. Origin supports two kinds of prior probabilities:
/math-e1de4455b1c3e1e0a1ab05eda657f3ad.png?v=0)
/math-97ca54b52364dcc3af70b85c99b12763.png?v=0)
is the number of observations in the jth group of the training data.The p variables of observations are assumed to follow a multivariate Normal distribution with mean
and covariance matrix
if the observation comes from the jth group.
If
is the probability of observing the observation
from group j, then the posterior probability of belonging to group j is:
/math-55fe41c69c4f2dbb7fe75337bfa93c42.png?v=0)
The parameters
and
are estimated from training data
. And the observation is allocated to the group with the highest posterior probability. Origin provides two methods to calculate posterior probability.
/math-78b58f6de3a01695068779e99f80e92c.png?v=0)
is the the Mahalanobis distance of the ith observation from the jth group using pooled with-group covariance matrix, and
is a constant./math-97bb999f1c63118e1c15d0cd99f3ecd9.png?v=0)
is the the Mahalanobis distance of the ith observation from the jth group using with-group covariance matrices, and
is a constant.
are standardized as follows and
will be determined from the standardization.
/math-766af91331121006a73b3d401c5e661e.png?v=0)
Atypicality Index
indicates the probability of obtaining an observation more typical of group
j than the ith observation. If it is close to 1 for all groups, it implies that the observation may come from a grouping not represented in the training data. Atypicality Index is calculated as:
/math-732acf44e40ad9cac61088dcd141df05.png?v=0)
where
is the lower tail probability from a beta distribution, for equal within-group covariance matrices,
/math-cd6fb8799495aa07b0c4c32cb5b97836.png?v=0)
for unequal within-group covariance matrices,
/math-45a4cc9d19289a7db844343cc7755561.png?v=0)
Linear discriminant function (also known as Fisher's linear discriminant functions) can be calculated as:
/math-bba90644f0ee4526a124e9a060efad5f.png?v=0)
is a column vector with size of p./math-b49fb8d88d19012d29f58a5ec2d3145e.png?v=0)
Each observation in training data can be classified by posterior probabilities (i.e., it is allocated to the group with the highest posterior probability). Squared Mahalanobis distance from each group and Atypicality Index of each group can also be calculated.
Classification result for training data is summarized by comparing given group membership and predicted group membership. Misclassified error rate is calculated by the percentage of misclassified observations weighted by the prior probabilities of groups. i.e.
/math-23a8991fe54e13492cec65429649bf96.png?v=0)
where
is the percentage of misclassified observations for the jth group.
It follows the same procedure as Classify Training Data except that to predict an observation's membership in training data, the observation is excluded during calculating within-group covariance matrices or pooled within-group covariance matrix.
Within-group covariance matrices and pooled within-group covariance matrix are calculated from training data. Each observation in test data can be classified by posterior probabilities (i.e., it is allocated to the group with the highest posterior probability).