17.7.4.3 Algorithms (Discriminant Analysis)

1 Test for Equality of Within-group Covariance Matrices
2 Canonical Discriminant Analysis
3 Mahalanobis Distance
4 Classify

Discriminant Analysis is used to allocate observations to groups using information from observations whose group memberships are known (i.e., training data).

Let $X_t\$ be the training data with n observations and p variables on $n_g$ groups. $\bar{x}_j$ is a row vector of the sample mean for the jth group, $n_j\$ is the number of observations for the jth group. The within-group covariance matrix for group j can be expressed as:

$S_j=\frac{1}{n_j-1}\cdot (X_{t}-\bar{x}_j)^T(X_{t}-\bar{x}_j)$

The pooled within-group covariance matrix is:

$S=\frac{1}{n-n_g}\cdot\sum_{j=1}^{n_g} (X_{t}-\bar{x}_j)^T(X_{t}-\bar{x}_j)$

Note that missing values are excluded in a listwise way in the analysis (i.e., an observation containing one or more missing values will be excluded in the analysis).

Test for Equality of Within-group Covariance Matrices

If training data are assumed to follow a multivariate normal distribution, the following likelihood-ratio test statistic G can be used to test for equality of within-group covariance matrices.

$G=C{(n-n_g) \mathrm{log} |S|-\sum_{j=1}^{n_g} (n_j-1) \mathrm{log} |S_j|}$

where

$C=1-\frac{2p^2+3p-1}{6(p+1)(n_g-1)}\cdot(\sum_{j=1}^{n_g} \frac{1}{n_j-1} -\frac{1}{n-n_g})$

For large n, G is approximately distributed as a $\chi^2\$ variable with $\frac{1}{2}\cdot p(p+1)(n_g-1)$ degrees of freedom.

Canonical Discriminant Analysis

Canonical discriminant analysis is used to find the linear combination of the p variables that maximizes the ratio of between-group to within-group variation. The formed canonical variates can then be used to discriminate between groups.

Let the training data with total means subtracted be X, and its rank be k, then the orthogonal matrix Q can be calculated from QR decomposition (for full column rank) or SVD from X. And $Q_X\$ is the first k columns of Q. Let $Q_g\$ be an n by $n_g-1$ orthogonal matrix to define groups. Then let the k by $n_g-1\$ matrix V be

$V=Q_X^TQ_g$

The SVD of V is:

$V=U_X \triangle U_g^T$

Non-zero diagonal elements of the matrix $\triangle$ are the l canonical correlations associated with the l canonical variates, $\delta_i\$ i=1,2,...,l and $l=\mathrm{min}(k, n_g)\$ .

Eigenvalues of the within-group sums of squares matrix are:

$\lambda_i=\frac{\delta_i^2}{1-\delta_i^2}$

Wilks' Lambda

Testing for a significant dimensionality greater than i,

$\Lambda_i=\prod_{j=i+1}^{l} 1/(1+\lambda_j)$

A $\chi^2\$ statistic with $(k-i)(n_g-1-i)\$ degrees of freedom is used:

$(n-1-n_g-\frac{1}{2}(k-n_g))\sum_{j=i+1}^{l} \mathrm{log}(1+\lambda_j)\ i=0,1,...,l-1$

Unstandardized Canonical Coefficients

Loading matrix B for canonical variates can be calculated from $U_X\$ . It is scaled so that the canonical variates have unit pooled within-group variance. i.e.

$B^TSB=I\$

Note that eigenvector's sign in the SVD result is not unique, which means each column in B can be multiplied by -1. Origin normalizes its sign by forcing the sum of each column in $RB\$ to be positive, where R is the Cholesky factorization of S.

Constant items can be calculated as follows.

$C_0=-X_mB\$

where $X_m\$ is a row vector of means for variables.

Standardized Canonical Coefficients

$D=S_aB\$

where $S_a\$ is a diagonal matrix, whose diagonal elements are the square roots of the diagonal elements of pooled within group covariance matrix S.

Canonical Structure Matrix

$C=S_a^{-1}SB\$

Canonical Group Means

$M_j=C_0+\bar{x}_jB\$

where $M_j\$ and $\bar{x}_j\$ are row vectors of the canonical group mean and group mean for the jth group, respectively.

Canonical Scores

$A_i=C_0+X_iB\$

where $A_i\$ is the canonical score for the ith observation $X_i\$ .

Note that here the ith observation can be training data and test data.

Mahalanobis Distance

Mahalanobis distance is a measure of the distance of an observation from a group. It has two forms. For an observation $x_i\$ from the jth group, the distance is:

Using within-group covariance matrix

$D_{ij}^2=(x_i-\bar{x}_j)S_j^{-1}(x_i-\bar{x}_j)^T$

Using pooled within-group covariance matrix

$D_{ij}^2=(x_i-\bar{x}_j)S^{-1}(x_i-\bar{x}_j)^T$

Classify

Prior Probabilities

The prior probabilities reflect the user’s view as to the likelihood of the observations coming from the different groups. Origin supports two kinds of prior probabilities:

Equal

$\pi_j=1/n_g\$

Proportional to Group Size

$\pi_j=n_j/n\$

where $n_j\$ is the number of observations in the jth group of the training data.

Posterior Probability

The p variables of observations are assumed to follow a multivariate Normal distribution with mean $\mu_j\$ and covariance matrix $\Sigma_j\$ if the observation comes from the jth group. If $p(x_i|\mu_j,\Sigma_j)\$ is the probability of observing the observation $x_i\$ from group j, then the posterior probability of belonging to group j is:

$q_j=p(j|x_i,\mu_j,\Sigma_j)\propto p(x_i|\mu_j,\Sigma_j)\pi_j$

The parameters $\mu_j\$ and $\Sigma_j\$ are estimated from training data $X_t\$ . And the observation is allocated to the group with the highest posterior probability. Origin provides two methods to calculate posterior probability.

Linear Discriminant Function

Within-group covariance matrices are assumed equal.

$\mathrm{log}(q_j)=-\frac{1}{2}D_{ij}^2+\mathrm{log}(\pi_j)+c_0$

where $D_{ij}^2$ is the the Mahalanobis distance of the ith observation from the jth group using pooled with-group covariance matrix, and $c_0\$ is a constant.

Quadratic Discriminant Function

Within-group covariance matrices are not assumed equal.

$\mathrm{log}(q_j)=-\frac{1}{2}D_{ij}^2+\mathrm{log}(\pi_j)-\frac{1}{2}\mathrm{log}|S_j|+c_0$

where $D_{ij}^2$ is the the Mahalanobis distance of the ith observation from the jth group using with-group covariance matrices, and $c_0\$ is a constant.

$q_j\$ are standardized as follows and $c_0\$ will be determined from the standardization.

$\sum_{j=1}^{n_g} q_j=1$

Atypicality Index

Atypicality Index $I_j(x_i)\$ indicates the probability of obtaining an observation more typical of group j than the ith observation. If it is close to 1 for all groups, it implies that the observation may come from a grouping not represented in the training data. Atypicality Index is calculated as:

$I_j(x_i)=P(B\le z:\frac{1}{2}p,\frac{1}{2}(n_j-d))$

where $P(B\le \beta:\ a, b)$ is the lower tail probability from a beta distribution, for equal within-group covariance matrices,

$z=D_{ij}^2/(D_{ij}^2+(n-n_g)(n_j-1)/n_j)$

for unequal within-group covariance matrices,

$z=D_{ij}^2/(D_{ij}^2+(n_j^2-1)/n_j)$

Linear Discriminant Function Coefficients

Linear discriminant function (also known as Fisher's linear discriminant functions) can be calculated as:

Linear Coefficient for the jth Group.

$b_j=S^{-1}\bar{x}_j^T$

where $b_j\$ is a column vector with size of p.

Constant Coefficient for the jth Group.

$a_j=\bar{x}_jb_j$

Classify Training Data

Each observation in training data can be classified by posterior probabilities (i.e., it is allocated to the group with the highest posterior probability). Squared Mahalanobis distance from each group and Atypicality Index of each group can also be calculated.

Classification result for training data is summarized by comparing given group membership and predicted group membership. Misclassified error rate is calculated by the percentage of misclassified observations weighted by the prior probabilities of groups. i.e.

$E=\sum_{j=1}^{n_g} e_j\pi_j$

where $e_j\$ is the percentage of misclassified observations for the jth group.

Cross Validation for Training Data

It follows the same procedure as Classify Training Data except that to predict an observation's membership in training data, the observation is excluded during calculating within-group covariance matrices or pooled within-group covariance matrix.

Classify Test Data

Within-group covariance matrices and pooled within-group covariance matrix are calculated from training data. Each observation in test data can be classified by posterior probabilities (i.e., it is allocated to the group with the highest posterior probability).