17.5.4.2 Algorithms (Mann-Whitney Test)
Consider two independent samples \(F(x)\,\) and \(G(y)\,\), with the size of \(n_1\,\!\) and \(n_2\,\! \), and the sample data is denoted as \(x_1,x_2,\ldots ,x_{n_1}\,\!\) and \(y_1,y_2,\ldots ,y_{n_1}\,\!\) respectively.
The null hypothesis, \(H_0: F(x) = G(y)\,\), is that the two distributions are the same. And this is to be tested against an alternative hypothesis \(H_1\,\) which is:
- \(H_1: F(x) \neq G(y)\,\); or
- \(H_1: F(x) < G(y)\,\!\), the \(x\,\)'s tend to be greater than the \(y\,\)'s; or
- \(H_1: F(x) > G(y)\,\!\), the \(x\,\)'s tend to be less than the \(y\,\)'s.
The test procedure includes the following steps:
- Combine \( x_i \,\!\), \( y_i\,\!\) in a group.
- Rank them in ascending order. Ties receive the average of their ranks. Let \(r_{1i}\,\!\)be the ranks assigned to\( x_i \,\!\), for \( i=1,2,\ldots ,n_1\) and be the ranks assigned to \( y_i\,\!\), for \( j=1,2,\ldots ,n_2\).
- Calculate sum of ranks:
- \( S_1=\sum_{I=1}^{n_1}r_{1i}\,\!\), and \( S_2=\sum_{I=1}^{n_2}r_{2j}\,\!\)
- Test statistic \(U\,\) is defined as follow:
- \[ U=S_1-\frac{n_1(n_1+1)}2\,\]
- The approximate Normal test statistic \(z\,\)is calculated as:
- \[z=\frac{U-M(U)\pm \frac 12}{\sqrt{Var(U)}} \,\]
- \[M(U)=\frac{n_1n_2}2 \,\]
- \[Var(U)=\frac{n_1n_2(n_1+n_2+1)}{12}-\frac{n_1n_2}{(n_1+n_2)(n_1+n_2-1)}\times TS \,\]
- \(TS=\sum_{j=1}^\tau \frac{(t_j)(t_j-1)(t_j+1)}{12}\,\).
Note that if no ties are present, the variance of \(U \,\) reduces to \(\frac{n_1n_2(n_1+n_2+1)}{12}\,\)
For more details of the algorithm, please refer to nag_mann_whitney (g08amc)