17.5.4.2 Algorithms (Mann-Whitney Test)

Consider two independent samples $F(x)\,$ and $G(y)\,$ , with the size of $n_1\,\!$ and $n_2\,\!$ , and the sample data is denoted as $x_1,x_2,\ldots ,x_{n_1}\,\!$ and $y_1,y_2,\ldots ,y_{n_1}\,\!$ respectively.

The null hypothesis, $H_0: F(x) = G(y)\,$ , is that the two distributions are the same. And this is to be tested against an alternative hypothesis $H_1\,$ which is:

$H_1: F(x) \neq G(y)\,$ ; or

$H_1: F(x) < G(y)\,\!$ , the $x\,$ 's tend to be greater than the $y\,$ 's; or

$H_1: F(x) > G(y)\,\!$ , the $x\,$ 's tend to be less than the $y\,$ 's.

The test procedure includes the following steps:

Combine $x_i \,\!$ , $y_i\,\!$ in a group.
Rank them in ascending order. Ties receive the average of their ranks. Let $r_{1i}\,\!$ be the ranks assigned to $x_i \,\!$ , for $i=1,2,\ldots ,n_1$ and be the ranks assigned to $y_i\,\!$ , for $j=1,2,\ldots ,n_2$ .
Calculate sum of ranks:
$S_1=\sum_{I=1}^{n_1}r_{1i}\,\!$ , and $S_2=\sum_{I=1}^{n_2}r_{2j}\,\!$
Test statistic $U\,$ is defined as follow:
$U=S_1-\frac{n_1(n_1+1)}2\,$
The approximate Normal test statistic $z\,$ is calculated as:
$z=\frac{U-M(U)\pm \frac 12}{\sqrt{Var(U)}} \,$
where
$M(U)=\frac{n_1n_2}2 \,$
and
$Var(U)=\frac{n_1n_2(n_1+n_2+1)}{12}-\frac{n_1n_2}{(n_1+n_2)(n_1+n_2-1)}\times TS \,$
where
$TS=\sum_{j=1}^\tau \frac{(t_j)(t_j-1)(t_j+1)}{12}\,$ .
$\tau \,$ is the number of ties in the sample and $t_j\,$ is the number of ties in the jth group.
Note that if no ties are present, the variance of $U \,$ reduces to $\frac{n_1n_2(n_1+n_2+1)}{12}\,$

For more details of the algorithm, please refer to nag_mann_whitney (g08amc)