The procedure below draws on NAG algorithms.
Consider two independent samples X and Y, with the size of
and
.Denoted as
and
respectively. Let F(x) and G(x) represent their respective, unknown distribution functions. Also let
and
denote the values of sample empirical distribution functions.
The null hypothesis :F(x)=G(x)
The alternative hypothesis
:F(x)<>G(x) the associated p-value is a two-tailed probability;
or
:F(x)>G(x) the associated p-value is an upper-tailed probability,
or
: F(x)<G(x) the associated p-value is a lower-tailed probability
For the first case of
, the statistics
represents the largest absolute deviation of the two empirical distribution functions.
For the second case of
, the statistics
represents the largest positive deviation between the empirical distribution function of the first sample and the empirical distribution function of the second sample, that is
.
For the third case of
, the statistics
represents the largest positive deviation between the empirical distribution function of the second sample and the empirical distribution function of the first sample, that is
.
KS-test2 also returns the standard statistics
,
where
maybe
,
,
depending on the choice of the alternative hypothesis.
The distribution of the statistic
converges asymptotically to a distribution given by Smirnov as
and
increase. The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed.
If
and
then an exact method is given by Kim and Jinrich. Otherwise
is computed using the approximations suggested by Kim and Jenrich (1973)
Note that the method used only exact for continuous theoretical distributions.
This method computes the two-sided probability. The one-sided probabilities are estimated by having the two-sided probability. This is a good estimate for small
, that is
, but it becomes very poor for larger
.
For more details of the algorithm, please refer to nag_2_sample_ks_test (g08cdc) .