Assume we have response data measured in k levels of the factor, where
represents the value of ith observation (i = 1, 2, ...
) on the jth factor level (j = 1, 2, ..., k). Then we could write the model of one-way ANOVA as:
,j = 1,2, ..., k; i = 1, 2, ...
Since ANOVA testing whether the mean of two or more populations (levels) are equal. Thus, the null hypothesis is that the means of the different populations are the same and the alternate hypothesis is at least one psample's mean is different from the others. Mathematically, this is expressed as:
H0:
H1:
for some p and q,
,
.
where
is the jth sample mean. To test the hypothesis, it should be divide the total sample variation into variation between groups and variation within groups, and then using the F-test to test whether these two variations are different.
Algebraically, we can use the respective mean square of each part to estimate the variation:
where the left term is called the "total sum of squares", the second term is called the "sum of squares of treatments", which represents the variation between groups, and the third term is called "sum of squares of error", which represent the variation within groups. The equation is then commonly abbreviated to
When
is true, the k levels sample data will be normally and independently distributed, with mean
and variance
. Thus the statistic
will follow an F distribution
where
is the mean squares for treatments and
is the mean squares for error, which are both formed by dividing the sum of squares by the associated degrees of freedom respectively. Given a certain significance level
, if the F statistic exceeds the critical value
which is the tabular value of the F distribution with k-1 and n-k degrees of freedom at level
, or equivalently, the followed P value less than the significance level, the null hypothesis should be rejected.
Typically, it is common to present the results of the analysis of variance in an ANOVA table:
| Source of Variation | Degrees of Freedom (DF) | Sum of Squares (SS) | Mean Square (MS) | F Value | Prob > F |
|---|---|---|---|---|---|
| Model (Factor) | k-1 |
|
|
/
|
|
| Error | n-k |
|
|
||
| Total | n-1 |
In the analysis of variance, it is assumed that different samples have equal variances, which is commonly called homogeneity of variance. The Levene test and Brown-Forsythe test can be used to verify the assumption. Suppose we have k samples of response data, where
represents the value of ith observation (i = 1, 2, ...
) on the jth factor level (j = 1, 2, ..., k). The hypotheses of both Levene test and Brown-Forsythe test can be expressed as:
:
:
, for at least one pair (p, q),
Define
as the following three definitions according to different tests,

When
holds, the test statistic
will (approximately) follow an F distribution
where
and
are the group mean of and the overall mean of the
respectively.
Given that an ANOVA experiment has determined that at least one of the population means is significantly different, multiple means comparison subsequently compares all possible pairs of factor level means to determine which mean (or means) is (or are) significantly different. There are various methods for mean comparison in Origin, and we use the NAG function nag_anova_confid_interval (g04dbc) to perform means comparisons.
Two types of multiple means comparison methods are included in Origin:
The power analysis procedure calculates the actual power for the sample data, as well as the hypothetical power if additional sample sizes are specified.
The power of a one-way analysis of variance is a measurement of its sensitivity. Power is the probability that the one-way ANOVA will detect differences in the sample means when real differences exist. In terms of the null and alternative hypotheses, power is the probability that the test statistic F will be extreme enough to reject the null hypothesis when it should be rejected actually (i.e. given the null hypothesis is not true).
Power is defined by the equation:
where f is the deviate from the non-central F-distribution with dfa and dfe, model and error degrees of freedom, respectively. And nc = SST/MSE, where SST is the sum of squares of the Model, and MSE is the mean square of the Errors. The value of probf( ) is obtained using the NAG function nag_prob_non_central_f_dist (g01gdc). Please see the NAG documentation for more detailed information.
All the above is a brief algorithm outline of one-way analysis of variation, for more information about the detail mathematical deduction, please reference to the corresponding part of the user's manual and NAG document.