17.4.1.6 Algorithms (Three-Way ANOVA)

Contents

Theory of Three-Way ANOVA

Suppose N observations are associated with three factors, say, factor A with I levels, factor B with J levels and factor C with K levels.

Let \(y_{hijk}\,\!\) denotes the hth observation at level i of factor A, level j of factor B and level k of factor C, the three-way ANOVA model can be written as

\[y_{hijk}=\mu +\alpha _i+\beta _j+\gamma _k+(\alpha\beta)_{ij}+(\alpha\gamma)_{ik}+(\beta\gamma)_{jk}+(\alpha\beta\gamma)_{ijk}+\varepsilon _{hijk}\]

where \(\mu \,\!\) is the whole response data mean, \(\alpha _i\,\!\) is deviation at level i of factor A; \(\beta _j\,\!\) is the deviation at level j of factor B, \(\gamma _k\,\!\) is the deviation at level k of factor C,\((\alpha\beta)_{ij}\,\!\) is interaction term between factors A and B,\((\alpha\gamma)_{ij}\,\!\) is interaction term between factors A and C,\((\beta\gamma)_{ij}\,\!\) is interaction term between factors B and C,\((\alpha\beta\gamma)_{ijk}\,\!\) is the interaction term among factors A and B and C, and \(\varepsilon _{hijk}\,\!\) is the error term.

In three-way ANOVA, users can specify their model. For example, they can exclude the term \((\alpha\beta)_{ij}\,\!\) (if so, then the term \((\alpha\beta\gamma)_{ijk}\,\!\) is autonomously excluded at the same time), then their model would like this:

\[y_{hijk}=\mu +\alpha _i+\beta _j+\gamma _k+(\alpha\gamma)_{ik}+(\beta\gamma)_{jk}+\varepsilon _{hijk}\]

The sample variation of a specified model can be obtained through so-called "design matrix" method. Taking the full model for example, the brief procedure for this method is:

Degrees of Freedom (DF) for the whole model is \(df_{Model}=IJK-1\). The whole design matrix is \(X := X_{N\times df_{Model}} = [X_\mu |X_A |X_B |X_C |X_{AB} |X_{AC} |X_{BC} |X_{ABC}]\), where \(X_\mu\) is the sub-design-matrix for \(\mu\), which is usually constructed by all "1", and other sub-design-matrices for what their subscripts stand. Let \(X_{-*}\) denotes X by replacing the corresponding sub-design-matrix with zeros, for instance, \(X_{-AB} = [X_\mu |X_A |X_B |X_C |0 |X_{AC} |X_{BC} |X_{ABC}]\)

Define

\[R_0 = Y^T X_{\mu}(X_{\mu}^T X_{\mu})^{-1}X_{\mu}^T Y\]

\[R_\mu = Y^T Y\]

\[R_{Model} = Y^T X(X^T X)^{-1}X^T Y\]

\[R_A = Y^T X_{-A}(X_{-A}^T X_{-A})^{-1}X_{-A}^T Y\]

\[R_B = Y^T X_{-B}(X_{-B}^T X_{-B})^{-1}X_{-B}^T Y\]

\[R_C = Y^T X_{-C}(X_{-C}^T X_{-C})^{-1}X_{-C}^T Y\]

\[R_{AB} = Y^T X_{-AB}(X_{-AB}^T X_{-AB})^{-1}X_{-AB}^T Y\]

\[R_{AC} = Y^T X_{-AC}(X_{-AC}^T X_{-AC})^{-1}X_{-AC}^T Y\]

\[R_{BC} = Y^T X_{-BC}(X_{-BC}^T X_{-BC})^{-1}X_{-BC}^T Y\]

\[R_{ABC} = Y^T X_{-ABC}(X_{-ABC}^T X_{-ABC})^{-1}X_{-ABC}^T Y\]

Then the sum of squares error would be

\[SS_A = R_{Model}-R_A\]

\[SS_B = R_{Model}-R_B\]

\[SS_C = R_{Model}-R_C\]

\[SS_{AB} = R_{Model}-R_{AB}\]

\[SS_{AC} = R_{Model}-R_{AC}\]

\[SS_{BC} = R_{Model}-R_{BC}\]

\[SS_{ABC} = R_{Model}-R_{ABC}\]

\[SS_{Error} = R_{\mu}-R_{Model}\]

\[SS_{Total} = R_{\mu}-R_{0}\]


For full model, the ANOVA table is summarized as below:

Source of Variation Degrees of Freedom (DF) Sum of Squares (SS) Mean Square (MS) F Value Prob > F
Factor A I - 1 \[SS_A\] \[MS_A\] \(MS_A\) / \(MS_{Error}\) \[P\{F\geq F_{(I-1,df_e,\alpha )}\}\]
Factor B J - 1 \[SS_B\] \[MS_B\] \(MS_B\) / \(MS_{Error}\) \[P\{F\geq F_{(J-1,df_e,\alpha )}\}\]
Factor C K - 1 \[SS_C\] \[MS_C\] \(MS_C\) / \(MS_{Error}\) \[P\{F\geq F_{(K-1,df_e,\alpha )}\}\]
A*B (I- 1) (J - 1) \[SS_{AB}\] \[MS_{AB}\] \(MS_{AB}\) / \(MS_{Error}\) \[P\{F\geq F_{((I-1)(J-1),df_e,\alpha )}\}\]
A*C (I- 1) (K - 1) \[SS_{AC}\] \[MS_{AC}\] \(MS_{AC}\) / \(MS_{Error}\) \[P\{F\geq F_{((I-1)(K-1),df_e,\alpha )}\}\]
B*C (J- 1) (K - 1) \[SS_{BC}\] \[MS_{BC}\] \(MS_{BC}\) / \(MS_{Error}\) \[P\{F\geq F_{((J-1)(K-1),df_e,\alpha )}\}\]
A*B*C (I- 1) (J - 1)(K - 1) \[SS_{ABC}\] \[MS_{ABC}\] \(MS_{ABC}\) / \(MS_{Error}\) \[P\{F\geq F_{((I-1)(J-1)(K-1),df_e,\alpha )}\}\]
Error \(df_e\)=N-IJK \[SS_{Error}\] \[MS_{Error}\]
Total N - 1 \[SS_{Total}\]

Multiple Means Comparisons

There are various methods for multiple means comparison in Origin, and we use the ocstat_dlsm_mean_comparison() function to perform means comparisons.

Two types of multiple means comparison methods:

Single-step method. It creates simultaneous confidence intervals to show how the means differ, including Tukey-Kramer, Bonferroni, Dunn-Sidak, Fisher’s LSD and Scheffé mothods.

Stepwise method. Sequentially perform the hypothesis tests, including Holm-Bonferroni and Holm-Sidak tests

Power Analysis

The power analysis procedure calculates the actual power for the sample data, as well as the hypothetical power if additional sample sizes are specified.

The power of a three-way analysis of variance is a measurement of its sensitivity. Power is the probability that the ANOVA will detect differences in the population means when real differences exist. In terms of the null and alternative hypotheses, power is the probability that the test statistic F will be extreme enough to reject the null hypothesis when it should be rejected actually (i.e. given the null hypothesis is not true).

The Origin Three-Way ANOVA dialog can compute powers for the Factor A, B and C sources. If the specified intersect terms are selected, Origin also can compute power for them.

Power is defined by the equation:

\[power=1-probf(f,df,dfe,nc)\,\!\]

where f is the deviate from the non-central F-distribution with df and dfe degrees of freedom and nc = SS/MSE. SS is the sum of squares of the source A, B, C, A*B, A*C, B*C, or A*B*C, MSE is the mean square of the Errors, df is the degrees of freedom of the numerator, dfe is the degrees of freedom of the Errors. All values (SS, MSE, df, and dfe) are obtained from the ANOVA table. The value of probf( ) is obtained using the NAG function nag_prob_non_central_f_dist (g01gdc) . See the NAG documentation for more detailed information.

All the above is a brief algorithm outline of three-way analysis of variation, for more information about the detail mathematical deduction, please reference to the corresponding part of the user's manual.

Levene test for Homogeneity of Variances

We use the following statistics to do Levene test.

\[L = \frac{(N-k)\sum_{k}^{i=1}n_i(Z_i-Z)^2}{(k-1)\sum_{k}^{i=1}\sum_{n_i}^{j=1}(Z_{ij}-Z_i)^2}\]

where

N is the number of observation, \(k = IJK\) is the number of subgroups with \(n_i(i=1,...,k)\) observation.

\[Z_{ij} = |Y_{ij}-T_i|\]

\[T_i = \frac{1}{n_i}\sum_{n_i}^{j=1}Y_{ij}\]

\[Z_i = \frac{1}{n_i}\sum_{n_i}^{j=1}Z_{ij}\]

\[Z = \frac{1}{N}\sum_{k}^{i=1}Z_i\]

Then we can get the p-value, which is \(1-F_{k-1,N-k}(L)\).