17.1.6.2 Interpreting Results and Chi-square of Cross Tabulation

1 Contingency Table
2 Chi-Square Tests
3 Fisher's Exact Table
4 Measures of Association
- 4.1 Measures for Nominal Variables
- 4.2 Measures for Ordinal Variables
5 Agreement Statistic
- 5.1 Kappa Test
- 5.2 Bowker's Test
6 Odds Ratio & Relative Risk
7 CMH Table
8 Mosaic Plot

Contingency Table

Contingency Table gives the information about the frequency distribution of the variables, including counts, percentages and residuals. Counts, Row%, Col% and Total% helps user to compare the levels across the groups.

Residuals are statistics to test the independency of the column and row variable.The more the value is close to zero, the more likely the column and row variable has no association

Adjusted residual is the most useful residual as it is standardized to N(0,1), for comparing between cells. If the value is larger than 1.96 or less than -1.96, the observed count is significantly larger than or less than expected. The larger the value is, the more likely the column variable is associate with the row variable.

Chi-Square Tests

The Chi-Square tests provides results to test the hypothesis that the row and column variables are independent.

Chi-Square Tests Table displays ChiSquare, DF and Prob > ChiSq(the p-value).

If Prob > ChiSq is less than the significant level, we can say at the significant level, there is significant evidence of association between the row and column variables. Else, we can say at the significant level, there is no significant evidence of association between the row and column variables.

Four tests are available.

Pearson Chi-Square:
It is the most widely used Chi-Square test. The test statistic is calculated by summing the squared deviations between observed and expected counts divided by expected counts. It has an approximately chi-squared distribution under large samples. So the test result is made by reference to the chi-squared distribution.
Likelihood Ratio
Likelihood Ratio builds on the likelihood of the data under the null hypothesis of independence. It is used to compare the goodness of fit of the null model with the alternative model. The test statistic also has an approximately chi-squared distribution. It usually comes to similar result as Pearson Chi-Square.
Continuity Correction
It is also referred to as Yates Continuity Correction, and is available only for a 2*2 table in Origin. If the expected number of observations in any category is too small(e,g,less than 5), the asymptotic chi-squared distribution is not quite correct, Pearson Chi-Square and Likelihood Ratio's results cannot be trusted and the Continuity correction is recommended. It is similar to the Pearson chi-square, except that it is adjusted for the continuity of the chi-squared distribution.
Linear Association
It is available only for numeric data. The Chi-Square tests above do not take the ordering of the rows or columns into account, but Linear Association can do it. It is based on the Pearson correlation coefficient, and it has an approximately chi-squared distribution on 1 df.

Notes: if the expected number of observations in any category is too small(e,g,less than 5), Pearson Chi-Square and Likelihood Ratio's results cannot be trusted.

Fisher's Exact Table

If the expected number of observations in any category is too small(e,g,less than 5), Chi-Square Tests may not be appropriate while Fisher's Exact Table is recommended.

Three tests are available, left-sided, right-sided and two-sided test. It enable user to know which A*B level combination is more likely to occur. You can look at the Conclusion column for the details. (A is for the row variable and B is for the column variable)

Notes: Note that Fisher's Exact test is available only for a 2*2 table

Measures of Association

Please look at the introduction page for what situation the statistics should be used in

Measures for Nominal Variables

Phi
For a 2*2 table, the range of the Phi is [-1,1]. For tables larger than 2*2, the range of the Phi is [0,M] (See algorithm page for M]). A larger value indicates the stronger association of two variables.
Contingency coefficient
The range of value is [0,1). A larger value indicates the stronger association of two variables.
Cramer's V
The values range from 0 to 1. A larger value indicates the stronger association of two variables.
Lambda
Please look at the notes below for more information of C|R, R|C and Symmetric. A larger value indicates a stronger association
Uncertainty Coefficient
Please look at the notes below for more information of C|R, R|C and Symmetric. A larger value indicates a stronger association

Notes:

C|R:
The row variable(R) is regarded as an independent variable, while the column variable(C) is regarded as dependent variable. The value indicates by what percentage do we reduce our error when using the R to predict the C
R|C
The column variable(C) is regarded as an independent variable, while the row variable(R) is regarded as dependent variable. The value indicates by what percentage do we reduce our error when using the C to predict the R
Symmetric:
The variables are not be classified as independent and dependent. That is, it can only to measure the strength of association between the two variables but it can not predict how one variable affects another one

Measures for Ordinal Variables

Gamma
Take a range of values from -1 to +1. If it is positive, this means that the increase of one variable is likely to cause the increase of the other variable. While a negative value indicates a reverse relationship. The more the values is close to 0, the weaker the relationship is.
Kendall's tau-b and tau-c
Similar to Gamma and with same results explanation.
Somer's D
Please look at the notes below for more information of C|R, R|C and Symmetric. A larger value indicates a stronger association

Notes:

C|R:
The row variable(R) is regarded as an independent variable, while the column variable(C) is regarded as dependent variable. The value indicates the strength of association while C depends on R.
R|C
The column variable(C) is regarded as an independent variable, while the row variable(R) is regarded as dependent variable. The value indicates the strength of association while R depends on C.
Symmetric:
The variables are not be classified as independent and dependent. That is, it can only to measure the strength of association between the two variables but it can not indicate how one variable affects another one

Agreement Statistic

Please look at the introduction page for what situation the statistics should be used in

Kappa Test

Kappa Test table displays the value of Kappa, standard error(SE), lower confidence limit(LCL) and upper confidence limit(UCL),Z value, Prob>Z(the p-value for a one-sided test for Kappa),Prob>|Z|(the p-value for a two-sided test for Kappa).

From the Kappa value, user will know the level of agreement the two rater agree to each other.

<=0: no agreement
0 - 0.4: poor agreement
0.4 - 0.59: fair agreement
0.6 - 0.74: good agreement
> 0.75: excellent agreement
1: complete agreement

In the mean time, Kappa Test table also provide results for testing the hypothesis that Kappa equals to zero.

If "Prob>Z" less than significant level, we can say that at the significant level, Kappa is significantly larger than zero. Else we can say at the significant level,Kappa is significantly equals to zero.
If "Prob>|Z|" less than significant level, we can say that at the significant level, Kappa is significantly different from zero. Else we can say at the significant level,Kappa is significantly equals to zero.

Bowker's Test

Bowker's Test table displays Chi-Square value, its DF and "Prob>ChiSq"(p-value for the Bowker's test). It tests the equality of proportion in all matched-pairs cells that are symmetric around the diagonal ( $P_{ij}=P_{ji}$ )

If "Prob>ChiSq" less than significant level, we can say that at the significant level, the frequency counts table is significantly asymmetric, that is, $P_{ij} \ne P_{ji}$ . Else, we can say that at the significant level, the frequency counts table is Not significantly asymmetric, that is, $P_{ij}=P_{ji}$

Odds Ratio & Relative Risk

Odds Ratio & Relative Risk is available only for a 2*2 table. Odds Ratio measures the ratio of the odds that an event or result will occur to the odds of the event not happening. Relative Risk measures the ratio of the odds of an event occurring in an group to the odds of the event occurring in a comparison group.

Odds Ratio & Relative Risk table displays the value, lower confidence limit(LCL) and upper confidence limit(UCL). Supposed Relative Risk =RR=P(a|b)/P(a|c), If RR=1, we can say that the probability of causing outcome a is the same in b and c; else if RR>1, we can say that the probability of causing outcome a is greater in b than in c;else, we can say that the probability of causing outcome a is smaller in b than in c.

CMH Table

Results of Cochran-Mantel-Haenszel tests. It is to test whether there is any relationship between the row and column variable after controlling for the layer variable

Conditional Independence Test

It is tested by Mantel-Haenszel statistic. The Mantel-Haenszel statistic tests the hypothesis that there is no significant association between the row and column variable, by controlling for the layer variable. Conditional Independence Test table displays Chi-Square value, its DF and "Prob>ChiSq"(p-value for the Conditional Independence Test).

If "Prob>ChiSq" less than significant level, we can say that at the significant level, there is significant association between the row and column variable in at least one layer.Else,we can say that at the significant level, there is no significant association between the row and column variable in any layer.

Odds Ratio Homogeneity Tests

It is tested by Breslow-Day statistic and Tarone's statistic. They all test the hypothesis that odds ratio between the row and column variable is the same at each level of layer variable.

Odds Ratio Homogeneity Tests table displays Chi-Square value, its DF and "Prob>ChiSq"(p-value for the Odds Ratio Homogeneity Tests). For Breslow-Day statistic and Tarone's statistic,

If "Prob>ChiSq" less than significant level, we can say that at the significant level, odds ratio is significantly different among layers. Else,we can say that at the significant level, odds ratio is NOT significantly different among layers.

Common Odds Ratio

The common odds ratio across layer variable is estimated by Mantel-Haenszel estimate. Common Odds Ratio table displays estimate of common odds ratio, "ln(estimate)" (The natural log of the estimated common odds ratio) and its standard error, lower confidence limit(LCL) and upper confidence limit(UCL).

Mosaic Plot

A mosaic plot is divided into rectangles, so that the area of each rectangle is proportional to the proportions of the Y variable in each level of the X variable.