5.1.2 Cross Tabulation and Chi-square


Contents

Summary

Cross tabulation is particularly useful for analyzing categorical data. In these analyses, a contingency table is used to display the frequency distribution of two or more variables. Analyses based on the table can determine whether there is a significant relationship between variables, and assess the strength of the relationship between the variables.

Minimum Origin Version Required: Origin 2016 SR0

What you will learn

This tutorial will show you:

  1. How to perform the Cross Tabulation.
  2. How to interpret the results.

User Story

Our data are from the Montana Economic Outlook Poll conducted in May 1992, with accompanying demographics for 209 out of 418 poll respondents. We have data on seven variables: Age(under 35, 35-54, 55 and over),Sex(male, female),Financial Status(worse, same, or better than a year ago), etc. With the data, we want to learn:

  1. The frequency distribution of financial status in three different age groups, and whether male and female differ in the distribution.
  2. Whether there is significant relationship between "Financial status" and "Age" for male and female groups.
  3. The strength of the relationship.

Preparing Data for Analysis

  1. Open a new project or a new workbook. Import the data file \Samples\Statistics\MontanacOutlookPoll.dat
  2. We begin by sorting the categorical values.
    Crosstab0.png

To exclude missing values from analysis, we should set the columns as categorical. Otherwise the missing values will be kept as numeric values.

Performing Cross Tabulation and Chi-square

  1. Open the Cross Tabulation and Chi-square dialog by choosing the menu item Statistics: Descriptive Statistics: Cross Tabulation and Chi-square.
  2. Click on the Input tab. The data is in raw data mode, so select column B, G and C for Row, Column and Layer, respectively.
    Crosstab1.png
  3. Click on the Statistics tab, uncheck the Expected Counts, Residuals, Standardized Residuals and Adjusted Residuals and accept all the other default settings.
    Crosstab2.png
  4. On the Tests tab, select the Chi-Square Test check box. Expand the Measures of Association branch, and then select the Contingency Coefficients, Phi, and Cramer's V boxes (for measuring nominal association).
    Crosstab3.png
  5. Click on the Output tab and select the Mosaic Plot check box. Accept the other default settings and click OK.
    Crosstab44.png

Interpreting The Results

Go to sheet Crosstab1

Frequency Distribution

We can get frequency distribution information from the Mosaic Plot and the Contingency Table. The area of each rectangle in the Mosaic Plotis proportional to the percentage of the Y variable for each level of the X variable, so we can visually compare the frequency distribution of "Financial status" and "Age" for female, male and total. From the Contingency Table we can get more specific information. Combining the Mosaic Plot with Contingency Table, we learn:

  1. There is a major difference between younger and older women's views.
    Crosstab5.png Crosstab6.png
  2. Compared to women, men's feeling for financial status shows another interesting pattern:
    Crosstab7.png Crosstab8.png
  3. Regardless of the sex of respondents, there are some trends by age:
    Crosstab9.png Crosstab10.png

Detecting Relationships Between Age and Financial Status

The Chi-Square Tests Table show test results for the independence of row and column variables. If the Prob>ChiSq is less than 0.05, this means the row and column variables, in this case age and financial status, are significantly related. Note the conclusions in the footnotes beneath the table. We conclude that:

Crosstab11.png

Assessing the Strength of the Relationship

The Measures of Association table helps in assessing the strength of the relationship between "Financial status" and "Age". As this is a 3*3 table(three levels for Age and three levels for Financial status), we can choose a Contingency Coefficient to compare across layers. (See the introduction page for the difference of the three statistics). From the table we can see

Crosstab12.png