BIVARIATE TABLES AND CHI-SQUARE I. Bivariate tables. A. Bivariate tables (also called cross-classification or cross- tabulation tables) present the joint frequency distributions of two variables. B. The table has rows representing the frequency of one variable and columns representing the frequency of another variable. The rows and columns intersect creating cells which contain the joint frequency distribution of the two variables. 1. Bivariate tables are described by the notation r x c where r equals the number of rows and c equals the number of colums. For example, a table with two rows and to columns is a 2 x 2 table, a table with 2 rows and 3 columns is a 2 x 3 table, and so on. 2. The number of cells in a table is equal to the number of rows multiplied by the number of columns. C. The row totals contain the total frequencies for the row variable and column totals contain the total frequencies for the column variable. These totals are called the marginal frequencies or simply the marginals. D. Bivariate tables often contain percentages as well as frequencies. There are three kinds of percentages that may be reported. 1. Row percentages are calculated by dividing the number of observations in a cell by the total number of observations in the row of that cell and multiplying by 100. 2. Column percentages are calculated by dividing the number of observations in a cell by the total number of observations in the column of that cell and multiplying by 100. 3. Total or overall percentages are calculated by dividing the number of observations in a cell by the total sample size and multiplying by 100. E. The distribution of one variable at a single level of the other variable is called a conditional distribution. There conditional distributions of the row variable at each level of the column variable and conditional distributions of the column variable at each level of the row variable. II. Chi-square test of independence. A. If the conditional distributions of one variable are the same proportionally at every level of the other variable, the two variables are considered independent. That is, the disribution of one variable does not affect the distribution of the other variable. Simply put, there is no relationship between the two variables. B. If the conditional distributions of one variable differ proportionally at different levels of the other variable, the two variables are considered dependent. That is, the distribution of one variable affects the distribution of the other variable. Simply put, there is a relationship between the two variables. C. The Chi-square test of independence is designed to test whether two variables are independent. The steps are: 1. State the null hypothesis--the two variables are independent. 2. Choose a statistical test--the chi-square test of independence. 3. Check assumptions. a. Sample is a random probability sample. b. Both variables are categorical variables and can be arranged in a bivariate table. c. Expected frequencies of each cell will be at least 5 in 2x2 tables. In 2x3 or larger tables, the expected frequencies of each cell should be at least 5 in at least 75% of the cells. 4. Select an alpha level. 5. Calculate the test statistic, determine the probability associated with the statistic, and make a decision about the null. a. Calculating chi-square. b. Finding the probability. i. To determine the probability, you must use the chi-square table. To use the table you must know the the degrees of freedom. ii. Calculating degrees of freedom. c. Deciding about the null. D. Limitations of chi-square. 1. Chi-square is inappropriate when expected frequencies for a large proportion of cells is less than 5. This can be a problem when sample size is small or table size is large. 2. Chi-square is greatly affected by sample size. The larger the sample size the larger chi-square will be. In large samples, almost all relationships are found to be significant. 3. In 2x2 tables, chi-square is slightly biased. This can be corrected using Yate's correction.