BIVARIATE TABLES AND CHI-SQUARE

  I.  Bivariate tables.
        A.  Bivariate tables (also called cross-classification or cross-
            tabulation tables) present the joint frequency distributions of 
            two variables. 
        B.  The table has rows representing the frequency of one variable 
            and columns representing the frequency of another variable.  
            The rows and columns intersect creating cells which contain the 
            joint frequency distribution of the two variables. 
              1.  Bivariate tables are described by the notation r x c 
                  where r equals the number of rows and c equals the number 
                  of colums.  For example, a table with two rows and to 
                  columns is a 2 x 2 table, a table with 2 rows and 3 
                  columns is a 2 x 3 table, and so on. 
              2.  The number of cells in a table is equal to the number of 
                  rows multiplied by the number of columns. 
        C.  The row totals contain the total frequencies for the row 
            variable and column totals contain the total frequencies for 
            the column variable.  These totals are called the marginal 
            frequencies or simply the marginals. 
        D.  Bivariate tables often contain percentages as well as 
            frequencies.  There are three kinds of percentages that may be 
            reported. 
              1.  Row percentages are calculated by dividing the number of 
                  observations in a cell by the total number of 
                  observations in the row of that cell and multiplying by 
                  100. 
              2.  Column percentages are calculated by dividing the number 
                  of observations in a cell by the total number of 
                  observations in the column of that cell and multiplying 
                  by 100. 
              3.  Total or overall percentages are calculated by dividing 
                  the number of observations in a cell by the total sample 
                  size and multiplying by 100. 
        E.  The distribution of one variable at a single level of the other 
            variable is called a conditional distribution.  There 
            conditional distributions of the row variable at each level of 
            the column variable and conditional distributions of the column 
            variable at each level of the row variable. 

 II.  Chi-square test of independence.
        A.  If the conditional distributions of one variable are the same 
            proportionally at every level of the other variable, the two 
            variables are considered independent.  That is, the disribution 
            of one variable does not affect the distribution of the other 
            variable.  Simply put, there is no relationship between the two 
            variables. 
        B.  If the conditional distributions of one variable differ 
            proportionally at different levels of the other variable, the 
            two variables are considered dependent.  That is, the 
            distribution of one variable affects the distribution of the 
            other variable. Simply put, there is a relationship between the 
            two variables. 
        C.  The Chi-square test of independence is designed to test whether 
            two variables are independent.  The steps are: 
              1.  State the null hypothesis--the two variables are 
                  independent. 
              2.  Choose a statistical test--the chi-square test of 
                  independence. 
              3.  Check assumptions. 
                    a.  Sample is a random probability sample. 
                    b.  Both variables are categorical variables and can be 
                        arranged in a bivariate table. 
                    c.  Expected frequencies of each cell will be at least 
                        5 in 2x2 tables.  In 2x3 or larger tables, the 
                        expected frequencies of each cell should be at 
                        least 5 in at least 75% of the cells. 
              4.  Select an alpha level.
              5.  Calculate the test statistic, determine the probability 
                  associated with the statistic, and make a decision about 
                  the null. 
                    a.  Calculating chi-square.
                    b.  Finding the probability.
                          i.  To determine the probability, you must use 
                              the chi-square table.  To use the table you 
                              must know the the degrees of freedom. 
                         ii.  Calculating degrees of freedom.                
                    c.  Deciding about the null.
        D.  Limitations of chi-square. 
              1.  Chi-square is inappropriate when expected frequencies for 
                  a large proportion of cells is less than 5.  This can be 
                  a problem when sample size is small or table size is 
                  large. 
              2.  Chi-square is greatly affected by sample size.  The 
                  larger the sample size the larger chi-square will be.  In 
                  large samples, almost all relationships are found to be 
                  significant. 
              3.  In 2x2 tables, chi-square is slightly biased.  This can 
                  be corrected using Yate's correction.