LINEAR REGRESSION

  I.  Analyzing the relationship between two interval level
      (numeric)variables.
        A.  There are two commonly used statistics that measure the
            relationship between interval level variables--the linear
            regression coefficient and Pearson's correlation
            coefficient--both are measures of the linear relationship
            between to interval variables.
        B.  Both provide some measure of how strong a relationship is
            and the direction of the relationship (positive or
            negative).  The Pearson's correlation coefficient is
            similar to measures the of asssociation for categorical
            variables discussed earlier in the course.

 II.  Linear relationships and regression.
        A.  Two variables are linearly related if increases in one
            variable are accompanied by relatively consistent
            increases in the other (a positive linear relationship)
            or if increases in one variable are accompanied by
            relatively consistent decreases in the other (a negative
            linear relationship).
        B.  Linear relationships can be illustrated graphically using
            scattergrams to plot the scores of cases of two the
            variables simultaneously.  Scores on one variable (the
            inedpendent variable) are represented by the horizontal
            axis and scores on the other variable (the dependent
            variable) are represented by the vertical axis.  Dots are
            used to represent the scores of individual respondents on
            the two variables.
        C.  If a linear relationship exists, the dots on a
            scattergram will line up in a relatively consistent
            pattern.  That does not necessarily mean they are exactly
            in line with each other, but that they cluster in a
            somewhat linear fashion.  If there is no linear
            relationship, the dots will be distributed randomly, or
            according to some other pattern.
        D.  If a linear relationship exists, a line can be drawn that
            represents the general linear pattern of the dots.  The
            goal is to draw the line so that the distance between
            each dot and the line is as small as possible.  This line
            is called the least-squares regression line because sum
            of the squares of the distance between each dot and the
            line (errors) is the least that is possible.  It is
            represented by the formula  Y = a + bX, where Y is the
            predicted value of the dependent variable, a is the Y
            interecept (place where the regression line crosses the
            vertical axis), b is the slope of the line, and X is the
            score of the independent variable.
        E.  The Y intercept (a), sometimes called the constant, is
            the value of Y when X is zero.
              1.  Calculating the constant.
              2.  Interpreting the constant.
        F.  The slope of the line (b), sometimes called the
            regression coefficient represents how many units the
            dependent variable (Y) changes, on the average, when the
            independent variable (X) increases by one unit.
              1.  Calculating the slope.
              2.  Interpreting the slope.
                    a.  Negative and postive slopes.
                    b.  Magnitude of the slope.

III.  Testing hypotheses about regression coefficients.
        A.  State the null hypothesis.
        B.  Choose a statistical test.
        C.  Check the assumptions.
              1.  Random probability samples.
              2.  Two Interval variables.
              3.  Distributions of both variables normal.
              4.  Linear relationship.
              5.  Distribution of dependent variable same at all
                  levels of independent variable (homoscedasticity).
              6.  Sample size sufficiently large.
	D.  Choose an alpha level.
	E.  Compute the test statistic and make a decision about the null.
	      1.  Compute the statistic.
              2.  Determine the probability associated with the statistic.
              3.  Compare the probability to the alpha level.
              4.  Make a decision about the null.

 IV.  The correlation coefficient.
        A.  While the regression coeficient (b) provides an
            indication of the relationship between the variables, it
            has no upper or lower limits and its size is dependent on
            how the two variables are measured, making its use as a
            measure of association somewhat problematic.
        B.  Pearson's correlation coefficient (r), on the other hand,
            ranges from -1 to +1 negative numbers indicate negative
            associations, positive numbers indicate postive
            associations), with values approaching 1 (or -1)
            indicating strong associations and values near 0
            indicating weak associations.
        C.  It can be seen as a measure of how closely individual
            observations fall to the least-squares regression line.
            The more observations cluster around the regression line,
            the larger the absolute value of the coefficient.

  V.  The coefficient of determination.
        A.  The square of the Pearson's correlation coefficient (r2)
            is called the coefficient of determination and has a PRE
            interpretation.  It represents the amount of improvement
            in predicting the dependent variable when the
            least-squares regression line is used compared to when
            the mean of the dependent variable is used.
        B.  When the mean of the dependent variable is used to
            predict the value of the dependent variable, the sum of
            the squared errors represents the total amount of
            variation in the dependent variable (TSS).
        C.  When the least-squares regression line is used
            predict the value of the dependent variable, the sum of
            squared errors represents the amount of variation in the
            dependent variable unexplained by the independent
            variable.
        D.  The difference between these two represents the amount of
            variation in the dependent variable explained by the
            independent variable.
        E.  The coefficient of determination represents the ratio of
            the amount of variation explained by the independent
            variable to the total amount of variation.  In other
            words, it is the proportion of the total variation in the
            dependent variable explained by the independent variable.

 VI.  Testing hypotheses correlation coefficients.
        A.  State the null hypotheses.
        B.  Choose a statistical test.
        C.  Check the assumptions.
              1.  Random probability samples.
              2.  Two Interval variables.
              3.  Distributions of both variables normal.
              4.  Linear relationship.
              5.  Distribution of dependent variable same at all
                  levels of independent variable (homoscedasticity).
              6.  Sample size sufficiently large.
        D.  Select an alpha level.
        E.  Calculate the test statistic and make a decision about
            the null hypothesis.
              1.  Calculate the statistic.
              2.  Determine the probability.
              3.  Compare the probability to the alpha level.
              4.  Make a decision about the null.