eLangston, Research Methods, Notes 8 -- Survey Research <>I.  Goals.
A.  Uses of survey data.
B.  Survey issues.
C.  What to do with the data.
D.  Survey exercise.
II.  Uses of survey data.  With a survey, you produce the observation of interest by asking a question.  You're still not manipulating anything, but your involvement is increasing.
A.  Research:  Find out attitudes, opinions or behaviors.  This can be as they are now or as they change over time.
B.  Practical:  Find the same stuff, but put it to use.  For example, politicians tailor their message based on issues people are focusing on.  Companies use product surveys to find out what you're buying and why you like what you do.
III.  Survey issues.
A.  Sampling:  You start with a population.  This is a theoretical entity that corresponds to everyone to whom you want to generalize.  It is almost impossible to specify a complete population.  Instead, you utilize a frame which is a list of members of the population.  If the population is Middle Tennessee State University students, it's easy to get a frame that contains everyone (a directory).  For something like the US population, you might need several sources for your frame.  Since your frame won't contain everyone, you have to be aware of selection biases (over representing some element of the population).  You want a representative sample (one that accurately reflects the make-up of the population).  Once you have a frame, you take a sample from it and administer the survey to those members of the population.  How do you get the sample?
1.  Non probability sampling:  This means you can't say how likely any member of the population was to be in the sample (you can't assign a probability of membership to elements of the population).
+:  Easy.
-:  Worry about selection biases.  You might choose people to survey based on how you think they'll answer the questions, biasing the results.  Taking who is handy is also a source of bias (as in phone polls might leave out people with cell phones or unlisted numbers).
3 kinds:
a.  Accidental:  I treat some group to whom I have easy access as the sample.  If I want Middle Tennessee State University students, and survey members of this class, that's an accidental sample.  Standing in front of the KUC and surveying the first 30 people who walk by is also accidental sampling.
b.  Purposive:  I want to know about a particular subset of the population, so I only sample those elements.  For example, if I want to identify study habits of “A” students, that's the only group sampled.  This is a biased sample, but it's not a problem because that's who I'm interested in.
c. Snowball sample: Sometimes, the people I want are not going to be reachable through advertisements, and they are rare enough that a regular sample may not include any of them. A snowball sample technique would be to find a key informant and then get that person to tell more people who meet the criteria about my study. The sample is like a snowball rolling downhill in that it grows through word of mouth.
2.  Probability sampling:  For any element of the population, you can specify exactly how likely that element is to be included in the sample.
+:  Representative samples.
-:  Hard.
3 kinds:
a.  Simple random:  Every element has an equal chance of being in the sample.  It's like throwing everyone's name in a hat and then drawing them out until you get the number of people you want.  It's the easiest of this type, but it doesn't guarantee a representative sample.
b.  Stratified random sampling:  If the population is not homogenous (for example, we have two distinct genders), then you subdivide into strata (subgroups) and sample from within each stratum (group).  You can take equal numbers from each stratum, or you can do it proportional to the stratum's membership in the population.  For example, if I have 40 students, and 10 are male, then in a sample of four, I'd want one male and three females (so males are still 25%).  I would randomly sample one male from the males and three females from the females.  To the extent that you do a good job of identifying strata, this is the most representative kind of sampling.
c.  Cluster sampling:  Like simple random, but the elements are actually clusters of individuals.  For example, if I couldn't get MTSU's student directory, I could randomly sample from the course catalog, and survey all of the members of the chosen courses (clusters of students).
B.  Methodology:  How do you contact the sample?
1.  Mail survey:  Send it by mail.
+:  Easy.
-:  Potential response biases:  If only some of the people respond, they might differ from the people who didn't respond (like they might hold the most extreme views).
2.  Interview:  Face-to-face administration of the survey.
+:  More responses.
-:  Biases from interviewer (subtle or intentional), hard and expensive.
3.  Phone:  Call them.  Good compromise.
+:  Response rate.
-:  Still potential bias.
4. Internet:
+: Could be nice response rate, anonymous.
-: Also potential for low response rate.
C.  The questions.
1.  Choosing what to ask:  You need.
a.  Very specific questions that go directly to what you want to know.
b.  To project answers to see if responses to the questions can tell you what you want to know.
2.  Decide on a format:
a.  Closed:  Multiple choice, true/false.  Participants have to select from several possibilities.  Good by mail, internet, phone.
b.  Open-ended:  Essays and short answers.  These are more informative, but harder to score.  Good for interviews.
3.  Write a pool of questions (more than you need).
4.  Revise:  Try the questions on people with strong opinions to check for bias, narrow the pool.
5.  Pretest:  Get a subset of the sample to take the survey.  Interview them about the questions (what were they thinking, was it clear).
6.  Make instructions and a procedure for administering it.
You also need to be sensitive to the ordering of the questions:
a.  Go from general to specific to avoid fixing participants’ responses.
b.  If you have similar questions, mix the order so that you have several orderings over the whole set of surveys.
c.  Use filter questions to cut down on the work of respondents (like “Do you own a car?  If yes, then answer this set of car questions...”).
D.  Design:  Time periods sampled.
1.  Cross sectional:  One time period.  Tells you opinion at the time you did the survey for the population you sampled.  Attempts to generalize beyond this are risky.
2.  Successive independent samples:  Give the same questions at several time periods to different samples of people.
+:  Easy way to assess changes with time.
-:  Different people mean changes could be due to sampling error.
3.  Longitudinal:  Give the same questions at several time periods to the same sample of people.
+:  Outstanding data for assessing change over time.
-:  Hard, mortality (people drop out, are they doing this for reasons similar to what you're studying, leading to biases?).
IV.  What to do with the data.
A.  Descriptive statistics.  Everything applies.
B.  Chi square (contingency tables).
C.  Confidence interval:  I could make a statement like:  “We can be 95% confident that the interval 4 < m < 6 contains the mean number of times MTSU students go to Nashville to shop.”
D.  Correlation:  Assess the strength of the relationship between two variables.  As an example, I can look at the relationship between the amount of time you spend studying and your exam score.  If it's strong, I might want to do an experiment to see if studying more causes better grades.  If there's no relationship, then there's no need to do additional research.  Some correlation stuff:
1.  Scatter plots:  You should always start by looking at a graph that plots one set of scores on the x-axis, and the other set on the y-axis.  This is like making a frequency distribution for descriptive statistics, it lets you get a qualitative feel for the data.  This can help make sense of the numbers later.  What can you see?  Here are some idealized plots:

a.  All points on a line, goes up from left to right = perfect positive correlation.
b.  All points on a line, goes down from left to right = perfect negative correlation.
c.  No pattern to the dots (uniformly distributed) = absolutely no correlation.
d.  Realistic data:  A cloud of points that generally goes up from left to right = positive correlation (if it goes down it's a negative correlation).  The more clustered the points, the higher the correlation.
2.  What the numbers mean:  Correlation ranges from -1 to 1.  Perfect negative = -1.  Perfect positive = 1.  Absolutely no correlation = 0.  Anything in between is what you'll find, the closer to 1 or -1, the stronger the correlation.  Keep in mind that absolute strength matters, not the sign, so -.83 is stronger than .24.
3.  Interpreting:  Some general things:
a.  CORRELATION DOES NOT IMPLY CAUSATION.  Even if there appears to be a very strong relationship between shoe size and IQ, we wouldn't say one causes the other.  This interpretation will be VERY tempting as the semester progresses and we get into less obvious examples.  Do not fall for it.
b.  r (a correlation coefficient) is not very interpretable as it stands.  But, r^2 (r-squared) is.  It's the proportion of variance in the Y-scores associated with variance in the X-scores.  In other words, how much of the difference in the Y's is related to differences in the X's.  The bigger this is, the more knowing a person's X score will tell you about their Y score.  This leads us to:
c.  Predicting a person's performance based on what other people have done.  If you make a scatter plot for the data above and draw a line through the points so that the distance from each point to the line is as small as it can be, you've made the regression line.  It's the line that best fits the points.  If you know the equation for the regression line and an X score, you can use that to predict a Y score.  A concrete example of this would be college admissions.  Based on thousands of past students’ ACT scores and college GPAs, I can compute the regression line for ACT and GPA.  Then, if you apply to my college and I know your ACT score (X), I can predict your GPA (Y) using my regression line.  That way, I can get some idea of how successful you'll be in college before you ever take a class.
Here's an example:  I collected five ACT scores and five GPAs.  They are:
19 2.4
21 3.2
17 2.0
29 3.7
20 2.8

First, compute the correlation:  r = .913.  This is a strong, positive correlation.  Graph the scatter plot to see details.
Then, compute r^2 = .834.  This is very high.  What it means is that 83% of the differences in GPA can be associated with differences in ACT.  We don't know how to explain the other 17% (if we had more possible factors, we could add them into the regression and find out, but that's multiple regression, and it's a bit beyond us here).  This is still correlation research, so we don't want to get carried away with causal statements.
Now for prediction:  I compute the regression line.  A line always takes the form

Y = mX + b

m is the slope, we'll compute that based on the data.  b is the y-intercept (the place where the line crosses the y-axis).  We'll also compute that based on the data.  X and Y define a point (in this case, the pair of scores corresponding to some individual's GPA and ACT score).  We know X (ACT).  The question is, what is Y?  What GPA should this person get based on their ACT?
I did the math.  Here's the regression line:

Y = 0.132X + 0.025

So, if I know a person's ACT score is 22, I can plug in for X and compute, and I get a predicted GPA of 2.93.  That person should probably be admitted, because it looks like they will be successful.  With an ACT of 13, I get a predicted GPA of 1.74.  That person will probably struggle, I might want to pass on an admission so someone else can have the spot.

To write it up, consider the following example:  For the results section below, students were looking for a relationship between magical ideation and paranormal belief. They measured both using scales from the literature. The students were interested in the relationship between  magical ideation and belief. Based on other research, they expected a positive relationship (higher magical ideation should be associated with higher belief). Here is a results section reporting the correlation.

“Magical ideation scores ranged from 26 to 79 (M = 48.6, SD = 9.8). Belief in the paranormal scores ranged from 1.0 to 6.0 (M = 2.7, SD = 1.2). There was a significant, positive correlation between magical ideation and paranormal belief, r = .59, p < .01. Higher magical ideation scores were associated with stronger paranormal belief.”
V.  Survey exercise.  As an exercise, we will look at the survey issues powerpoint to see how the material presented above manifests itself in real surveys.

Research Methods Notes 8
Will Langston

Back to Langston's Research Methods Page