Research Methods, Notes 8 -- Survey Research
A. Uses of survey data.
B. Survey issues.
C. What to do with the data.
D. Survey exercise.
II. Uses of survey data. With a survey, you produce
the observation of interest by asking a question. You're still
manipulating anything, but your involvement is increasing.
A. Research: Find out attitudes, opinions or
This can be as they are now or as they change over time.
B. Practical: Find the same stuff, but put it to use.
For example, politicians tailor their message based on issues people
focusing on. Companies use product surveys to find out what
buying and why you like what you do.
III. Survey issues.
A. Sampling: You start with a population. This is
a theoretical entity that corresponds to everyone to whom you want to
generalize. It is almost impossible to specify a complete
Instead, you utilize a frame which is a list of members of the
If the population is Middle Tennessee State University students, it's
to get a frame that contains everyone (a directory). For
like the US population, you might need several sources for your
Since your frame won't contain everyone, you have to be aware of
biases (over representing some element of the population). You
a representative sample (one that accurately reflects the make-up of
population). Once you have a frame, you take a sample from it and
administer the survey to those members of the population. How do
you get the sample?
1. Non probability sampling: This means you can't say how
likely any member of the population was to be in the sample (you can't
assign a probability of membership to elements of the population).
-: Worry about selection biases. You might choose people
to survey based on how you think they'll answer the questions, biasing
the results. Taking who is handy is also a source of bias (as in
phone polls might leave out people with cell phones or unlisted
a. Accidental: I treat some group to whom I have easy
as the sample. If I want Middle Tennessee State University
and survey members of this class, that's an accidental sample.
in front of the KUC and surveying the first 30 people who walk by is
b. Purposive: I want to know about a particular subset
of the population, so I only sample those elements. For example,
if I want to identify study habits of “A” students, that's the only
sampled. This is a biased sample, but it's not a problem because
that's who I'm interested in.
c. Snowball sample: Sometimes, the people I want are not going to be
reachable through advertisements, and they are rare enough that a
regular sample may not include any of them. A snowball sample technique
would be to find a key informant and then get that person to tell more
people who meet the criteria about my study. The sample is like a
snowball rolling downhill in that it grows through word of mouth.
2. Probability sampling: For any element of the population,
you can specify exactly how likely that element is to be included in
+: Representative samples.
a. Simple random: Every element has an equal chance of
being in the sample. It's like throwing everyone's name in a hat
and then drawing them out until you get the number of people you
It's the easiest of this type, but it doesn't guarantee a
b. Stratified random sampling: If the population is not
homogenous (for example, we have two distinct genders), then you
into strata (subgroups) and sample from within each stratum
You can take equal numbers from each stratum, or you can do it
to the stratum's membership in the population. For example, if I
have 40 students, and 10 are male, then in a sample of four, I'd want
male and three females (so males are still 25%). I would randomly
sample one male from the males and three females from the
To the extent that you do a good job of identifying strata, this is the
most representative kind of sampling.
c. Cluster sampling: Like simple random, but the elements
are actually clusters of individuals. For example, if I couldn't
get MTSU's student directory, I could randomly sample from the course
and survey all of the members of the chosen courses (clusters of
B. Methodology: How do you contact the sample?
1. Mail survey: Send it by mail.
-: Potential response biases: If only some of the people
respond, they might differ from the people who didn't respond (like
might hold the most extreme views).
2. Interview: Face-to-face administration of the survey.
+: More responses.
-: Biases from interviewer (subtle or intentional), hard and
3. Phone: Call them. Good compromise.
+: Response rate.
-: Still potential bias.
+: Could be nice response rate, anonymous.
-: Also potential for low response rate.
C. The questions.
1. Choosing what to ask: You need.
a. Very specific questions that go directly to what you want
b. To project answers to see if responses to the questions can
tell you what you want to know.
2. Decide on a format:
a. Closed: Multiple choice, true/false. Participants
have to select from several possibilities. Good by mail,
b. Open-ended: Essays and short answers. These are
more informative, but harder to score. Good for interviews.
3. Write a pool of questions (more than you need).
4. Revise: Try the questions on people with strong opinions
to check for bias, narrow the pool.
5. Pretest: Get a subset of the sample to take the
Interview them about the questions (what were they thinking, was it
6. Make instructions and a procedure for administering it.
You also need to be sensitive to the ordering of the questions:
a. Go from general to specific to avoid fixing participants’
b. If you have similar questions, mix the order so that you have
several orderings over the whole set of surveys.
c. Use filter questions to cut down on the work of respondents
(like “Do you own a car? If yes, then answer this set of car
D. Design: Time periods sampled.
1. Cross sectional: One time period. Tells you
at the time you did the survey for the population you sampled.
to generalize beyond this are risky.
2. Successive independent samples: Give the same questions
at several time periods to different samples of people.
+: Easy way to assess changes with time.
-: Different people mean changes could be due to sampling error.
3. Longitudinal: Give the same questions at several time
periods to the same sample of people.
+: Outstanding data for assessing change over time.
-: Hard, mortality (people drop out, are they doing this for
reasons similar to what you're studying, leading to biases?).
IV. What to do with the data.
A. Descriptive statistics. Everything applies.
B. Chi square (contingency tables).
C. Confidence interval: I could make a statement
“We can be 95% confident that the interval 4 < m < 6 contains the
mean number of times MTSU students go to Nashville to shop.”
D. Correlation: Assess the strength of the relationship
between two variables. As an example, I can look at the
between the amount of time you spend studying and your exam
If it's strong, I might want to do an experiment to see if studying
causes better grades. If there's no relationship, then there's no
need to do additional research. Some correlation stuff:
1. Scatter plots: You should always start by looking at
a graph that plots one set of scores on the x-axis, and the other set
the y-axis. This is like making a frequency distribution for
statistics, it lets you get a qualitative feel for the data. This
can help make sense of the numbers later. What can you see?
Here are some idealized plots:
a. All points on a line, goes up from left to right = perfect
b. All points on a line, goes down from left to right = perfect
c. No pattern to the dots (uniformly distributed) = absolutely
d. Realistic data: A cloud of points that generally goes
up from left to right = positive correlation (if it goes down it's a
correlation). The more clustered the points, the higher the
2. What the numbers mean: Correlation ranges from -1 to
1. Perfect negative = -1. Perfect positive = 1.
no correlation = 0. Anything in between is what you'll find, the
closer to 1 or -1, the stronger the correlation. Keep in mind
absolute strength matters, not the sign, so -.83 is stronger than .24.
3. Interpreting: Some general things:
a. CORRELATION DOES NOT IMPLY CAUSATION. Even if there
appears to be a very strong relationship between shoe size and IQ, we
say one causes the other. This interpretation will be VERY
as the semester progresses and we get into less obvious examples.
Do not fall for it.
b. r (a correlation coefficient) is not very interpretable as
it stands. But, r^2 (r-squared) is. It's the proportion of
variance in the Y-scores associated with variance in the
other words, how much of the difference in the Y's is related to
differences in the X's. The bigger this is, the more knowing a
X score will tell you about their Y score. This leads us to:
c. Predicting a person's performance based on what other people
have done. If you make a scatter plot for the data above and draw
a line through the points so that the distance from each point to the
is as small as it can be, you've made the regression line. It's
line that best fits the points. If you know the equation for the
regression line and an X score, you can use that to predict a Y
A concrete example of this would be college admissions. Based on
thousands of past students’ ACT scores and college GPAs, I can compute
the regression line for ACT and GPA. Then, if you apply to my
and I know your ACT score (X), I can predict your GPA (Y) using my
line. That way, I can get some idea of how successful you'll be
college before you ever take a class.
Here's an example: I collected five ACT scores and five
First, compute the correlation: r = .913. This is a
positive correlation. Graph the scatter plot to see details.
Then, compute r^2 = .834. This is very high. What it means
is that 83% of the differences in GPA can be associated with
in ACT. We don't know how to explain the other 17% (if we had
factors, we could add them into the regression and find out, but that's
multiple regression, and it's a bit beyond us here). This is
correlation research, so we don't want to get carried away with causal
Now for prediction: I compute the regression line. A line
always takes the form
Y = mX + b
m is the slope, we'll compute that based on the data. b is the
y-intercept (the place where the line crosses the y-axis). We'll
also compute that based on the data. X and Y define a point (in
case, the pair of scores corresponding to some individual's GPA and ACT
score). We know X (ACT). The question is, what is Y?
What GPA should this person get based on their ACT?
I did the math. Here's the regression line:
Y = 0.132X + 0.025
So, if I know a person's ACT score is 22, I can plug in for X and
compute, and I get a predicted GPA of 2.93. That person should
be admitted, because it looks like they will be successful.
an ACT of 13, I get a predicted GPA of 1.74. That person will
struggle, I might want to pass on an admission so someone else can have
To write it up, consider the following example: For the results
section below, students were looking for a relationship between magical
ideation and paranormal belief. They measured both using scales from
the literature. The students were
interested in the relationship between magical ideation and
belief. Based on other research, they
expected a positive relationship (higher magical ideation should be
associated with higher belief). Here is a results section reporting the
“Magical ideation scores ranged from 26 to 79 (M = 48.6, SD = 9.8). Belief in the paranormal
scores ranged from 1.0 to 6.0 (M
= 2.7, SD = 1.2). There was a
significant, positive correlation between magical ideation and
paranormal belief, r = .59, p < .01. Higher magical ideation
scores were associated with stronger paranormal belief.”
V. Survey exercise. As an exercise, we will look at
the survey issues powerpoint to see
how the material
above manifests itself in real surveys.
Research Methods Notes 8
Back to Langston's Research Methods Page