Langston, Research Methods, Notes 10 -- Within Participants Designs

I.  Goals.
A.  What are within participants designs?
B.  Why run within participants designs?
C.  Why not run within participants designs?
D.  Repeated measures designs.
E.  Analysis.

II.  What are within participants designs?  Last time, we looked at between participants two group designs.  Let's put it all together:

As you go down, the groups should become more and more like one another prior to treatment.  The next line will be for within participants designs where you know the groups are exactly equal because it's the same participant in every condition.
So far, we've been pretending that we only know about one type of design:  between participants designs.  In these designs, each participant sees one and only one experimental condition.  In within participants designs, each participant sees every experimental condition.
Consider our love experiment.  To make it within participants, we would have everybody be in love and not in love, and measure them in each condition.

Top

III.  Why run within participants designs?  Several reasons:
A.  Equate the groups in your experiment.  Random assignment eliminates any systematic differences between the groups in your experiment.  It does nothing to eliminate differences.  But, if every participant is in every group, then there can't be any differences, because they're all the same people.
*B.  Efficiency.  You greatly reduce the number of participants that you need.  Consider our experiment.  We have two groups, so, taking 10 participants per condition as a rough estimate of an appropriate sample size, we need 20 participants to do the experiment.  But, if each participant participated in each condition, we could get 10 participants in each condition and still only run 10 participants.  That greatly reduces the amount of work that we have to do.  As you increase the number of groups, this becomes a big issue.
More important, it lets us do the kind of quality research we'd like to do.  Ten participants per cell is nowhere near as many as we'd like.  Instead, we might bump that up to thirty per cell.
*C.  Statistical power.  Remember our basic equation:  between groups variation (you made) / within groups variation (error).  In a within participants design a portion of the within groups variation goes away, and we get a smaller term in the denominator.  That helps us to achieve our goal of maximizing between groups while minimizing within groups.

Top

IV.  Why not run within participants designs?
A.  Sometimes your manipulation has a permanent impact on the participants.  We'll call these sorts of effects carry-over effects because the effect of previous manipulations carries over into new manipulations.  For example, if you want to use a surprise recall test in your experiment, you can only surprise your participants once.  Knowledge that a test is coming carries over into subsequent conditions.  Or, if you are trying various therapies for depression, you can't return participants to their original state after a particular therapy and try out another one.  The effect of the first therapy will carry over.  (This would prevent us from running the love experiment within participants.)
There are also some transient changes that you produce that still make it impractical to do within participants designs in some situations.  For example, if you want to measure the effect of caffeine on performance and you have levels of 0, 1, or 2 cups of coffee, you can run the order 0 -> 1 -> 2 cups, but not the order 2 -> 0 -> 1.  If you can't run all possible orders, you have to worry about order effects, which we'll discuss in a moment.  You could introduce a sufficient waiting period between conditions (say one day for the coffee experiment), but that greatly reduces the efficiency aspect of running within participants designs.
Consider a more subtle kind of change in the participants.  In the late fifties, a group of researchers were investigating the duration of short term memory traces.  They would have participants memorize some information and then count backwards for a certain period of time before trying to recall.  What they discovered was that people's memories gradually declined to virtually nothing after about 18 seconds.  However, a different group of researchers worried that what was actually happening in these memory experiments was a phenomenon called build-up of proactive interference (once you've crammed a lot of similar sounding stuff into your head it gets harder to cram in more because the old items interfere).  To test this, they replicated the experiment, but each participant did only one trial with a particular delay period.  Obviously, they ended up running a lot more participants.  But, they found that memory traces will last well past 18 seconds.  So, a subtle change in the participants was actually responsible for the effect.  This kind of thing has to be looked for when using within participants designs.
B.  Order effects:  Whenever the order of the stimuli has an impact on the results of the experiment you have order effects.
1.  Some of the most common kinds:
a.  Practice effects:  The more times you do something the better you get at it.  Say we did our coffee experiment and the DV was performance on a pursuit rotor task (the participant tries to keep a pointer on the same spot on a rapidly spinning disk).  Since most people have never done this before, we might expect their performance to improve over time just from practice.  If this is confounded with a particular order (as in our 0 -> 1 -> 2 cups example) then you can't tell if changes are due to the condition or the order (practice).  This can work for your hypothesis (if we expect caffeine to improve performance) or against your hypothesis (if we expect caffeine to hurt performance).
b.  Fatigue effects:  The opposite of practice effects.  The longer participants do a task the more tired they get.  So, they don't have the ability/motivation to devote the same attention to tests at the end of the experiment as they had at the beginning.
2.  What do we do about order effects?
a.  Counterbalancing:  Simple definition:  Every possible ordering of conditions is presented to participants in the experiment.  This guarantees that every condition appears equally often in every position in the sequence and that every condition is preceded and followed by every other condition equally often.  You can have full counterbalancing where every order is shown, or partial counterbalancing where a subset of orders are used, but they're carefully chosen to be representative.  For our purposes, when I say counterbalancing, I mean full counterbalancing.
Consider a simple experiment with two conditions, A and B.
This gives two possible orderings:  A -> B and B -> A.  We can randomly assign participants to an order so that we get equal numbers of participants in each order, and our experiment is then fully counterbalanced.
If there's an order effect, this should cancel it out.  Any time there's an effect of having A before B it's canceled by the effect of having B before A.
Counterbalancing eliminates shifts in sizes of effects provided one very important assumption is met.  Namely, that there are no differential order effects.  In other words, the change caused by having A before B is equal to the change caused by having B before A.  Look at the example above.  We assumed that going second always added a constant effect (say 10 units), regardless of which condition was first and which was second.  But, what if having A fist increased B by 10, but having B first had no impact on A (the case if A was more tiring than B)?  Then, counterbalancing wouldn't cancel out the effects.  You have to carefully consider the context of your experiment to determine if effects like this exist.  If they do, you probably can't use a within participants design.
Why don't people always do full counterbalancing?  With two groups, it's easy because you only have two orders.  But, the more groups you have the more orders you get.  The rule to determine the number of orders is:

Number of orders = N!

Where N is the number of conditions (! is factorial meaning multiply N * N - 1 * N - 2 * ... * 1).  To work some of these for you:  3 conditions require 6 orders, 4 conditions require 24 orders, and 6 conditions require 720 orders.  Keeping in mind that you need at least one participant per order for the thing to work out correctly, you can see how this can defeat the purpose of doing a within participants design.
b.  You can also use a latin square to counterbalance if the number of groups is too large for full counterbalancing.
c.  Randomization is your friend.  To avoid all the headaches, most people randomize condition orders (especially when you get over four or five conditions).  The more conditions you have and the more orders you use (the more participants you run) the better off you'll be.  The procedure is simple:  Make a new random order for each participant in the experiment.

Top

V.  Repeated measures designs.  When you measure each condition more than one time for each participant.  For example, we have two conditions, and each participant produces three scores for each condition (they get three trials in love and three trials not in love).  Trial:  One observation in a repeated measures design.  The Stroop experiment worked like this because you had 50 trials each of words and boxes.
Why do these?  One way to improve statistical power is to collect more than one observation per participant per condition.  Imagine collecting three trials from each participant in each condition.  Then, instead of just having one sample from the participant, you have three.  If you take some measure of central tendency from those samples (say the mean) and use it as the participant's score, it will decrease variability.  How?  Let's say on one trial the participants are incompatible and don't fall in love.  Right away their concentration score will be affected because they aren't as in love as they're supposed to be.  If that's the only trial you get from the participant, then you'll get a different score than you should.  When you put that participant's score into the pot with other participants’ scores, the variability will be higher than it should be.  By collecting multiple observations you average out some of this chance variability and get more stable estimates (closer to the true value).
These are most common in perception type experiments where each trial requires the participant to make a very simple judgment and you can collect a lot of observations in a limited period of time.  Keep in mind that this can introduce a new set of order problems, but randomization can still bail you out.

Continuing on the theme of within participants designs:  The more data you collect from each participant the better; or the more work each participant does the less work you do.

Top

VI.  Analysis.  For a two-group, within-participants design, you will use a dependent samples t-test for the analysis.  The computations are complex enough that it's worth letting a computer do it for you.  When you finish, here's a sample of how to write up the results:

“The data were analyzed using a dependent samples t-test.  The independent variable was amount of love, and the conditions were in love and not in love.  The dependent variable was concentration.  The mean concentration scores for people in love and not in love were 2.00 (0.71) and 4.80 (0.45) respectively.  With alpha = .05, the two population means were significantly different, t(8) = -7.57, estimated standard error = 0.37.”

Top

Research Methods Notes 10
Will Langston