Langston, Research Methods, Notes 3 -- Hypotheses
 
I.  Goals.
A.  Ideal Hypotheses.
B.  Logic of Hypothesis Testing.
C.  Induction and deduction.
D.  Wrap-up.
 
II.  Ideal Hypotheses.
A.  Gremlin example:  "What makes this watch go?"  Hypothesize a gremlin.  No matter how someone wants to test it, bat the suggestion away.  Should suggest:
B.  Properties of the ideal hypothesis:
1.  Produces testable implications.  Some procedure can be followed to verify whether or not the implications occur.  Two kinds:
a.  testable in practice:  We can do the experiment today to see if the implications are true of the world.
b.  testable in principle:  In a perfect world (with some technology we don't have at present) we could test the implications.
Example:  100 years ago, hypotheses about the surface of the moon were testable in principle, but now they're testable in practice.
2.  Incompatible with certain outcomes:  At least one of the potential observations I can collect will be incompatible with my hypothesis.  This is where the Gremlin example fails (no observation could rule it out).  Without this property, the hypothesis is essentially untestable (it's not really a test if no matter what you find it still works).
3.  Simplicity:  The hypothesis is based on few or no untested assumptions.  It should involve only one unknown, and the purpose of the experiment is to answer a question about that unknown.
a.  Untested assumptions could cause the hypothesis to fail because of the assumptions, even when the hypothesis really is correct.
One example of this comes from the medical literature.  A group of people in New Guinea called the Fore were dying of a mysterious neurological disease called Kuru.  People with the disease would lose motor control.
The questions were:  What's the disease agent and how is it spread?  A genetic cause was originally suspected.  The disease clustered in women and children, and tended to run in families.  A disease agent was ruled out because viruses and bacteria produce symptoms like fever that indicate that an infection has taken place.  People with Kuru showed no symptoms.
An alternative possibility for the spread of Kuru was cannibalism.  This alternative was ruled out because men didn't seem to get Kuru.  It was assumed that both men and women participated in cannibalism equally.
There are two untested assumptions.  The first is that disease agents cause symptoms.  The second is that everyone participates in cannibalism.  Both assumptions are wrong, and caused the correct hypothesis to be ruled out.  Cannibalism first.  Men usually hunted and did not share the pigs they caught with the women and children.  So, the women and children were severely protein deprived.  The women were responsible for dressing bodies for burial.  Around the turn of the century, some woman discovered that human flesh was tasty.  Eventually, the women would eat entire corpses.  Choice bits were parceled out according to rank and kinship.  The only restriction was corpses of people who died of obvious disease.  Since the Fore thought Kuru was the result of sorcery, the women generally ate people who died of Kuru.  Steaming the brains in bamboo tubes was probably the best method of transmission.  (See Goodfield, J.  [1985].  Quest for the Killers.  New York:  Hill and Wang.)  There's also an article on the topic from the Straight Dope.
The point:  An untested assumption caused a correct hypothesis (cannibalism) to be rejected.  Fortunately, cannibalism stopped anyway, and Kuru is now almost extinct (fewer than six cases per year).
b.  Untested assumptions can also cause the hypothesis to be supported strictly because of the assumptions and not because it's correct.
Some examples of hypotheses that are not simple and what's wrong with them:
1)  "If people enjoy football then they are more likely to have violent personalities."  Problem:  "More likely."  If it fails, you can hide behind it, if it succeeds, that could be due to the definition of "more likely."  More simply:  Adding "more likely" makes the hypothesis harder to test and harder to falsify (you hurt the other two properties by including it).
2)  "If people see the full moon then their brains will secrete a chemical that will make them more violent."  Problem:  Is it the chemical secretion that you're testing, or the violence?  What if they're not more violent, does that mean you conclude chemicals aren't secreted?  If they are more violent, does that mean chemicals were secreted?  Figure out if the moon causes the secretion of brain chemicals, then test if those chemicals cause violence.
C.  Generating hypotheses:  Keep these principles in mind when generating hypotheses.  The attitude is to look for situations where the hypothesis can fail, and to make it produce clear and precise predictions about the world.  Note we're already seeing the foundations of the falsification approach.  Part of the reason we want to have our hypothesis be incompatible with certain outcomes is so we can set up situations where those outcomes are likely in an attempt to falsify the hypothesis.  Only one counter-example rules out a hypothesis, but it takes an infinite (or nearly infinite) number of positive instances to prove it true.
 
Top
 
III.  Logic of hypothesis testing.
A.  Let's ease into this slowly.  Consider the following example.  You're on the vice squad, and you go into a party with the following rule in mind:  "Anyone who's under 21 can't be drinking alcohol."  Here are some people at the party:
 
Bob
18 yrs. old
Carol
drinking a Coke
Jerry
43 yrs. old
Emily
drinking beer
 
Which two people do you check to see if the rule above is being followed?  (check meaning look at their age or drink depending on which you don't already know).
If you think Bob and Emily, you're exactly correct.  This same situation will underlie all of the logic of hypothesis testing.
Here's how it works:  We have a hypothesis and we want to test it.  We can set up a situation and perform a test, and we want that process to yield the most information about the hypothesis.  Otherwise, we're just wasting our time.  We'll consider each situation above in turn (we'll turn the rule into the hypothesis "If you're under 21 then you can't be drinking alcohol" and pretend we're doing an experiment to test it):
 
Bob A situation where someone who is 18 is drinking something.  He's under 21, so if we look at his drink and it's not alcohol, the hypothesis is supported.  The hypothesis would be false if he is drinking alcohol. CHECK
Carol A situation where someone is having a coke.  We can look at her age, but there's no point.  Our hypothesis doesn't say anything about the ages of people not drinking alcohol. DON'T CHECK
Jerry A situation where someone is 43 years old.  We can look at what he's drinking, but there's no point.  Our hypothesis is only about people under 21, it says nothing about people over 21, so checking Jerry won't tell us about our hypothesis. DON'T CHECK
Emily A situation where someone is having beer.  This can address our hypothesis.  If Emily's under 21, then the hypothesis is wrong.  If she's over 21, the hypothesis is supported. CHECK
 
B.  Now we'll make it more formal.  Some definitions:
The "if" part of a hypothesis is called the antecedent (p)
The "then" part of a hypothesis is called the consequent (q)
So, for our example:
p = "you're under 21"
q = "you can't be drinking alcohol"
A hypothesis "If p then q" leads to four situations we could set up:
1.  present p, look for q  (Bob)
2.  present not q, look for not p (Emily)
3.  present q, look for p  (Carol)
4.  present not p, look for not q (Jerry)
We'll consider each situation in turn with its formal name.  To check your comprehension, make sure you understand how the names map onto the situations.
1.  Present p, look for q.
This is called modus ponens.  It is also called confirmatory reasoning.  It's confirmatory because we're looking for instances where the hypothesis is correct.  (We're selecting people under 21 to see if they're not drinking).  We present p, look for q, present p, look for q, present p, look for q...
This can disconfirm if we present p and find not q.  So, if we select someone under 21 and they're drinking, the hypothesis is false.
Several problems with this approach (it's perfectly valid, logically, these problems have to do with how fast we can accumulate information using the approach):
a.  We tend to be biased towards confirmation.  The situations we set up will necessarily support it.  This is a characteristic of the humans doing science who like their hypotheses and want to find support for them.
One place where confirmation biases come up is stereotypes.  People with stereotypes tend to note instances that support the stereotype, and discount instances that do not support it.  Another example can be had from reasoning experiments.  Here's a brain teaser:  I have a rule that generated this set of things:  a fire truck and an ambulance.  You generate additional examples, and I'll tell you whether or not my rule would have put them on the list.  When you think you know the rule (100% certain), tell it to me.  (My rule was "vehicles.")  Most people think "emergency vehicles," which is too specific.  Then, they test by thinking up more emergency vehicles.  The appropriate test would be a non-emergency vehicle.  That way, you can prove that your first guess is wrong.
As an example of this, consider the hypothesis (now defunct) that brain size is an index of intelligence.  The implication of this for the researchers who studied it was that different races would have different brain sizes, and so could be ranked in terms of intelligence.  The idea was wrong, but confirmation biases played a role in perpetuating it a lot longer than the data would have supported.  The lecture contains more on this.  The controversy is from Gould, S. J.  (1981).  The Mismeasure of Man.  New York:  W. W. Norton and Co.
How might coincidences be a result of confirmation bias?  Check this article from Skeptical Inquirer.
For bible code debate examples presented in class, check here and here.
b.  We end up wading through a lot of crap because of the volume of research you do showing all of the contexts in which the hypothesis holds.  Each time you support it I can say "yes, but how about this?" and you have to support it again in that situation...
c.  No matter how many times you support your hypothesis there's always that element of doubt.  You've shown me 500,000,000 people under 18 not drinking alcohol, but that very next person you check could be the one that rules out the hypothesis.  No matter how many times you confirm, we'll never know for sure if the counter-example is just waiting to be found.
d.  Why it isn't much use in research.  If your hypothesis is something like "If my theory is true then I will find data to support it," you can"t know if the theory is true to do the test in the first place.  Finding data to support it is affirming the consequent.
2.  Present not q, look for not p.
This is called modus tollens.  It's also called disconfirmatory reasoning.  We're looking for situations where the hypothesis will fail (find people who are drinking and see if they're under 21).
This has a good chance of disconfirming.  If we find one drinker under 21, then the hypothesis is ruled out.
This approach has a big advantage over modus ponens.  Using modus tollens we only have to find one case where the hypothesis fails and we're through.  If we use modus ponens we have to find too many to count where it succeeds.
3.  Present q, look for p.
This is called affirming the consequent.  It's NOT VALID.  Look closely at the hypothesis.  It says "when you see p you'll see q".  It does not say "when you see q you will see p."  But, that's the test you're performing here.  No matter what you find (present q and get p or not p), nothing happens to the hypothesis.  So, you're not learning anything here.
Aside:  The name makes sense.  The consequent is the second part of the hypothesis, and you're affirming it.  So, don't try to memorize these as arbitrary strings, think about the meaning.
4.  Present not p, look for not q.
This is called denying the antecedent.  It's NOT VALID.  Again, the hypothesis says "when you see p you'll see q."  It does not say "when you see not p you'll see not q."  You might see q without p, there's nothing in the hypothesis to rule that out.  So, looking at a case where p is missing is pointless.  Again, note that the name makes sense.
Here's a table that might help make this all clear.  (mind your p's and q's)
 
Present Find Outcome
Modus ponens 

p

not q
hypothesis is supported 
hypothesis is not supported
Modus tollens 
not q 
not q
not p 
p
hypothesis is supported 
hypothesis is not supported
Affirming the consequent 

q
not p 
p
hypothesis is not affected 
hypothesis is not affected
Denying the antecedent 
not p 
not p

not q
hypothesis is not affected 
hypothesis is not affected
 
Let's have another example:  If it's a lemon, then it's sour.
p = "it's a lemon"
q = "it's sour"
Situations:
1.  present something sour, see if it's a lemon:  affirming the consequent, no information.  The hypothesis doesn't say anything about sour things, it says something about lemons.
2.  present a lemon, see if it's sour:  modus ponens.
3.  present something not a lemon (ex. a tree), see if it's not sour:  denying the antecedent.  We didn't say anything about things not lemons, only lemons.
4.  present something not sour, see if it's not a lemon:  modus tollens.
Note the relation to the ideal hypothesis:  The two that are invalid set up a situation where every outcome is consistent with the hypothesis.  No matter what happens, you can't rule out the hypothesis.
C.  You may be asking yourself two questions about now:
1.  How am I supposed to remember all of this?  Cram two things in your head:
a.  present p = modus ponens.
b.  present not q = modus tollens.
Whenever you get a problem, label p and q in the hypothesis, then label all the p's, q's, not p's, and not q's in the questions.  Then, using a and b above, figure out what you've got.  Note:  If it's not one of the two you've memorized, the names are informative enough to reconstruct them.  If you see not p being presented, that's "not the antecedent" or "denying the antecedent."
2.  What am I supposed to take from all this?  Some situations are more informative than others.  It's to your advantage to set up experimental situations that yield useful information.  Furthermore, if you seek to rule out hypotheses, you can get away with even less work.  It takes only one counter-example and you're finished, but you can never provide enough confirmations.
 
Top
 
V.  Induction and deduction.  Now we'll step back a level and look at the bigger picture.  Where do all of these hypotheses come from, and where are all of these experiments going?
A.  Induction:  Induce a rule from a set of specific examples.  So, I get a lemon, it's sour, get another lemon, it's sour, [repeat ad infinitum].  After a while, I can induce the rule "if it's a lemon, then it's sour."  Note:  this is closely tied to modus ponens.  Confirmatory reasoning research puts out all of these examples from which I can induce the rule.  It's also tied to exploratory research.  The reason I'm doing these experiments is I don't know the rule (I don't know enough to have a rule).  So, I try a few contexts and get some results, and then induce the rule.
B.  Deduction:  Once I have a rule, I can use logic to derive predictions from that rule that I can then test.  "If it's a lemon, then it's sour" --> "if I get something that's not sour, then it shouldn't be a lemon."  I can test this prediction.  Note how much that sounds like modus tollens.  Once we have a hypothesis, we can try to make it fail (or identify more precisely the conditions under which it holds).  This is related to hypothesis testing research.  When I know enough I can use deduction to derive hypotheses to test.
C.  The sequence is induction first to get hypotheses, and then deduction to test them.  Here's a picture:
 
Circle of knowledge
 
Here's a rough example.  I observe several people complaining about grades, and they all say the professor made the test too hard.  From this data I induce the rule "people attribute things to external sources."  That becomes my theory.  Using deduction, I predict that people with good grades will also credit the professor.  I collect some more data.  The people with good grades say it's because they studied.  This data leads me to induce a new theory "people attribute bad things to external sources, and good things to internal sources."  I predict that this will be true for self and other.  I test this by asking people at a drive-through line why it's taking so long.  People waiting blame the delay on the cars in front of them ("people order such weird stuff at the drive-through").  But, when they're at the window, they blame it on the workers in the restaurant ("these people are so slow").  I use the data to revise the theory again "people attribute good things to internal sources and attribute bad things to external sources for themselves, but internal sources for others."  We could keep going on, but you get the idea.  Data leads to a theory, the theory leads to a prediction, the prediction leads to more data, ...
 
Top
 
VI.  Wrap-up.  The final spin on hypothesis stuff:
A.  Some situations are more informative than others.
B.  Falsification is better because it only has to happen once.
C.  There are sometimes limits on the kinds of causal statements you can make.
D.  There are different ways to propose hypotheses that are related to how much you know and what kinds of experiments you do.
 
Top
 

Research Methods Notes 3
Will Langston

Back to Langston's Research Methods Page