Langston, Cognitive Psychology, Notes 9 -- Categorization

Note:  A lot of the demonstrations for this unit were derived from Reed's textbook or the instructor's manual by Reed and Pusateri.

I.  Goals.
A.  Where we are/themes.
B.  Kinds of categorization.
C.  Theories of categorization.
II.  Where we are/themes.  We're finally moving into higher cognition.  We're drifting away from describing parts (representation) and looking at process instead.  Note that this won't be perfect since representation will still determine process.  But, we're going that way.  This week's topic is categorization.  This is how you partition the continuous experience of the world into discrete things.  Categorizing does four things for you:
A.  Reduces complexity.  Instead of having to treat every experience as unique, we can put things in classes and save some work.  For example, if you get bitten by a bee, you can avoid getting bitten by a wasp if you can see the similarities.
B.  Allows identification.  Think about this:  How would you know what something was if you didn't categorize it?
C.  Reduces the need for constant learning.  You can deal with classes of things instead of tons of individuals.  That can help, but as we'll see with person perception, it can also lead to stereotypes.
D.  Allows for action.  If I know something belongs to the category of things that are likely to eat me, I can run away from it.
What we'll do in this unit is look at kinds of categories and how you learn them.  Then, we'll consider theories to explain categorization.  We'll wrap up by looking at some implications of categorizing. 
III.  Kinds of categorization.  Categories can be defined in a number of ways.  We will look at these in some detail.  How you define a category will impact how it is learned.
A.  Logical categories.  A category can be defined according to some rule.  For example, a dog might be defined by the conjunction of two features (has 4 legs and barks).  So, anything satisfying the rule is called a dog.  There are four kinds of logical rules.
1.  Conjunction:  Join two features with "and."  For example, has 4 legs and barks.
2.  Disjunction:  Join two features with "or."  If either feature is present, then it's a member.  For example, we might define a mammal as warm blooded or live birth.  That lets in platypuses, which would otherwise be out.
3.  Conditional:  If...then...  For example, to decide if something is a mammal you might say "if lays eggs, then warm blooded."  Again, this lets in platypuses, plus everything that doesn't lay eggs.
4.  Biconditional:  If and only if...then...  For mammals, we might say "if and only if lays eggs, then warm blooded."  This lets in platypuses, but rules out anything that's warm blooded but doesn't lay eggs.  In other words, this is a lousy rule for mammals, but I couldn't think of another one.
Let's try the demonstration to make sure we all understand.

Demonstration:  I'm going to show you circles, squares, and triangles.  They can be large or small, and they can be red, green, or blue.  I will tell you a rule, we'll look at each example, the goal is to prove you understand the rule by classifying them all correctly.  For example, I might have the rule blue and large.  If I show you a small, blue square, you should say "no."  If I show you a large, blue square, you should say "yes."
If you're following along with the notes, don't read the answers, pay attention to the lecture at this point.
Conjunction:  Large and red.  (Numbers 7, 16, 17.)
Disjunction:  Square or green.  (Numbers 1, 2, 5, 6, 7, 8, 9, 12, 15, 18.)
Conditional:  If it's blue, then circle.  (Numbers 1, 3, 4, 5, 6, 7, 9, 11, 12, 13, 15, 16, 17, 18.)
Biconditional:  If and only if it's a triangle, then blue.  (Numbers 5, 6, 7, 10, 12, 13, 14, 15, 16, 18.)

The Appendix has the images for this demonstration.

The rules vary in complexity.  Conjunction is the simplest, biconditional is the hardest.  If you look at people learning these, that's what you see. 
What does all this mean?  The more complex the rule is that determines category membership, the harder it is to learn.
Criticism:  Real categories don't work like this.  They have continuous and probabilistic features, there might not be a rule that tells things apart.  For example, think about "game."  What's the rule that defines games?  If I'm on top of my game, I should be able to shoot down any rule.
B.  Natural categories.  Look at real world categorization.  You'll notice a few properties.
1.  Continuous variables.  For example, various colors (a continuous variable) all get grouped into one category (for example, yellow).
2.  Graded membership.  Which is a "better" mammal, a whale or a bear?  Which is a better even number, 4 or 106?
3.  Hierarchical organization.
a.  There is a superordinate level.  This is for general classes of things, like furniture.  At this level, the things in the category can be pretty dissimilar.
b.  There is a basic level.  For example, chairs.  The things in the category are all pretty similar.
c.  There is a subordinate level.  For example, living room chairs.  These things are even more similar.
The basic level seems to be the one where people prefer to work (Rosch).  Why?  It has to do with features.  At the superordinate level, there is little feature overlap.  What relates chairs to refrigerators, other than being furniture?  At the subordinate level, there is too much overlap.  Living room chairs are all very similar, they are hard to distinguish from one another.  In fact, telling things apart at this level takes some expertise.  Depending on which one of these is your area, you can see this if we think about subordinate categories of trees, insects, or psychologists.  I could do very well telling apart the subordinate category "cognitive psychologists."  I would do pretty bad on "beetles."
At the basic level, the amount of feature overlap is just right.  Chairs share a number of common features, so there is overlap.  But, it's still easy to tell chairs from trees, so there's not too much overlap.
Evidence for basic level categories?  Three sorts.
1.  List features.  I present you with some categories.  List all of the features of these categories that you possibly can.  Previous research indicates that superordinate categories only get a few features (for clothing, only two).  Basic level get a lot (pants got six).  Subordinate only get a few (Levi's only get one).  Let's demonstrate that here.

Demonstration:  Here are some categories.  List all of the features of these categories that you possibly can.  The features you list should be things that all members of the category share.
What should happen is that the superordinate only has a couple of features.  Then the basic adds a lot more.  Then the subordinate only adds a few to that.

What this demonstrates is that at a feature level, basic is where all of the action is.  That's where the most similar and dissimilar features come into play.  Note that "basic" is a little arbitrary, but counting features gives us a method for determining it.
2.  Identify category members.  If I ask you to verify that things are in a category, and measure the time, I get differences between the levels.  So, I might ask if a picture is a living-room chair, a chair, or furniture.  If I then show you a living-room chair, the answer to all is "yes," but the fastest responders are at the basic level.  Somehow, you start there and then compare more generally for superordinate or more specifically for subordinate.
3.  Typicality.  Some things are better members of the category than others.  If you ask people to rate typicality, you get pretty consistent patterns.  Rosch and Mervis (1975) had people make these ratings.  Let's see what we get.

Demonstration:  I've got members of two categories.  I want you to put a 1 by the thing that's most typical of the category up to a 5 by the thing that's the worst example of the category.
_____  Car 
_____  Elevator 
_____  Sled 
_____  Tractor 
_____  Train
_____  Jacket 
_____  Mittens 
_____  Necklace 
_____  Pajamas 
_____  Pants
When we compute averages for these, we should find really good agreement.  Originally, these ranked 1, 4, 10, 15, and 20.  Can you guess which were which?

What does this mean?  The basic level is where typicality comes from.  The typical ones share a lot of features with other members of the basic category and differ from other basic categories.  The atypical ones don't.
To test this, compute a measure of family resemblance.  For each item in a category, list all of its features.  Then, for each feature of an item, count the other members of the category that share the feature.  Add them all up, and you get resemblance.  For example, if you're looking at animals, and 4 legs was a feature of dogs, then you could count other members of the category animals that had 4 legs.  If there were 20, then the score would be 20.  Then, if barks was the next feature, and four animals barked, you would add in four to get 24 and so forth.
The final step is to correlate family resemblance and typicality.  They should be related based on what we've been saying.  They are (r's between .84 and .94, which are really high).
C.  Goal directed categories.  There are some categories that have varying membership.  The defining characteristic is related to a particular goal.  For example, the category of things to rescue from the house in a fire.  Or things to eat on a diet.  The organization of these categories is around ideals.  For example, if your ideal vacation is a cruise, then membership in the category of good things to do on a vacation will be based on that.  These categories throw off everything that I've said previously (they're not logical and they don't follow the rules of natural categories).  What do they tell us about cognition?
IV.  Theories of categorization.  These are all going to boil down to some kind of feature analysis.  You might think back to pattern recognition and refresh your memory as to the pros and cons of using features.
A.  Classify by comparing to specific examples.  Basically, compare every thing you have in memory to the thing you're trying to classify, go with the category that contains the thing that provides the best match.  This sounds insane, but I'll present a model later that does something like this.  The simple version of this could lead to a lot of mistakes.  For example, you might call a whale a fish instead of a mammal because of the neighbors in the two categories.
B.  Classify by looking at the average distance.  Compare to every thing in memory, compute a distance from each thing, average the distances for things in every category, go with the category that has the lowest average distance.  Again, it's a bit wacky.
C.  Feature counting.  Compare to every thing in memory, feature by feature.  Count the number of matches.  Go with the category with the most matches.
D.  Prototypes.  Extract an average for each category.  That average is called a prototype.  It's kind of like the ideal member.  Then, compare the new thing to the prototype and see whether it matches.
Evidence:  Posner and Keele (1968).  Make a prototype out of dots, like a triangle.  Then, create category members by moving around the dots.  Move them a little bit, a medium amount, or a lot.  The amount of movement leads to variability (differences amongst category members).  The more you move the dots, the more variability there is.  Then, have people learn to classify the examples (don't show the prototype).  Two things happen:
1.  The prototype is recognized as a member of its category, even though it was never seen before.  Prototype recognition is usually the best, indicating that people have developed a representation of it, even though they never saw it.
2.  The amount of variability in training really matters.  If you trained with a highly variable set, you can categorize examples you never saw before.  This is true even if the new examples are really variable.  But, people trained with low variability are poor at classifying new examples that are highly variable.
So, categorizing appears to be based on knowing the prototype plus some estimate of how variable the category is.

CogLab:  We'll look at the results of our prototype demonstration.

Let's look at a more developed prototype model.  This has a lot of the parts of an ideal theory:  Hintzman's trace model.
1.  We need to define two terms.  Schema:  A configuration of typical knowledge.  This can be a frame:  A representation of an object (like a house has walls, roof, windows, door, etc.), or a script:  A representation of a typical event (like going to a restaurant involves sitting down, ordering, eating, etc.).
Prototype:  The idealized representation of a concept.  This is the perfect exemplar of a concept (exemplar:  an example of an item in a particular category or concept).  For example, the prototypical bird is a robin (has wings, flies, sings, lays eggs, nests in trees, eats insects, etc.).  A non-prototypical bird would be a penguin or a chicken.  They have some of the properties, but not all of them.
2.  Traditionally, people assume that schemas and prototypes are abstracted as a result of experience.  Some executive process looks through all of the birds you've seen and figures out what the ideal bird is, then represents that as a prototype (this may be an automatic process).  Experimental evidence suggests this (see above).
In a typical experiment I make a random dot pattern.  This is my prototype.  Then, I vary a few dots to make exemplars of the category represented by that prototype.  I might make three prototypes with three exemplars of the first prototype, six of the second, and nine of the third.
Then, I have people classify the exemplars (they never get to see the prototype).  I say "Which category, 1, 2, or 3, does this belong to?"  At first, people miss them all since they don't know anything about the categories.  We keep going and I keep giving feedback until they can do all 18 perfectly.  Then I test to see if they "abstracted" the prototype.  Evidence that they have (as described by Hintzman):
a.  Prototype classification (ask people to assign the prototype to a category) is more stable over time than the exemplars they actually saw.  This suggests that the prototype was abstracted and is in memory.
b.  Old exemplars are classified better than new ones.
c.  Classification of unseen stimuli is best for prototypes followed by low-level distortions followed by high-level distortions.  This suggests that a prototype is in memory, and classification is made by comparing to it.
d.  Transfer is better with bigger categories (more exemplars shown).  Suggests that more variability in the input enables you to do a better job at zeroing in on the prototype.
3.  Hintzman's critique:
a.  The concepts of schema and prototype are very fuzzy and vague.  They don't really have any predictive power.  We need something more specific.
b.  A multiple trace model accounts for these findings plus others.
c.  We shouldn't postulate processes (prototype abstraction) or representations (multiple memory stores) if we don't have to have them.
4.  The model:
a.  Memory contains a trace of every experience.  The trace is made up of primitive features (what these are is not too well specified, but they're low level, and few in number relative to the number of traces in memory).
b.  A retrieval is sending a probe through all of these traces based on what's in working memory (what's currently activated or conscious out of all of the traces).  An echo comes back.
c.  The echo has intensity (overall similarity to the probe).  This is akin to familiarity.  It also has content (the sum of the contributions of all of the traces).  The content is akin to the memory.
For example, I see an exemplar and need to know what category it belongs to.  Some features are the exemplar, some are its category name.  I have the exemplar in working memory, send it through all of my traces, and get back an echo.  The content of the name part of the echo is my response (as in "Category 1").
Note that this memory is content addressable.  You don't have to know where in memory something is.  Instead, traces are activated based on their similarity to the trace in working memory (they're activated based on their content).
5.  Does it work?
a.  A basic simulation shows that you can get it to recall something that looks like a prototype even with no prototype in memory (the echo is more correct than any of the exemplars).  So, it looks like it's abstracting, but it really isn't.
b.  Comparison to the classic human experiment yields almost the exact same pattern of results.
So, prototype models were winning, but when you get down to the nuts and bolts, there isn't really any need for a prototype to get evidence of one.  Similarity is sufficient, the math will make it work out like you need it to.
E.  Fuzzy-set theory.  A person named Zadeh has proposed that classic logic is too restrictive for categorization.  In classic logic, only two truth values are allowed, 0 (false) and 1 (true).  This is fine for a robin as a member of the category bird (true).  It's also OK as a robin as a member of the category fish (false).  But, a lot of ordinary categorization doesn't work like that.
One place it shows up is something called linguistic hedges.  I might say "loosely speaking, a whale is a fish."  The hedge is "loosely speaking" and it limits the extent to which I'm putting whales in the fish category.  Or, consider "technically speaking, a whale is a mammal."  Again, the hedge illustrates graded membership.  A whale is a mammal, but it's not a very good example.  A whale is not a fish, but it's pretty close.
Fuzzy logic allows for graded truth values.  A whale can be a mammal to degree 0.7.  In that case, a whale is a lot like a mammal, but it's not all the way in the category.  This solves some serious problems for categorization.  Consider the category "tall men."  Who is in the category?  The boundary between tall and short is fuzzy.  So, we need fuzzy logic to account for it.  A man who is four feet tall is in the category "tall men" to degree 0.0.  A man who is eight feet tall is in the category to degree 1.0.  Someone who is six feet tall is in between.  This also allows us to escape some paradoxes.  If you take a man with a full head of hair and pluck one hair, is he bald?  Now pluck another.  And so on.  Pulling one hair at a time doesn't seem to ever lead to calling someone bald, but at some point we'll be calling a rather bald guy "not bald."  If we had fuzzy values, we could say "bald to degree 0.6."
How does this fare when you try to test it?  First, we need to know a bit more logic.  There are two big relations:  conjunction (and) and disjunction (or).  Traditional logic has rules for handling these.  You usually use a truth table.  For AND:



For OR:



How do we do it for fuzzy logic?  One popular AND rule is minimum.  Figure out the degree of membership in the two categories separately, and then take the least, that's the degree of membership in the conjunction.
Test:  A guppy is a fish, a guppy is a pet, a guppy is a pet fish.  The "pet fish" shouldn't be larger than the smallest of pet and fish.  There are some categories that violate this.  Storms, De Boek, Van Mechelen, and Ruts (1998) had participants rate membership in various categories.  For example, people rated AIDS as a disease and as a cause of death (separately).  Then, they rated AIDS as a combination of disease and cause of death.  The combination was the highest.  This shouldn't happen under the minimum rule.  Some other examples:  Panties as underwear and pants, Noriega as a military man and a political leader, and a car crash as a murdering method and a suicide method.  They list more.  What this means is that the minimum rule (at least) doesn't apply to categorization.
That's not to say that other methods of conjunction won't work for fuzzy logic.  Fuzzy logic has some nice properties that other models don't.  As an exercise, you might ask how fuzzy logic could account for data from the other approaches.

Cognitive Psychology Notes 9
Will Langston

 Back to Langston's Cognitive Psychology Page