Psychology, Notes 2 -- Pattern Recognition
A. The scheme.
II. The scheme. How do you know what you're looking
at? (Hearing? Touching? Smelling?
That's the question for this section. Keep in mind that our
program here will be a move from inputs to outputs. We're going
try to work from the bottom (identifying stuff in the world) to the top
(thinking). Remember that this is the basic model we're working
This unit will be about the sensory store (and pattern recognition
needed to get the system going). There are three components to
A. Input. This is primarily about sensory systems.
Some very brief storage of information takes place and attention
some of that stuff for us to identify.
B. Identification: The first stage in getting the
What are the parts of the thing (features). Or, if you're
bent, what is the overall gestalt of the thing. This could be
via low-level feature modules or it could be more complicated.
thing we know: More information leads to better identification.
C. Attach meaning (recognize): This is the part where it
gets interesting. What is the representation of meaning?
What processes act on that representation? A lot of books just
equate recognition with identification,
but that's cheating. It's one thing to know the letters of a
The meaning of the word is entirely different from its letters or even
how the letters are arranged. You might be able to experience
if you repeat a word over and over. For example, you might never
have thought about how strange a word like "over" is until you study it
separate from its meaning.
We'll handle each of these stages in turn.
III. Input. As you should remember from our basic
architecture, there's a brief sensory store that holds information for
processing. Whenever we look at one of the boxes we'll generally
ask: What's the capacity, what's the duration, what's the code,
how does information get removed? Sperling (1960) did an
experiment working out the properties of this store. Here's his
You look at a grid of letters presented for a very brief period of
time (1/20 second):
G T F B
Q Z C R
K P S N
Report. We'll look at the class data.
Then, I ask you to tell me the letters (whole-report technique).
Or, I can play a tone. If you hear a high tone, report top
Middle tone report middle row. Low tone report bottom row.
This is called the partial report technique.
Using the partial report technique, Sperling found that people could
normally report about three letters per row. So, they can report
about 75% of the information in the display. That's the capacity
of your sensory store.
Then he manipulated the time before the tone. If you play the
tone immediately, people get between 9 and 10 letters (multiply three
row * 12 letters and you get around 9). Then he delayed the
The delays were 150 msec, 300 msec, or 1 second. By one second,
were back to around 4.5 letters, which is the same amount as the whole
report technique gets. So, the information in the sensory store
to decay within one second.
You might be asking yourself why this is relevant. The
is relatively continuous, so if you miss something, you can look
In other words, the duration and amount of information in the sensory
might just be an artifact of the situation, and in the real world, it's
all irrelevant. This is a nice cognitive psychology
Either your mind works contrary to intuition (using brief snapshots of
information for further processing even though the original persists in
the environment) or a laboratory result has no relevance for real
Your book has a nice discussion of how one might decide which of these
views is correct. Some nice reaction papers could come from this.
Conclusion: The raw sensory data for pattern recognition is about
75% of the input, and lasts for less than one second. How do you
go about using this information?
IV. Identification. We want to see how much we can
do based entirely on the inputs. Bottom-up processing is
that is unaffected by higher cognitive processes (like knowing what you
think you're looking at). It's driven by the stimulus.
processes are coming from higher cognitive processes. If we can,
we want to avoid talking about top-down stuff. If there's enough
information in the input to do the work, then we don't need anything
That's antithetical to cognitive psychology, but it can at least
us that cognitive stuff is involved.
A. There are three ways that identification might work.
I'm going to start with a list now, and later we'll discuss potential
1. Template matching. You store a perfect copy of
that you might encounter. Then, when you see something (say a
you compare that image to everything in memory and you take the best
a. Instance theories of memory (you store some trace of
you encounter) work pretty well. For example, Hintzman (1986)
a model that recorded a trace of experience based on "transducable"
The details of an experience would include things like emotional tone,
color, odor, etc. as well as primitive abstract relations like above,
etc. Memory retrieval (or recognition) was based on an echo of
traces currently in memory. So, the echo is like a template, but
it's abstract. If instance models can store a trace of every
the number problem with templates (discussed below) isn't as big a
b. Perceptual priming effects (repetition priming).
Jacoby (1983). People look at a list of words. They can do
three tasks with those words. The condition we're interested in
reading the word. Then, they get either an explicit or implicit
test. For explicit, the old words are mixed with new words, and
circle the one they saw before. For the implicit test, people
the words under constrained conditions (the words are presented really
fast). As you can see in the results, the implicit task actually
showed better performance. The point: You could easily
a template of every item that you encounter because you'd have to store
a copy of everything to produce repetition priming.
Problems with the template model:
a. Too much memory (too much variability). In spite of
what's above, there are a lot of things to store with this model.
Maybe too many.
b. Problems in matching (orientation, position, size, etc.).
c. Don't know what produces the match (two things can be close,
but different; without analyzing features, how can you tell which is
d. Can't produce two interpretations of the same thing (for
2. Feature models. Everything can be broken down into a
set of features. This set has these properties:
a. The features are critical (they allow you to tell things apart
because different things have different features).
b. Features are the same when brightness, size, and perspective
c. The features yield a unique pattern for every input.
d. Reasonably small number of features.
If you do the feature part well, you get 2N patterns that
you can identify with N features. So, with 2 features, you can
4 things. With 8 features, you can identify 256 patterns.
20 features, you can identify 1,048,576 patterns. (Note that a
game of 20 questions should almost always be solved.)
Gibson (1968) worked out a model for letters. This chart has
the features (along the left) and the letters (across the top).
this set, you can distinguish typed, capital letters in English.
a. Confusion matrices. For example, if you compare G and
W it takes on average 458 msec, but it takes 571 msec to compare R and
P (R and P share more features). Note that this is also
with template matching. If you work out a complete matrix for
you can make an argument about which pairs are harder based on
If we look at Gibson's table, we can probably do a little of
(Gibson, Schapiro, & Yonas, 1968)
b. Cluster analyses. If you look at a bunch of comparisons
and figure out what goes together, shared features seems to be an
determinant. For example, curves separate from straight lines,
C and G form a category. The complete cluster analysis is
here. You can see how letters
into smaller and smaller groups on the basis of shared features.
c. Face recognition. Caricatures that exaggerate
features are recognized faster than faithful line drawings.
Brennan, & Carey, 1987)
d. Brain organization. Certain cells respond best to lines
of a particular orientation of a particular length. Your brain
to be looking for features.
Two examples of this. In frogs, if you record from retinal
cells (which bundle together to form the optic nerve), you can find
cell types: 1) stationary edges, 2) moving edges,
dimming, and 4) small, moving spots. Most of these cells
also location specific, so they wanted a particular thing in a
spot. (Lettvin, et al., 1959)
Farther back in the cortex are cells that are not location or eye
but still look for particular features. For example, simple
cells look for edges of a particular orientation, from vertical to
and angles in between. (Hubel & Wiesel, 1963)
Both of the neural studies just discussed suggest that feature
is an integral component of brains doing recognition, so our cognitive
models should probably take that into account.
Problems with feature models:
a. It's hard to get a set of features with the properties we
want. Letters are pretty easy, but what about things like desks
chairs or dogs and cats. Do we need custom sets for different
of things? Where do they come from?
b. It's hard to tell if these predictions are different from
c. When you ignore how features combine, you're making a big
mistake. You need to know the relationships as well as the
3. Structure models (recognition by components). As the
gestaltists say, the whole is greater than the sum of its parts.
Analyzing the features may be a step, but putting them together is
the action is. The analysis involves grouping things together
go together (the general problem is figure-ground segregation:
do you tell the object from the background?). Here are five
a. Proximity: Stuff that's close together is part of the
b. Similarity: Stuff that looks alike goes together.
c. Continuity: The perceptual system prefers continuous
interpretations to discrete interpretations.
d. Closure: Closed figures are preferred.
e. Connectedness: Stuff that's joined gets grouped as one
a. Object recognition stuff shows that eliminating the edges
isn't as bad as eliminating the vertices (the vertices show how the
go together). So, features without relations is not much help.
V. Recognition/meaning. Assume we know the features
and their relationships, are we done? Not really.
evidence suggests that meaning is separate from identification.
had a patient who could make out letters and letter features, but could
no longer figure out what words meant. In prosopagnosia, a person
loses the ability to recognize faces. Generally, face
is fine, the faces just lack meaning. Other disorders also
that getting all the information for meaning and the meaning itself are
separate. Let's look at some kinds of recognition that you do and
see how you get meaning.
A. Letter recognition. A good place to start is with a
survey of the inputs for letter recognizers. These could be
letters or handwriting. I'm going to skip handwriting because
so complicated. Let's look at some properties of machine
(See Crist, W.B., & Lockhead, G.R. (1980). Making
distinctive. Journal of Educational Psychology, 72, 483-493 for
example of research on this topic.)
1. Font characteristics:
|A list of some basic properties
|a. Serif vs. sans-serif: A serif is the little
mark on the tops and bottoms of vertical lines in fonts (the line on
bottom of this f). Originally, it was used when cutting letters
stone to prevent the stone from cracking, but it was preserved out of
Luckily, serif is better.
b. Weight difference: Some lines in a character are thicker
than others (e
c. Bias: The fonts can be on a bias or they can be
d. x-height: How much height is devoted to the body of
the characters (how tall the x is).
e. Spacing: Some characters are wider than others (i vs.
e). Typewriters force these to take up the same amount of space,
but it's not required in printing (proportional is when they take up
the required space) (piece vs.
f. Proportions: How big is the x-height relative to the
heights of ascenders and descenders (parts going above and below the
of the letter). There is an optimal proportion for each font.
More is better
2. Impact of features. Fonts can have a huge influence
on identification. As an example of the importance of features on
letter identification, consider an experiment by Neisser on visual
You scan for an 'X' in a field of 'Z's and 'N's or a field of 'O's and
'P's. 'X' is easier to see in letters with different features
letters with similar features. Try it:
|N N Z N Z N Z N Z
Z N Z Z N Z Z N N
N N N Z N X N Z N
N N Z N Z N Z N Z
Z N Z Z N Z Z N N
|O O P O P O P O P
P O P P O P P P O
O O P P O X P O P
O O P O P O P O P
P O P P O P P P O
The X should "pop out" of the grid with dissimilar features.
A similar process of analysis-by-features probably takes place in
making it very important to understand the features.
B. Word recognition. Once you have letters, you need to
recognize words. The letter features still apply, but now we add
some additional features.
1. Word envelope. If you outline the word, that's the word
envelope. This holistic feature may help in identification.
2. Spelling rules (orthography). If there are rules that
govern spelling patterns, then knowledge of these rules can help the
of identifying words. The system for English was worked out by
a. Some of the big rules:
1) Avoid letter doubling.
2) VC V, V C C V, V C: A vowel before a
is long, a vowel before a C-C-V is short, a vowel before a consonant is
a) To override V C, add a dummy 'e' to get V C V.
"fin", "can" vs. "fine," "cane."
b) To override V C V, you have to double. Example:
3) Especially avoid doubling at the beginnings and endings of
a) Except for ff, ll, ss.
b) Except for 3-letter words (egg, inn, add, ebb).
18th century editors decided to reserve 2-letter words for function
Why do these features matter?
1. They give you hints at pronunciation even if you've never
seen the word before ("mabe," "mab," "mabing," "mabbing").
2. They help you to know what letters to expect in a particular
3. Example of orthography constraints: Word superiority
effect. Letters are perceived better in words than alone or in
words. Imagine the following experiment (Reicher, 1969):
People can identify the letter better if it was in a word than
There are other examples of similar effects. For example, Huey
that words can be perceived at distances that are too great to perceive
the letters that make up the words. All of this suggests that
are the unit. But, you clearly have to see letters too.
Superiority. We'll look at the class data.
Note that the finding is paradoxical. The word helps you identify
the letters, but you should have to look at the letters before you can
identify the word.
Miller, Bruner, and Postman (1954) show how word superiority could
be due in part to spelling rules. They made strings of letters
get closer and closer to English. A zero-order word would be
This word is unrelated to English spelling. A fourth-order
would be VERNALIT. All successive sets of four letters match
spelling. The closer to English a string is, the easier it is for
people to remember it.
The best model for explaining word superiority is the
model (McClelland & Rumelhart, 1981). This model has three
of nodes. Feature nodes take features as input. Letter
are activated by feature nodes. For example, if you have / and ),
that would activate 'D', 'R', 'B', etc. Word nodes are activated
by letter nodes. For example, if the letters 'W', 'O', and 'R'
activated, "WORD" and "WORK" would be coming on. Information from
letters also feeds down to features. If you're pretty sure you
a 'K', then you can suppress all the curved features. Words also
feed down to letters. If you think it's "WORD," you can knock out
the 'K'. If you look at graphs of letter activations, you can see
how this model produces word superiority. When you have other
they can activate a word, and the word can help you with the
When the letter is by itself, it doesn't get this help.
C. Speech perception. This is a special problem for
You hear lots of speech, it's really hard to identify the sounds, you
it anyway. How?
The first thing to ask is: How can we describe speech?
Some terms: A phone is a sound. You can make around 4,096
However, in a database of (nearly) every human language, only 869
phones are used. Most of these are very rare (occurring only once
or twice), with a smaller set (100 or so) accounting for most of the
used in languages. A phoneme is a sound that changes
Languages cluster phones together to get a phoneme. The
phones within a phoneme are called allophones. A phoneme conveys
meaning in the sense that changing from one phoneme to another will
the meaning of the word being produced (as in going from "bit" to
one sound changed and the meaning changed with it).
How do languages "choose" phonemic differences? The general plan
is to maximize distinctiveness. Look at the plot of vowel space
These graphs plot the first frequency component of vowel sounds against
the second. The line represents a boundary between sounds humans
can produce and sounds that they can't. Inside the line is
outside isn't. If a language has just three vowel sounds, odds
it uses the three in the first picture. With five, it's likely to
use the five in the second. Note how this spreads the vowels as
as possible in the space that humans can produce. This makes them
easier to discriminate while listening to speech. The general
is "ease of articulation is secondary to distinctiveness." For
it's easier to say the vowel in "bit" than in "beet," but the one in
is more discriminable, so it's much more likely to be used.
There are two ways to classify speech, and both have their special
features so we'll discuss both of them:
1. Articulatory phonetics: Articulatory phonetics describes
speech sounds in terms of the vocal tract mechanisms used to produce
The sounds come out of a system involving an air source (lungs), a
source (larynx: vocal cords), and filters (pharynx: chamber
in the throat, mouth, and nasal passages). Speech signals can be
analyzed according to the contributions these parts make. The
source can allow the sound to be either voiced (you make sound) or
(no sound). The filters can disrupt the air flow by stopping it,
causing turbulence, or modifying it. These disruptions can take
in several locations along the vocal tract. This leads to three
along which sounds vary: voicing (voiced or voiceless), manner
the air flow is disrupted), and place (where the air flow is
Overall, a sound is: air + voicing + manner + place.
This matters because a prominent model of speech recognition is the
motor theory. The idea is that you recognize speech by mentally
out how you would have to position your mouth to produce the
Obviously, knowing something about sound production would make a big
2. Acoustic phonetics: The other way to characterize the
speech signal. We're no longer interested in describing how it's
produced. Instead, we want to know what is produced.
a. Primary methodology: Spectrogram: Plot frequency
(of a sound) with duration and intensity. How does it work?
Imagine a long row of tuning forks, each responding to a particular
No two forks are alike, but the differences between each one are very
Arrange these in order from highest to lowest pitch. Then, hook
electrode to each that sends a charge when it vibrates. Hook a
to the other end of the electrode. When you pass a sound over
forks, each fork will only vibrate if its pitch is in the sound.
So, there will only be marks on the paper corresponding to the pitches
in the sound. This produces a spectrograph: a recording of
the pitch (frequency) components of a sound.
Some important points about spectrographs:
1) The sounds you make are complex. In other words, they're
composed of many different frequencies. The spectrogram breaks
into these frequencies. Loosely speaking, each dark band
a frequency. Each band is called a formant. These are
from bottom to top. So, F1 (first formant) is the lowest, then
Formant transitions are places where there's a sharp rise or fall in
a formant. Generally, these correspond to consonants. A
state is a place where there is little change in a formant. These
generally correspond to vowels.
2) The darker the band is, the higher the intensity (loudness)
of that sound.
3) As you go from left to right you can see how the sounds change
Now that we know how to read these, we can look at a topics in speech
perception that makes it such a hard task.
b. Problems in perception: Context conditioned
The phoneme is different (physically) depending on what's around it.
For example, the second formant in "di" is totally different from "du",
but hearers perceive both as having a /d/ at the beginning.
The question is: How can two totally different physical stimuli
be classed as the same thing? This problem is also called "lack
invariance." In order to classify sounds they need to be
(invariant). Since they're not, you suffer from lack of
D. How does context affect recognition? So far, it might
sound like the process is data-driven (the inputs dictate what
But, recognition is also conceptually driven (your idea of what you're
looking for affects recognition). Context is one place where this
comes into play. Consider an experiment by Biederman
If you take a natural scene (like a kitchen) and a rearranged version
it (same stuff, but not in the usual arrangement) people recognize
more rapidly in the properly arranged scene. Context (the other
can help when it's accurate because it directs attention to the right
and tells you what's there. There are lots of other context
and we'll return to this when we get to language.
E. So, what about meaning? Let's consider words because
they're a little easier. What is the meaning of a word? Two
simple ideas: The meaning is what the word refers to in the world
or The meaning is the mental image a word evokes. These are
Think about the meaning of the words "young girl." If they mean
they refer to, then they only mean something in the context of some
young girl. You'd probably agree that that isn't correct.
for images, think of a cat. We're probably getting different
but we can easily agree on the meaning. Therefore, the meaning
be more than the image. What is it?
How about a two part model. The meaning is in propositions and
models derived from them. A proposition is an idea unit.
a sentence like "the star is right of the circle." You can make
necessary propositions. You also need a model that you can match
to perceptual experience to see if the statement is true. With
two things, the meaning is specified. This is also probably
but it's an idea.
F. How is meaning grounded? A variant of the Chinese Room
problem is for you to imagine yourself just off the plane in
China. You have a Chinese dictionary. When you ncounter a
sign, you look up its symbols in your dictionary. These take you
to more symbols, etc. At what point would you say you know the
meaning of the sign? Without grounding the symbols in some way,
the answer is you would probably never figure out the meaning.
How do we ground symbols? One hypothesis comes from research on
embodied cognition. Symbols are grounded in the relationships
between our bodies and the environment. The conceptual system can
start with simple relations and gradually build to complex
representations. We'll look at a couple of areas of research as
we think about this:
1. The action-sentence compatibility effect (ACE).
The basic point of this section is that it's not enough to identify the
features or show that people are attending to the features.
Recognition is more than just getting a list or accessing the correct
symbol. For something to have meaning, it has to be more than
Cognitive Psychology Notes 2
Back to Langston's Cognitive Psychology