Psychology, Notes 8 -- Semantic Long Term Memory
A. Where we are/themes.
B. Traditional (symbolic) models.
C. Neural networks.
II. Where we are/themes. Last time we looked at
episodic memory. This time, we're looking at the way your
about the world is organized. As we discussed earlier, long-term
memory is generally assumed to be divided into autobiographical
(called episodic memory) and fact knowledge (called semantic
You first learn information in episodic memory, then organize it into
knowledge. The topic for this unit is the way that semantic
is organized. There are two sides to this problem:
A. What is the structure of the information in semantic
The first half of this lecture is devoted to "traditional" models, the
second half to neural networks. Traditional models do a great job
of explaining the structure. Using carefully controlled reaction
time studies, they can make and confirm predictions about
Unfortunately, there are several versions of these models, and they all
make similar predictions. Neural network models are less explicit
(in one sense), because they use distributed representations.
B. How does stuff get into semantic memory? What process
sifts through all of your episodes and decides what to abstract as
What process sorts those facts into a higher representation?
questions are especially relevant for traditional models, but they
been answered very well. Neural networks blur the distinction
episodic and semantic memories, incorporate learning of information,
make these questions less important.
So, the outline will be to look at traditional (symbolic) models and
try to address the two questions. Then we'll look at neural
and do the same thing.
III. Traditional (symbolic) models. A symbol is
something that stands for something else. For example, "MOON" is
a symbol for the thing in the sky that revolves around the Earth
any word is a symbol). You're so accustomed to seeing "moon" used
to refer to the moon that it's hard to separate the two, but there's
special about the word. It's a relatively arbitrary arrangement
letters that stands for the thing in the sky. It's easier to see
this in a foreign language. You know what "dog" stands for, so it
might look like it's the "right" word, but "caine" probably seems more
A symbolic model of a cognitive process assumes that you have symbols
in your head. They are discrete things (easily separable from one
another) that stand for things in the world. Cognition is merely
manipulating symbols, a lot like using language is manipulating
You arrange words in language to get a meaning ("the dog bit the man"
something different from "the man bit the dog"). You also arrange
symbols in your head to get a meaning. Our first set of models
A. Network models. (Collins and Quillian, 1970)
1. Structure. Your concepts are organized in a collection
of nodes and links between the nodes.
Nodes: Hold the concepts. For example, you have a RED node,
a FIRE TRUCK node, a FIRE node, etc..
Links: Connect the concepts. Links can be:
a. Superset/subset: ROBIN is a subset of BIRD, so these
nodes are linked.
b. Properties. Nodes for properties are connected to
nodes via labeled property links. For example, WINGS would be
to BIRD via a "has" link.
The nodes are organized in a hierarchy. Distance matters:
The farther apart two concepts are, the less related they are.
also have the concept of cognitive economy: Store things one time
at the highest place they apply. So, BREATHES is a property of
animals, store it at the ANIMAL node instead of with every animal in
network. The model included this because it was based on a
model, and computers don't have a lot of memory to spare. It's
clear if your brain has the same space limitations. So, this
not be an important property of the model. A simple network is
2. Evidence: Do the semantic verification task:
"a canary is a canary" vs. "a canary is a bird". The more links
have to travel, the longer it takes to verify. Examples:
|A canary is a canary.
A canary is a bird.
A canary is an animal.
|A canary is yellow.
A canary has wings.
A canary can breathe.
As you go down each list of questions, you add more links to travel
before you reach the answer. You get a pattern of response times
that corresponds to this order. The more links you travel, the
a. Hierarchy: "A horse is an animal" is faster than "a
horse is a mammal". That's not right if it's a hierarchy.
b. Answering no: To say no, the concepts must be far apart
(a bird has 4 legs). But, saying no is sometimes fastest of
Other times, people are really slow to say "no" ("a whale is a
c. Typicality: "A robin is a bird" is faster than "A
is a bird." A robin is a more typical bird than chicken, but
still one link from the bird concept, so there should be no difference
CogLab: We can look at
the results of our lexical decision exercise here and think about how
that might relate to semantic memory.
B. Feature models. (Rips, Shoben, and Smith, 1974)
1. Structure: Concepts are clusters of semantic
Two kinds of features:
a. Distinctive: The defining features. Sort of core
features (like wings for birds).
b. Characteristic: Typical features, not required ("can
fly" for birds).
A comparison involves a two step process. First, do a quick match
on all features. If they're really similar or really different,
"yes" or "no." If the amount of match is in the middle, compare
distinctive features. Then respond. The distinctive
involves an extra stage, and should take longer. Here's what the
model looks like schematically:
Some examples of some features:
Some types of questions:
|A robin is a bird.
||A robin is a fish.
||A whale is a mammal.
||A whale is a fish.
2. Evidence: Use the same verification task. We can
a. Typicality effect. More typical members of a category
are responded to faster because they only require one comparison step.
b. Answering "no." We can explain why some "no" questions
are harder than others. Also why some "no" questions are fastest
c. We also solve the hierarchy problems. It's all due to
similarity, not a hierarchy.
3. Problems: Getting the features is hard
The distinctive features are particularly hard to identify. Think
how many features of BIRD might not be required for something to still
be a bird.
C. Spreading activation. (Collins and Loftus, 1975)
1. Structure: It's a network model, but the length of the
links matters. The longer the link, the less related the concepts
are. When you search, you activate the two nodes involved, and
activation spreads until there is an intersection. The farther
go, the weaker the activation gets and the slower it travels.
can explain a lot of the findings that were hard for the original
2. Problem: Making your model really powerful reduces its
ability to predict. It's not as good when you make a model
CogLab: We'll discuss the
data from the false memory demonstration and think about how spreading
activation could be responsible for it.
D. Propositional models.
1. Structure: For propositions, the elements are idea units
(instead of features). To extract meaning, you get the basic idea
units and then their relationships. Let's do a sentence:
"Pat practiced from noon until dusk."
The propositions are:
(PRACTICE, A:PAT, S:NOON, G:DUSK)
So, we have this thing "Pat," and Pat was the agent of the verb
The source of the practice was "noon," and the goal was "dusk."
elements of the proposition are in all caps to indicate that they're
and not the things themselves.
To extract meaning, extract propositions. This can be a pretty
powerful system if you treat meaning as a grammar. The elements
propositions, the rules come from first-order predicate calculus.
2. Evidence: They can solve some very difficult
To illustrate, I'll use the problem of scope: How broadly do we
a quantifier in a sentence? Consider:
"Bilk is not available in all areas."
It could be that Bilk is available in some areas, but not all of them;
or, Bilk isn't available in all areas (it is available nowhere).
This kind of ambiguity is not syntactic (based on grammar) and it's not
lexical (based on word meanings). The problem comes from where
apply the "not." Let's use the standard approach to a grammar of
meaning to understand the sentence.
First-order predicate calculus: Rules to combine propositional
a. ¬: "It is not the case that."
all> x: "For
(I couldn't find the upside down capital A)
The "meanings" of the sentence:
all>x) (Bilk is available in
x) (Bilk is only available in some places).
all>x) ¬(Bilk is available in
x) (Bilk isn't available anywhere).
E. Schemas and scripts. (Bartlett, Schank)
1. Structure: A list of all of the stuff you know about
a concept arranged in some format. Scripts are most
They are your typical action sequences. For example, you might
a restaurant script that tells you the sequence of events at a
The exact structure is a matter of some debate.
a. For the use of script knowledge: Pompi and Lachman
had participants read a story like the following:
Chief Resident Jones adjusted his face mask while anxiously surveying
a pale figure secured to the long gleaming table before him. One
swift stroke of his small, sharp instrument and a thin red line
Then an eager young assistant carefully extended the opening as another
aide pushed aside glistening surface fat so that vital parts were laid
bare. Everyone present stared in horror at the ugly growth too
for removal. He now knew it was pointless to continue.
It's about an operation, but notice that the important words "doctor,"
"nurse," "scalpel," and "operation" were never used. However,
words are part of your operation script (your knowledge of what goes on
during an operation). When people were given a list of words to
they were likely to "remember" words that were part of the script, but
weren't presented (like "nurse").
A similar thing occurs if people are asked to remember stories where
script-typical events have been left out. People "remember" the
parts of the story.
b. How is script knowledge stored? Barsalou and Sewell
(1985) compared two versions: Scripts are organized around
concepts or they're organized in sequential order. Let's
Demonstration: Half the class follow the instruction
down everything you can think of related to the task of going to the
but generate from the most central action to the least central
The other half follow the instruction "write down everything you can
of related to the task of going to the doctor, but generate from the
action to the last action." Time for 20 seconds, count the number
of actions. The sequential group should get more.
The results indicate that the organization of scripts is based on
order and not importance.
F. What do these models have in common?
Abstraction. Some process has to go through what you know and
organize it into concepts (like pick out the distinctive features or
properties to a node). The final concepts are a construct made
inputs, not the inputs themselves. This hasn't been very well
IV. Neural networks. So far, we've been pretty
in our analysis of mind. We have symbols (representation) and
that manipulate those symbols (process). Neural networks get rid
of both. At best we have subsymbols (clusters of elements whose
operation stands for something else). Instead of a node holding
concept (as in Collins and Quillian) we have a network holding the
In fact, most of the action is the connections between the nodes.
The basic idea: Model learning using something that looks like a
brain (but not a lot like a brain). The idea is to hook up a lot
of simple processors and have learning emerge from their parallel
To get an idea, look at the real neuron and artificial neuron
The artificial neuron was based on the real neuron. The neuron
inputs, multiplies them by a weight, and sums them up. Then it
whether or not to fire. If the total input is greater than some
it fires. We can hook a bunch of these together to learn a
A. Perceptrons. A perceptron is a neuron that learns to
classify. Before we go on, let's make a concept to fit the
(we need two-valued features).
|Sweet = 1 or Not_sweet = 0
Edible = 1 or Not_Edible = 0
Edible = 1 or Not_Edible = 0
Our examples are:
For categorizing: Good_Fruit = 1, Not_Good_Fruit = 0
I've trained the perceptron below to classify the concept good_fruit
(anything with edible skin and seeds). This is its processing of
banana: It inputs (1,1,0) (for sweet, edible_seeds,
and outputs Not_Good_Fruit. How? It takes each input and
it by the weight on the link, then adds them up (1 X 0.0 + 1 X 0.25 + 0
X 0.25 = 0.25). If that sum is bigger than a threshold (0.4 in
case), then it outputs a 1, otherwise a 0. Check its
for the other examples (it should get them all correct).
Now, how does the perceptron learn to classify? Let's train it
on sweet_fruit (sweet is the only feature that needs to be on).
with all weights at 0.0, and input the first example:
It should call this a sweet_fruit, but it doesn't. So, we want
to adjust the weights to make it more likely to respond sweet_fruit
time. How? We use something called the delta rule (also
learning after Hebb, the person who invented it). The idea is to
make it a little more likely to answer correctly the next time it
the same problem.
* (overall teacher - overall output)
* node output
The learning rate is set to be slow enough to get the right answer
without skipping over it (0.25 for us). The overall teacher is
the response should have been, the overall output is what the response
was, and the node output is the output for a single node (equivalent to
its input). So, for the color, taste, and consistency features:
∆w = .25 * (1 - 0) * 1 = 0.25
∆w = .25 * (1 - 0) * 1 = 0.25
∆w = .25 * (1 - 0) * 0 = 0.00
And the perceptron after training on this example is (note, I've
put the new example in the input part):
(The weights have been adjusted by adding ∆w
to the old weights.) We're ready to try the next example.
should try each of the remaining examples. The net will classify
them all correctly once the weights reach the values (0.50, 0.25,
I suggest that you work with this until you achieve that set of
You can stop training after you've learned all of the examples, so
stop. The concept sweet_fruit is now in the connections in the
(it's learned the concept).
B. Some issues for where we are now:
1. We're still doing a lot of work to prepare the concept, but
it's more natural. For example, we could argue that perceptual
deliver the input coded into the 1's and 0's, and all we did is adjust
the weights, just like a brain does.
2. We didn't have to understand the classification ourselves
for it to learn. The net figured out how to represent the
That's the magic.
3. We still have a teacher, but you could characterize that as
the environment (as in you touch the stove and the environment tells
that was a mistake).
C. Example: Past tense of English verbs. A classic
test case because this acquisition process looks like learning a rule
memorizing exceptions. Early on, kids see a lot of irregulars and
learn them well. Then they're exposed to all of the verbs using
rule "add -ed for past tense" and they do poorly on irregulars.
they get good again. If a network can do this, it's strong
that rules aren't necessary. In fact, networks can learn this.
Plunkett and Marchman (1991) used a perceptron to learn English past
tense, but it was a lot more complicated than our perceptron above.
D. Generalize this: Perceptrons don't work on problems
that aren't linearly separable (you can draw a line between all of the
positive and negative examples). So, you need a more general
that can have internal representations that aren't connected to the
XOR. The classic problem for perceptrons (as in it's not linearly
separable, so they can't do it) is XOR (exclusive OR). You make a
truth table like this:
For categorizing: 1 = TRUE, 0 = FALSE.
Only say true when one or the other, but not both, of the input
is on. You're familiar with XOR from ordering in a
It would be odd to say "I'll have both" when offered the choice of soup
or salad. The perceptron will fail if you set it up for XOR, a
with a hidden layer will be fine. Demonstrate.
E. Back propagation.
Generalize the delta rule to overcome the problem of linear
What we need is an internal representation. So, we add a hidden
(a set of nodes between input and output). Here's a picture of a
more complicated network:
But, with an internal representation we don't know how to assess blame
when we're learning. If the network makes a mistake, is it the
from input to hidden layer or hidden layer to output? Back
gets you out of this problem. Instead of a step function (a
activation function), use a sigmoid. This is a curve which allows
you to do calculus. The mathematics allows you to assign blame at
all levels of the network. This is a bit beyond us here, but the
interested student could see me for more information.
Cognitive Psychology Notes 8
Back to Langston's Cognitive Psychology