Langston,
Psychology of Language, Notes 7 -- Syntax
I. Goals:
A. Introduction.
B. Formal grammars.
C. Parsing strategies.
II. Introduction. We've discussed decoding processes
(getting from sounds and letters to words), but we still haven't done
anything
with the stuff we've decoded. I want to emphasize a couple of
basic
themes:
A. Language things can be described by a grammar (a set of
elements
and rules for combining those elements). As an exercise, we
should
review the elements and rules we know about so far:
1. Perceiving speech.
2. Reading.
3. Meanings.
B. Language is frequently understood in spite of a lot of
ambiguity.
During comprehension, something has to be done to resolve
ambiguity.
You can take two approaches:
1. Brute force: Represent every possibility. For
lexical access, this seems to be what happens. If someone reads
"He
dug with the spade," "Shovel" and "ace" are equally activated.
The
wrong meaning is quickly suppressed. There are good reasons for
lexical
access to work this way, and it seems to be the only process that uses
brute force.
2. Immediacy assumption: Make a choice, try to go on, if
you get stuck, reevaluate. As we start to get into syntax, you'll
see why this has to happen. There are too many possibilities to
maintain
all of them.
As an exercise, let's rehearse some of the ambiguities we've seen so
far:
1. Perceiving speech.
2. Reading.
3. Meanings.
C. Now we come to the point where things get exciting.
Language usually comes at us in sentences, and that's our next level of
analysis. What is the meaning of a sentence? There are
three
parts: Syntax (the rules for grammar); Semantics (the rules for
combining
units of meaning); and Pragmatics (extra-linguistic knowledge that
helps
you interpret the content of a message). We'll address each in
turn.
To start grammar, I want you to try a couple of exercises that will
get at our basic themes:
1. Write down the meaning of "of." It should be hard
because
it represents an empty syntactic category, and not a word.
2. What is the meaning of this sentence: "The boy saw the
statue in the park with the telescope." Could the statue have the
telescope? Could the boy be in the park? This is a kind of
syntactic ambiguity.
Top
III. Formal grammars.
A. Word string grammars (finite state grammars): Early
attempts to model sentences treated them as a string of
associations.
If you have the sentence "The boy saw the statue," "the" is the
stimulus
for "boy," "boy" is the stimulus for "saw,"... If a speaker is
processing
language, the initial input is the stimulus for the first word, which,
when spoken, becomes the stimulus for the next word, ...
These ideas get tested with "sentences" of nonsense words. If
you make people memorize "Vrom frug trag wolx pret," and then ask for
associations
(by a type of free association task), you get a pattern like this:
Cue |
Report |
vrom
frug
trag |
frug
trag
wolx... |
It looks like people have a chain of associations. This is
essentially
the behaviorist approach to language.
Chomsky had three things to say in response to this:
1. Long distance dependencies are problematic. A long
distance
dependency is when something later in a sentence depends on something
earlier.
For example, verbs agree in number with nouns. If I say "The dogs
that walked in the grass pee on trees," I have to hold in mind that
plural
"dogs" take the verb form "pee" and not "pees" for five words.
Other
forms of this are sentences like "If...then..." and
"Either...or..."
In order to know how to close them, you have to remember how you opened
them.
2. Sentences have an underlying structure that can't be
represented
in a string of words. If you have people memorize a sentence like
"Pale children eat cold bread," you get an entirely different pattern
of
association:
Cue |
Report |
pale
children
eat
cold
bread |
children
pale
cold or bread
bread
cold |
Why? "Pale children" is not a pair of words. It's a noun
phrase. The two words produce each other as associates because
they're
part of the same thing. To get a representation of a sentence,
you
need to use (at least) a phrase structure grammar. That's our
next
proposal.
3. Something can be a sentence with no associations.
"Colorless
green ideas sleep furiously" still works as a sentence, even though you
probably have no association between colorless and green.
B. Phrase structure grammars (a.k.a. surface structure
grammars):
Generate a structure called a phrase marker when parsing (analyzing the
grammatical structure of the sentence). It proceeds in the order
that the words occur in the sentence, and the process is to
successively
group words into larger and larger units (to reflect the hierarchical
structure
of the sentence). For example:
(1) The television shows the boring program
The phrase marker is a labeled tree diagram that illustrates the
structure
of the sentence. The phrase marker is a result of the parsing.
What's the grammar? It's a series of rewrite rules. You
have a unit on the left that's rewritten into the units on the
right.
For our grammar (what we need to parse the sentence above) the rules
are:
P1: S -> NP VP
P2: VP -> V NP
P3: NP -> Det (Adj)* N
L1: N -> {television, program}
L2: Det -> {the}
L3: V -> {shows}
L4: Adj -> {boring}
The rules can be used to parse and to generate.
C. Transformational grammars: There are some constructions
you can't handle with phrase structure grammars. For example,
consider
the sentences below:
(2) John phoned up the woman.
(3) John phoned the woman up.
Both sentences have the same verb ("phone-up"). But, the "up"
is not always adjacent to the "phone." This phenomenon is called
particle movement. You can't parse or generate these sentences
with
a simple phrase-structure grammar. Problems like this were part
of
the motivation for transformational grammar's development.
Chomsky
is also responsible for this insight.
What is transformational grammar? Add some concepts to our
phrase-structure
grammar (both technical and philosophical):
1. The notion of a deep structure: Sentences have two
levels
of analysis: Surface structure and deep structure. The
surface
structure is the sentence that's produced. The deep structure is
an intermediate stage in the production of a sentence. It's got
the
words and a basic grammar.
2. Transformations: You get from the deep structure to
the surface structure by passing it through a set of transformations
(hence
the name transformational grammar). These transformations allow
you
to map a deep structure onto many possible surface structures.
3. Expand the left side of the rewrite rules:
Transformation
rules can have more than one element on the left side. This is a
technical point, but without it you couldn't do transformations.
Why do we need to talk about deep structure? It explains two
otherwise difficult problems:
1. Two sentences with the same surface structure can have very
different meanings. Consider:
(4a and 4b) Flying planes can be dangerous.
This can mean either "planes that are flying can be dangerous" or
"the
act of flying a plane can be dangerous." If you allow this
surface
structure to be produced by two entirely different deep structures,
it's
no problem.
2. Two sentences with very different surface structures can have
the same meaning. Consider:
(5) Arlene is playing the tuba.
(6) The tuba is being played by Arlene.
These both mean the same thing, but how can a phrase structure
grammar
represent that? With a deep structure (and transformations) it's
easy.
Let's get into this with Chomsky's (1957) Toy Transformational
Grammar.
His model has four basic stages:
Phrase structure rules allow you to construct basic trees (what we've
already seen). Then lexical insertion rules put on the
words.
That makes a deep structure. Then, you do some
transformations.
Finally, you go through a pronunciation stage and you have the surface
structure, or the final sentence.
The rules (different for each stage):
a. Phrase structure rules:
P1: S -> NP VP
P2: NP -> Det N
P3: VP -> Aux V (NP)
P4: Aux -> C (M) (have en) (be
ing)
b. Lexical insertion rules:
L1: Det -> {a, an, the}
L2: M -> {could, would, should,
can, must, ...}
L3: C -> {ø, -s (singular
subject), -past (abstract past marker), -ing (progressive), -en (past
participle),
...}
L4: N -> {cookie, boy, ...}
L5: V -> {steal, ...}
c. Transformation rules:
Obligatory
T1: Affix (C) V -> V Affix
(affix hopping rule)
Optional
T2: NP1 Aux V NP2 -> NP2 Aux be en
V by NP1 (active ->
passive transformation)
d. Morpho-phonological rules:
M1: steal Æ /stil/
M2: be Æ /bi/ etc.
How does this work? I've done an example for "The boy steals
the cookie." (Tree given in class)
Produce the sentence "The boy steals the cookie." (Tree given
in class)
Here's a harder one because it involves the passive. Produce
"The cookie is stolen by the boy." (Tree given in class)
You try "The cookie could have been being stolen by the boy" as an
exercise.
Psychological evidence for transformations. Early studies rewrote
sentences into transformed versions. For what follows, the base
sentence
is "the man was enjoying the sunshine."
Negative: "The man was not enjoying the sunshine."
Passive: "The sunshine was being enjoyed by the man."
Question: "Was the man enjoying the sunshine?"
Negative + Passive: "The sunshine was not being enjoyed by the
man."
Negative + Question: "Was the man not enjoying the sunshine?"
Passive + Question: "Was the sunshine being enjoyed by the man?"
Negative + Passive + Question: "Was the sunshine not being
enjoyed
by the man?"
I have you read lots of these and measure reading time. The more
transformations you have to undo, the longer it should take. That
happens. Note the problem with unconfounding reading time from
the
number of words in the sentence.
D. Semantic grammars: The PSG and transformational grammars
parse sentences with empty syntactic categories. For example, NP
doesn't mean anything, it's just a marker to hold a piece of
information.
These syntax models have some problems:
1. Not very elegant. The computations can be pretty
complicated.
2. Overly powerful. Why this set of transformations?
Why not transformations to go from "The girl tied her shoe" to "Shoe by
tied is the girl"? There's no good reason to explain the
particular
set of transformations that people seem to use.
3. They ignore meaning. Chomsky's sentence "They are
cooking
apples" isn't ambiguous in a story about a boy asking a grocer why he's
selling ugly, bruised apples. Sentences always come in a context
that can help you understand them.
Semantic grammars are different in spirit. The syntactic
representation
of a sentence is based on the meaning of the sentence. For
example,
consider Fillmore's (1968) case-role grammar. Cases and roles are
the parts each element of the sentence plays in conveying the
meaning.
You have roles like agent and patient, and cases like location and
time.
The verb is the organizing unit. Everything else in the sentence
is related to the verb. Consider a parse of:
(7) Father carved the turkey at the Thanksgiving dinner
table with his new carving knife.
The nodes here have meaning. You build these structures as you
read, and the meaning is in the structure. You can do things
purely
syntactic grammars can't. For example, consider:
(8) John strikes me as pompous.
(9) I regard John as pompous.
If you analyze these two syntactically, it's hard to see the
relationship
between the "me" in (8) and the "I" in (9). But, there is a
relationship.
Both are the experiencer of the action. One problem is the number
of cases.
E. Something to keep in mind: Competence vs.
performance.
People have pushed each of these to generate and comprehend language
(I've
used a modified PSG to read and write scripts for Friday the 13th
movies).
But, that's only demonstrating competence: being capable of
solving
the problem. It doesn't address performance: What people
actually
do. The question of what people actually do hasn't been
determined.
We might agree that some form of syntactic analysis has to take place,
and we might agree that these grammars can achieve that, but that isn't
saying we agree on performance.
Top
IV. Parsing strategies. When people are reading
text, how do they parse on the fly?
A. Some constraints on parsing: People have a very limited
capacity working memory. This means that any processes we propose
have to fit in that capacity. The problem of trying to do it with
limited resources will be the driving force behind the
strategies.
They're both ways to minimize working memory load. One way to
minimize
load is to make the immediacy assumption: When people encounter
ambiguity
they make a decision right away. This can cause problems if the
decision
is wrong, but it saves capacity in the meantime. Consider a
seemingly
unambiguous sentence like:
(10) John bought the flower for Susan.
It could be that John's giving it to Susan, but he could also be buying
it for her as a favor. The idea is that you choose one right
away.
Why? Combinatorial explosion. If you have just four
ambiguities
in a sentence with two options each, you're maintaining 16 possible
parses
by the end. Your capacity is 7±2 items, you do the
math.
To see what happens when you have to hold all of the information in a
sentence
in memory during processing, try to read:
(11) The plumber the doctor the nurse met called ate the
cheese.
The problem is you can't decide on a structure until very late in the
sentence, meaning you're holding it all in memory. If I
complicate
the sentence a bit but reduce memory load, it gets more comprehensible:
(12) The plumber that the doctor that the nurse met called
ate the cheese.
Or, make it even longer but reduce memory load even more:
(13) The nurse met the doctor that called the plumber that
ate the cheese.
Now that we've looked at the constraints supplied by working memory
capacity and the immediacy assumption, let's look at the strategies.
B. Parsing strategies. There are two problems. The
first is getting the clauses, the second is hooking them up.
1. Getting the clauses (NPs, VPs, PPs, etc.).
a. Constituent strategy: When you hit a function word,
start a new constituent. Some examples:
Det: Start NP.
Prep: Start PP.
Aux: Start VP.
b. Content-word strategy: Once you have a constituent
going,
look for content words that fit in that constituent. An example:
Det: Look for Adj or N to follow.
c. Noun-verb-noun: Overall strategy for the sentence.
First noun is agent, verb is action, second noun is patient.
Apply
this model to all sentences as a first try. Why? It gets a
lot of them correct. So, you might as well make your first guess
something that's usually right. We know people do this because of
garden-path sentences (sentences that lead you down a path to the wrong
interpretation). Example:
(14) The editor authors the newspaper hired liked
laughed.
You want authors to be a verb, but when you find out it isn't, you
have to go back and recompute.
d. Clausal: Make a representation of each clause, then
discard the surface features. Evidence:
(15) Now that artists are working in oil prints are
rare.
(863 ms)
vs.
(16) Now that artists are working longer hours oil prints
are rare. (794 ms)
In 15, "oil" is not in the last clause, in 16 it is. The access
time for "oil" after reading the sentence is in parentheses. When
it's not in the current clause, it takes longer, as if you've discarded
it.
2. Once you get the clauses, how do you hook them
up? More strategies:
a. Late closure: The basic strategy is to attach new
information
under the current node. Consider the parse for:
(17) Tom said Bill ate the cake yesterday.
(We'll need some new rules in our PSG to pull it off, I'm skipping
those to produce final phrase markers.)
Late Closure and Not Late Closure trees go here.
According to late closure, "yesterday" modifies when Bill ate the cake,
not when Tom said it (is that how you interpreted the sentence?)
It could modify when Tom said it, but that would require going up a
level
in the tree to the main VP and modifying the decision you made about it
(that it hasn't got an adverb). That's a huge memory burden (once
you've parsed the first part of the sentence, you probably threw out
that
part of the tree to make room). So, late closure eases memory
load
by attaching where you're working without backtracking.
Evidence: Have people read things like:
(18) Since J. always jogs a mile seems like a very short
distance to him.
With eye-tracking equipment, you can see people slow down on "seems"
to rearrange the parse because they initially attach "a mile" to jogs
when
they shouldn't.
b. Minimal attachment: Make a phrase marker with
the fewest nodes. It reduces load by minimizing the size of the
trees
produced. Consider:
(19) Ernie kissed Marcie and Joan...
Minimal Attachment and Not Minimal Attachment trees go here.
The minimal attachment tree has 11 nodes vs. 13 for the other.
It's also less complex. The idea is that if you can keep the
whole
tree in working memory (you don't have to throw out parts to make
room),
then you can parse more efficiently.
Evidence: Consider:
(20) The city council argued the mayor's position
forcefully.
(21) The city council argued the mayor's position was
incorrect.
In (21) minimal attachment encourages you to make the wrong tree and
you have to recompute.
3. Note that these are strategies. Both help you meet the
goal of keeping your burden as small as possible. It doesn't mean
this is all you do or that these necessarily compete during processing.
Top
Psychology of Language Notes 7
Will Langston
Back to Langston's Psychology of Language
Page