Langston, Cognitive Psychology, Notes 11 -- Language--Structure
I.  Goals.
A.  Where we are/Themes.
B.  Syntax.
C.  Meaning.
D.  Influences.
II.  Where we are/Themes.  We've already looked at some parts of language.  When we did our ecological survey, we considered visual letter features, orthography (spelling patterns), auditory features, and some sound patterns.  When we looked at knowledge representation, we also looked at word meanings (semantic networks, some basic propositions).
Now, we're turning our attention to the real comprehension issues:  How is language understood, and what influences that understanding?
A.  We're going to break this into three basic categories:
1.  Syntax:  Word order information.  Sentences are more than arbitrary strings of words.  Some words have to go in some places, other words in other places.  Getting the order information out of the sentence may do a lot for your comprehension.
2.  Meaning:  Knowing the order is a good start, but meaning is more than that.  We'll look at propositions and mental models and other attempts to model the meaning of sentences.
3.  Influences:  A lot of factors influence the recovery of structure and meaning, we'll look at some of the big ones.
B.  As we do this, two things are going to be really important:
1.  Grammars:  A grammar is a set of elements and the rules for combining those elements.  Grammars can be used to explain almost every aspect of language comprehension.
2.  Ambiguity:  At every level, language inputs are ambiguous.  Part of what's going to drive our models of language is an approach to deal with ambiguity.  You can use two approaches:
a.  Brute force:  Try to represent every possibility.  Not good given the limited capacity of working memory.
b.  Strategies:  Much more reasonable.  Use what you know about language from past experience and apply it to current problems.  If a particular sentence structure usually has one interpretation, go with that as your first try to understand it.
III.  Syntax.  What's a grammar for word order?  Some possibilities:
A.  Word string grammars (finite state grammars):  Early attempts to model sentences treated them as a string of associations.  If you have the sentence "The boy saw the statue," "the" is the stimulus for "boy," "boy" is the stimulus for "saw,"...  If a speaker is processing language, the initial input is the stimulus for the first word, which, when spoken, becomes the stimulus for the next word, ...
These ideas get tested with "sentences" of nonsense words.  If you make people memorize "Vrom frug trag wolx pret," and then ask for associations (by a type of free association task), you get a pattern like this:
Cue Report
It looks like people have a chain of associations.  This is essentially the behaviorist approach to language.
Chomsky had a couple of things to say in response to this:
1.  Long distance dependencies are problematic.  A long distance dependency is when something later in a sentence depends on something earlier.  For example, verbs agree in number with nouns.  If I say "The dogs that walked in the grass pee on trees," I have to hold in mind that plural "dogs" take the verb form "pee" and not "pees" for five words.  Other forms of this are sentences like "If...then..." and "Either...or..."  In order to know how to close them, you have to remember how you opened them.
2.  Sentences have an underlying structure that can't be represented in a string of words.  If you have people memorize a sentence like "Pale children eat cold bread," you get an entirely different pattern of association:
Cue Report
cold or bread 
Why?  "Pale children" is not a pair of words.  It's a noun phrase.  The two words produce each other as associates because they're part of the same thing.  To get a representation of a sentence, you need to use (at least) a phrase structure grammar.  That's our next proposal.
B.  Phrase structure grammars (a.k.a. surface structure grammars):  Generate a structure called a phrase marker when parsing (analyzing the grammatical structure of the sentence).  It proceeds in the order that the words occur in the sentence, and the process is to successively group words into larger and larger units (to reflect the hierarchical structure of the sentence).  For example:

 (1)  The television shows the boring program

The phrase marker is a labeled tree diagram that illustrates the structure of the sentence.  The phrase marker is a result of the parsing.

Phrase marker
What's the grammar?  It's a series of rewrite rules.  You have a unit on the left that's rewritten into the units on the right.  For our grammar (what we need to parse the sentence above) the rules are:
P1:  S -> NP + VP
P2:  VP -> {Vi or Vt + NP}
P3:  NP -> Det (+ Adj) + N
L1:  N -> {television, program}
L2:  Det -> {the}
L3:  Vt -> {shows}
L4:  Vi -> {}
L5:  Adj -> {boring}
The rules can be used to parse and to generate.
C.  Transformational grammars:  There are some constructions you can't handle with phrase structure grammars.  For example, consider the sentences below:
 (2)  John phoned up the woman.
 (3)  John phoned the woman up.
Both sentences have the same verb ("phone-up").  But, the "up" is not always adjacent to the "phone."  This phenomenon is called particle movement.  You can't parse or generate these sentences with a simple phrase-structure grammar.  Problems like this were part of the motivation for transformational grammar's development.  Chomsky is also responsible for this insight.
What is transformational grammar?  Add some concepts to our phrase-structure grammar (both technical and philosophical):
1.  The notion of a deep structure:  Sentences have two levels of analysis:  Surface structure and deep structure.  The surface structure is the sentence that's produced.  The deep structure is an intermediate stage in the production of a sentence.  It's got the words and a basic grammar.
2.  Transformations:  You get from the deep structure to the surface structure by passing it through a set of transformations (hence the name transformational grammar).  These transformations allow you to map a deep structure onto many possible surface structures.
3.  Expand the left side of the rewrite rules:  Transformation rules can have more than one element on the left side.  This is a technical point, but without it you couldn't do transformations.
Why do we need to talk about deep structure?  It explains two otherwise difficult problems:
1.  Two sentences with the same surface structure can have very different meanings.  Consider:
 (4a and 4b)  Flying planes can be dangerous.
This can mean either "planes that are flying can be dangerous" or "the act of flying a plane can be dangerous."  If you allow this surface structure to be produced by two entirely different deep structures, it's no problem.
2.  Two sentences with very different surface structures can have the same meaning.  Consider:
 (5)  Arlene is playing the tuba.
 (6)  The tuba is being played by Arlene.
These both mean the same thing, but how can a phrase structure grammar represent that?  With a deep structure (and transformations) it's easy.
Let's get into this with Chomsky's Toy Transformational Grammar.
1.  Outline:  When producing a sentence you pass the basic thought through four stages:
Tranformational grammar stages
2.  Rules:  You have rules at each stage:
a.  Phrase structure rules:
P1:  S -> NP + VP
P2:  NP -> Det + N
P3:  VP -> Aux + V (+ NP)
P4:  Aux -> C (+ M) (+ have + en) (+ be + ing)
b.  Lexical insertion rules:
L1:  Det -> {a, an, the}
L2:  M -> {could, would, should, can, must, ...}
L3:  C -> {ø, -s (singular subject), -past (abstract past marker), -ing (progressive), -en (past participle), ...}
L4:  N -> {cookie, boy, ...}
L5:  V -> {steal, ...}
c.  Transformation rules:
T1:  Affix (C) + V -> V + Affix (affix hopping rule)
T2:  NP1 + Aux + V + NP2 -> NP2 + Aux + be + en + V + by + NP1 (active Æ passive transformation)
d.  Morpho-phonological rules:
M1:  steal -> /stil/
M2:  be -> /bi/ etc.
3.  Some examples:
a.  Produce the sentence "The boy steals the cookie."
To see these, check someone's notes.  They're too complex to produce here.
b.  Produce "The cookie is stolen by the boy."
To see these, check someone's notes.  They're too complex to produce here.
3.  Psychological evidence for transformations.  Note before we start that getting this evidence is problematic.  That said, here's some data.
a.  Early studies rewrote sentences into transformed versions.  For what follows, the base sentence is "the man was enjoying the sunshine."

Negative:  "The man was not enjoying the sunshine."
Passive:  "The sunshine was being enjoyed by the man."
Question:  "Was the man enjoying the sunshine?"
Negative + Passive:  "The sunshine was not being enjoyed by the man."
Negative + Question:  "Was the man not enjoying the sunshine?"
Passive + Question:  "Was the sunshine being enjoyed by the man?"
Negative + Passive + Question:  "Was the sunshine not being enjoyed by the man?"
I have you read lots of these and measure reading time.  The more transformations you have to undo, the longer it should take.  That happens.  Note the problem with unconfounding reading time from the number of words in the sentence.
b.  Understanding negatives.  I present you with a long list of statements to verify.  Some are positive statements, some are negative statements.  Untransforming the negative statements should cause trouble.   
D.  Wrap-up.  There's more to this story.  These grammars also have things they can't handle.  We're stopping now because you have a pretty good idea how this works.  Let's look at the other side of the coin. 
IV.  Meaning.
A.  Syntax models have some problems:
1.  Not very elegant.  The computations can be pretty complicated.
2.  Overly powerful.  Why this set of transformations?  Why not transformations to go from "The girl tied her shoe" to "Shoe by tied is the girl"?  There's no good reason to explain the particular set of transformations that people seem to use.
3.  They ignore meaning.  Chomsky's sentence "They are cooking apples" isn't ambiguous in a story about a boy asking a grocer why he's selling ugly, bruised apples.  Sentences always come in a context that can help you understand them.  So, let's yak about meaning.  See the demonstration for an example of context influencing meaning.

Demonstration:  Someone volunteer to read this passage out loud.

"Cinderella was sad because she couldn't go to the dance that night.  There were big tears in her brown dress."

How about this passage?

"The young man turned his back on the rock concert stage and looked across the resort lake.  Tomorrow was the annual fishing contest and fishermen would invade the place.  Some of the best bass guitarists in the world would come to the spot.  The usual routine of the fishing resort would be disrupted by the festivities."

Was there a problem?  What caused it?  What did you do to fix it?
B.  Semantic grammars:  The PSG and transformational grammars parse sentences with empty syntactic categories.  For example, NP doesn't mean anything, it's just a marker to hold a piece of information.  Semantic grammars are different in spirit.  The syntactic representation of a sentence is based on the meaning of the sentence.  For example, consider Fillmore's (1968) case-role grammar.  Cases and roles are the part each element of the sentence plays in conveying the meaning.  You have roles like agent and patient, and cases like location and time.  The verb is the organizing unit.  Everything else in the sentence is related to the verb.  Consider a parse of:

 (7)  Father carved the turkey at the Thanksgiving dinner table with his new carving knife.

The structure is:

 Verb frame
The nodes here have meaning.  You build these structures as you read, and the meaning is in the structure.  You can do things purely syntactic grammars can't.  For example, consider:
 (8)  John strikes me as pompous.
 (9)  I regard John as pompous.
If you analyze these two syntactically, it's hard to see the relationship between the "me" in (8) and the "I" in (9).  But, there is a relationship.  Both are the experiencer of the action.  One problem is the number of cases.
C.  We can also look at meaning using propositions.  I defined propositions last time.  When we talk about reading and look at the Kintsch model, propositions will come into play.  I'll hold off until then. 
V.  Influences.  What determines how you attach meaning to language?  The process operates at several levels.
A.  Words.
1.  Frequency:  In general, the more frequently you encounter something the easier it is to access its meaning (true of words, objects, sounds, etc.).  To compute frequency we count the number of times a word appears in a representative sample of written English (or spoken English).  This has a huge impact on speed of understanding.
2.  Morphology:  Phonemes are groups of sounds that are treated as the same sound by a language.  Changing a phoneme will change the meaning (bit to pit).  But, the phoneme has no meaning (/b/ and /p/ alone don't mean anything).  Morphemes are the meaning units.  These are words, but they're also things like plural (-s), past tense (-ed), ...  The more morphemes you have in a word you're looking up, the slower it is.  It looks like the organization is root word (dog) plus morphological rules (-s = plural; -ed = past...).
3.  Syntactic category:  Nouns are faster than verbs.
4.  Priming:  Experience with related concepts will speed access (either a previous word that's related or a sentence context that's related).  The more context you have to tell you the meaning, the more help you should have looking it up.  This context effect sometimes happens and sometimes doesn't.
5.  Ambiguity:  If it's present, access is slower.  There can be syntactic ambiguity (pardon is a verb and a noun) and semantic ambiguity (our old pal bank).
B.  Lexical access:  Getting at the meaning.  Two possibilities:
1.  Search:  Automatic process that goes through a list arranged by frequency until it gets a match.  The process is not affected by context.  After access happens, you check the recovered meaning against the context.
Support:  Bimodal listening:  Listen to a sentence about a guy looking for bugs in a conference room before an important secret meeting.  After the sentence (or during) visually present a string of letters.  The person says if the string of letters makes a word.  The trick:  The strings that make words can be related to the meaning of "bugs," "SPY" or "ANT."  The context biases towards just one meaning.  But, soon after the word "bugs," both meanings appear to be activated.
2.  Direct access:  The memory is content-addressable, meaning you focus right in on what you're looking for instead of searching.  This model is influenced by context.
Support:  Neighborhood effects.  A neighborhood is all of the orthographically similar words to a target word.  For example, the neighborhood around game is:  gave, gape, gate, came, fame, tame, lame, same, name, gale.  Neighborhoods can be big or small.  The neighborhood around film is small:  file, fill, firm.  If neighborhood size affects access, it's evidence for context effects (kind of like word superiority).  Neighborhood size does affect access.
As an aside gesture also seems to affect lexical access.  Frick-Horbury and Guttentag (1998) showed that people whose hands were restricted retrieved fewer words given the definition that people who could move their hands.  So, for example, if I give you this definition "a slender shaft of wood tipped with iron and thrown for distance in an athletic field event," you should retrieve "javelin."  Somehow, holding a rod (which restricts hand movements) interferes with retrieval.  Think about what we said about semantic memory last time.  Can those models incorporate embodiment?
C.  Sentences.  Most of the action has to do with ambiguity.  You have to break the sentence up into its components, and then you have to link those components into some kind of structure.  To do this, you have some strategies.  People have a very limited capacity working memory.  This means that any processes we propose have to fit in that capacity.  The problem of trying to do it with limited resources will be the driving force behind the strategies.  They're ways to minimize working memory load.  One way to minimize load is to make the immediacy assumption:  When people encounter ambiguity they make a decision right away.  This can cause problems if the decision is wrong, but it saves capacity in the meantime.  Consider a seemingly unambiguous sentence like:
 (10)  John bought the flower for Susan.
It could be that John's giving it to Susan, but he could also be buying it for her as a favor.  The idea is that you choose one right away.  Why?  Combinatorial explosion.  If you have just four ambiguities in a sentence with two options each, you're maintaining 16 possible parses by the end.  Your capacity is 7±2 items, you do the math.  To see what happens when you have to hold all of the information in a sentence in memory during processing, try to read:
 (11)  The plumber the doctor the nurse met called ate the cheese.
The problem is you can't decide on a structure until very late in the sentence, meaning you're holding it all in memory.  If I complicate the sentence a bit but reduce memory load, it gets more comprehensible:
 (12)  The plumber that the doctor that the nurse met called ate the cheese.
Or, make it even longer but reduce memory load even more:
 (13)  The nurse met the doctor that called the plumber that ate the cheese.
Now that we've looked at the constraints supplied by working memory capacity and the immediacy assumption, let's look at the strategies.  Note:  These are strategies.  They don't always work, and you won't always use them.  But, it's a good solution to the brute force problem.
1.  Getting the clauses (NPs, VPs, PPs, etc.).
a.  Constituent strategy:  When you hit a function word, start a new constituent.  Some examples:
Det:  Start NP.
Prep:  Start PP.
Aux:  Start VP.
b.  Content-word strategy:  Once you have a constituent going, look for content words that fit in that constituent.  An example:
Det:  Look for Adj or N to follow.
c.  Noun-verb-noun:  Overall strategy for the sentence.  First noun is agent, verb is action, second noun is patient.  Apply this model to all sentences as a first try.  Why?  It gets a lot of them correct.  So, you might as well make your first guess something that's usually right.  We know people do this because of garden-path sentences (sentences that lead you down a path to the wrong interpretation).  Example:
 (14)  The editor authors the newspaper hired liked laughed.

You want authors to be a verb, but when you find out it isn't, you have to go back and recompute.
d.  Clausal:  Make a representation of each clause, then discard the surface features.  For example:
 (15)  Now that artists are working in oil prints are rare.  (863 ms)
 (16)  Now that artists are working longer hours oil prints are rare.  (794 ms)
In 15, "oil" is not in the last clause, in 16 it is.  The access time for "oil" after reading the sentence is in parentheses.  When it's not in the current clause, it takes longer, as if you've discarded it.
2.    Once you get the clauses, how do you hook them up?  More strategies:
a.  Late closure:  The basic strategy is to attach new information under the current node.  Consider the parse for:
 (17)  Tom said Bill ate the cake yesterday.
(We'll need some new rules in our PSG to pull it off, I'm skipping those to produce final phrase markers.  Again, the markers are too complex to reproduce here.)

According to late closure, "yesterday" modifies when Bill ate the cake, not when Tom said it (is that how you interpreted the sentence?)  It could modify when Tom said it, but that would require going up a level in the tree to the main VP and modifying the decision you made about it (that it hasn't got an adverb).  That's a huge memory burden (once you've parsed the first part of the sentence, you probably threw out that part of the tree to make room).  So, late closure eases memory load by attaching where you're working without backtracking.
Evidence:  Have people read things like:
 (18)  Since J. always jogs a mile seems like a very short distance to him.
With eye-tracking equipment, you can see people slow down on "seems" to rearrange the parse because they initially attach "a mile" to jogs when they shouldn't.
b.  Minimal attachment:   Make a phrase marker with the fewest nodes.  It reduces load by minimizing the size of the trees produced.  Consider:
 (19)  Ernie kissed Marcie and Joan...
The markers are too complex to reproduce here.
The minimal attachment tree has 11 nodes vs. 13 for the other (depending on how you count).  It's also less complex.  The idea is that if you can keep the whole tree in working memory (you don't have to throw out parts to make room), then you can parse more efficiently.
Evidence:  Consider:
 (20)  The city council argued the mayor's position forcefully.
 (21)  The city council argued the mayor's position was incorrect.
In (21) minimal attachment encourages you to make the wrong tree and you have to recompute.
3.  Ambiguity.  What does ambiguity do to the processing of sentences?  Loosely speaking, it makes things harder.  (Note:  In the next unit we'll look at individual differences in dealing with ambiguity.  This has a large impact on reading skill.)  MacKay (1966) asked participants to complete sentences.  They could be ambiguous, such as:
 (22)  Although he was continually bothered by the cold... (cold)
 (23)  Although Hannibal sent troops over a week ago... (over)
 (24)  Knowing that visiting relatives could be bothersome... (visiting relatives)
The ambiguous part of each sentence is in parentheses.  Completing these ambiguous versions took longer than completing non-ambiguous counterparts.  Note that ambiguity is really a part of any language understanding task.  If it weren't for ambiguity, comprehending language would be easy to understand.

Cognitive Psychology Notes 11
Will Langston

 Back to Langston's Cognitive Psychology Page