NLP
Ambiguity in natural language
-
e.g.
-
Do you sell (Sony laptops) and (disk drives)?
-
Do you sell (Sony (laptops and disk drives))?
-
-
ambiguity is resolved by humans using real world knowledge
-
generalised NLP requires AI-complete understanding of real world knowledge
-
hence normally look for applications which on limited domains or where we cab approximate full world knowledge
Information retrieval:
-
returning a set of documents in response to a user query
-
e.g. search engines
Information extraction
-
attempting to extract structured information from a set of documents
-
e.g. extraction of instances of corporate mergers, more formally MergerBetween(company1,company2,date) from an online news sentence
Question answering
-
attempt to find a specific answer to a specific question from a set of documents
Machine translation
-
automatic translation between languages
Deep NLP applications
-
ones which require meaning representation or elaborate syntactic representation
Components of deep NLP application
-
input preprocessing (to get input into segmented text)
-
morphological analysis
-
POS tagging (optional)
-
parsing
-
disambiguation (can be done in parsing)
-
context module: maintains context information, e.g. for anaphora resolution
-
text planning: decide what meaning want to convey (not covered)
-
tactical generation: generates, can use same grammar and lexicon as parser
-
morphological generation
-
output processing
Morphology
-
morphemes
-
affixes: morphemes which can only exist in conjunction with a suffix
-
suffixes: morphemes which can exist on their own
-
circumfixes no longer exist in English, there are inflections of it in e.g. sing, sang, sung, but it is no longer productive
-
i.e. doesn't apply to new words like “ping”
-
-
-
inflectional morphology
-
concerns properties such as gender, number, aspect, case
-
setting values of a fixed number of slots with simple values
-
close to being fully productive
-
-
derivational morphology
-
e.g. un-, re-, anti-, -ee
-
no limit to number of them that can be combined
-
is relatively productive
-
e.g. -ee applies to the verb text but not snore
-
-
can change POS (plural -> pluralise, noun -> verb)
-
-
some stems and affixes can be individually ambiguous
-
un – ion – ise – ed
-
union – ise – ed
-
-
spelling rules
/
^ _ s
-
-
the mapping is shown to the left of the slash
-
the context is shown to the right of the slash
-
where
-
^ indicates the affix boundary
-
ε is the empty string
-
_ indicates where the item which is being mapped is, in the context
-
i.e. here, the ε
-
-
-
applies equally to plural form of nouns and 3rd person singular present form of verbs
-
-
stemming
-
used in IR
-
reduce morphologically complex terms to a canonical form
-
strip endings without using a lexicon
-
not necessarily the same thing as the lexical stem
-
-
Porter stemmer algorithm is commonly used
-
-
lexical informations required for full, high precision morphological processing
-
affixes, plus associated information conveyed by the affix
-
irregular forms, with associated information similar to that for affixes
-
stems with syntactic categories (plus more information if derivational morphology is to be treated as productive)
-
e.g.
ed PAST_VERB
ed PSP_VERB
s PLURAL_NOUN
...
began PAST_VERB begin
begun PSP_VERB begin
-
-
normally irregular forms of English are the same for all senses of a word (run for athlete or nose)
-
exception: the washing was hung/hanged out to dry, vs. the murdered was hanged
-
-
assume morphological analyser split into two sections, e.g. for feed
-
-
-
produces feed and fee-ed
-
cuts out fee-ed using syntactic information
-
-
overgeneration: a system which generates some output which is invalid
-
finite state transducers-
can be used for analysis and generation (two level morphology)
-
outputs “boxes” as
-
boxes
-
box^s
-
boxe^s
-
-
do not allow any internal state of the word form
-
can make an FST which maps unionised into un-ion-ise-ed, but doesn't include the order
-
needs to be (un – ((ion – ise)– ed))
-
-
-
other finite state techniques in NLP
-
grammars for simple dialogue systems (for anything of substantial complexity use CFGs) -
dialogue models for Spoken Dialogue Systems (SDSs), e.g.:
-
-
probabilistic FSAs
-
augment FSA with transition probabilities
-
Prediction and POS tagging
-
Corpora
-
corpus: body of work collected for some purpose
-
balanced corpus: contains texts which represent different genres (newspapers, fiction, cooking recipes, scientific journals, etc.)
-
-
prediction
-
given a sequence of words, want to determine what is most probable to come next
-
N-gram: predict a word based on the previous (n-1) words
-
useful for
-
speech recognition
-
cannot tell a word by its sound alone
-
-
communication aids (systems for people who cannot speak due to some disability)
-
estimating entropy in a language
-
entropy indicates difficulty of prediction problem
-
e.g. system with only “yes” and “no” is easier than with unlimited vocabulary
-
-
-
-
bigrams
-
probability of one word based on the previous, i.e.

-
Probability of a string of words is

-
(approx as probabilities are not independent)
-
-
bigram probabilities given by

-
e.g.
-
good morning
good afternoon
good afternoon
it is very good
it is good
is corpus:
<s> good morning <s> good afternoon <s> good afternoon <s> it is very good <s> it is good <s>
-
sequence
count
bigram probability
<s>
5
<s> good
3
0.6
<s> it
2
0.4
good
5
good morning
1
0.2
good afternoon
2
0.4
good <s>
2
0.4
morning
1
morning <s>
1
1
afternoon
2
afternoon <s>
2
1
it
2
it is
2
1
is
2
is very
1
0.5
is good
1
0.5
very
1
very good
1
1
-
-
this leads to things like <s> good being impossible
-
usually make some assumption about rare/unseen data
-
-
-
POS tagging
-
can apply prediction techniques to this
-
some basic tags
-
NN1 singular noun
-
NN2 plural noun
-
PNP personal pronoun
-
VM0 modal auxiliary verb
-
VVB base form of verb (except infinitive)
-
VVI infinitive form of verb
-
PUN punctuation
-
-
stochastic POS tagging
-
use previous prediction techniques, but on the tag type rather than the word
-

-
-
evaluation factors
-
training and test data must be separate
-
if there has been much hand coding the test data must be unseen by the researchers
-
-
baselines
-
ceiling
-
error analysis
-
error rate for different problems will be distributed very unevenly
-
for a particular application some errors may be of more import than others
-
-
-
Parsing and generation
-
generative grammar: formally specified grammar which generates all and only the sentences in a natural language
-
equivalence of two grammars
-
weakly equivalent if generate the same sentences
-
strongly equivalent if generate the same sentences and the same bracketing
-
-
context free grammar
-
set of non-terminal symbols
-
set of terminal symbols
-
set of productions (single non-terminal to set of non-terminals and terminals)
-
start symbol, which is a non-terminal
-
-
left and right associative grammars are weakly equivalent to relative grammars (ones implementable by FSAs)
-
lexical ambiguity: “they can fish”
-
structural ambiguity: “they fish in rivers in December”
-
can either attach the prepositional phrase “in December” to the NP “rivers” or the VP “they fish”
-
-
generation:
-
can be done with CFG
-
generates arbitrarily long sentences
-
-
chart parsing
-
the chart is a list of edges, of the form
-
[id, left_vertex, right_vertex, mother_category, daughters]
-
-
ambiguity is shown when there are multiple edges for the same span (left and right vertices)
-
algorithm
-
Parse:
Initialise the chart (i.e., clear previous results)
For each word word in the input sentence, let from be the left vertex, to be the right vertex and daughters be (word)
For each category category that is lexically associated with word
Add new edge from, to, category, daughters
Output results for all spanning edges
(i.e., ones that cover the entire input and which have a mother corresponding to the root category)
Add new edge from, to, category, daughters:
Put edge in chart: [id,from,to, category,daughters]
For each rule in the grammar of form lhs -> cat1 . . . catn−1,category
Find set of lists of contiguous edges [id1,from1,to1, cat1,daughters1] . . . [idn−1,fromn−1,from, catn−1,daughtersn−1]
(such that to1 = from2 etc)
(i.e., find all edges that match a rule)
For each list of edges
Add new edge from1, to, lhs, (id1 . . . id)
(i.e., apply the rule to the edges)
-
-
packing
-
the above runs in exponential time where there are an exponential number of parses
-
modify to run in cubic time with an exponential number of parses (exponential output)
-
modifications
-
change the daughters element to from being a list of edges to a set of list of edges
-
when about to add
-
-
-
[id, left_vertex, right_vertex, mother_category, daughters]
and there is an existing edge
[id_old, left_vertex, right_vertex, mother_category, daughters_old]
modify the old edge to give
[id_old, left_vertex, right_vertex, mother_category, daughters_old U daughters]
-
-
active parsing
-
modify the edge to
-
-
[id, left_vertex, right_vertex, mother_category, expected_category, daughters]
-
-
-
generally more efficient for parsing FS grammars
-
-
probabilistic CFGs
-
assumes probabilities of the rules and the lexical entries in a grammar are independent (incorrect, but generally effective)
-
can machine learn these probabilities from a corpora
-
use to rank parses and only return the most probable ones
-
-
-
reasons FSAs cannot be used to model NL syntax
-

-
if the recursion is finite then it can be generated by FSA, but it might need to be infinite (it is undecided)
-
-
FSA grammars are highly redundant (hard to build/maintain)
-
FSA grammars don't have internal structure, and hence we can't build up good semantic representations
-
-
CFGs can be automatically compiled into approximately equivalent FSAs by putting bounds on the recursion
Parsing with constraint based grammars
-
deficiencies of CFGs
-
exponential blow up of rules to model complexity, e.g. singular and plural
-
S -> NP-sg VP-sg
S -> NP-pl VP-pl
VP-sg -> V-sg NP-sg
VP-sg -> V-sg NP-pl
VP-pl -> V-pl NP-sg
VP-pl -> V-pl NP-pl
NP-sg -> he
NP-sg -> fish
NP-pl -> fish
-
-
sub-categorisation cannot be done
-
lexical property that, among other things, tells how many arguments a verb can have
-
e.g. “give” is ditransitive “A gave B to C” - B and C are the two arguments
-
-
distance dependencies are difficult to do
-
basic sentence: “Harry likes the witch”
-
modify to: “the witch who Harry likes GAP(NP)”
-
“the witch”, a NP, is moved to the front, and a relative pronoun is inserted
-
the gap is where the NP would normally appear, e.g.
-
which kids did you say _ were making all that noise?
-
NB, “what kid did you say _ were making all that noise” is ungrammatical
-
-
which problem did you say you don't understand _ ?
-
-
-
-
Feature structures – from the book
-
constraint based grammar – is a set of constraints which have to be unified/satisfied
-
types
-
a language is a system of linguistic entities (words, phrases, categories, sounds...)
-
a type is the class of such an entity
-
the properties employed in the type classification will be those that we wish to refer to in our description of the entities
-
|
Type |
Features/values |
IST |
|
entity |
[name: string ] [tel: number] |
|
|
organisation |
[founders: list(individual)] |
entity |
|
university |
[president: individual ] |
organisation |
|
department |
[chair: individual ] |
organisation |
|
individual |
[birthday: date ] |
entity |
-
combining FSs
-
can combine FSs if they satisfy constraints (such as types, values, etc.)
-
[tel 01494 769470 ]
is compatible with
[individual ]
[name: Austin ]
but not with
[tel 01223 123456 ]
-
(S (NP (D the) (NOM(N defendant))) (VP(V disappeared)))
-
-
phrases: S, NO, NOM, VP
-
words: D, N, V (not “the” or “defendant”)
-
-
vocabulary
-
agr = agreement
-
aux = auxiliary (verb)
-
comps = obj = complements (word coming after, like an object)
-
val = valence
-
nom = everything in a NP except the det
-
det = determiner
-
spr = subj = specifier(word coming before, like a determiner)
-
itr = intransitive
-
str = strict transitive
-
dtr = ditransitive
-
per = person
-
num = number
-
sg = singular
-
pl = plural
-
-
transitive verb: one which requires one or more (direct) objects
-
the committee named a new chairman
-
-
intransitive verb: one which takes no object
-
we would like to stay longer, but we must leave
-
-
note, aux and agr are not associated with words directly, but because they have a feature head whose value is always a FS of type pos
V = [word ]
[head verb ]
IV = [word ]
[head verb ]
[val [ val_cat ] ]
[ [comps itr ] ]
TV = [word ]
[head verb ]
[val [ val_cat ] ]
[ [comps str ] ]
DTV = [word ]
[head verb ]
[val [ val_cat ] ]
[ [comps dtr ] ]
VP = [phrase ]
[head verb ]
[val [val_cat ] ]
[ [comps itr ] ]
[ [spr - ] ]
N = [word ]
[head noun ]
NP = [phrase ]
[head noun ]
[val [val_cat ] ]
[ [comps itr ] ]
[ [spr + ] ]
NOM= [phrase ]
[head noun ]
[val [val_cat ] ]
[ [comps itr ] ]
[ [spr - ] ]
D= [word ]
[head det ]
[val [comps itr ] ]
[ [spr + ] ]
[spr - ] means needs a specifier on the left
[spr + ] means doesn't need a specifier
-
-
can now refer to IV, DTV and TV as one class: words that are in [head verb]
-
-
assume that all expressions specify values for a feature called HEAD
-
POS of a phrase depends on the POS of one particular daughter, the head daughter
-
-
agreement with features
-
agr is a feature of certain types of pos
-
hence appears in the head value
-
-
mother node hence has same agr specification as its head daughter
-
values are from {1st, 2nd, 3rd}, {sg, pl}
-
e.g.
-
[agr-cat ]
[per 3rd ]
[num sg ]
-
Grammar rules












-
Head Feature Principle (HFP)
-
in any headed phrase the head value of the mother and the head value of the head daughter (denominated by H) must be identical
-
use to factor out properties common to all headed phrases
-
e.g
-






where


Simpler version:


-
complements
-
allow comps to be a list of (complex) FSs
-
e.g. < NP, NP> for hand (someone something)
-
<> indicates an intransitive verb (requires no complements/objects)
-
-
allow comps to be alternatives
-
e.g. <NP> | <S> for deny (the jury believed the evidence/the jury believed the man was telling the truth)
-
-
head-complement rule:
-



-
-
complements vs. modifiers
-
modifiers are PPs which further define description of a situation, rather than
-
complements refer to essential participants in a situation
-
-
-
specifiers
-
allow spr to be a list of (complex) FSs
-
e.g. [SPR <[head det]>]
-
-
head specifier rule
-



-
the valence principle: unless the rule says otherwise the mother's values for the val features (spr and comps) are identical to that of the head daughter
-
.... finish later?
Feature structures (from the notes)
-
FSs can be represented as single rooted DAGs
-
features are the labels on the arcs
-
nodes are either
-
atomic (no arcs coming out) or
-
complex (some arcs coming out)
-
-
can have re-entrant nodes (i.e. more than one arc coming into it)
-
represented by [i] or {i} in the feature structures
-
-
under specified (can unify with anything) values are represented by empty square brackets []
-
simple feature structure rules
-
the verb object (head complement) rule
-

-
-
-
or alternatively represented as
-
-

-
-
the subject verb (head specifier) rule
-

-
-
-
or alternatively represented as
-
-

-
-
with a root feature structure of
-

-
unification is the tool by which this is done
-
can be done in any order, and many different algorithms exist
-
-
formally
-
properties of FSs
-
connectedness and unique root
-
unique features – all labels exiting a node must have unique names
-
no cycles
-
values – a node with no exiting arcs is a value
-
finiteness
-
paths – sequences of features (arcs)
-
-
subsumption
-
FS1 subsumes FS2 iff
-
for every path P in FS1
-
there is a path P in FS2
-
if P has an atomic value t in FS1 then P has a value t in FS2
-
-
for every pair of paths P Q in FS1 which are re-entrant they are also re-entrant in FS2
-
-
informally FS2 has more information
-
-
unification
-
unification of FS1 and FS2 is FS3 such that
-
FS1 subsumes FS3
-
FS2 subsumes FS3
-
FS3 is the most general FS that satisfies the above two conditions
-
-
-
-
head feature
-
this is the feature which passes the most important part of the feature up to the next level
-
e.g. in a VP it is the V part, as any objects (complements) or subjects (specifiers) are being attached on to the verb
-
includes the type, the agreement, etc.
-
-
-
filled
-
this indicates something which cannot unify with anything
-
is used to indicate that e.g. a specifier/subject is not required
-
-
slightly more complicated FS


-
parsing
-
the lexical entry FS's and the root FS act as constraints on the parse
-
have to make a parse which is subsumed by all the relevant constraints
-
naïve implementation
-
do as chart parsing
-
when application of a grammar rule is checked
-
all FS's in the edges in the chart that correspond to the possible daughters have to be copied
-
the grammar rule FS has to be copied
-
attempt to unify the copied FS's with the daughter positions in the copy of the rule
-
if unification succeeds the copied structure is associated with a new edge on the chart
-
-
-
“the fish in the lake which is near the town swim”
-
fish (in the lake) (which is near the town)
-
if fish has it's FS agr value set to sg in one parse attempt it then that FS cannot be used again for other parse attempts
-
hence copies have to be when decisions like this are made
-
-
-
copying can be reduced by
-
doing efficient pre-tests so unifications is only attempted when it is likely to succeed
-
sharing parts of FS's which aren't changed
-
taking advantage of linguistic locality principles which limit the need to distribute information through structures
-
-
-
templates

INTRANS_VERB
fish INTRANS_VERB
sleep INTRANS_VERB
snore INTRANS_VERB
-
-
the specific lexical entry may have additional information, which is resolved by unification
-
-
morphology
-
inflectional morphology
-

-
-
-
can do inflectional morphology simply by just unifying the stem with the affix
-
-
derivational
-
cannot do as simply since affixes can change, for example a nominal (noun) into a verbal
-
lemma to lemmatization (using -ize)
-
-
i.e. can convert feature structures into new structures
-
-
-
semantics
-
e.g. they like fish
-

-
-
lexical entries can be augmented with FS's to include predicate argument structures
-
NB, the syntactic argument positions are linked to semantic argument positions
-
e.g. “like”-
the syntactic subject corresponds to the first argument position
-
the syntactic object corresponds to the second argument position
-
-
-
semantics and syntax are not always closely related
-
e.g. “it rains” (pleonastic pronouns)
-
-
semantics can be encoded as a form of typed lambda calculus
-
first order predicate calculus
-
in order to to be expressive enough FOPC is a minimum requirement
-
especially important is quantifiers, though it does bring its own problems:
“Every man love a woman”
-
-
difficulties with this include
-
acquiring detailed domain knowledge in FOPC
-
some reasoning needs to be probabilistic
-
-
-
Generation
-
tactical generation or realisation
-
given a semantic representation and FS grammar an output string can be made which represents the semantic meaning
-
can use chart generation, which is similar to chart parsing
-
building bidirectional grammars is hard – most permit many ungrammatical strings to be accepted
-
-
strategic generation or text planning
-
generating the required semantic representation
-
Meaning postulates
-
open class words: a word class that accepts the addition of new items, through such processes as compounding, derivation, coining, borrowing, etc.
-
e.g. most English-speaking people use basically the same prepositions and pronouns as their great-grandparents, but different nouns and verbs
-
-
open class predicate: predicates that relate to open class words
-
inference rules can be used to relate open class predicates
-
e.g.
-
this doesn't hugely work, is the Pope a bachelor?

-
-
generally do with implication

-
again acquiring significant amount of knowledge in postulates is hard
Hyponymy
-
hyponym: dog is a hyponym of animal
-
hypernym: animal is a hypernym of dog
-
hyponymy is the most important relationship in NLP
-
questions/problems
-
some classes of words cannot be categorised by hyponymy
-
truth
-
adjectives
-
murder is a hyponym of kill, but not nearly so clearly
-
-
do differences in quantisation and individuation matter?
-
e.g. is chair a hyponym of furniture
-
-
is multiple inheritance allowed?
-
e.g. coin being a hyponym of both money and metal
-
-
-
can be used for
-
semantic classification for selectional constraints (e.g. the object of eat has to be edible)
-
shallow inference: X murdered Y implies X killed Y
-
word sense disambiguation
-
query expansion for information retrieval: if a search doesn't return enough results can replace an over specific term with a hypernym
-
Other lexical semantic relations
-
meronymy
-
a part of something
-
e.g. arm is a meronymy (part of) of (a) body
-
-
synonymy
-
two words with same or similar meaning
-
near synonyms convey nuances of meaning (thin, slender, skinny)
-
-
antonymy
-
opposite meaning
-
generally only relevant with respect to adjectives
-
Polysemy
-
this refers to a word having more than one sense
-
e.g. bank as a river bank and as a financial institution
-
homonymy: when the two words have completely unrelated meanings
-
often not, e.g. bank as financial institution and the bank in a casino
-
-
it is difficult to determine whether a word is general or ambiguous
-
e.g. teacher is general in terms of gender rather than ambiguous
-
Collocation
-
can be defined as
-
groups of words which occur together more often then would be expected by chance
-
or restrictions on how words can be used together
-
-
useful for word sense disambiguation
-
“striped bass are common”
-
“bass guitars are common”
-
-
striped is a good indication that the sense of bass is a fish
-
co-occurrence (slightly different)
-
trout might co-occur (in a slightly larger window) with the word bass used in a fish sense
-
Word sense disambiguation
-
used to determine which of the multiple different meanings for a word is the correct one
-
generally depends upon
-
frequency
-
collocations
-
selectional restrictions/preferences
-
-
methods to do this include
-
supervised learning
-
requires a sense tagged corpus
-
time consuming to construct systematically
-
-
unsupervised learning
-
attempt to determine the clusters of usages in texts that correspond to senses
-
this is a good idea since word sense is fuzzy
-
however there hasn't been much success
-
-
machine readable dictionaries
-
use the internal data in dictionaries
-
also useful for finding selectional preference and collocation information data
-
-
-
baseline for WSD is 'pick the most frequent' sense
-
Yarowsky's unsupervised learning approach
-
uses collocates and co-occurrences
-
a few seed collocates are chosen for each sense (by hand or from a machine readable dictionary)
-
these are used to accurately identify distinct senses
-
the sentences in which the disambiguated senses occur can then be used to learn other collocates automatically (producing a decision list)
-
the process is iterated and allows bad collocates to be overridden
-
algorithm, e.g. for plant
-
-
-
identify all occurrences of the word to disambiguated in the corpus and store their contexts (surrounding 7 words or so)
-
identify come seeds which reliable disambiguate a few of these uses, e.g.
-
-
-
plant life
-
manufacturing plant
-
-
-
train a decision list classifier on the senses (and classifiers) which already exist
tag the occurrences which are relevant with these senses
-
-
-
a decision list classified takes a set of already classified examples and returns
-
criteria which distinguish them
-
a reliability metric
-
-
(the original seeds are generally at the top in terms of reliability)
-
e.g. animal within 10 words of plant means sense A with 6.27/10 reliability
-
-
-
apply the decision list classifier to the training set and add a sense to any occurrences which are classified using the classifier with greater than a given threshold reliability
-
go to 3
-
-
-
then the classifier can be applied to unseen data
-
Context
-
rhetorical relations
-
this is an implicit relation between sentences with no syntactic or semantic differences, but they can be interpreted in different ways
-
e.g. “Max fell. John pushed him”
-
-
types of relation which are applicable here
-
explanation, i.e. “Max fell because John pushed him”
-
narration, i.e. “Max fell then John pushed him”
-
-
cue phrases
-
“because” and “and then” are cue phrases to indicate how to interpret it
-
-
-
coherence
-
discourses have to have connectivity to be coherent
-
Kim got into her car. Sandy likes apples
-
-
adding context can restore coherence
-
Kim got into her car. Sandy likes apples so Kim thought she'd go buy some
-
-
strategic generation needs to implement coherence, contrast
-
In trading yesterday: Dell was up 4.2%, Safeway was down 3.2%, HP was up 3.1%
-
Computer manufacturers gained in trading yesterday: Dell was up 4.2% and HP was up 3.1%. But retail stocks suffered: Safeway was down 3.2%
-
-
-
factors influencing discourse interpretation
-
cue phrases
-
these can completely remove the ambiguity
-
normally they don't (and can be used in VP or sentential conjunction)
-
-
punctuation, intonation and text structure
-
parenthetical (information in parenthesis) is generally explanation
-
a list is often narration, e.g. Max fell, John pushed him and Kim laughed
-
-
real world content
-
tense and aspect
-
Max fell. John had pushed him
-
Max was falling. John pushed him
-
-
-
discourse structure and summarization
-
in many relationships one phrase depends on the other, e.g. explanation
-
the main phrase is the nucleus and the subsidiary the satellite
-
-
it is possible to remove the subsidiary phrases and still have a reasonably coherent discourse
-
this is useful for doing automatic summarisation
-
we can't do this with narration as there is no subsidiary phrase
-
also genre specific (e.g. scientific paper) can be used
-
-
-
referring expressions
-
Niall Ferguson is prolific, well paid and a snappy dresser. Stephen Moss hated him – at least until he spent an hour being charmed in the historian's Oxford study.
-
the link between referring expressions is another discourse structure
-
terminology
-
referent: a real world entity to which some piece of text refers
-
referring expressions: bits of language used to perform reference
-
corefering expressions: expression which all refer to the same referent (e.g. above are “Niall Ferguson”, “him” and “the historian”)
-
-
antecedent: the text which is evoking a referent (e.g. “Niall Ferguson” is the antecedent of “him” and “the historian”)
-
anaphora: the phenomenon of referring to an antecedent (“him” and “the historian” are anaphoric because they refer to the previously introduced entity)
-
demonstratives (e.g. “this”) and pronouns are normally anaphoric
-
-
cataphora: pronouns which appear before their referent, e.g.
-
“Although she couldn't see any dogs, Kim was sure she'd heard barking”
-
-
-
entities are introduced or evoked by proper names or sometimes indefinite noun phrases (“the president of the united states”)
-
-
pronoun agreement
-
pronouns need to agree in number and gender with their antecedents
-
in cases where you could choose several (he/she/it for an animal) the choice has to be used consistently
-
-
complications
-
use of “they”
-
group nouns
-
“the team played well but now they are all very tired”
-
-
conjunctions
-
discontinuous sets
-
-
-
reflexives
-
reflexive pronouns need to be co-referential with a preceding argument of the same verb
“John cut himself shaving”
-
a non reflexive pronoun cannot be co-referential with a preceding argument of the same verb
“John cut him shaving”
-
this is called binding theory
-
-
pleonastic pronouns
-
pleonastic pronouns are semantically empty and don't refer, e.g.
“it is snowing”
-
pseudo-pleonastic pronouns
-
“they are digging up the street again”
-
doesn't refer in the normal way, but almost pleonastic
-
“they” = “the council”?
-
-
-
salience: the effects which cause particular pronoun referents to be preferred
-
recency:
-
more recent referents are preferred
-
only relatively recently referred to entities are accessible
-
-
grammatical role
-
subjects > objects > everything else
“Fred went to the mall with Bill. He bought a CD” more likely he = Fred
-
-
repeated mention
-
entities that have been mentioned (mentioning includes references which have already been disambiguated as referring to that entity) more frequently are preferred
-
-
parallelism
-
entities which share the same role as the pronoun in the same sort of sentence are preferred
“Bill went with Fred to the mall. Kim went with him to the shop”
-
-
coherence effects
-
the pronoun resolution may depend on the rhetorical/discourse relation which would be inferred if that pronoun resolution was chosen
-
“Bill likes Fred. He has a great sense of humour”
-
-
-
-
algorithm to resolve anaphora – Lappin and Leass
-
the algorithm relies on parsed text
-
the (discourse) model is
-
sets of referring NPs arranged into equivalence classes
-
each equivalence class has a global salience value
-
-
algorithm:
-
for each sentence
-
-
-
-
-
divide by two the global salience factors for each of the existing equivalence classes
-
identify referring NPs in the sentence (i.e. exclude pleonastic pronouns etc.)
-
calculate global salience factors for each NP
-
update the discourse model with the referents and their global salience scores
-
for each pronoun
-
-
-
-
-
collect potential referents (cut off is four sentences back)
-
filter referents according to binding theory and agreement constraints (i.e. remove referents that are plural when the pronoun is “it”)
-
calculate the per pronoun adjustments for each remaining referent
-
select the referent with the highest sum of
-
salience value for its equivalence class and
-
its per pronoun adjustment (salience factor), i.e. give additional weights for properties which would be true if the pronoun was assigned to this equivalence class
-
e.g. for parallelism
-
-
-
add the pronoun into the equivalence class for that referent
-
increment the salience factor by the non-duplicate salience factors pertaining to the pronoun
-
-
-
-
the weightings were determined experimentally
-
the global salience properties for which are weighted are (most important first):
-
recency
-
subject
-
objects of existential sentences, e.g. “the cat” in “there is a cat in the garden”
-
direct object
-
indirect object
-
oblique complement
-
non embedded noun
-
-
the per pronoun salience properties which are weighted are:
-
cataphora (negative score)
-
same role (as another pronoun in a similar situation, i.e. parallelism)
-
-
importance of anaphora resolution
-
e.g. in web search, with a page which uses the name “Niall Ferguson” once and then uses “he” many, many times to refer to him should have a higher score than one which only uses the name once and never refers to it again.
-
anaphora resolution can be done without the (proprietary) parser using POS tagging
-