8/20/2014
1
Thesis Defense, Aug 20th, 2014
Language Technologies Institute School of Computer Science
Carnegie Mellon University, USA
Paraphrase Pattern Acquisition by Diversifiable Bootstrapping
Carnegie Mellon
Hideki Shima
Thesis Committee: Teruko Mitamura, CMU (chair) Eric Nyberg, CMU Eduard Hovy, CMU Patrick Pantel, Microsoft Research
1
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 2
Need for capturing meaning equivalence in QA
Q. What did John Lennon die of?
John Lennon died of what
John Lennon was murdered with gunshots
in 1980 …
Templates of natural language expressions can
bridge different surface with close meaning: • X died of Y • X was murdered with Y
8/20/2014
2
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 3
Need for capturing meaning equivalence in QA
Q. What did John Lennon die of?
John Lennon died of what
John Lennon was murdered with gunshots
John Lennon's death by gunshots
John Lennon suffered a fatal gunshot wound
John Lennon fell victim to assassin's bullets
Chapman killed him with four gunshots wounds
… pumping four bullets into him, ending his life
: : :
• X died of Y • X had died from Y • X was murdered with Y • X's death by Y
• killed X with Y • X suffered a fatal Y • X fell victim to Y • pumping Y into X, ending his life
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 4
Automatic Evaluation – In Machine Translation [Kauchak & Barzilay, 2006][Padó et al., 2009]
– In Text Summarization [Zhou et al., 2006]
– In Question Answering [Ibrahim et al., 2003] [Dalmas, 2007]
Text Summarization [Lloret et al., 2008][Tatar et al., 2009]
Information Retrieval [Parapar et al., 2005][Riezler et al., 2007]
Information Extraction [Romano et al., 2006]
Question Answering [Harabagiu & Hickl, 2006][Dogdan et al., 2008]
Collocation Error Correction [Dahlmeier and Ng, 2011]
Paraphrasing is a common need in various applications
8/20/2014
3
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 5
(1) Paraphrase Recognition
(2) Paraphrase Generation
(3) Paraphrase Extraction
• die <decease, pass away, kick the bucket> • He had a lot of admiration for his job He had plenty of admiration for his job
Usage / Application Classification of Paraphrase Research
(word/phrase-level) (sentence-level) (document-level)
• Question Answering • Text Summarization • Automatic Grading • Plagiarism Detection
• Query Expansion • Reference Expansion in Automatic Evaluation
{word, phrase, sentence} -level paraphrases
• with/without variables • with/without structure
• Resource for (1) and (2) Paraphrase dictionary Sentence-aligned paraphrase
corpus
<kill, murder> {Y, N} <S1, S2> {Y, N} <D1, D2> {Y, N}
<X wrote Y, X is the writer of Y>
<writer, author>
<S1, S2>
<a lot of X, plenty of X> <X buy Y, Y sell X>
SUBJ FROM TO SUBJ
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 6
Why not using existing lexical resource (e.g. WordNet)?
Limitations:
• Lack of coverage (e.g. phrasal expression)
• Lack of context (preposition etc)
Can we rewrite patterns with knowledge for more lexical varieties? e.g., WordNet [Miller, 1995], FrameNet [Baker et al., 1998], Nomlex [Macleod et al., 1998],
VerbNet [Kipper et al., 2006]
8/20/2014
4
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 7
Why extract paraphrases? It’s because language expressions are diverse
Type Example Paraphrases of “die”
Idioms bite the dust, go west, give up the ghost, go to a better place, pay the ultimate price, buy the farm
Non-idiom phrase suffer a fatal something; fall victim to something; pumping a bullet into the heart, ending one’s life
Religious euphemism
be carried away by angels, answer God’s calling, go to heaven, reach nirvana
Euphemism by profession
(author) write one’s final chapter, (dancer) dance one’s last dance, (gambler) cashed in their chips
Slang in military go Tango Uniform, go T.U., turn one’s toes up, be KIA (killed in action), be KIFA (killed in flight accident), be DOW (died of wounds)
Slang in physician be at room temperature, be bloodless, feel no pain, lose vital signs, wear a toe tag
Slang in gangsters merc, merk, murk, snuff, smoke, bang, get a backdoor parole
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 8
absolute synonym
near-synonym
expression with high semantic relatedness
entailment / inference
metaphor
syntactic variation euphemism
neologism slang / jargon
expression with high semantic similarity
(Quasi-)Paraphrase
8/20/2014
5
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 9
Introduction
Paraphrase Extraction – Vanilla Espresso (Baseline)
– Espresso Extension (Baseline2)
– Diversifiable Bootstrapping
– Distributional Type Filtering
Paraphrase Evaluation Metric: DIMPLE
Experiment – Design
– Evaluation Results
Conclusion
Outline
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 10
– Bilingual parallel corpus [Callison-Burch, 2008, Kok and Brockett,
2010]
– Multiple translations [Barzilay & McKeown 2001] [Pang et al, 2003]
– Aligned news contents [Dolan et al., 2004][Dolan and Brockett,
2005][Quirk et al., 2004]
– Aligned definitions [Hashimoto et al., 2002]
– Huge monolingual corpora
• 150GB [Bhagat & Ravichandran, 2008]
• 4.5TB parsed corpus [Metzler & Hovy, 2011]
Paraphrase extraction source corpora
8/20/2014
6
Carnegie Mellon
Thesis Contribution (1 of 4)
Problem
– Corpus Restriction: previous works have special corpus requirement e.g. parallel corpus, terabyte-scale corpus.
• Not suitable for domain-specific paraphrase acquisition
• Costly to build
Hypothesis & Proposed Solution
– It is possible to extract paraphrase templates from an unstructured monolingual corpus given seed instances Bootstrap Paraphrase Learning
11 Thesis Defense, Aug 20th, 2014
Carnegie Mellon
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 12
monolingual plain corpus
seed instances
BOOTSTRAP LEARNING
ALGORITHM more instances
patterns
INPUT OUTPUT
ESPRESSO [Pantel & Pennacchiotti, 2006]
8/20/2014
7
Carnegie Mellon
BOOTSTRAP LEARNING
ALGORITHM
monolingual plain corpus
Bootstrapping more instances
patterns
INPUT OUTPUT
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 13
seed instances
X (killer) Y (victim)
John Wilkes Booth Mark David Chapman
Nathuram Godse Yigal Amir
John Bellingham Mohammed Bouyeri
Dan White Sirhan Sirhan
El Sayyid Nosair Mijailo Mijailovic
Abraham Lincoln John Lennon
Mahatma Gandhi Yitzhak Rabin
Spencer Perceval Theo van Gogh
Mayor George Moscone Robert F. Kennedy
Meir Kahane Anna Lindh
Carnegie Mellon
monolingual plain corpus
seed instances
Bootstrapping more instances
INPUT OUTPUT
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 14
patterns
X, the assassin of Y
assassination of Y by X
X assassinated Y
the assassination of Y by X
of X, the assassin of Y
X assassinated Y in
: : :
Unlike many other bootstrapping works the goal is acquire patterns, not instances
8/20/2014
8
Carnegie Mellon
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 15
monolingual plain corpus
seed instances
BOOTSTRAP LEARNING
ALGORITHM more instances
patterns
INPUT OUTPUT
Carnegie Mellon
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 16
Seed Instances
Sentences Extracted Patterns
Ranked Patterns
Extracted Instances
Sentences
Ranked Instances
1st iteration
. . . 2nd iteration
8/20/2014
9
Carnegie Mellon
Search sentences by instances
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 17
Extracted Patterns
Ranked Patterns
Extracted Instances
Sentences
Ranked Instances
1st iteration
. . . 2nd iteration
Sentences Seed Instances
Edwin Booth was brother of John Wilkes Booth, the assassin of Abraham Lincoln.
John Wilkes Booth, the assassin of Abraham Lincoln, was inspired by Brutus.
In 1969 Berman was part of the defense team of Sirhan Sirhan, the assassin of Robert F. Kennedy.
: : :
Carnegie Mellon
Search sentences by instances
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 18
Extracted Patterns
Ranked Patterns
Extracted Instances
Sentences
Ranked Instances
1st iteration
. . . 2nd iteration
Sentences Seed Instances
Edwin Booth was brother of X, the assassin of Y.
X, the assassin of Y, was inspired by Brutus.
In 1969 Berman was part of the defense team of X, the assassin of Y.
: : :
8/20/2014
10
Carnegie Mellon
Extract patterns from sentences
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 19
Seed Instances
Ranked Patterns
Extracted Instances
Sentences
Ranked Instances
1st iteration
. . . 2nd iteration
Extracted Patterns
Sentences
… brother of X, the assassin of Y .
X, the assassin of Y , was
…team of X, the assassin of Y .
Extracted Pattern: Longest Common Substring among retrieved sentences
Carnegie Mellon
Score and rank patterns
Sentences
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 20
Extracted Instances
Sentences
Ranked Instances
1st iteration
. . . 2nd iteration
Ranked Patterns Rank by reliability of pattern: r(p).
r(p) is based on an association measure with each instance in the corpus.
Extracted Patterns
Seed Instances
8/20/2014
11
Carnegie Mellon
Score and rank patterns
Sentences
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 21
Extracted Instances
Sentences
Ranked Instances
1st iteration
. . . 2nd iteration
Ranked Patterns
1. 0.422 X, the assassin of Y 2. 0.324 assassination of Y by X 3. 0.312 X assassinated Y 4. 0.231 the assassination of Y by X 5. 0.208 of X, the assassin of Y
: : :
Extracted Patterns
Seed Instances
Carnegie Mellon
Search sentences by pattern(s)
Sentences Extracted Patterns
Seed Instances
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 22
Extracted Instances
Ranked Instances
1st iteration
. . . 2nd iteration
Ranked Patterns
Still shot from the CCTV video footage showing Oguen Samast, the assassin of Hrant Dink.
Henry Bellingham is a descendant of John Bellingham, the assassin of Spencer Perceval.
Sentences
8/20/2014
12
Carnegie Mellon
Ranked Patterns
Extract instances from sentences
Sentences Extracted Patterns
Seed Instances
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 23
Ranked Instances
1st iteration
. . . 2nd iteration
Still shot from the CCTV video footage showing Oguen Samast, the assassin of Hrant Dink.
Henry Bellingham is a descendant of John Bellingham, the assassin of Spencer Perceval.
Sentences Extracted Instances
Carnegie Mellon
Sentences
Sentences
1st iteration
Extracted Patterns
Seed Instances
Score and rank instances
Bootstrap Instance/Pattern Learning
Thesis Defense, Aug 20th, 2014 24
. . . 2nd iteration
Ranked Patterns
Extracted Instances
Ranked Instances
Rank instances by reliability: r(i)
(similar to pattern reliability scoring)
8/20/2014
13
Carnegie Mellon
Thesis Defense, Aug 20th, 2014
Reliability: Scoring Patterns and Instances
25
Pattern reliability
Instance reliability
ESPRESSO [Pantel & Pennacchiotti, 2006]
Carnegie Mellon
Convergence: When to stop the iteration?
Thesis Defense, Aug 20th, 2014 26
Until extracting τ1 patterns
The average pattern score decreases by more than τ2 from the previous iteration
8/20/2014
14
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 27
Introduction
Paraphrase Extraction – Vanilla Espresso (Baseline)
– Espresso Extension (Baseline2)
– Diversifiable Bootstrapping
– Distributional Type Filtering
Paraphrase Evaluation Metric: DIMPLE
Experiment – Design
– Evaluation Results
Conclusion
Outline
Carnegie Mellon
Extending Vanilla Espresso
Thesis Defense, Aug 20th, 2014 28
Instance Extraction:
POS-based [Justesona & Katz, 1995] ( (Adj|Noun)+ | ((Adj|Noun) * ((Noun)(Prep))?
) (Adj|Noun) * ) Noun
Sliding window + dictionary (YAGO2[Hoffart et al., 2012] )
Instance Filtering
Pronouns
Distributional type constraint
Specific pattern filtering
General pattern filtering
Sentence-based corpus
8/20/2014
15
Carnegie Mellon
Thesis Defense, Aug 20th, 2014
Corpus preprocessing
29
punctuations & symbols play important role in a pattern
Let's index them too
Carnegie Mellon
Thesis Defense, Aug 20th, 2014
Reliability from sentence-based corpus
30
(relation: died-in)
|x, p, y| = | Liu Bei (d. 223 | = xcount( #1( Liu Bei lLPARENl d lPERIODl 223) ) = 20 |x, *, y| = xcount( #uw20( #1( Liu Bei ) #1( 223 ) ) ) = 36 |*, p, *| = | X (d. Y | = xcount( #1( lLPARENl d lPERIODl ) ) = 50347
p = “X (d. Y” i = <“Liu Bei”, “223”>
|x, p, y| calculation is the core part in both accuracy and speed.
8/20/2014
16
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 31
Introduction
Paraphrase Extraction – Vanilla Espresso (Baseline)
– Espresso Extension (Baseline2)
– Diversifiable Bootstrapping
– Distributional Type Filtering
Paraphrase Evaluation Metric: DIMPLE
Experiment – Design
– Evaluation Results
Conclusion
Outline
Carnegie Mellon
Issue: Lack of Lexical Diversity
Thesis Defense, Aug 20th, 2014 32
X, the assassin of Y
assassination of Y by X
X assassinated Y
the assassination of Y by X
of X, the assassin of Y
X assassinated Y in
Words participating in patterns are skewed!
0
0.1
0.2
0.3
0.4
0.5
1 2 3 4 5 6 7 8 9 10 Iteration
precision
recall
8/20/2014
17
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 33
Paraphrases extracted for “killed” in various approaches
[Bannard & Callison-Burch, 2005]
[Bhagat & Ravichandran, 2008]
[Pasca & Dienes, 2005]
[Metzler & Hovy, 2011]
murdered killed in used wounded
died killed , made injured
beaten that killed involved arrested
been killed killed NN people found left
are killed NN born that killed
lost killed by done were killed
were killed were wounded in injured Involved
kill and wounding seen killing
have died dead , including taken claimed
, hundreds released shot dead
Paraphrases acquired by Metzler et al., [2011]
unique
keywords
from correct
phrases
murder
die
kill
kill
dead
N/A kill
dead
Carnegie Mellon
Thesis Contribution (2 of 4)
Problem
– Lack of Lexical Diversity: preventing semantic drift too much results in extracting patterns with poor lexical diversity
Hypothesis & Proposed Solution
– Lexical diversity of acquired paraphrase can be controlled with a model of relevance-dissimilarity interpolation Diversifiable Bootstrapping
34 Thesis Defense, Aug 20th, 2014
8/20/2014
18
Carnegie Mellon
Diversifiable Bootstrapping [Shima & Mitamura, 2012]
Thesis Defense, Aug 20th, 2014 35
)()1()()(' pdiversityprpr
Original reliability score of a pattern
How is a pattern lexically different from other patterns originally
ranked higher than this?
Carnegie Mellon
Diversifiable Bootstrapping [Shima & Mitamura, 2012]
Thesis Defense, Aug 20th, 2014 36
)()1()()(' pdiversityprpr
Original reliability score of a pattern
Interpolation parameter: 10
How is a pattern lexically different from other patterns originally
ranked higher than this?
8/20/2014
19
Carnegie Mellon
How is this pattern lexically different from
other patterns originally ranked higher than this?
Diversifiable Bootstrapping [Shima & Mitamura, 2012]
Thesis Defense, Aug 20th, 2014 37
)()1()()(' pdiversityprpr
Original reliability score of a pattern
By tweaking the parameter λ, patterns to acquire can be diversifiable with a specific degree one can control.
Interpolation parameter: 10
Carnegie Mellon
Acquired Paraphrases: killed
Thesis Defense, Aug 20th, 2014 38
X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X of X, the assassin of Y X assassinated Y in X, the man who assassinated Y Y's assassin, X of Y's assassin X of the assassination of Y by X X shot and killed Y Y was assassinated by X named X assassinated Y Y was shot by X X to assassinate Y
1 (no diversification)
[Shima & Mitamura, 2012]
8/20/2014
20
Carnegie Mellon
Acquired Paraphrases: killed
Thesis Defense, Aug 20th, 2014 39
X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X of X, the assassin of Y X assassinated Y in X, the man who assassinated Y Y's assassin, X of Y's assassin X of the assassination of Y by X X shot and killed Y Y was assassinated by X named X assassinated Y Y was shot by X X to assassinate Y
X, the assassin of Y X assassinated Y assassination of Y by X Y was shot by X X, who killed Y the assassination of Y by X X assassinated Y in X tells his version of Y X shoot Y X murdered Y Y's killer, X Y, at the theatre after X Y, push X to his breaking point X to assassinate Y of X, the assassin of Y
X, the assassin of Y X, who killed Y Y was shot by X X tells his version of Y X shoot Y X murdered Y Y's killer, X Y, at the theatre after X Y, push X to his breaking point X assassinated Y assassination of Y by X X to assassinate Y X kills Y of X shooting Y X assassinated Y in
1 7.0 3.0
[Shima & Mitamura, 2012]
Carnegie Mellon
Acquired Paraphrases: died-of
Thesis Defense, Aug 20th, 2014 40
X died of Y X died of Y in X died of Y on X died of lung Y X died of lung Y in X died of lung Y on X died of Y in the X died of Y at X died of stomach Y X died of natural Y X died of breast Y in X died of a Y X died of Y in his X passed away from Y X died of a Y in
X died of Y in X died of Y X's death from Y X passed away from Y Y of X, news Y of X, a former that X was suffering from Y the suspected Y of X X to breast Y in X was diagnosed with ovarian Y X dies of Y X was dying of Y X died of lung Y X died of Y on X died of lung Y in
X died of Y in X's death from Y X passed away from Y Y of X, news Y of X, a former that X was suffering from Y the suspected Y of X X succumbed to lung Y X to breast Y in X was diagnosed with ovarian Y X dies of Y X was dying of Y X died of Y X's death from Y in X died of lung Y
1 7.0 3.0
[Shima & Mitamura, 2012]
8/20/2014
21
Carnegie Mellon
Acquired Paraphrases: was-led-by
Thesis Defense, Aug 20th, 2014 41
Y came to power in X in Y came to power in X Y to power in X Y came to power in X in the when Y came to power in X in when Y came to power in X Y took power in X Y rose to power in X after Y came to power in X Y became chancellor of X Y came to power in X and Y seized power in X Y gained power in X to power of Y in X Y's rise to power in X
Y came to power in X Y to power in X regime of Y in X Y came to power in X in Y to power in X in Y became chancellor of X the rise of Y in X X's dictator Y X's president Y Y took control of X Y, who ruled X Y's success and X's saviour Y declared that X had X's leader Y government of Y in X
Y came to power in X in regime of Y in X X's dictator Y Y became chancellor of X X's president Y the rise of Y in X X's leader Y Y, who ruled X Y took control of X government of Y in X X, led by Y quisling had visited Y in X to flee X after Y Y in X the year before X, under the leadership of Y
1 7.0 3.0
[Shima & Mitamura, 2012]
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 42
Introduction
Paraphrase Extraction – Vanilla Espresso (Baseline)
– Espresso Extension (Baseline2)
– Diversifiable Bootstrapping
– Distributional Type Filtering
Paraphrase Evaluation Metric: DIMPLE
Experiment – Design
– Evaluation Results
Conclusion
Outline
8/20/2014
22
Carnegie Mellon
Semantic Drift Problem
Thesis Defense, Aug 20th, 2014 43
“gave birth to X and Y”
<rock, roll>
“annual X and Y hall of fame”
“lexicons intended meaning shifts into another category during bootstrapping” [Curran et al., 2007]
“semantic drift often occurs when ambiguous or erroneous terms and/or patterns are introduced into and then dominate the iterative process” [McIntosh 2009]
(has-sister relation)
Carnegie Mellon
Thesis Contribution (3 of 4)
Problem
– Semantic Drift: bootstrap pattern-instance learning can easily mess up with ambiguous or erroneous item
Hypothesis & Proposed Solution
– Semantic drift risk from diversification be mitigated by distributional type constraint.
44 Thesis Defense, Aug 20th, 2014
8/20/2014
23
Carnegie Mellon
Overview: Distributional Type Constraint
Thesis Defense, Aug 20th, 2014 45
Elvis Presley heart attack
Bob Marley cancer
John Lennon shot dead
Marilyn Monroe drug overdose
X Y
Linda McCartney breast cancer
Los Alamos radiation exposure
Peter Turkel car accident
Jim Morrison 1971
X Y
Initial seed instances Extracted instance candidates
Distributional Type Extractor
weight: type frequency * Inverse corpus type frequency
44.7 physical condition
34.9 condition
34.9 illness
30.1 ill health
29.9 pathological state
29.1 state
20.9 crisis
20.8 emergency
20.4 juncture
each
0.0 entity
0.0 abstraction
0.0 attribute
2.2 pathological state
2.1 illness
2.0 malignant tumor
1.9 cancer
Vector Space Similarity Calculation
[0.0, 1.0]
Carnegie Mellon
Distributional Type Constraint: Pros and Cons
Pros: Can define soft constraint by seed instances (instead of ontological hard-constraint). – Associating with one ontology node is sometimes difficult
– Example: cause-of-death
• disease or health problem (Motor Neurone Disease; alcohol overdose; starvation)
• accident (traffic accident; lawn mower; fight; fire)
• indirect cause of death (overwork; curse; shame)
Cons: robustness – Errors of type extraction
– Coverage of words/phrases e.g. week-long series of air raid; well-aimed rifle shots
46 Thesis Defense, Aug 20th, 2014
8/20/2014
24
Carnegie Mellon
Source of types: YAGO2 DB
Type resource: YAGO2[Hoffart et al., 2012]
– 9.8 million entities from Wikipedia, GeoNames, and WordNet.
– each entity is linked with WordNet synsets
Cf. WordNet 3.0 contains 155K (nouns: 118K) words and 118K (nouns: 82K) synsets, which lacks coverage of proper nouns.
47 Thesis Defense, Aug 20th, 2014
Carnegie Mellon
Example Types
48 Thesis Defense, Aug 20th, 2014
Exhaustive set of types associated with
“heart attack” in YAGO2.
8/20/2014
25
Carnegie Mellon
Type Vector Weights
49 Thesis Defense, Aug 20th, 2014
cf: tfidf = tf * log ( D / 1 + df ).
Carnegie Mellon
Type similarity calculation
50 Thesis Defense, Aug 20th, 2014
8/20/2014
26
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 51
Introduction
Paraphrase Extraction – Vanilla Espresso (Baseline)
– Espresso Extension (Baseline2)
– Diversifiable Bootstrapping
– Distributional Type Filtering
Paraphrase Evaluation Metric: DIMPLE
Experiment – Design
– Evaluation Results
Conclusion
Outline
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 52
X died from Y
X died of Y
X who died of Y
X who died from Y
X was dying of Y
X had died of Y
X dying of Y
X died of Y
X died from Y
X was murdered with Y
X's death by Y
X suffered a fatal Y
Y killed Y
X fell victim to Y
Paraphrase Evaluation: Which set of paraphrases is better? Relation: “killed”
Are A & B really equally valuable?
A B
100% precision 100% precision
8/20/2014
27
Carnegie Mellon
Thesis Contribution (4 of 4)
Problem
– Lack of Evaluation Metric: precision or recall does not reward lexical diversity
Hypothesis & Proposed Solution
– Evaluation metric which gives reward to lexically diverse paraphrases is effective for paraphrase evaluation DIMPLE Metric (DIversity-aware Metric for Pattern Learning Experiments)
53 Thesis Defense, Aug 20th, 2014
Carnegie Mellon
Traditional metrics
54
Relation: “killed”
Expected Precision [Bannard and Callison-Burch, 2005; Callison-Burch, 2008;
Kok and Brockett, 2010; Metzler et al., 2011]
Output
“kill”
“killed, ”
“of”
“death”
“murdered”
Judge1
1
1
0
1
1
k
i ik avgk 1
1EP
Judge2
1
1
0
0
1
Judge3
1
1
0
1
1
Avg
1
1
0
2/3
1
Thesis Defense, Aug 20th, 2014
8/20/2014
28
Carnegie Mellon
Traditional metrics
55
Relation: “killed”
Expected Precision
+ Redundancy [Metzler et al., 2011]
Output
“kill”
“killed, ”
“of”
“death”
“murdered”
Judge1
0
0
0
1
1
Judge2
0
0
0
0
1
Judge3
0
0
0
1
1
Avg
0
0
0
2/3
1
k
i ik avgk 1
1EPR
Thesis Defense, Aug 20th, 2014
Carnegie Mellon
Cumulative Gain (for Information Retrieval evaluation)
56
Input: “killed”
Output
“kill”
“killed, ”
“of”
“death”
“murdered”
gain
2
1
0
2
3
k
i ik gain1
)1^2(CG
Cumulative Gain [Järvelin & Kekäläinen, 2002; Kekäläinen, 2005]
Query
doc1
doc2
doc3
doc4
doc5
relevance
fairly relevant
marginally relevant
irrelevant
fairly relevant
highly relevant
Thesis Defense, Aug 20th, 2014
8/20/2014
29
Carnegie Mellon
DIMPLE metric [Shima & Mitamura, 2011]
57
Relation: “killed”
Output
“kill”
“killed, ”
“of”
“death”
“murdered”
Q
1
1
0
2/3
1
D
2
1
1
3
3
gain
2
1
0
2
3
iii DQgain
Thesis Defense, Aug 20th, 2014
Quality Diversity
Carnegie Mellon
DIMPLE metric [Shima & Mitamura, 2011]
58
Relation: “killed”
Output
“kill”
“killed, ”
“of”
“death”
“murdered”
Q
1
1
0
2/3
1
D
2
1
1
3
3
gain
2
1
0
2
3
k
i ik gain1
)1^2(CG
Cumulative Gain [Järvelin & Kekäläinen, 2002; Kekäläinen, 2005]
Thesis Defense, Aug 20th, 2014
8/20/2014
30
Carnegie Mellon
DIMPLE metric [Shima & Mitamura, 2011]
59
Relation: “killed”
Output
“kill”
“killed, ”
“of”
“death”
“murdered”
Q
1
1
0
2/3
1
D
2
1
1
3
3
gain
2
1
0
2
3
1473013)1^2(5
15 i ik gainCG
2^gain-1
3
1
0
3
7
Thesis Defense, Aug 20th, 2014
Carnegie Mellon
DIMPLE metric [Shima & Mitamura, 2011]
60
Relation: “killed”
Output
“kill”
“killed, ”
“of”
“death”
“murdered”
Q
1
1
0
2/3
1
D
2
1
1
3
3
gain
2
1
0
2
3
4.077777
73013DIMPLE 5
k
2^gain-1
3
1
0
3
7
Thesis Defense, Aug 20th, 2014
8/20/2014
31
Carnegie Mellon
Evaluating DIMPLE
61
Relation: “killed”
Output
“kill”
“killed, ”
“of”
“death”
“murdered”
Intrinsic EP, EPR, DIMPLE
Extrinsic MSRPA, RTE, CQAE
Correlation
40 sets of paraphrases
(=10 verbs x 4 paraphrase
generation algorithms)
Thesis Defense, Aug 20th, 2014
• MSRPC: The Microsoft Research Paraphrase
Corpus [Dollan et al., 2005]
• RTE: Recognizing Textual Entailment dataset from
PASCAL/TAC RTE1-4.
• CQAE: Complex Question Answering Evaluation
from 6 past TREC QA tracks.
Carnegie Mellon
Evaluating DIMPLE: Result
62
Dataset EP EPR DIMPLE
MSRPC 0.19 0.37 *0.52
RTE 0.29 *0.38 *0.58
CQAE *0.47 *0.55 *0.70
*: statistical significance where
null-hypothesis tested: “there is no correlation”, p-value<0.01
Thesis Defense, Aug 20th, 2014
8/20/2014
32
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 63
Introduction
Paraphrase Extraction – Vanilla Espresso (Baseline)
– Espresso Extension (Baseline2)
– Diversifiable Bootstrapping
– Distributional Type Filtering
Paraphrase Evaluation Metric: DIMPLE
Experiment – Design
– Evaluation Results
Conclusion
Outline
Carnegie Mellon
List of Relations
Thesis Defense, Aug 20th, 2014 64
Seed source:
N: NELL[Carlson et al., 2010]
E: Ephyra [Schlaefer et al., 2006]
8/20/2014
33
Carnegie Mellon
Paraphrase Extractors
Thesis Defense, Aug 20th, 2014 65
Extraction Algorithm
Description Iteration Corpus
CPL Baseline 1: Coupled Pattern Learner from NELL [Carlson et al., 2010]
860th ClueWeb09 25TB 500m pages (includes Wikipedia)
VANILLA Baseline 2: Espresso[Pantel
& Pennacchiotti, 2006] w/o web 10th and/or convergence at τ2=0.01
Wikipedia 7GB 2.1m pages 50m sentences
BPL
Baseline 3: Extended Espresso; Bootstrap Paraphrase Learner (λ=1.0)
D-BPL Proposed: BPL with Diversification (λ=0.75)
Carnegie Mellon
Gold standard labels for patterns
Thesis Defense, Aug 20th, 2014 66
Label
M Matched (high certainty)
O Matched & Out-of-dictionary
I Inconclusive depends on the context (medium certainty)
R Related (no or very small certainty)
A Antonym
W Wrong
correct
incorrect
8/20/2014
34
Carnegie Mellon
Judging: M, O, R
Thesis Defense, Aug 20th, 2014 67
(M) X died of Y
(M) X passes to Y
(M) X perished in Y
(M) X succumbed to Y
(O) X fell victim to Y
(O) X was terminally ill with Y
(O) X suffered a fatal Y
(R) X was diagnosed with Y
(M) X was diagnosed with Y, and died
died-of
Carnegie Mellon
DIMPLE
Precision
Recall
Evaluation metrics on paraphrase patterns
8/20/2014
35
Carnegie Mellon
Label Distribution
Thesis Defense, Aug 20th, 2014 69
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 70
Introduction
Paraphrase Extraction – Vanilla Espresso (Baseline)
– Espresso Extension (Baseline2)
– Diversifiable Bootstrapping
– Distributional Type Filtering
Paraphrase Evaluation Metric: DIMPLE
Experiment – Design
– Evaluation Results
Conclusion
Outline
8/20/2014
36
Carnegie Mellon
Effect of diversification
Effect of type-based filtering
Outline of Experiments
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 72
Main Results (Metric: DIMPLE)
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4 5 6 7 8 9 10
DIMPLE
Iteration
D-BPL BPL VANILLA
p-values
CPL & D-BPL: 0.042
VANILLA & D-BPL: 0.023
BPL & D-BPL: 0.048
8/20/2014
37
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 73
Main Results (Metric: Precision)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10
Precision
Iteration
D-BPL BPL VANILLA
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 74
Main Results (Metric: Recall)
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10
RECALL
Iteration
D-BPL BPL VANILLA
8/20/2014
38
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 75
Overall results (11 relations; macro-avg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10
Precision
Iteration
D-BPL BPL VANILLA
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4 5 6 7 8 9 10
DIMPLE
Iteration
D-BPL BPL VANILLA
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10
RECALL
Iteration
D-BPL BPL VANILLA
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10
Num of Distinct
Keywords
Iteration
D-BPL BPL VANILLA
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 76
Effect of type-constraint
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10
Precision
Iteration
D-BPL(+)
D-BPL(-)
D-BPL
with type scoring
without type scoring
8/20/2014
39
Carnegie Mellon
Example: LEADER(X:person, Y:organization)
Thesis Defense, Aug 20th, 2014 77
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 78
Y, President of X Y, president of X Y, president of X Y, president of X Y, former president of X Y's regime in X X - Y, President president of X, Y Y's government in X Y, former president of X X's president Y X an leader Y president of X, Y Y, the president of X X an dictator Y President of X, Y Y (president of X Y to X to face trial X n president Y president Y of X Y (captain general, X X's President Y X's president, Y Y from power in X
X's president Y Y, the current president of X X, led by Y Y, the president of X Y is elected president of X banned in X during Y
Y (president of X X ian president Y invaded and annexed by X (under Y
president Y of X Y - president of X war against Y's X President Y of X Y, the former president of X Y to the presidency of X Y - former president of X Y becomes president of X Y is made premier of X X's president, Y Y, current president of X unification with Y's X
VANILLA BPL D-BPL
Example: LEADER(X:person, Y:organization) Top 15 paraphrases patterns
8/20/2014
40
Carnegie Mellon
Example: LEADER(X:person, Y:organization)
Thesis Defense, Aug 20th, 2014 79
(selected)
Carnegie Mellon
Example: person_graduated_school (X: person, Y:org)
Thesis Defense, Aug 20th, 2014 80
8/20/2014
41
Carnegie Mellon
Example: person_graduated_school (X: person, Y:org)
Thesis Defense, Aug 20th, 2014 81
Patterns by CPL/NELL (top 100)
Carnegie Mellon
Example: person_graduated_school (X: person, Y:org)
Thesis Defense, Aug 20th, 2014 82
VANILLA BPL D-BPL
High School, X attended Y X graduated from Y X graduated from Y high school, X attended Y X is a graduate of Y attended Y, where X School, X attended Y X has taught at Y X has taught at Y school, X attended Y attended Y, where X Y, where X majored
X attended Y Y, where X majored X received his undergraduate degree from Y
X graduated from Y X attended Y X joined the faculty at Y graduating from high school, X attended Y X taught at Y X studied at Y high school, X attended Y where he played
X received his undergraduate degree from Y
X was a visiting professor at Y
X attended Y where he played X studied at Y Y, where X earned Y, where X majored X graduated at Y X accepted a position at Y attended Y, where X Y, where X graduated X then went to Y X taught at Y high school, X attended Y Y, where X was a member
X has taught at Y X joined the faculty at Y X played college football for Y
X is a graduate of Y X graduated with honors from Y X is a graduate of Y
X received his undergraduate degree from Y X was graduated from Y science at Y, where X
Top 15 paraphrases patterns
8/20/2014
42
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 83
Introduction
Paraphrase Extraction – Vanilla Espresso (Baseline)
– Espresso Extension (Baseline2)
– Diversifiable Bootstrapping
– Distributional Type Filtering
Paraphrase Evaluation Metric: DIMPLE
Experiment – Design
– Evaluation Results
Conclusion
Outline
Carnegie Mellon
Summary of Contributions
Thesis Defense, Aug 20th, 2014 84
Limitations in State-of-the-art
Confirmed Hypothesis Supporting Evidence
Corpus restriction
It is possible to extract paraphrase templates from an unstructured monolingual corpus given seed instances.
BPL & D-BPL outperforms the baselines in precision, recall and number of distinct keywords.
Lack of lexical diversity
Lexical diversity of acquired paraphrase can be controlled with a model of relevance-dissimilarity interpolation.
A statistically significant difference (p < 0.05) in DIMPLE was observed between the diversifiable bootstrapping and the baselines.
Semantic drift
Semantic drift risk from diversification be mitigated by distributional type restriction.
When type-based instance filtering is enabled, precision is constantly above the baseline and does not steeply drop.
Lack of evaluation metric
Cumulative-gain style metric which gives reward to lexically diverse paraphrases is effective.
DIMPLE correlates with paraphrase recognition task performance, with a Pearson's r of +0.5 ~ +0.7 with a statistical significance (p < 0.01).
8/20/2014
43
Carnegie Mellon
Future Works Co-reference resolution
– full name : last name only : pronoun = 1 : 5.8 : 6.7 (wikipedia)
– Data-sparseness issue when calculating reliability
– generate and add sentence replacing a reference with referent (avoid double count)
Corpus-specific paraphrase extraction
– Medical, Legal, Sports etc
Robust vector representation for type-scoring
– YAGO covers 9.8M entities, but there’s still coverage issue
DIMPLE's "Q" (Quality) by O/M/I labels
Extrinsic evaluation (QA)
Feature-based trainable scorer
– Using multiple features (pos seq, context vector, type vector, dict feature)
– Optimize w.r.t. different application needs / labels
85 Thesis Defense, Aug 20th, 2014
Carnegie Mellon
Wrap up: immediate tasks to do Complete annotating all 15 relations
Calculate Inter-annotator agreement – Compare fine- vs coarse-grain ({M, O, I} vs {R, A, W})
– By Cohen's Kappa
Related works – Especially, Coupled Pattern Learner (NELL)
Analysis of Precision vs Avg Reliability
Release D-BPL code + Evaluation tool+ annotated data
86 Thesis Defense, Aug 20th, 2014
8/20/2014
44
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 87
Web-based Experiment Management Tool
Carnegie Mellon
Thesis Defense, Aug 20th, 2014 88
Web-based Experiment Management Tool
8/20/2014
45
Carnegie Mellon
Conclusion Developed a paraphrase extraction algorithm that can acquire
lexically-diverse binary-relation paraphrase templates, given a relatively small number of seed instances for a certain relation and an unstructured monolingual corpus.
– Diversification is effective: a statistically significant difference in DIMPLE was observed between the Diversifiable Bootstrapping (D-BPL) and the two baseline algorithms (D-BPL without diversification and vanilla Espresso).
– Distributional type scoring is effective: when enabled, precision drop became less steel in early iterations, suggesting semantic drift is mitigated.
89 Thesis Defense, Aug 20th, 2014
Carnegie Mellon
Questions?
Thesis Defense, Aug 20th, 2014 90
Top Related