slides.pdf · Nathuram Godse Yigal Amir John Bellingham Mohammed Bouyeri Dan White Sirhan Sirhan El Sayyid Nosair

Transcript

Page 1: Carnegie Mellon Paraphrase Pattern Acquisition by ...hideki/thesis/slides.pdf · Nathuram Godse Yigal Amir John Bellingham Mohammed Bouyeri Dan White Sirhan Sirhan El Sayyid Nosair

8/20/2014

Thesis Defense, Aug 20th, 2014

Language Technologies Institute School of Computer Science

Carnegie Mellon University, USA

Paraphrase Pattern Acquisition by Diversifiable Bootstrapping

Carnegie Mellon

Hideki Shima

Thesis Committee: Teruko Mitamura, CMU (chair) Eric Nyberg, CMU Eduard Hovy, CMU Patrick Pantel, Microsoft Research

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 2

Need for capturing meaning equivalence in QA

Q. What did John Lennon die of?

John Lennon died of what

John Lennon was murdered with gunshots

in 1980 …

Templates of natural language expressions can

bridge different surface with close meaning: • X died of Y • X was murdered with Y

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 3

Need for capturing meaning equivalence in QA

Q. What did John Lennon die of?

John Lennon died of what

John Lennon was murdered with gunshots

John Lennon's death by gunshots

John Lennon suffered a fatal gunshot wound

John Lennon fell victim to assassin's bullets

Chapman killed him with four gunshots wounds

… pumping four bullets into him, ending his life

: : :

• X died of Y • X had died from Y • X was murdered with Y • X's death by Y

• killed X with Y • X suffered a fatal Y • X fell victim to Y • pumping Y into X, ending his life

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 4

Automatic Evaluation – In Machine Translation [Kauchak & Barzilay, 2006][Padó et al., 2009]

– In Text Summarization [Zhou et al., 2006]

– In Question Answering [Ibrahim et al., 2003] [Dalmas, 2007]

Text Summarization [Lloret et al., 2008][Tatar et al., 2009]

Information Retrieval [Parapar et al., 2005][Riezler et al., 2007]

Information Extraction [Romano et al., 2006]

Question Answering [Harabagiu & Hickl, 2006][Dogdan et al., 2008]

Collocation Error Correction [Dahlmeier and Ng, 2011]

Paraphrasing is a common need in various applications

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 5

(1) Paraphrase Recognition

(2) Paraphrase Generation

(3) Paraphrase Extraction

• die <decease, pass away, kick the bucket> • He had a lot of admiration for his job He had plenty of admiration for his job

Usage / Application Classification of Paraphrase Research

(word/phrase-level) (sentence-level) (document-level)

• Question Answering • Text Summarization • Automatic Grading • Plagiarism Detection

• Query Expansion • Reference Expansion in Automatic Evaluation

{word, phrase, sentence} -level paraphrases

• with/without variables • with/without structure

• Resource for (1) and (2) Paraphrase dictionary Sentence-aligned paraphrase

corpus

<kill, murder> {Y, N} <S1, S2> {Y, N} <D1, D2> {Y, N}

<writer, author>

<S1, S2>

SUBJ FROM TO SUBJ

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 6

Why not using existing lexical resource (e.g. WordNet)?

Limitations:

• Lack of coverage (e.g. phrasal expression)

• Lack of context (preposition etc)

Can we rewrite patterns with knowledge for more lexical varieties? e.g., WordNet [Miller, 1995], FrameNet [Baker et al., 1998], Nomlex [Macleod et al., 1998],

VerbNet [Kipper et al., 2006]

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 7

Why extract paraphrases? It’s because language expressions are diverse

Type Example Paraphrases of “die”

Idioms bite the dust, go west, give up the ghost, go to a better place, pay the ultimate price, buy the farm

Non-idiom phrase suffer a fatal something; fall victim to something; pumping a bullet into the heart, ending one’s life

Religious euphemism

be carried away by angels, answer God’s calling, go to heaven, reach nirvana

Euphemism by profession

(author) write one’s final chapter, (dancer) dance one’s last dance, (gambler) cashed in their chips

Slang in military go Tango Uniform, go T.U., turn one’s toes up, be KIA (killed in action), be KIFA (killed in flight accident), be DOW (died of wounds)

Slang in physician be at room temperature, be bloodless, feel no pain, lose vital signs, wear a toe tag

Slang in gangsters merc, merk, murk, snuff, smoke, bang, get a backdoor parole

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 8

absolute synonym

near-synonym

expression with high semantic relatedness

entailment / inference

metaphor

syntactic variation euphemism

neologism slang / jargon

expression with high semantic similarity

(Quasi-)Paraphrase

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 9

Introduction

Paraphrase Extraction – Vanilla Espresso (Baseline)

– Espresso Extension (Baseline2)

– Diversifiable Bootstrapping

– Distributional Type Filtering

Paraphrase Evaluation Metric: DIMPLE

Experiment – Design

– Evaluation Results

Conclusion

Outline

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 10

– Bilingual parallel corpus [Callison-Burch, 2008, Kok and Brockett,

2010]

– Multiple translations [Barzilay & McKeown 2001] [Pang et al, 2003]

– Aligned news contents [Dolan et al., 2004][Dolan and Brockett,

2005][Quirk et al., 2004]

– Aligned definitions [Hashimoto et al., 2002]

– Huge monolingual corpora

• 150GB [Bhagat & Ravichandran, 2008]

• 4.5TB parsed corpus [Metzler & Hovy, 2011]

Paraphrase extraction source corpora

8/20/2014

Carnegie Mellon

Thesis Contribution (1 of 4)

Problem

– Corpus Restriction: previous works have special corpus requirement e.g. parallel corpus, terabyte-scale corpus.

• Not suitable for domain-specific paraphrase acquisition

• Costly to build

Hypothesis & Proposed Solution

– It is possible to extract paraphrase templates from an unstructured monolingual corpus given seed instances Bootstrap Paraphrase Learning

11 Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 12

monolingual plain corpus

seed instances

BOOTSTRAP LEARNING

ALGORITHM more instances

patterns

INPUT OUTPUT

ESPRESSO [Pantel & Pennacchiotti, 2006]

8/20/2014

Carnegie Mellon

BOOTSTRAP LEARNING

ALGORITHM

monolingual plain corpus

Bootstrapping more instances

patterns

INPUT OUTPUT

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 13

seed instances

X (killer) Y (victim)

John Wilkes Booth Mark David Chapman

Nathuram Godse Yigal Amir

John Bellingham Mohammed Bouyeri

Dan White Sirhan Sirhan

El Sayyid Nosair Mijailo Mijailovic

Abraham Lincoln John Lennon

Mahatma Gandhi Yitzhak Rabin

Spencer Perceval Theo van Gogh

Mayor George Moscone Robert F. Kennedy

Meir Kahane Anna Lindh

Carnegie Mellon

monolingual plain corpus

seed instances

Bootstrapping more instances

INPUT OUTPUT

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 14

patterns

X, the assassin of Y

assassination of Y by X

X assassinated Y

the assassination of Y by X

of X, the assassin of Y

X assassinated Y in

: : :

Unlike many other bootstrapping works the goal is acquire patterns, not instances

8/20/2014

Carnegie Mellon

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 15

monolingual plain corpus

seed instances

BOOTSTRAP LEARNING

ALGORITHM more instances

patterns

INPUT OUTPUT

Carnegie Mellon

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 16

Seed Instances

Sentences Extracted Patterns

Ranked Patterns

Extracted Instances

Sentences

Ranked Instances

1st iteration

. . . 2nd iteration

8/20/2014

Carnegie Mellon

Search sentences by instances

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 17

Extracted Patterns

Ranked Patterns

Extracted Instances

Sentences

Ranked Instances

1st iteration

. . . 2nd iteration

Sentences Seed Instances

Edwin Booth was brother of John Wilkes Booth, the assassin of Abraham Lincoln.

John Wilkes Booth, the assassin of Abraham Lincoln, was inspired by Brutus.

In 1969 Berman was part of the defense team of Sirhan Sirhan, the assassin of Robert F. Kennedy.

: : :

Carnegie Mellon

Search sentences by instances

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 18

Extracted Patterns

Ranked Patterns

Extracted Instances

Sentences

Ranked Instances

1st iteration

. . . 2nd iteration

Sentences Seed Instances

Edwin Booth was brother of X, the assassin of Y.

X, the assassin of Y, was inspired by Brutus.

In 1969 Berman was part of the defense team of X, the assassin of Y.

: : :

8/20/2014

Carnegie Mellon

Extract patterns from sentences

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 19

Seed Instances

Ranked Patterns

Extracted Instances

Sentences

Ranked Instances

1st iteration

. . . 2nd iteration

Extracted Patterns

Sentences

… brother of X, the assassin of Y .

X, the assassin of Y , was

…team of X, the assassin of Y .

Extracted Pattern: Longest Common Substring among retrieved sentences

Carnegie Mellon

Score and rank patterns

Sentences

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 20

Extracted Instances

Sentences

Ranked Instances

1st iteration

. . . 2nd iteration

Ranked Patterns Rank by reliability of pattern: r(p).

r(p) is based on an association measure with each instance in the corpus.

Extracted Patterns

Seed Instances

8/20/2014

Carnegie Mellon

Score and rank patterns

Sentences

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 21

Extracted Instances

Sentences

Ranked Instances

1st iteration

. . . 2nd iteration

Ranked Patterns

1. 0.422 X, the assassin of Y 2. 0.324 assassination of Y by X 3. 0.312 X assassinated Y 4. 0.231 the assassination of Y by X 5. 0.208 of X, the assassin of Y

: : :

Extracted Patterns

Seed Instances

Carnegie Mellon

Search sentences by pattern(s)

Sentences Extracted Patterns

Seed Instances

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 22

Extracted Instances

Ranked Instances

1st iteration

. . . 2nd iteration

Ranked Patterns

Still shot from the CCTV video footage showing Oguen Samast, the assassin of Hrant Dink.

Henry Bellingham is a descendant of John Bellingham, the assassin of Spencer Perceval.

Sentences

8/20/2014

Carnegie Mellon

Ranked Patterns

Extract instances from sentences

Sentences Extracted Patterns

Seed Instances

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 23

Ranked Instances

1st iteration

. . . 2nd iteration

Still shot from the CCTV video footage showing Oguen Samast, the assassin of Hrant Dink.

Henry Bellingham is a descendant of John Bellingham, the assassin of Spencer Perceval.

Sentences Extracted Instances

Carnegie Mellon

Sentences

1st iteration

Extracted Patterns

Seed Instances

Score and rank instances

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 24

. . . 2nd iteration

Ranked Patterns

Extracted Instances

Ranked Instances

Rank instances by reliability: r(i)

(similar to pattern reliability scoring)

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014

Reliability: Scoring Patterns and Instances

Pattern reliability

Instance reliability

ESPRESSO [Pantel & Pennacchiotti, 2006]

Carnegie Mellon

Convergence: When to stop the iteration?

Thesis Defense, Aug 20th, 2014 26

Until extracting τ1 patterns

The average pattern score decreases by more than τ2 from the previous iteration

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 27

Introduction

Paraphrase Extraction – Vanilla Espresso (Baseline)

– Espresso Extension (Baseline2)

– Diversifiable Bootstrapping

– Distributional Type Filtering

Paraphrase Evaluation Metric: DIMPLE

Experiment – Design

– Evaluation Results

Conclusion

Outline

Carnegie Mellon

Extending Vanilla Espresso

Thesis Defense, Aug 20th, 2014 28

Instance Extraction:

POS-based [Justesona & Katz, 1995] ( (Adj|Noun)+ | ((Adj|Noun) * ((Noun)(Prep))?

) (Adj|Noun) * ) Noun

Sliding window + dictionary (YAGO2[Hoffart et al., 2012] )

Instance Filtering

Pronouns

Distributional type constraint

Specific pattern filtering

General pattern filtering

Sentence-based corpus

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014

Corpus preprocessing

punctuations & symbols play important role in a pattern

Let's index them too

Carnegie Mellon

Thesis Defense, Aug 20th, 2014

Reliability from sentence-based corpus

(relation: died-in)

|x, p, y| = | Liu Bei (d. 223 | = xcount( #1( Liu Bei lLPARENl d lPERIODl 223) ) = 20 |x, *, y| = xcount( #uw20( #1( Liu Bei ) #1( 223 ) ) ) = 36 |*, p, *| = | X (d. Y | = xcount( #1( lLPARENl d lPERIODl ) ) = 50347

p = “X (d. Y” i = <“Liu Bei”, “223”>

|x, p, y| calculation is the core part in both accuracy and speed.

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 31

Introduction

Paraphrase Extraction – Vanilla Espresso (Baseline)

– Espresso Extension (Baseline2)

– Diversifiable Bootstrapping

– Distributional Type Filtering

Paraphrase Evaluation Metric: DIMPLE

Experiment – Design

– Evaluation Results

Conclusion

Outline

Carnegie Mellon

Issue: Lack of Lexical Diversity

Thesis Defense, Aug 20th, 2014 32

X, the assassin of Y

assassination of Y by X

X assassinated Y

the assassination of Y by X

of X, the assassin of Y

X assassinated Y in

Words participating in patterns are skewed!

0.1

0.2

0.3

0.4

0.5

1 2 3 4 5 6 7 8 9 10 Iteration

precision

recall

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 33

Paraphrases extracted for “killed” in various approaches

[Bannard & Callison-Burch, 2005]

[Bhagat & Ravichandran, 2008]

[Pasca & Dienes, 2005]

[Metzler & Hovy, 2011]

murdered killed in used wounded

died killed , made injured

beaten that killed involved arrested

been killed killed NN people found left

are killed NN born that killed

lost killed by done were killed

were killed were wounded in injured Involved

kill and wounding seen killing

have died dead , including taken claimed

, hundreds released shot dead

Paraphrases acquired by Metzler et al., [2011]

unique

keywords

from correct

phrases

murder

die

kill

dead

N/A kill

dead

Carnegie Mellon

Thesis Contribution (2 of 4)

Problem

– Lack of Lexical Diversity: preventing semantic drift too much results in extracting patterns with poor lexical diversity

Hypothesis & Proposed Solution

– Lexical diversity of acquired paraphrase can be controlled with a model of relevance-dissimilarity interpolation Diversifiable Bootstrapping

34 Thesis Defense, Aug 20th, 2014

8/20/2014

Carnegie Mellon

Diversifiable Bootstrapping [Shima & Mitamura, 2012]

Thesis Defense, Aug 20th, 2014 35

)()1()()(' pdiversityprpr

Original reliability score of a pattern

How is a pattern lexically different from other patterns originally

ranked higher than this?

Carnegie Mellon

Diversifiable Bootstrapping [Shima & Mitamura, 2012]

Thesis Defense, Aug 20th, 2014 36

)()1()()(' pdiversityprpr

Original reliability score of a pattern

Interpolation parameter: 10

How is a pattern lexically different from other patterns originally

ranked higher than this?

8/20/2014

Carnegie Mellon

How is this pattern lexically different from

other patterns originally ranked higher than this?

Diversifiable Bootstrapping [Shima & Mitamura, 2012]

Thesis Defense, Aug 20th, 2014 37

)()1()()(' pdiversityprpr

Original reliability score of a pattern

By tweaking the parameter λ, patterns to acquire can be diversifiable with a specific degree one can control.

Interpolation parameter: 10

Carnegie Mellon

Acquired Paraphrases: killed

Thesis Defense, Aug 20th, 2014 38

X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X of X, the assassin of Y X assassinated Y in X, the man who assassinated Y Y's assassin, X of Y's assassin X of the assassination of Y by X X shot and killed Y Y was assassinated by X named X assassinated Y Y was shot by X X to assassinate Y

1 (no diversification)

[Shima & Mitamura, 2012]

8/20/2014

Carnegie Mellon

Acquired Paraphrases: killed

Thesis Defense, Aug 20th, 2014 39

X, the assassin of Y X assassinated Y assassination of Y by X Y was shot by X X, who killed Y the assassination of Y by X X assassinated Y in X tells his version of Y X shoot Y X murdered Y Y's killer, X Y, at the theatre after X Y, push X to his breaking point X to assassinate Y of X, the assassin of Y

X, the assassin of Y X, who killed Y Y was shot by X X tells his version of Y X shoot Y X murdered Y Y's killer, X Y, at the theatre after X Y, push X to his breaking point X assassinated Y assassination of Y by X X to assassinate Y X kills Y of X shooting Y X assassinated Y in

1 7.0 3.0

[Shima & Mitamura, 2012]

Carnegie Mellon

Acquired Paraphrases: died-of

Thesis Defense, Aug 20th, 2014 40

X died of Y X died of Y in X died of Y on X died of lung Y X died of lung Y in X died of lung Y on X died of Y in the X died of Y at X died of stomach Y X died of natural Y X died of breast Y in X died of a Y X died of Y in his X passed away from Y X died of a Y in

X died of Y in X died of Y X's death from Y X passed away from Y Y of X, news Y of X, a former that X was suffering from Y the suspected Y of X X to breast Y in X was diagnosed with ovarian Y X dies of Y X was dying of Y X died of lung Y X died of Y on X died of lung Y in

X died of Y in X's death from Y X passed away from Y Y of X, news Y of X, a former that X was suffering from Y the suspected Y of X X succumbed to lung Y X to breast Y in X was diagnosed with ovarian Y X dies of Y X was dying of Y X died of Y X's death from Y in X died of lung Y

1 7.0 3.0

[Shima & Mitamura, 2012]

8/20/2014

Carnegie Mellon

Acquired Paraphrases: was-led-by

Thesis Defense, Aug 20th, 2014 41

Y came to power in X in Y came to power in X Y to power in X Y came to power in X in the when Y came to power in X in when Y came to power in X Y took power in X Y rose to power in X after Y came to power in X Y became chancellor of X Y came to power in X and Y seized power in X Y gained power in X to power of Y in X Y's rise to power in X

Y came to power in X Y to power in X regime of Y in X Y came to power in X in Y to power in X in Y became chancellor of X the rise of Y in X X's dictator Y X's president Y Y took control of X Y, who ruled X Y's success and X's saviour Y declared that X had X's leader Y government of Y in X

Y came to power in X in regime of Y in X X's dictator Y Y became chancellor of X X's president Y the rise of Y in X X's leader Y Y, who ruled X Y took control of X government of Y in X X, led by Y quisling had visited Y in X to flee X after Y Y in X the year before X, under the leadership of Y

1 7.0 3.0

[Shima & Mitamura, 2012]

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 42

Introduction

Paraphrase Extraction – Vanilla Espresso (Baseline)

– Espresso Extension (Baseline2)

– Diversifiable Bootstrapping

– Distributional Type Filtering

Paraphrase Evaluation Metric: DIMPLE

Experiment – Design

– Evaluation Results

Conclusion

Outline

8/20/2014

Carnegie Mellon

Semantic Drift Problem

Thesis Defense, Aug 20th, 2014 43

“gave birth to X and Y”

<rock, roll>

“annual X and Y hall of fame”

“lexicons intended meaning shifts into another category during bootstrapping” [Curran et al., 2007]

“semantic drift often occurs when ambiguous or erroneous terms and/or patterns are introduced into and then dominate the iterative process” [McIntosh 2009]

(has-sister relation)

Carnegie Mellon

Thesis Contribution (3 of 4)

Problem

– Semantic Drift: bootstrap pattern-instance learning can easily mess up with ambiguous or erroneous item

Hypothesis & Proposed Solution

– Semantic drift risk from diversification be mitigated by distributional type constraint.

44 Thesis Defense, Aug 20th, 2014

8/20/2014

Carnegie Mellon

Overview: Distributional Type Constraint

Thesis Defense, Aug 20th, 2014 45

Elvis Presley heart attack

Bob Marley cancer

John Lennon shot dead

Marilyn Monroe drug overdose

X Y

Linda McCartney breast cancer

Los Alamos radiation exposure

Peter Turkel car accident

Jim Morrison 1971

X Y

Initial seed instances Extracted instance candidates

Distributional Type Extractor

weight: type frequency * Inverse corpus type frequency

44.7 physical condition

34.9 condition

34.9 illness

30.1 ill health

29.9 pathological state

29.1 state

20.9 crisis

20.8 emergency

20.4 juncture

each

0.0 entity

0.0 abstraction

0.0 attribute

2.2 pathological state

2.1 illness

2.0 malignant tumor

1.9 cancer

Vector Space Similarity Calculation

[0.0, 1.0]

Carnegie Mellon

Distributional Type Constraint: Pros and Cons

Pros: Can define soft constraint by seed instances (instead of ontological hard-constraint). – Associating with one ontology node is sometimes difficult

– Example: cause-of-death

• disease or health problem (Motor Neurone Disease; alcohol overdose; starvation)

• accident (traffic accident; lawn mower; fight; fire)

• indirect cause of death (overwork; curse; shame)

Cons: robustness – Errors of type extraction

– Coverage of words/phrases e.g. week-long series of air raid; well-aimed rifle shots

46 Thesis Defense, Aug 20th, 2014

8/20/2014

Carnegie Mellon

Source of types: YAGO2 DB

Type resource: YAGO2[Hoffart et al., 2012]

– 9.8 million entities from Wikipedia, GeoNames, and WordNet.

– each entity is linked with WordNet synsets

Cf. WordNet 3.0 contains 155K (nouns: 118K) words and 118K (nouns: 82K) synsets, which lacks coverage of proper nouns.

47 Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Example Types

48 Thesis Defense, Aug 20th, 2014

Exhaustive set of types associated with

“heart attack” in YAGO2.

8/20/2014

Carnegie Mellon

Type Vector Weights

49 Thesis Defense, Aug 20th, 2014

cf: tfidf = tf * log ( D / 1 + df ).

Carnegie Mellon

Type similarity calculation

50 Thesis Defense, Aug 20th, 2014

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 51

Introduction

Paraphrase Extraction – Vanilla Espresso (Baseline)

– Espresso Extension (Baseline2)

– Diversifiable Bootstrapping

– Distributional Type Filtering

Paraphrase Evaluation Metric: DIMPLE

Experiment – Design

– Evaluation Results

Conclusion

Outline

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 52

X died from Y

X died of Y

X who died of Y

X who died from Y

X was dying of Y

X had died of Y

X dying of Y

X died of Y

X died from Y

X was murdered with Y

X's death by Y

X suffered a fatal Y

Y killed Y

X fell victim to Y

Paraphrase Evaluation: Which set of paraphrases is better? Relation: “killed”

Are A & B really equally valuable?

A B

100% precision 100% precision

8/20/2014

Carnegie Mellon

Thesis Contribution (4 of 4)

Problem

– Lack of Evaluation Metric: precision or recall does not reward lexical diversity

Hypothesis & Proposed Solution

– Evaluation metric which gives reward to lexically diverse paraphrases is effective for paraphrase evaluation DIMPLE Metric (DIversity-aware Metric for Pattern Learning Experiments)

53 Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Traditional metrics

Relation: “killed”

Expected Precision [Bannard and Callison-Burch, 2005; Callison-Burch, 2008;

Kok and Brockett, 2010; Metzler et al., 2011]

Output

“kill”

“killed, ”

“of”

“death”

“murdered”

Judge1

i ik avgk 1

1EP

Judge2

Judge3

Avg

2/3

Thesis Defense, Aug 20th, 2014

8/20/2014

Carnegie Mellon

Traditional metrics

Relation: “killed”

Expected Precision

+ Redundancy [Metzler et al., 2011]

Output

“kill”

“killed, ”

“of”

“death”

“murdered”

Judge1

Judge2

Judge3

Avg

2/3

i ik avgk 1

1EPR

Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Cumulative Gain (for Information Retrieval evaluation)

Input: “killed”

Output

“kill”

“killed, ”

“of”

“death”

“murdered”

gain

i ik gain1

)1^2(CG

Cumulative Gain [Järvelin & Kekäläinen, 2002; Kekäläinen, 2005]

Query

doc1

doc2

doc3

doc4

doc5

relevance

fairly relevant

marginally relevant

irrelevant

fairly relevant

highly relevant

Thesis Defense, Aug 20th, 2014

8/20/2014

Carnegie Mellon

DIMPLE metric [Shima & Mitamura, 2011]

Relation: “killed”

Output

“kill”

“killed, ”

“of”

“death”

“murdered”

2/3

gain

iii DQgain

Thesis Defense, Aug 20th, 2014

Quality Diversity

Carnegie Mellon

DIMPLE metric [Shima & Mitamura, 2011]

Relation: “killed”

Output

“kill”

“killed, ”

“of”

“death”

“murdered”

2/3

gain

i ik gain1

)1^2(CG

Cumulative Gain [Järvelin & Kekäläinen, 2002; Kekäläinen, 2005]

Thesis Defense, Aug 20th, 2014

8/20/2014

Carnegie Mellon

DIMPLE metric [Shima & Mitamura, 2011]

Relation: “killed”

Output

“kill”

“killed, ”

“of”

“death”

“murdered”

2/3

gain

1473013)1^2(5

15 i ik gainCG

2^gain-1

Thesis Defense, Aug 20th, 2014

Carnegie Mellon

DIMPLE metric [Shima & Mitamura, 2011]

Relation: “killed”

Output

“kill”

“killed, ”

“of”

“death”

“murdered”

2/3

gain

4.077777

73013DIMPLE 5

2^gain-1

Thesis Defense, Aug 20th, 2014

8/20/2014

Carnegie Mellon

Evaluating DIMPLE

Relation: “killed”

Output

“kill”

“killed, ”

“of”

“death”

“murdered”

Intrinsic EP, EPR, DIMPLE

Extrinsic MSRPA, RTE, CQAE

Correlation

40 sets of paraphrases

(=10 verbs x 4 paraphrase

generation algorithms)

Thesis Defense, Aug 20th, 2014

• MSRPC: The Microsoft Research Paraphrase

Corpus [Dollan et al., 2005]

• RTE: Recognizing Textual Entailment dataset from

PASCAL/TAC RTE1-4.

• CQAE: Complex Question Answering Evaluation

from 6 past TREC QA tracks.

Carnegie Mellon

Evaluating DIMPLE: Result

Dataset EP EPR DIMPLE

MSRPC 0.19 0.37 *0.52

RTE 0.29 *0.38 *0.58

CQAE *0.47 *0.55 *0.70

*: statistical significance where

null-hypothesis tested: “there is no correlation”, p-value<0.01

Thesis Defense, Aug 20th, 2014

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 63

Introduction

Paraphrase Extraction – Vanilla Espresso (Baseline)

– Espresso Extension (Baseline2)

– Diversifiable Bootstrapping

– Distributional Type Filtering

Paraphrase Evaluation Metric: DIMPLE

Experiment – Design

– Evaluation Results

Conclusion

Outline

Carnegie Mellon

List of Relations

Thesis Defense, Aug 20th, 2014 64

Seed source:

N: NELL[Carlson et al., 2010]

E: Ephyra [Schlaefer et al., 2006]

8/20/2014

Carnegie Mellon

Paraphrase Extractors

Thesis Defense, Aug 20th, 2014 65

Extraction Algorithm

Description Iteration Corpus

CPL Baseline 1: Coupled Pattern Learner from NELL [Carlson et al., 2010]

860th ClueWeb09 25TB 500m pages (includes Wikipedia)

VANILLA Baseline 2: Espresso[Pantel

& Pennacchiotti, 2006] w/o web 10th and/or convergence at τ2=0.01

Wikipedia 7GB 2.1m pages 50m sentences

BPL

Baseline 3: Extended Espresso; Bootstrap Paraphrase Learner (λ=1.0)

D-BPL Proposed: BPL with Diversification (λ=0.75)

Carnegie Mellon

Gold standard labels for patterns

Thesis Defense, Aug 20th, 2014 66

Label

M Matched (high certainty)

O Matched & Out-of-dictionary

I Inconclusive depends on the context (medium certainty)

R Related (no or very small certainty)

A Antonym

W Wrong

correct

incorrect

8/20/2014

Carnegie Mellon

Judging: M, O, R

Thesis Defense, Aug 20th, 2014 67

(M) X died of Y

(M) X passes to Y

(M) X perished in Y

(M) X succumbed to Y

(O) X fell victim to Y

(O) X was terminally ill with Y

(O) X suffered a fatal Y

(R) X was diagnosed with Y

(M) X was diagnosed with Y, and died

died-of

Carnegie Mellon

DIMPLE

Precision

Recall

Evaluation metrics on paraphrase patterns

8/20/2014

Carnegie Mellon

Label Distribution

Thesis Defense, Aug 20th, 2014 69

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 70

Introduction

Paraphrase Extraction – Vanilla Espresso (Baseline)

– Espresso Extension (Baseline2)

– Diversifiable Bootstrapping

– Distributional Type Filtering

Paraphrase Evaluation Metric: DIMPLE

Experiment – Design

– Evaluation Results

Conclusion

Outline

8/20/2014

Carnegie Mellon

Effect of diversification

Effect of type-based filtering

Outline of Experiments

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 72

Main Results (Metric: DIMPLE)

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4 5 6 7 8 9 10

DIMPLE

Iteration

D-BPL BPL VANILLA

p-values

CPL & D-BPL: 0.042

VANILLA & D-BPL: 0.023

BPL & D-BPL: 0.048

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 73

Main Results (Metric: Precision)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10

Precision

Iteration

D-BPL BPL VANILLA

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 74

Main Results (Metric: Recall)

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10

RECALL

Iteration

D-BPL BPL VANILLA

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 75

Overall results (11 relations; macro-avg)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10

Precision

Iteration

D-BPL BPL VANILLA

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4 5 6 7 8 9 10

DIMPLE

Iteration

D-BPL BPL VANILLA

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10

RECALL

Iteration

D-BPL BPL VANILLA

1 2 3 4 5 6 7 8 9 10

Num of Distinct

Keywords

Iteration

D-BPL BPL VANILLA

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 76

Effect of type-constraint

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10

Precision

Iteration

D-BPL(+)

D-BPL(-)

D-BPL

with type scoring

without type scoring

8/20/2014

Carnegie Mellon

Example: LEADER(X:person, Y:organization)

Thesis Defense, Aug 20th, 2014 77

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 78

Y, President of X Y, president of X Y, president of X Y, president of X Y, former president of X Y's regime in X X - Y, President president of X, Y Y's government in X Y, former president of X X's president Y X an leader Y president of X, Y Y, the president of X X an dictator Y President of X, Y Y (president of X Y to X to face trial X n president Y president Y of X Y (captain general, X X's President Y X's president, Y Y from power in X

X's president Y Y, the current president of X X, led by Y Y, the president of X Y is elected president of X banned in X during Y

Y (president of X X ian president Y invaded and annexed by X (under Y

president Y of X Y - president of X war against Y's X President Y of X Y, the former president of X Y to the presidency of X Y - former president of X Y becomes president of X Y is made premier of X X's president, Y Y, current president of X unification with Y's X

VANILLA BPL D-BPL

Example: LEADER(X:person, Y:organization) Top 15 paraphrases patterns

8/20/2014

Carnegie Mellon

Example: LEADER(X:person, Y:organization)

Thesis Defense, Aug 20th, 2014 79

(selected)

Carnegie Mellon

Example: person_graduated_school (X: person, Y:org)

Thesis Defense, Aug 20th, 2014 80

8/20/2014

Carnegie Mellon

Example: person_graduated_school (X: person, Y:org)

Thesis Defense, Aug 20th, 2014 81

Patterns by CPL/NELL (top 100)

Carnegie Mellon

Example: person_graduated_school (X: person, Y:org)

Thesis Defense, Aug 20th, 2014 82

VANILLA BPL D-BPL

High School, X attended Y X graduated from Y X graduated from Y high school, X attended Y X is a graduate of Y attended Y, where X School, X attended Y X has taught at Y X has taught at Y school, X attended Y attended Y, where X Y, where X majored

X attended Y Y, where X majored X received his undergraduate degree from Y

X graduated from Y X attended Y X joined the faculty at Y graduating from high school, X attended Y X taught at Y X studied at Y high school, X attended Y where he played

X received his undergraduate degree from Y

X was a visiting professor at Y

X attended Y where he played X studied at Y Y, where X earned Y, where X majored X graduated at Y X accepted a position at Y attended Y, where X Y, where X graduated X then went to Y X taught at Y high school, X attended Y Y, where X was a member

X has taught at Y X joined the faculty at Y X played college football for Y

X is a graduate of Y X graduated with honors from Y X is a graduate of Y

X received his undergraduate degree from Y X was graduated from Y science at Y, where X

Top 15 paraphrases patterns

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 83

Introduction

Paraphrase Extraction – Vanilla Espresso (Baseline)

– Espresso Extension (Baseline2)

– Diversifiable Bootstrapping

– Distributional Type Filtering

Paraphrase Evaluation Metric: DIMPLE

Experiment – Design

– Evaluation Results

Conclusion

Outline

Carnegie Mellon

Summary of Contributions

Thesis Defense, Aug 20th, 2014 84

Limitations in State-of-the-art

Confirmed Hypothesis Supporting Evidence

Corpus restriction

It is possible to extract paraphrase templates from an unstructured monolingual corpus given seed instances.

BPL & D-BPL outperforms the baselines in precision, recall and number of distinct keywords.

Lack of lexical diversity

Lexical diversity of acquired paraphrase can be controlled with a model of relevance-dissimilarity interpolation.

A statistically significant difference (p < 0.05) in DIMPLE was observed between the diversifiable bootstrapping and the baselines.

Semantic drift

Semantic drift risk from diversification be mitigated by distributional type restriction.

When type-based instance filtering is enabled, precision is constantly above the baseline and does not steeply drop.

Lack of evaluation metric

Cumulative-gain style metric which gives reward to lexically diverse paraphrases is effective.

DIMPLE correlates with paraphrase recognition task performance, with a Pearson's r of +0.5 ~ +0.7 with a statistical significance (p < 0.01).

8/20/2014

Carnegie Mellon

Future Works Co-reference resolution

– full name : last name only : pronoun = 1 : 5.8 : 6.7 (wikipedia)

– Data-sparseness issue when calculating reliability

– generate and add sentence replacing a reference with referent (avoid double count)

Corpus-specific paraphrase extraction

– Medical, Legal, Sports etc

Robust vector representation for type-scoring

– YAGO covers 9.8M entities, but there’s still coverage issue

DIMPLE's "Q" (Quality) by O/M/I labels

Extrinsic evaluation (QA)

Feature-based trainable scorer

– Using multiple features (pos seq, context vector, type vector, dict feature)

– Optimize w.r.t. different application needs / labels

85 Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Wrap up: immediate tasks to do Complete annotating all 15 relations

Calculate Inter-annotator agreement – Compare fine- vs coarse-grain ({M, O, I} vs {R, A, W})

– By Cohen's Kappa

Related works – Especially, Coupled Pattern Learner (NELL)

Analysis of Precision vs Avg Reliability

Release D-BPL code + Evaluation tool+ annotated data

86 Thesis Defense, Aug 20th, 2014

8/20/2014

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 87

Web-based Experiment Management Tool

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 88

Web-based Experiment Management Tool

8/20/2014

Carnegie Mellon

Conclusion Developed a paraphrase extraction algorithm that can acquire

lexically-diverse binary-relation paraphrase templates, given a relatively small number of seed instances for a certain relation and an unstructured monolingual corpus.

– Diversification is effective: a statistically significant difference in DIMPLE was observed between the Diversifiable Bootstrapping (D-BPL) and the two baseline algorithms (D-BPL without diversification and vanilla Espresso).

– Distributional type scoring is effective: when enabled, precision drop became less steel in early iterations, suggesting semantic drift is mitigated.

89 Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Questions?

Thesis Defense, Aug 20th, 2014 90

Top Related