Ulrich Heid, IMS-CL, Universität Stuttgart

5
26./27. Juni 2006 Saarbrücken Workshop on multilingual semantic annotation, Saarbrücken, 26/27-06-06 Comments on Emanuele Pianta: Exploiting Parallel Texts to leverage the manual annotation bottleneck: the MultiSemCor case Ulrich Heid, IMS-CL, Universität Stuttgart

description

Ulrich Heid, IMS-CL, Universität Stuttgart. Comments on Emanuele Pianta: Exploiting Parallel Texts to leverage the manual annotation bottleneck: the MultiSemCor case. The methodology: transfer of annotations. It does around 75% of the annotation work It produces - PowerPoint PPT Presentation

Transcript of Ulrich Heid, IMS-CL, Universität Stuttgart

Page 1: Ulrich Heid, IMS-CL, Universität Stuttgart

26./27. Juni 2006 Saarbrücken

Workshop on multilingual semantic annotation, Saarbrücken, 26/27-06-06

Comments on Emanuele Pianta:Exploiting Parallel Texts

to leveragethe manual annotation bottleneck:

the MultiSemCor case

Ulrich Heid,

IMS-CL, Universität Stuttgart

Page 2: Ulrich Heid, IMS-CL, Universität Stuttgart

26./27. Juni 2006 Saarbrücken

Workshop on multilingual semantic annotation, Saarbrücken, 26/27-06-06

The methodology: transfer of annotations

• It does around 75% of the annotation work

• It produces– an annotated TL corpus (pos, lemma, sense)– an annotated parallel corpus

Page 3: Ulrich Heid, IMS-CL, Universität Stuttgart

26./27. Juni 2006 Saarbrücken

Workshop on multilingual semantic annotation, Saarbrücken, 26/27-06-06

Transfer of annotations: required infrastructure

• „Controlled“ translation: sentence-wise, pos-preserving where possible

• Multiword recognition

• Parallel WordNets: Princeton Target Language

Problems could arise:• with „free“ translations (cf. Translation Memories)

• with more „deviant“ WordNets, e.g. GermaNet

Page 4: Ulrich Heid, IMS-CL, Universität Stuttgart

26./27. Juni 2006 Saarbrücken

Workshop on multilingual semantic annotation, Saarbrücken, 26/27-06-06

Analysing the transfer result

Systematic cases of non-alignment:• lack of „cross-linguistic synonymy“

• translation not 1:1– not pos-preserving: coexist - coesistenza– 1:2: successfully - con successo

• Do we get the same problems as those discussed as „divergences“/“mismatches“ in MT?

• Would a marking of chunks in SL/TL help?

• Would a morphology system help?

Page 5: Ulrich Heid, IMS-CL, Universität Stuttgart

26./27. Juni 2006 Saarbrücken

Workshop on multilingual semantic annotation, Saarbrücken, 26/27-06-06

Towards relaxing the conditions on the infrastructure• To get the system to work under suboptimal conditions

– Would the integration of morphological

relations across pos be useful?

(Yes for alignment, no for WN synset transfer)

– Could the system be

made „aware“ of transfer problems

(and signal these to the user?)

• Test with e.g. Acquis Communautaire?

• Test with Germanic languages?