26./27. Juni 2006 Saarbrücken
Workshop on multilingual semantic annotation, Saarbrücken, 26/27-06-06
Comments on Emanuele Pianta:Exploiting Parallel Texts
to leveragethe manual annotation bottleneck:
the MultiSemCor case
Ulrich Heid,
IMS-CL, Universität Stuttgart
26./27. Juni 2006 Saarbrücken
Workshop on multilingual semantic annotation, Saarbrücken, 26/27-06-06
The methodology: transfer of annotations
• It does around 75% of the annotation work
• It produces– an annotated TL corpus (pos, lemma, sense)– an annotated parallel corpus
26./27. Juni 2006 Saarbrücken
Workshop on multilingual semantic annotation, Saarbrücken, 26/27-06-06
Transfer of annotations: required infrastructure
• „Controlled“ translation: sentence-wise, pos-preserving where possible
• Multiword recognition
• Parallel WordNets: Princeton Target Language
Problems could arise:• with „free“ translations (cf. Translation Memories)
• with more „deviant“ WordNets, e.g. GermaNet
26./27. Juni 2006 Saarbrücken
Workshop on multilingual semantic annotation, Saarbrücken, 26/27-06-06
Analysing the transfer result
Systematic cases of non-alignment:• lack of „cross-linguistic synonymy“
• translation not 1:1– not pos-preserving: coexist - coesistenza– 1:2: successfully - con successo
• Do we get the same problems as those discussed as „divergences“/“mismatches“ in MT?
• Would a marking of chunks in SL/TL help?
• Would a morphology system help?
26./27. Juni 2006 Saarbrücken
Workshop on multilingual semantic annotation, Saarbrücken, 26/27-06-06
Towards relaxing the conditions on the infrastructure• To get the system to work under suboptimal conditions
– Would the integration of morphological
relations across pos be useful?
(Yes for alignment, no for WN synset transfer)
– Could the system be
made „aware“ of transfer problems
(and signal these to the user?)
• Test with e.g. Acquis Communautaire?
• Test with Germanic languages?
Top Related