Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf ·...

37
1 03.01.2020 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780) [CorDon] Linguistic Annotation Prof. Dr. Jolanta Gelumbeckaitė Institut für Empirische Sprachwissenschaft Goethe-Universität Frankfurt am Main eMail: [email protected] MAKING OLD LITHUANIAN TEXTS USABLE FOR RESEARCH (November 28—29, 2019)

Transcript of Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf ·...

Page 1: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

1 03.01.2020

Old Lithuanian Digital:

Corpus of Kristijonas Donelaitis (1714–1780) [CorDon]

Linguistic Annotation

Prof. Dr. Jolanta Gelumbeckaitė Institut für Empirische Sprachwissenschaft Goethe-Universität Frankfurt am Main eMail: [email protected]

MAKING OLD LITHUANIAN TEXTS USABLE FOR RESEARCH (November 28—29, 2019)

Page 2: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

2 03.01.2020

Page 3: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

3 03.01.2020

A comprehensive, deeply annotated diachronic reference corpus

of Old Lithuanian

Referenzcorpus Altlitauisch

Senosios lietuvių kalbos korpusas

(Lith. sliekas “earthworm”)

Cooperation:

2012–2014 (Nr. VAT-42/2012)

Goethe-University of Frankfurt am

Main

Institute of Lithuanian Language

Institute of Lithuanian

Literature and Folklore

University of Pisa

Page 6: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

6 03.01.2020

Old Lithuanian (ca. 1520–1800)

“Dzūkian prayers” (Pater noster, Ave Maria, Credo) in: Nicolaus de Blony, Tractatus sacerdotalis

(Straßburg: Martin Flach, 1503)

Christian Gottlieb Mielcke (1732–1807), Anfangs=Gründe einer Littauischen Sprach=Lehre

(Königsberg: Hartungsche Hofbuchdruckerei, 1800)

Vilniaus universitetas, Sign.: VUB RS II–3006

ca. 10 million word tokens

Page 8: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

8 03.01.2020

First Lithuanian text in TITUS

Page 9: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

9 03.01.2020

First Lithuanian text in TITUS

Page 10: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

10 03.01.2020

Institute of Lithuanian Literature and Folklore www.llti.lt

Kristijonas Donelaitis (1714–1780): PL, WD, F

Sign.: F1-5259 (Msc. A 120a-f. fol.)

Page 11: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

11 03.01.2020

DM 1765-1775: DM PL, DM WD, DMN RG, DMN ZR DMRh 1818 first edition by Ludwig J. Rhesa DMSch 1865 edition by August Schleicher DMN 1869 edition by Georg H. F. Nesselmann

Page 12: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

12 03.01.2020

DM F: Fortsetzung (29 verses) DPP: Pričkaus pasaka apie lietuvišką svodbą („Fritzens Erzählung von der litauischen Hochzeit“)

DP: Pasakos (fables): DP LG — Lapės ir gandro čėsnis („Gastmahl der Füchsin und des Storches“) DP RJ — Rudikis jomarkininks („Der Köter auf dem Jahrmarkt“) DP ŠD — Šuo didgalvis („Der großmaulige Hund“) DP PŠ — Pasaka apie šūdvabalį („Fabel vom Mistkäfer“) DP VP — Vilks provininks („Der Wolf als Richter“) DP ĄG — Ąžuols gyrpelnys („Der prahlerische Eichbaum“)

Page 13: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

13 03.01.2020

Ludwig J. Rhesa: DMRh 1818 first edition and translation into German of Metai; DPRh 1824 first edition of Pasakos (without translation) August Schleicher: DMSch, DPSch 1865 edition without translation; DPPSch 1865 first edition of Pričkaus pasaka Georg H. F. Nesselmann: DMN, DPN, DPPN 1869 edition and German translation. Nesselmann’s edition differs the least from the original.

Page 14: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

14 03.01.2020

Corpus’ Structure

o digitisation of the texts (and structural annotation)

o palaeographic resp. typographic and textological annotation

o lexical annotation:

• transliteration

• standardisation

• lemmatising

• glossing

o grammatical annotation

o annotation of quotations

o alignment of the annotated texts with facsimile reproductions

of the original, with each other, and with their translation

source texts (or translations into other languages)

Page 15: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

15 03.01.2020

Donelaitis 1977 = TITUS: ſʒŏkĭnėjánt

DM WD 16r 5(87): ſʒŏkĭnėjant

Donelaitis 1977 = TITUS: ſŭgăbe ſim

DM PL 7v 24(406): ſŭgăbe ſim

Digitisation

Page 17: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

18 03.01.2020

DM PL 10r 37(622)

Textological annotation

Page 18: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

19 03.01.2020

DM PL 10r 37(622)

DMRh 1818

Page 19: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

20 03.01.2020

• transliteration into Standard Lithuanian (in a historical lexicon;

phonotactic and orthographic pecularities)

• standardisation: normalised actual word form (in Standard Lithuanian;

common lexical/morphosyntactic base)

• lemmatising―main word form and its accentuation in a historical

lexicon

• glossing of the lemma in Lithuanian and in English (and/or German),

whereby its meanings in the given context are considered

• language encoding (olt, lat, ger, gre)

Lexical annotation

Page 20: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

21 03.01.2020

Jonas Kabelka, Kristijono Donelaičio raštų leksika, Vilnius: Mintis, 1964.

Lexical annotation

Georg H. F. Nesselmann (1851) Friedrich Kurschat (1883)

Page 21: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

22 03.01.2020

o hierarchic grammatical description, predominantly restricted to

morphology:

• part of speech-tagging:

POS-tagging of the lemma

POS-tagging of the actual word form

• morphological information:

unalterable morphological categories of the lemma

unalterable morphological categories of the actual word form:

flexional morphological characteristics of the actual word

form

Grammatical annotation

Page 22: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

23 03.01.2020

https://software.sil.org/toolbox/

• semi-automated (semi-manual) and human-controlled

annotation

• seven dictionaries are utilised in the Toolbox enviroment

• supplementation in the process of the annotation

Page 23: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

24 03.01.2020

https://software.sil.org/toolbox/

Page 24: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

25 03.01.2020

https://software.sil.org/toolbox/

Page 25: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

26 03.01.2020

https://software.sil.org/toolbox/

Page 27: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

28 03.01.2020

The distinction of the grammatical class of the lemma and of the actual word

form in a given text enables us to indicate:

changes in the grammatical classes (nominalisation, adverbalisation, and

turning of some nouns into adpositions), e.g.: aukščiau (preposition

APPR) bambos (lemma: aukštai ADV)

Grammatical annotation

Page 28: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

29 03.01.2020

Donelaitis: <i > / <y> / <in>ti Kabelka 1964, LKŽ: i ti > yti 1977 = TITUS: yti, inti > inti

Page 29: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

30 03.01.2020

-(i)áus vs. -(i)aus

DM PL 5v 16–20(224–228)

Page 30: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

31 03.01.2020

-(i)áus vs. -(i)aus

Page 31: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

32 03.01.2020

• Kabelka 1964, LKŽ: paskiaus

• Nesselmann 1869: paskiáus

• DMN ZR 98 23–24(356–357)

-(i)áus vs. -(i)aus

Page 32: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

33 03.01.2020

• DMN PL 121 20(82)

-(i)áus vs. -(i)aus

Page 33: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

34 03.01.2020

• DM WD 15v 7–9(49–50)

-(i)áus vs. -(i)aus

Page 34: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

35 03.01.2020

DM WD 17v 20–24(231–235)

Page 35: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

37 03.01.2020

DM WD 17v 23(234)

• 1977, Kabelka 1964: rūpesčiu (ins.sg.)

• LKŽ:

Page 36: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

40 03.01.2020

„DIGITALE HUMANITIS“

Page 37: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)

41 03.01.2020

Nuoširdžiai dėkoju Jums už dėmesį! Thank you very much for your attention!

Vielen herzlichen Dank für Ihre Aufmerksamkeit!

DM WD 23r 32(708)

DMN WD 708