Post on 13-Jan-2016
description
PATR II PATR II CompilerCompiler
Prolog Aufbaukurs SS 2000
Heinrich-Heine-Universität Düsseldorf
Christof Rumpf
22.05.2000 PATR II Compiler 2
NotationskonventionenNotationskonventionen
• Instantiierungsmodus von Argumenten– Blau: Input-Argumente– Rot: Output-Argumente
• Cut– roter Cut !– grüner Cut !
• Prädikatsdefinitionen– abgeschlossen– wird fortgesetzt
22.05.2000 PATR II Compiler 3
DirektivenDirektiven
% external resources
:- [tokenize]. % load tokenizer
% operators
:- op(510, xfy, : ). % attr:val:- op(600, xfx, ===). % path equation:- op(1100,xfx,'--->'). % syntax rule, lexical entry:- op(1200,xfx,'::'). % description annotation
22.05.2000 PATR II Compiler 4
3 Compiler-Komponenten3 Compiler-Komponenten
• Tokenizer– Input: PATR II-Grammatik– Output: Token-Zeilen
• Präprozessor– Input: Token-Zeilen– Output: Token-Sätze
• Syntax-Compiler– Input: Token-Sätze– Output: Prolog-Klauseln
compile_grammar(File):-clear_grammar,tokenize_file(File), read_sentences,compile_sentences.
22.05.2000 PATR II Compiler 5
Tokenizer-InputTokenizer-Input
; Shieb1.ptr; Sample grammar one from Shieber 1986
; Grammar Rules; ------------------------------------------------------------
Rule {sentence formation} S --> NP VP:
<S head> = <VP head><VP head subject> = <NP head>.
Rule {trivial verb phrase} VP --> V:
<VP head> = <V head>.
; Lexicon; ----------------------------------------------------------------
Word uther:<cat> = NP<head agreement gender> = masculine<head agreement person> third<head agreement number> = singular.
22.05.2000 PATR II Compiler 6
Tokenizer Output = Präprozessor InputTokenizer Output = Präprozessor Input
line(1,[o($;$),b(1),u($Shieb1$),o($.$),l($ptr$)]).line(2,[o($;$),b(1),u($Sample$),b(1),l($grammar$),b(1),l($one$),b(1),l($from$),b(1), ...line(3,[ ]).line(4,[ ]).line(5,[o($;$),b(1),u($Grammar$),b(1),u($Rules$)]).line(6,[o($;$),b(1),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$), ...line(7,[ ]).line(8,[u($Rule$),b(1),o(${$),l($sentence$),b(1),l($formation$),o($}$)]).line(9,[b(2),u($S$),b(1),o($-$),o($-$),o($>$),b(1),u($NP$),b(1),u($VP$),o($:$)]).line(10,[b(1),o($<$),u($S$),b(1),l($head$),o($>$),b(1),o($=$),b(1),o($<$),u($VP$),b(1), ...line(11,[b(1),o($<$),u($VP$),b(1),l($head$),b(1),l($subject$),o($>$),b(1),o($=$),b(1), ...line(12,[b(1)]).line(13,[u($Rule$),b(1),o(${$),l($trivial$),b(1),l($verb$),b(1),l($phrase$),o($}$)]).line(14,[b(2),u($VP$),b(1),o($-$),o($-$),o($>$),b(1),u($V$),o($:$)]).line(15,[b(1),o($<$),u($VP$),b(1),l($head$),o($>$),b(1),o($=$),b(1),o($<$),u($V$),b(1),......line(41,[b(1),o($<$),l($head$),b(1),l($subject$),b(1),l($agreement$),b(1),l($number$),...line(42,[eof]).
22.05.2000 PATR II Compiler 7
Präprozessor Output = Compiler InputPräprozessor Output = Compiler Input
sentence( 1,11,[u($Rule$),o(${$),l($sentence$),l($formation$),o($}$),...
sentence(12,15,[u($Rule$),o(${$),l($trivial$),l($verb$),l($phrase$),o($}$),...
sentence(16,24,[u($Word$),l($uther$),o($:$),o($<$),l($cat$),o($>$),o($=$),...
sentence(25,30,[u($Word$),l($knights$),o($:$),o($<$),l($cat$),o($>$),o($=$),...
sentence(31,36,[u($Word$),l($sleeps$),o($:$),o($<$),l($cat$),o($>$),o($=$),...
sentence(37,41,[u($Word$),l($sleep$),o($:$),o($<$),l($cat$),o($>$),o($=$),...
sentence(42,42,[eof]).
Der Präprozessor entfernt Kommentare und Leerzeichen und fasst mit einem Punkt terminierte Sätze aus mehreren Zeilen zusammen. Der eigentliche Compiler kann sich dann auf das wesentliche konzentrieren.
22.05.2000 PATR II Compiler 8
Präprozessor: Main LoopPräprozessor: Main Loopread_sentences:-
abolish(cnt/1),write('preprocessing...'), nl,repeat,count(I),read_sentence(N,M,S),assert(sentence(N,M,S)),put(13), tab(3), write(I), write(' sentences preprocessed'),S = [eof], !, nl.
read_sentence(N,M,S):-retract(line(N,L)),read_sentence(L,N,M,S), !.
Backtracking
22.05.2000 PATR II Compiler 9
Präprozessor: Satz lesenPräprozessor: Satz lesen
read_sentence([eof],N,N,[eof]):- !. % end of fileread_sentence([o($.$)|_],N,N,[]):- !. % end of sentenceread_sentence([o($;$)|_],N,M,S):- !, % skip comment
N1 is N+1,retract(line(N1,L)), % next lineread_sentence(L,N1,M,S).
read_sentence([],N,M,S):- !, % end of lineN1 is N+1,retract(line(N1,L)), % next lineread_sentence(L,N1,M,S).
read_sentence([b(_)|T1],N,M,T2):- !, % skip blanksread_sentence(T1,N,M,T2).
read_sentence([H|T1],N,M,[H|T2]):- % collect tokensread_sentence(T1,N,M,T2).
22.05.2000 PATR II Compiler 10
Compiler: Main LoopCompiler: Main Loop
compile_sentences:-abolish(cnt/1),write('compiling...'), nl,retract(sentence(N,M,S)),compile_sentence((N,M),C,S,[]),assert(C),count(I), put(13), tab(3), write(I), write(' sentences compiled'),S = [eof], !,nl.
Backtracking
22.05.2000 PATR II Compiler 11
Compiler: SatztypenCompiler: Satztypen
% compile_sentence(Position,Clause,Sentence,Rest)
compile_sentence(_,C) --> [eof], !, {C = finished}.compile_sentence(_,C) --> syntax_rule(C), !.compile_sentence(_,C) --> lex_entry(C), !.compile_sentence(_,C) --> template(C), !.compile_sentence(P,_,_,_):-
P = (N,M), nl,write(' error in sentence between lines '),write(N),write(' and '), write(M), nl, fail.
22.05.2000 PATR II Compiler 12
Syntax-RegelnSyntax-Regeln
syntax_rule(C) --> rs('Rule'), !, syntax_rule_cont(C).
syntax_rule_cont((Expansion :: Descr)) -->
rule_name,
sr_expansion(Expansion,Sugar),
rs(:), !,
sr_path_equations(Equations,Sugar),
{sr_sugar_cats(Sugar,Equations,Descr)}.
22.05.2000 PATR II Compiler 13
Reservierte SymboleReservierte Symbolers(=) --> [o($=$)], !.rs(:) --> [o($:$)], !.rs(<) --> [o($<$)], !.rs(>) --> [o($>$)], !.rs('{') --> [o(${$)], !.rs('}') --> [o($}$)], !.rs('Rule') --> [u($Rule$)], !.rs('Word') --> [u($Word$)], !.rs('Let') --> [u($Let$)], !.rs('be') --> [l($be$)], !.rs('-->') --> [o($-$),o($-$),o($>$)], !.
Alternative: Definiere für jedes reservierte Symbol ein eigenes Prädikat, z.B. colon statt rs(:).
22.05.2000 PATR II Compiler 14
Weitere TerminalsymboleWeitere Terminalsymbole
uatom(A) --> [u(S)], {atom_string(A,S)}.latom(A) --> [l(S)], {atom_string(A,S)}.satom(A) --> [s(S)], {atom_string(A,S)}.
int(I) --> [i(I)].
atom(A) --> uatom(A), !.atom(A) --> latom(A), !.atom(A) --> satom(A), !.
atomic(A) --> atom(A), !.atomic(A) --> int(A), !.
22.05.2000 PATR II Compiler 15
RegelnamenRegelnamen
rule_name --> rs('{'), !, % start of rule namecurley_braces_terminated_string.
rule_name --> []. % rule names are optional
curley_braces_terminated_string --> rs('}'), !. % end of rule name
curley_braces_terminated_string --> [_], % read any symbolcurley_braces_terminated_string.
Regelnamen werden überlesen und nicht in die Prolog-Repräsentation der Regeln übernommen.
22.05.2000 PATR II Compiler 16
RegelexpansionRegelexpansion
sr_expansion((LHS ---> RHS),[LSugar|RSugar]) --> sr_lhs(LHS,LSugar),rs('-->'),sr_rhs(RHS,RSugar).
sr_lhs(LHS,Sugar) --> fsd(LHS,Sugar).sr_rhs(RHS,Sugar) --> ne_fsd_seq(RHS,Sugar).
ne_fsd_seq((FSD,FSDs),[Sugar|Sugars]) --> fsd(FSD,Sugar), ne_fsd_seq(FSDs,Sugars).
ne_fsd_seq(FSD,[Sugar]) --> fsd(FSD,Sugar).
fsd(Var,(FSD,Var)) --> uatom(FSD).
22.05.2000 PATR II Compiler 17
Syntax-Regeln: PfadgleichungenSyntax-Regeln: Pfadgleichungen
sr_path_equations((E,Es),Sugar) -->sr_path_equation(E,Sugar),sr_path_equations(Es,Sugar).
sr_path_equations(E,Sugar) --> sr_path_equation(E,Sugar).
sr_path_equation((LHS === RHS),Sugar) --> sr_path(LHS,Sugar), rs(=),sr_val(RHS,Sugar).
sr_val(V,Sugar) --> sr_path(V,Sugar).sr_val(V,_) --> atomic(V).
22.05.2000 PATR II Compiler 18
Syntax-Regeln: PfadeSyntax-Regeln: Pfade
sr_path(Var,Sugar) --> rs(<), fsd(FSD), rs(>), {member((FSD,Var),Sugar)}, !.
sr_path(Var:P,Sugar) --> rs(<), fsd(FSD), ne_feature_seq(P), rs(>), {member((FSD,Var),Sugar)}, !.
ne_feature_seq(F) --> feature(F).ne_feature_seq(F:P) -->
feature(F), ne_feature_seq(P).
fsd(FSD) --> uatom(FSD).feature(F) --> atomic(F).
22.05.2000 PATR II Compiler 19
Syntaktischer ZuckerSyntaktischer Zucker
sr_sugar_cats([(Cat,Var)|Sugar],Equations,((Var:cat === Cat),Descr)):-
sr_sugar_cats(Sugar,Equations,Descr).
sr_sugar_cats([],Descr,Descr).
Rule {sentence formation} S --> NP VP: <S head> = <VP head> <VP head subject> = <NP head>.
Rule {sentence formation} X0 --> X1 X2:
<X0 cat> = S<X1 cat> = NP<X2 cat> = VP<X0 head> = < X2 head><X2 head subject> = <X1 head>.
22.05.2000 PATR II Compiler 20
Lexikalische EinträgeLexikalische Einträge
lex_entry(C) --> rs('Word'), !, lex_entry_cont(C).
lex_entry_cont((FS ---> L :: Descr)) --> lexeme(L),rs(:), !,lex_definition(FS, Descr).
lexeme(L) --> atom(L).
22.05.2000 PATR II Compiler 21
Lexikon: MerkmalsstrukturenLexikon: Merkmalsstrukturen
lex_definition(FS,(LDef,LDefs)) --> lexdef(FS,LDef),lex_definition(FS,LDefs).
lex_definition(FS,LDef) --> lexdef(FS,LDef).
lexdef(FS,LDef) --> template_name(FS,LDef), !.
lexdef(FS,LDef) --> lex_path_equation(FS,LDef), !.
22.05.2000 PATR II Compiler 22
Lexikon: PfadgleichungenLexikon: Pfadgleichungen
lex_path_equation(FS, (LHS === RHS)) --> lex_path(FS, LHS), rs(=), !,lex_val(FS, RHS).
lex_path(FS,FS:P) --> rs(<), ne_feature_seq(P), rs(>), !.
lex_val(FS,V) --> lex_path(FS,V).lex_val(_,V) --> atomic(V).
22.05.2000 PATR II Compiler 23
TemplatesTemplates
template(C) --> rs('Let'), !, template_cont(C).
template_cont((N :- TDef)) --> template_name(FS,N),rs('be'),template_definition(FS,TDef),{assert(template(N))}.
22.05.2000 PATR II Compiler 24
Templates: Head & BodyTemplates: Head & Body
template_name(FS,N) -->atom(A),{N =.. [A,FS]}.
template_definition(FS,TDef) -->lex_definition(FS,TDef).
22.05.2000 PATR II Compiler 25
Löschen einer GrammatikLöschen einer Grammatik
clear_templates:-template(T),T =.. [F,_],abolish(F/1),fail.
clear_templates:- abolish(template/1).
clear_grammar:-abolish('::'/2),abolish(line/2),abolish(sentence/3),clear_templates.
22.05.2000 PATR II Compiler 26
Compiler OutputCompiler Output
A ---> B , C :: A : cat === 'S', B : cat === 'NP', C : cat === 'VP', A : head === C : head, C : head : subject === B : head.
A ---> uther :: A : cat === 'NP', A : head : agreement : gender === masculine, A : head : agreement : person === third, A : head : agreement : number === singular.
22.05.2000 PATR II Compiler 27
ResourcenResourcen
• Grammatiken PATR II / Prolog– shieb1.ptr / shieb1.ari
– shieb2.ptr / shieb2.ari
– shieb3.ptr / shieb3.ari
– shieb4.ptr / shieb4.ari
• Tokens– shieb1.tok (Tokenizer)
– shieb1.snt (Präprozessor)
• PATR II Interpreter– patrlcl.ari: Left-corner
mit Linking– patrlclc.ari: Left-corner
mit Linking und Syntaxbäumen
– patr-ii.ari: DCG
• PATR II Compiler– patrcomp.ari– patr-ii.ari: DCG
22.05.2000 PATR II Compiler 28
Offene Probleme und ErweiterungenOffene Probleme und Erweiterungen
• Syntaktischer Zucker der Form VP_1 VP_2 X
• Lexikalische Regeln
• Templates in Syntaxregeln
• Negation und Disjunktion
• Default Vererbung (Priority Union)
• ...
22.05.2000 PATR II Compiler 29
LiteraturLiteratur
• Shieber, Stuart (1986): An Introduction to Unification-based Approaches to Grammar. CSLI Lecture Notes.
• Gazdar, Gerald & Chris Mellish (1989): Natural Language Processing in Prolog. Addison Wesley.
• Covington, Michael A. (1994): Natural Language Processing for Prolog Programmers. Chap. 6: Parsing Algorithms. Prentice-Hall.