Post on 12-Jan-2016
description
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
XML WorkshopXML – Standardformat für den Austausch von elektronischen Daten in der pharmazeutischen Industrie?
Joerg DillertSenior Consultant
March, 30th, 2004
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
0. Allgemeines
3
Der Workshop …
ist in Germisch!
4
Ein paar Regeln
9.00 – 16.30
Pausen 15,60,15
Handys aus oder Vibration!
Toiletten
Fluchtwege
Fragen - bitte jederzeit
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
1. EinführungWas ist eigentlich XML?Wie ist es entstanden?
6
Handys, Smartphones und PDAs mit integrierter SyncML-Unterstützung Modell Anbieter Gerätetyp Verfügbarkeit
Alcatel: ot715
Motorola: A830, A835, V600, E390
Nokia 7250, 6800, 3650, 6220, 9210i, 7650
Samsung SGH-D700
Siemens: S55, SL55, M55, SX1
Sony Ericsson: T68i, T610, P800, Z1010, PEG-NZ90
PDAs: Sony PEG-NX70V, PEG-T675C,PEG-T625C
7
Wir leben im Zeitalter der Buzzwords
B2B, B2M, E2B
DIA, EMEA, FDA
XML, DTD, XSL, SVG
Die (Computer) Industrie gibt uns viele neue Wörter jede Woche
Schauen Sie mal an Ihren Arbeitsplatz – welches sind denn so Ihre Buzzwords? (SOPs, DCFs, …)
8
... kennen Sie diese Deutsche Musikgruppe?
MFG
Smudo
9
Urkundlich erwähnt …
SGMLStandard Generalized
Markup Language
ISO 88791
seit 1986
10
The SGML family of markup languages – more buzzwords!!
GML Generalized Markup Language
Goldfarb, Mosher and Lorie, IBM, 1969
IBM Document Composition Facility DCF (Script)
SGML Standardized Generalized Markup Language
Content attributes. ISO-8879 first published in 1986
HTML HyperText Markup Language
Functional attributes: hyperlink, frame
Based on hyperdocument standard definitions
CALS Continuous Acquisition and Life-cycle Support
Based on DoD MIL-M-28001B standard definitions
XML eXtensible Markup Language
(Founding father: Dr. Charles F. Goldfarb, IBM)
11
1986
Entwicklung SGML in den IBM Labs in Almaden
Charles Goldfarb
ISO Standard
Überarbeitung 1990, Ziel war eine universell einsetzbare Auszeichnungssprache für Dokumente
12
A brief history of SGML
The Evolution of Markup Languages
Plain text
Font attributes: Bold, underline, italics, font size
Document structure attributes: Heading level, index term
Document content attributes: Patient age, dosage unit
13
1990
Am Kernforschungszentrum Cern in Genf begann Entwicklung von HTML
erster Entwurf 1993, Geburtstunde des Web
1995 überarbeitet HTML Version 2.0
14
1994
Um Wildwuchs zu verhindern – Gründung des
World Wide Web Consortium (W3C)
primäre Aufgabe: Weiterentwicklung von HTML
15
1998
W3C erkannte, daß mit HTML die Herausforderungen der Zukunft nicht gemeistert werden können
Zwischen Zuviel an Markup (SGML) und dem Zuwenig (HTML) sollte der goldene Mittelweg gefunden werden
Abschluß des Findungsprozesses – XML
1998 als offizieller Standard verabschiedet
16
2001
W3C verabschiedet als wichtigste Ergänzung die erste Version von XSL (Extensible Stylesheet Language)
stellt Regeln zur Umwandlung von XML Dokumenten und ein Vokabular zum Formatieren dieser Dokumente zur Verfügung
2002 Arbeitsentwurf zu XHTML Version 2.0 , Bruch mit HTML 4.0 und XHMTL 1.0 – keine Rückwärtskompatibilität
17
The big advantage of XML You have flexibility - you can define your own TAGS
The Parser need only the DTD / Schemas for checking the correctness of your file
Readable for everyone
Vendor independent (No vendor can impose their own definitions, standards or undocumented formats)
License free
... all three types are in ASCII format!(American Standard Code for Information Interchange )
18
XML and data interchange
This kind of information data interchange is the standard in other industries and is called B2B
The Germans favourite spare time object ...
• is produced Just in Time
19
What we‘ve learned from other industries ...
B2B Server B2B Server
SupplierSystem Y
Car producer System X
RequestOrder
Delivery
ProposalOrder confirmation
Delivery confirmationInvoicing
20
XML is ...
The data interchange and document format for now and in the future (E2B, CDISC, CTD)
Is in practical use in many industries
• E.g. car production industry
EVERY system can communicate with another
You need only ONE interface per system
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
2. XMLeXtensible Markup Language
22
XML
XML - eXtensible Markup Language
• It is a subset of SGML
• It focusses on content (sometimes also structure)
• The XML file contains the DATA
• It is restricted by TAGs
• Example:
<messagetype>ICSR</messagetype>
23
An example as a graphic
24
... and as XML structure<ANA106>
<Screening visit>
<Inclusion criteria>
<Inc1>YES</Inc1>
<Inc2>YES</Inc2>
<Inc3>YES</Inc3>
<Inc4>YES</Inc4>
</Inclusion criteria>
<Exclusion criteria>
<Excl1>NO</Excl1>
<Excl2>NO</Excl2>
<Excl3>NO</Excl3
<Excl4>NO</Excl4>
</Exclusion criteria>
<Demographic/Investigator>
<Sex>Male</Sex>
<DoB>07/26/1966</DoB>
<Smoke>Yes</Smoke>
<InvNo>128</InvNo>
</Demographics/Investigator>
... --- more page(sections>
</Screnning visit>
<Visit 1>
... --- more blocks
</Visit 1>
... --- more visits
</ANA106>
25
einfache XML Struktur<?xml version="1.0" encoding="ISO-8859-1"?>
<DVMDTagung>
<Workshop>
<event >
<stadt>Ulm</stadt>
<ort>MedSchule</ort>
</event>
</Workshop>
</ DVMDTagung >
26
Attribute<?xml version="1.0" encoding="ISO-8859-1"?>
<DVMDTagung>
<Workshop name=„XML in der Pharmazeutischen Industrie" Leiter=„Joerg Dillert">
<event datum="31.03.2004">
<stadt>Ulm</stadt>
<ort>MedSchule</ort>
</event>
<event datum=„25.06.2004">
<stadt>Berlin</stadt>
<ort>PFOffice</ort>
</event>
</Workshop>
</ DVMDTagung >
27
Attribute<?xml version="1.0" encoding="ISO-8859-1"?>
<!– zum Kommentieren -->
<DVMDTagung>
<Workshop name="XML in der Pharmazeutischen Industrie" Leiter=„Joerg Dillert">
<event datum="31.03.2004">
<stadt>Ulm</stadt>
<ort>MedSchule</ort>
</event>
<event datum="25.06.2004">
<stadt>Berlin</stadt>
<ort>PFOffice</ort>
</event>
</Workshop>
</DVMDTagung >
see in IE
see in XML Notepad
28
Characters
Character set
• Characters that may be represented in XML document
– e.g., ASCII character set
• Letters of English alphabet
• Digits (0-9)
• Punctuation characters, such as !, - and ?
29
Character Set
XML documents may contain
• Carriage returns
• Line feeds
• Unicode characters
– Enables computers to process characters for several languages
30
Characters vs. Markup
XML must differentiate between
• Markup text
– Enclosed in angle brackets (< and >)
• e.g,. Child elements
• Character data
– Text between start tag and end tag
• e.g., Fig. 5.1, line 7: Welcome to XML!
31
White Space, Entity References and Built-in Entities Whitespace characters
• Spaces, tabs, line feeds and carriage returns
– Significant (preserved by application)
– Insignificant (not preserved by application)
• Normalization- Whitespace collapsed into single whitespace character- Sometimes whitespace removed entirely
<markup>This is character data</markup>
after normalization, becomes
<markup>This is character data</markup>
32
White Space, Entity References and Built-in Entities (cont.)
XML-reserved characters
• Ampersand (&)
• Left-angle bracket (<)
• Right-angle bracket (>)
• Apostrophe (’)
• Double quote (”)
Entity references
• Allow to use XML-reserved characters
– Begin with ampersand (&) and end with semicolon (;)
• Prevents from misinterpreting character data as markup
33
White Space, Entity References and Built-in Entities (cont.)
Build-in entities
• Ampersand (&)
• Left-angle bracket (<)
• Right-angle bracket (>)
• Apostrophe (')
• Quotation mark (")
• Mark up characters “<>&” in element message
<message><>&</message>
see in IE
34
Using Unicode in an XML Document
XML Unicode support
• e.g., displays Arabic words
– Arabic characters
• represented by entity references for Unicode characters
35
XML document that contains Arabic words
<?xml version = "1.0"?>
<welcome><from>دايتَلأند</from>
<subject>أهلاً بكمفيِعالم </subject>
</welcome>
see in IE
36
Markup
XML element markup
• Consists of
– Start tag
– Content
– End tag
• All elements must have corresponding end tag<img src = “img.gif”>
is correct in HTML, but not XML
• XML requires end tag or forward slash (/) for termination
<img src = “img.gif”></img>
or
<img src = “img.gif”/>
is correct XML syntax
37
Markup (cont.)
Elements
• Define structure
• May (or may not) contain content
– Child elements, character data, etc.
Attributes
• Describe elements
– Elements may have associated attributes
• Placed within element’s start tag
• Values are enclosed in quotes
– Element car contains attribute doors, which has value “4”
<car doors = “4”/>
38
Markup (cont.)
Processing instruction (PI)
• Passed to application using XML document
• Provides application-specific document information
• Delimited by <? and ?>
39
1 <?xml version = "1.0"?>
2
3 <!-- Fig. 5.5 : usage.xml -->
4 <!-- Usage of elements and attributes -->
5
6 <?xml:stylesheet type = "text/xsl" href = "usage.xsl"?>
7
8 <book isbn = "999-99999-9-X">
9 <title>Deitel&s XML Primer</title>
10
11 <author>
12 <firstName>Paul</firstName>
13 <lastName>Deitel</lastName>
14 </author>
15
16 <chapters>
17 <preface num = "1" pages = "2">Welcome</preface>
18 <chapter num = "1" pages = "4">Easy XML</chapter>
19 <chapter num = "2" pages = "2">XML Elements?</chapter>
20 <appendix num = "1" pages = "9">Entities</appendix>
21 </chapters>
22
23 <media type = "CD"/>
24 </book>
PI discussed later
40
CDATA Sections
CDATA sections
• May contain text, reserved characters and whitespace
– Reserved characters need not be replaced by entity references
• Not processed by XML parser
• Commonly used for scripting code (e.g., JavaScript)
• Begin with <![CDATA[
• Terminate with ]]>
see in IE
41
CDATA1 <?xml version = "1.0"?>
2
3 <!-- Fig. 5.7 : cdata.xml -->
4 <!-- CDATA section containing C++ code -->
5
6 <book title = "C++ How to Program" edition = "3">
7
8 <sample>
9 // C++ comment
10 if ( this->getX() < 5 && value[ 0 ] != 3 )
11 cerr << this->displayError();
12 </sample>
13
14 <sample>
15 <![CDATA[
16
17 // C++ comment
18 if ( this->getX() < 5 && value[ 0 ] != 3 )
19 cerr << this->displayError();
20 ]]>
21 </sample>
22
23 C++ How to Program by Deitel & Deitel
24 </book>
42
CDATA (cont.)
43
XML Namespaces
Naming collisions
• Two different elements have same name<subject>Math</subject>
<subject>Thrombosis</subject>
Namespaces
• Differentiate elements that have same name <school:subject>Math</school:subject>
<medical:subject>Thrombosis</medical:subject>
– school and medical are namespace prefixes
• Prepended to elements and attribute names
• Tied to uniform resource identifier (URI)- Series of characters for differentiating names
44
XML Namespaces (cont.)
Creating namespaces
• Use xmlns keywordxmlns:text = “urn:deitel:textInfo”
xmlns:image = “urn:deitel:imageInfo”
– Creates two namespace prefixes text and image
– urn:deitel:textInfo is URI for prefix text
– urn:deitel:imageInfo is URI for prefix image
• Default namespaces
– Child elements of this namespace do not need prefix
xmlns = “urn:deitel:textInfo”
45
XML namespace - no default
1 <?xml version = "1.0"?>
2
3 <!-- Fig. 5.8 : namespace.xml -->
4 <!-- Namespaces -->
5
6 <directory xmlns:text = "urn:deitel:textInfo"
7 xmlns:image = "urn:deitel:imageInfo">
8
9 <text:file filename = "book.xml">
10 <text:description>A book list</text:description>
11 </text:file>
12
13 <image:file filename = "funny.jpg">
14 <image:description>A funny picture</image:description>
15 <image:size width = "200" height = "100"/>
16 </image:file>
17
18 </directory>
46
XML namespace with default
1 <?xml version = "1.0"?>
2
3 <!-- Fig. 5.9 : defaultnamespace.xml -->
4 <!-- Using Default Namespaces -->
5
6 <directory xmlns = "urn:deitel:textInfo"
7 xmlns:image = "urn:deitel:imageInfo">
8
9 <file filename = "book.xml">
10 <description>A book list</description>
11 </file>
12
13 <image:file filename = "funny.jpg">
14 <image:description>A funny picture</image:description>
15 <image:size width = "200" height = "100"/>
16 </image:file>
17
18 </directory>
default
needsfull name
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
3. DTD und Schemas
48
DTD / Schema
DTD – Document Type Definition• Sometimes also called the ‘Document Type Description‘
Today you have Schemas• Schemas are more detailed
• Comes back to XML
Contains the ‘grammar’ of an XML document • The parser (a program) checks the correctness of the XML
file based on the DTD / Schemas– numerics
– characters ...
49
Parsing / Validieren
DTDSchemas
Correct(well-formed)
fileXML file
50
DTDs vs. Schemas
beschreiben den prinzipiellen Aufbau von Dokumenten eines bestimmten Typs
können entweder mit DTDs (Document Type Definitions) oder XML-Schemata spezifiziert werden
DTDs wurden von SGML übernommen und sind Teil von XML 1.0.
XMLSchema sind ein eigener W3C-Standard.
51
DTDs vs. Schemas (2)
XML-Schemata sind ausdrucksstärker
DTDs sind kompakter und lesbarer
DTDs für Spezifikation von Text-Dokumenten ausreichend
XML-Schemata zur Spezifikation von Daten besser geeignet.
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
4. DTDs
53
Wie könnte die DTD aussehen?
<BookStore>
<Book>
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>July, 1998</Date>
<ISBN>94303-12021-43892</ISBN>
<Publisher>McMillin Publishing</Publisher>
</Book>
</BookStore>
Ein BookStore soll mindestens ein Buch enthalten.
Die ISBN soll optional, alle anderen Kind-Elemente sollen obligatorisch sein.
54
Die DTD für den BookStore
<!ELEMENT BookStore (Book+)>
<!ELEMENT Book (Title, Author, Date, ISBN?, Publisher)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Author (#PCDATA)>
<!ELEMENT Date (#PCDATA)>
<!ELEMENT ISBN (#PCDATA)>
<!ELEMENT Publisher (#PCDATA)>
ähnelt einer regulären Grammatik
55
Deklaration
<!ELEMENT BookStore (Book+)>
deklariert das Element BookStore
BookStore hat mindestens ein Kind-Element Book.
Außer Book darf BookStore keine Kind-Elemente haben.
+ bezeichnet n Wiederholung des vorstehenden Elementes mit n >= 1.
* bezeichnet n Wiederholung mit n >= 0.
<BookStore>
<Book> </Book>
<Book> </Book>
</BookStore>
56
Deklaration (2)
<!ELEMENT Book (Title, Author, Date, ISBN?, Publisher)>
deklariert das Element Book
Title, Author, Date, ISBN und Publisher erscheinen (in dieser Reihenfolge) als Kind-Elemente von Book.
Außer diesen darf Book keinen anderen Kind-Element haben.
, bezeichnet Sequenz (Aufeinanderfolge) von Elementen.
? bedeutet, dass das vorstehende Element in der Sequenz optional ist.
<Book><Title>…</Title><Author>…</Author><Date>…</Date><ISBN>…</ISBN><Publisher>…</Publisher>
</Book>
57
Rekursive Darstellung
<!ELEMENT BookStore (Book | (Book, BookStore))>
rekursive Deklaration von BookStore
Bookstore besteht entweder aus genau einem Kind-Element Book oder hat zwei Kind-Elemente, nämlich Book und BookStore
| bezeichnet Auswahl (Disjunktion)
Beachte: Diese rekursive Deklaration ist nicht äquivalent zur vorherigen:
• <! ELEMENT BookStore (Book+)>
58
Deklaration der Elemente
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Author (#PCDATA)>
<!ELEMENT Date (#PCDATA)>
<!ELEMENT ISBN (#PCDATA)>
<!ELEMENT Publisher (#PCDATA)>
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>July, 1998</Date>
<ISBN>94303-12021-43892</ISBN>
<Publisher>McMillin Publishing</Publisher>
#PCDATAbezeichnet
unstrukturierterText ohnereservierte
Symbole wie „<“.
59
Vordefinierte Datentypen Zur Festlegung von Element-Inhalten stehen drei
vordefinierte Datentypen (engl. built-in datatypes) zur Verfügung:
• #PCDATA: unstrukturierter Text ohne reservierte Symbole.
• EMPTY: Der Element-Inhalt ist leer. Das Element kann allerdings Attribute haben.
<!ELEMENT br EMPTY> <br/>
• ANY: beliebige XML-Strukturen <!ELEMENT title ANY>
Beachte: Gängige Datentypen, wie INTEGER oder FLOAT stehen nicht zur Verfügung.
60
Deklaration von Attributen
<!ATTLIST BookStore version CDATA #IMPLIED "1.0">
Das Element BookStore hat ein Attribut version.
Außer version hat BookStore keine weiteren Attribute.
Das Attribut version ist vom Typ String (CDATA).
#IMPLIED: Das Attribut ist optional. "1.0" ist der Standard-Wert.
#REQUIRED: Das Attribut ist obligatorisch.
#FIXED: Das Attribut hat immer den gleichen Wert.
61
Deklaration von Attributen (2)
<!ATTLIST Author gender (male | female) "female">
Das Element Author hat ein Attribut gender und keine weiteren Attribute.
Das Attribut hat entweder den Wert male oder female (Aufzählungstyp).
"female" ist der Standard-Wert von gender.
62
Datentypen für Attribute
Neben Strings (CDATA) und Aufzählungstypen stehen im wesentlichen folgende Datentypen zur Verfügung:
NMTOKEN: ein String, der den Namenskonventionen von XML entspricht
NMTOKENS: ein Liste von solchen Namen, jeweils getrennt durch ein Leerzeichen
ID: Bezeichner, der den Namenskonventionen von XML entspricht und innerhalb des Dokumentes eindeutig ist.
IDREF: eine Referenz auf einen eindeutigen Bezeichner
IDREFS: eine Liste von solchen Referenzen
63
ID/IDREF
<!ATTLIST Author
key ID #IMPLIED
keyref IDREF #IMPLIED>
Author kann ein Attribut key haben.
Das Attribut key muss eindeutig sein: Attribute mit dem Typ ID dürfen niemals den gleichen Wert haben.
Author kann auch ein Attribut keyref haben.
Der Wert von keyref muss eine gültige Referenz darstellen: Der Wert von keyref muss als Wert eines Attributes mit dem Typ ID existieren.
64
ID/IDREF (2)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE BookStore SYSTEM "BookstoreWithAttributes.dtd">
<BookStore>
<Book>
<Title>Text</Title>
<Author key="k1">Text</Author>
<Date>Text</Date>
<Publisher>Text</Publisher>
</Book>
<Book>
<Title>Text</Title>
<Author keyref="k1"/>
<Date>Text</Date>
<Publisher>Text</Publisher>
</Book>
</BookStore>
k1 muss eindeutig sein:weder dieses noch ein
anderes Attribut mitdem Typ ID darf k1 als
Wert haben.
k1 muss existieren: einAttribut mit dem Typ ID
muss den Wert k1haben.
65
Well-formed and valid
Ein XML-Dokument heißt wohlgeformt, wenn es den syntaktischen Regeln des entsprechenden W3CStandards entspricht.
Ein XML-Dokument heißt zulässig (engl. valid). bzgl. einer Dokument-Typ-Definition, wenn
• 1. das Wurzel-Element des XML-Dokumentes in der DTD deklariert ist und
• 2. das Wurzel-Element genau die Struktur hat, wie sie in der DTD festgelegt ist.
66
Nachteil von DTDs
Reihenfolge von Kind-Elementen ist festgelegt:
<!ELEMENT Book (Title, Author)>
dadurch sehr starre Struktur in XML-Dokumenten
Um Reihenfolgeunabhängigkeit zu garantieren, müssen alle Permutationen explizit aufgezählt werden:
<!ELEMENT Book ((Title, Length) | (Length, Title))>
Für n Element gibt es n! verschiedene Permutationen.
67
Noch mehr Nachteile
keine XML-Syntax, daher eigene Parser nötig
kaum vordefinierte Datentypen, insbesondere für Element-Inhalte
keine eigenen Datentypen definierbar
keine Namensräume: Bereits existierende DTDs können nur dann kombiniert werden, wenn es keine Namenskonflikte gibt!
keine Vererbungshierarchien, nicht objekt-orientiert
68
Elements Summary Overview
Tag definitions– <!ELEMENT to - - (#PCDATA)>
• PCDATA = Parsed Character Data
• Comments– <! - - comment text - - >
Tag order– Fixed order
• <!ELEMENT head - - (to , from)>
INVALID:
– Any order• <!ELEMENT head - - (to & from)>
VALID:
<head><from>S. Ender</from><to>R. Eceiver</to></head><head><from>S. Ender</from><to>R. Eceiver</to></head>
<head><from>S. Ender</from><to>R. Eceiver</to></head><head><from>S. Ender</from><to>R. Eceiver</to></head>
69
(cont.)
Tag occurrence
– One occurrence• <!ELEMENT head - - (to)>
– One or more occurrences• <!ELEMENT head - - (to+)>
– Zero or one occurrence• <!ELEMENT head - - (to?)>
– Zero or more occurrences• <!ELEMENT head - - (to*)>
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
5. XML Schemas
71
XML Schemas
Ähnlich wie eine DTD legt ein XML-Schema (engl. XML schema) den prinzipiellen Aufbau von XML-Dokumenten eines bestimmten Typs fest.
DTDs wurden zur Beschreibung von strukturierten (für Menschen lesbare) Text-Dokumenten entwickelt.
Für die Beschreibung von Dokumenten/Daten zum Austausch zwischen Computern sind sie allerdings zu ausdrucksschwach.
Deshalb wurden XML-Schemata entwickelt.
72
Vorteile
Ähnlich wie in vielen Programmiersprachen, steht eine Vielzahl von vordefinierten Datentypen zur Verfügung.
Es können auch eigene Datentypen definiert werden.
keine eigene Syntax, sondern XML-Schema sind selbst XML-Dokumente
objekt-orientiert, erlauben Vererbungshierarchien
verwenden Namensräume
Reihenfolgeunabhängige Strukturen können einfach definiert werden.
73
Schema
DTD
Datentypen<location>
<latitude>32.904237</latitude>
<longitude>73.620290</longitude>
<uncertainty units="meters">2</uncertainty>
</location>
Eine Ortsangabe besteht aus dem Breitengrad, dem Längengrad und einem Maß für die Unsicherheit der beiden Angaben.
Ein Breitengrad ist eine Dezimalzahl zwischen -90 und +90.
Ein Längengrad ist eine Dezimalzahl zwischen -180 und +180.
Das Maß für die Unsicherheit ist eine nicht-negative Zahl.
Die Unsicherheit wird entweder in Meter oder in Fuß angeben.
74
Unser DTD als Schema<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.books.org"
xmlns="http://www.books.org"
elementFormDefault="qualified">
<xsd:element name="BookStore">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Book" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Title" type="xsd:string"/>
<xsd:element name="Author" type="xsd:string"/>
<xsd:element name="Date" type="xsd:string"/>
<xsd:element name="ISBN" type="xsd:string"/>
<xsd:element name="Publisher" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
75
Einige Elementtypen
Datentyp Beispiel Erläuterung
string einfacher Text
boolean true,false,1,0
decimal 1,23 12345,66678
float -1E4 32 bit
double -1E4 64 bit
duration P1Y2M3DT10H30M 1 Jahr 2Monate 3 Tage
dateTime 2004-03-22T13:22:22 in date und time separierbar
hexBinary 0FB7 (0-9A-FA-F)
base64Binary xFhbhg/bsEg= Base64 kodierte Dateien
anyURI http://www.phaseforward.com
QName x:element1 anyURI:NCName
ID, IDREF NCName nur als datentyp von Attributen
76
DTD <-> Schema
Jede DTD kann in ein äquivalentes XML-Schema übersetzt werden.
Tools wie XML Spy bieten eine entsprechende Funktionalität an.
Umgekehrt gibt es allerdings XML-Schemata, für die es keine äquivalente DTD gibt.
XML-Schemata sind also ausdrucksmächtiger als DTDs
77
XML Schema sind selbst XML Dokumente
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.books.org"
xmlns="http://www.books.org"
elementFormDefault="qualified">
….
</xsd:schema>
Ein XML-Schema ist ein XML-Dokument.
Wurzel-Element eines Schemas ist immer „schema“ aus dem W3C-Namensraum „XMLSchema“.
Letzteres wird auch Schema der Schemata genannt.
78
ELEMENT <!ELEMENT BookStore BookStore (Book+)>
<xsd:element name="BookStore">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Book" type="BookType„ minOccurs="1„ maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Standard-Werte für minOccurs und maxOccurs jeweils 1 minOccurs="1" kann also weggelassen werden!
79
<!ELEMENT Book (Title, Author+, Date, ISBN?, Publisher)>
<xsd:complexType name="BookType">
<xsd:sequence>
<xsd:element name="Title" type="xsd:string"/>
<xsd:element name="Author" type="xsd:string" maxOccurs="unbounded" />
<xsd:element name="Date" type="xsd:string"/>
<xsd:element name="ISBN" type="xsd:string" minOccurs="0" />
<xsd:element name="Publisher" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
80
Komplettes Schema<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.books.org"
xmlns="http://www.books.org"
elementFormDefault="qualified">
<xsd:element name="BookStore">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Book" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Title" type="xsd:string"/>
<xsd:element name="Author" type="xsd:string"/>
<xsd:element name="Date" type="xsd:string"/>
<xsd:element name="ISBN" type="xsd:string"/>
<xsd:element name="Publisher" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
81
Software zum Validieren
xerces by Apache (API)
• http://www.apache.org/xerces-j/index.html
MSXML (API)
• http://www.microsoft.com
XML Spy (GUI)
• http://www.xmlspy.com
http://www.w3.org/XML/Schema#Tools
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
6. XSL(T)eXtensible Stylesheet Language
83
XSL
Regeln zum Umwandeln von Dokumenten
nicht nur nach HTML
84
Transformation
DTDSchemas
XML fileXSL(T)
Correct(well-formed)
file
(X)HTMLSVG
XMLXLS
TXT ..…
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
7. SVGScalable Vector Graphics
87
SVG
sind Graphiken, die auf XML beruhen
spezielle Anzeigeprogramme notwendig (Browser Plugin, Adobe SVG Viewer)
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
8. Path, XQuery
89
XPath
XPath (XML Path Language) ist eine Abfrage-Sprache, um Teile eines XML-Dokumentes zu adressieren. XPath 1.0 bildet die Grundlage der in XSLT verwendeten Patterns sowie von XPointer. Die nächste Version, XPath 2.0, wird zudem die Grundlage der XML-Abfragesprache XQL / XQuery / XML Query bilden.
Anatomie von XPath
XPath betrachtet, ähnlich dem DOM, XML-Dokumente als Baumstruktur mit Knoten. XPath unterscheidet dabei folgende Knotentypen:
• Dokumentwurzel
• Elemente
• Attribute
• Text
• Namespaces
• Processing Instructions
• Kommentare
Die Syntax ist angelehnt an die der UNIX-Pfadangaben.
90
XPath Beispiel
Beispiel
<dok>
<!-- ein XML-Dokument -->
<kap title="Nettes Dokument"> <pa>Ein Absatz</pa><pa>Noch ein Absatz</pa><pa>Und noch Absatz</pa> <pa>Nett, oder?</pa>
</kap> <kap title="Zweites Kapitel"> <pa>Ein Absatz</pa>
</kap>
</dok>
Beispiele für XPath-Ausdrücke:
//pa selektiert alle pa-Elemente auf allen Ebenen
kap[@title="Nettes Dokument"]/pa selektiert alle Absätze des Kapitels "Nettes Dokument".
91
siehe CD
XPath dokumentation
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
9. Technische AnwendungenWebservices, Data Interchange, etc
93
XML ist der Standard für alles
Webservices
Programmierumgebungen
Data Interchange
Systeme
94
Webservices
WebStandard • a standardized interface, API
based on XML• independent, readable
several security issues fixed in the meantime
new thinking of ‚firewalling‘• it‘s now content oriented
95
Programmierumgebung
.NET
• Visual Studio .NET
Java
• Eclipse
N(Ant) Build tool für beide Welten
96
Data Interchange
B2B
E2B
CDISC
eCTD
97
Systeme
MathML
ChemML
Phase Forwards MedML
Open Office
98
MathML
99
Bsp: SyncML
100
Handys, Smartphones und PDAs mit integrierter SyncML-Unterstützung Modell Anbieter Gerätetyp Verfügbarkeit
Alcatel: ot715
Motoroal: A830, A835, V600, E390
Nokia 7250, 6800, 3650, 6220, 9210i, 7650
Samsung SGH-D700
Siemens: S55, SL55, M55, SX1
Sony Ericsson: T68i, T610, P800, Z1010, PEG-NZ90
PDAs: Sony PEG-NX70V, PEG-T675C,PEG-T625C
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
10. E2B
102
Welcome to the Electronic World for the Safety of the Circle of Life!
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
10.1. ICH and where the name ‘E2B’ came from
104
E2B – who is working with these standards?
Who is currently working with E2B?
Who knows the content of E2B?
105
Introduction to ICH and Efficacy Topics
ICH
International Conference on Harmonisation
of Technical Requirements for Registration of Pharmaceuticals
for Human Use
106
Introduction to ICH and Efficacy Topics
ICH Representation
• Countries and Regions directly represented at ICH are the European Union, the USA and Japan.
• Canada, other European countries and other countries are represented by observers:
• Canada: Drugs Directorate, Health Canada
• Other European countries European Free Trade Area (EFTA) , represented at the ICH by Switzerland
• Other countries: World Health Organisation (WHO)
Additional Observer: International Federation of Pharmaceutical Manufacturers Association
107
Introduction to ICH and Efficacy Topics
ICH Structure
Three areas (European Union, USA, Japan)
- Regulatory Bodies for each area
- Research-base Industry for each area
Six Parties:Food & Drug Administration
European Commission
Japanese Pharmaceutical Manufacturers Association
Pharmaceutical Research
and Manufacturers of America
European Federation of Pharmaceutical Industries and Associations
Ministry of Health, Labor and Welfare,
Japan
USA
EU
Japan
108
Introduction to ICH and Efficacy Topics
ICH Process
Step 1: Consensus building
Step 2: Start of Regulatory Action
Step 3: Regulatory Consultation
Step 4: Adoption of a Tripartite Harmonised Text
Step 5: Implementation
109
The ICH website
110
Where the name comes from ...
Structure of the ICH documents:
Q – Quality topics“Those relating to chemical and pharmaceutical Quality
Assurance“
S – Safety topics“Those relating to in vitro and in vivo pre-clinical studies“
E – Efficacy topics“Those relating to clinical studies in human subject“,
M – Multidisciplinary Topics“Topics covering more than one area“
111
Where the name comes from ... (cont.)
all EFFICACY documents
E1 – Exposure
E2 – Clinical Safety
E3 – Study reports
E4 – Dose response
E5 – Ethnic factors
E6 – GCP
E7 – Special population
E8, 9, 10 – Clinical Trial Design
E11 – Pediatrics
E12 – Therapeutics Categories
112
Where the name comes from ... (cont.)
all CLINICAL SAFETY documents
E2A – Definitions and Standards for Expedited Reporting
E2B – Data Elements for Transmission of ADR reports
E2B(M) – Data Elements for Transmission of Individual Case Safety Reports
• Maintenance, last release February 2001
E2C – Periodic Safety Update Reports
113
Where the name comes from ... (cont.)
Multidisciplinary Topics
M1 – Medical Terminology (MedDRA)
M2 – Electronic Standards for Transmission of Regulatory Information (ESTRI)
M3 – Timing of Pre-clinical Studies in Relation to Clinical Trials
M4 – The Common Technical Document
114
E2B – E2B(M)
E2B contains the definition of all required data elements
The first definition was found to be successful and only minor revisions have been made
• (language problems, MedDRA versioning)
Pilot studies demonstrated succesful usage
M2 is another part of the ICH documents ...
115
M2 Electronic transmission of ICSR Message Specification
M2 is part of ‘Multi-disciplinary Topics’ in the ICH structure
It is the technical description of E2B(M) and defines the interchange format
It exists in three different versions• 1.0
• 2.0
• 2.1 -> last release Nov. 2000, final version Feb. 2001
From the technical point of view: ICH ICSR DTD Version 2.1
116
Interchange with E2B
... is an accepted international standard for interchange the ICSR (Individual Case Safety Report) data between the interested groups
Interchange between
• from authority to authority
• pharmaceutical industry to authorities
• inside a company
• B2B between companies (ex. pharmaceutical with CRO‘s)
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
10.2. The content of an ICSR document
118
The ICH-E2B/M2 Message StructureM ICH ICSR Message Header
A Information on the Safety Report
A.1 Identification of the Case Safety Report
A.2 Primary Source of Information
A.3.1 Sender
A.3.2 Receiver
B Information on the Case
B.1 Patient Characteristics
B.2 Reaction(s) / Event(s)
B.3 Results of Tests and Procedures
B.4 Drug(s) Information
B.5 Narrative Case Summary and Other Information
119
E2B – example chapter A.1
IDENTIFICATION OF CASE RECORDS
Total of 14 different fields
Examples:
Unique Sender-ID, country information (primary source & where the case came up), date, report type, seriousness, linked with another case
120
E2B – example field A.1.4
TYPE OF REPORT
Spontaneous report
Report from study
Other (... unclear from the literature report ...)
Not available to sender (unknown)
121
Repeating records
In every Safety Report some repeating items are possible:
• example:
– Events (B.2)
– Drugs (B.4)
– Primary source (A.2)
122
safetyreport area
(partial)
123
patient area (partial)
124
reaction area
125
drug area (partial)
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
10.3. E2B(M) and M2 – description of the model
127
The technical description
Is found in the M2 document
Description of the:
• Safety report
• Acknowlegdement message
Contains a description: How to write an ICSR
128
problem?
129
language attribute and the DCL problem
5 characteristics fonts
Defined in an DCL file
• Declaration language file, which is SGML based
MLHW – two files for E2B submission
130
MedDRA
Is a basic requirement for E2B
Which version?
How is it implemented in an E2B file?
LLT or PT?
at the Notes for GuidanceReg. Elec. Transmission of ICSR in Pharmacovigilance
131
Problem: Unitlists (codelists)
Defined in the E2B(M) document
• Mass, Volume, Radioactivity, Other
• Interval list
• Route of Administration list
132
Problem: More than one transmission of an ICSR
Unique Case ID
Solution in Appendix of the E2B(M) document
133
The steps which need the most time:
Map your data !
(Data, Codelists)
134
Clintrace E2b Data Mapping Principles
Mapping philosophy
• Mapping provides increased flexibility (particularly in view of the upcoming E2b changes)
• Mapping allows for addressing field length issues
– Note:The standard schema of your drug safety system includes many more data items than the E2b definition
135
Clintrace E2b Data Mapping Principles (cont.)
Dependent on the type of item several mapping methods can be distinguished
• Direct Data directly transferable. This can be:Numerical (numbers; dates)Textual (any form of alpha-numerical data)
• Numeric Data transferable after numeric conversionExample: Weight quantity + unit Weight (in
KG)
• Text Data transferable after text conversionExample: Truncation or Concatenation
• Coded Data transferable after a code conversionExample: Sex
• Fixed Standard codes such as date formatsExample: Code 102 for complete dates
136
E2b Mapping – Example: Case ID format
An automatic Case ID Generation algorithm can be implemented in order to create Case IDs directly compliant with E2b recommendations.
Alternatively the mapping can be used to transmit the compliant Case ID.
Case ID format for E2b is: country code (2 digits)-company or regulatory name-report number
137
E2B Mapping: Codelists
Example: Reporter’s Qualification
1=Physician2=Pharmacist3=Other Health Professional4=Lawyer5=Consumer or Other Non Health Professional
Reporter's Type Code Value LabelHP Health Professional Health ProfessionalSI Study Investigator Study InvestigatorCN Consumer ConsumerUNK Unknown Unknown
Reporter's Occupation Code Value LabelPHARMACIST Pharmacist PharmacistLAWYER Lawyer Lawyer
Allowed Combinations Map to E2B code:HP/nn 1 (Physician)HP/empty 1 (Physician)SI/nn 1 (Physician)SI/empty 1 (Physician)HP/PHARMACIST 2 (Pharmacist)HP/OTHER_HP 3 (Other Health Professional)CN/LAWYER 4 (Lawyer)CN/OTHER_NON_HP 5 (Consumer or Other Non HP)CN/empty 5 (Consumer or Other Non HP)
Implementation proposal
138
After sending an ICSR you ...
will receive an ACKNOWLEDGMENT Message:
• 1. The acknowledgement of receipt
• 2. The verification of correctness of the ICSR
Also sent out as XML (SGML)
139
The acknowledgement message
Split into
• ICHICSR MESSAGE HEADER
• ACKNOWLEDGMENT
– Message acknowledgment (once)
– Report acknowledgment (once, many, none)
140
Requirements for your Tracking System you must be able to react to error conditions quickly
• transport/communication error
– server down
– broken line
should support more than the standard messages
• should include also messages for non-transmission, other possible errors, etc.
141
What does an ICSR file look like?
XML Notepad
The model
XML in IE
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
10.4. The Electronic Journey
143
DTDICH E2B
SGML Parser
DatabaseDefinition
DTDICH E2B
DatabaseDefinition
<ichicsr lang="en"> <safetyreport> <safetyreportid>KAL96.00180</safetyreportid> <primarysourcecountry>US</primarysourcecountry> <transmissiondateformat>102</transmissiondateformat> <transmissiondate>20011119</transmissiondate> <reporttype>2</reporttype> <serious>1</serious> <seriousnessdeath>2</seriousnessdeath> <seriousnesslifethreatening>2</seriousnesslifethreatening> <seriousnesshospitalization>1</seriousnesshospitalization> <seriousnessdisabling>2</seriousnessdisabling> <seriousnessother>2</seriousnessother> <receivedateformat>102</receivedateformat> <receivedate>19960318</receivedate> <receiptdateformat>102</receiptdateformat> <companynumb>KAL96.00180</companynumb> <primarysource> <reportertitle>Dr.</reportertitle> <reportergivename>Glenn</reportergivename> <reportermiddlename>H.</reportermiddlename> <reporterfamilyname>Wilson</reporterfamilyname> <reportercity>Mossville</reportercity> <reporterstate>TX</reporterstate> <reporterpostcode>71100</reporterpostcode> <reportercountry>US</reportercountry> .....
</ichicsr>
Application
Software
Sender DB GatewayEncryption
GatewayDecryption
The Electronic Journey of your ICSR
SGML Parser
GatewayAck
ICSRACKMessage
Receiver DB
Application
Software
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
10.5. Links
145
Useful links
http://www.ich.orghttp://www.ifpma.org/http://www.fda.gov/cder/http://www.eudravigilance.org/http:// www.fda.gov/cder/m2/http:// www.bfarm.dehttp:// www.meddramsso.com
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
11. CDISC
147
Clinical Data Interchange Standards Consortium
CDISC is an open, multidisciplinary, non-profit organization committed to the development of worldwide industry standards
to support the electronic acquisition, exchange, submission and archiving of clinical trials data and metadata for medical
and biopharmaceutical product development.
The CDISC mission is to lead the development of global, vendor-neutral, platform-independent standards to improve data
quality and accelerate product development in our industry.
148
The Current State of Data Transfer
Lab
SubmissionData
Submission
MedicalRecords
CCRF
OperationalData
CRO
OperationalData
Pharma
149
Current State: Costly and Time-consuming
Pharma 1
Pharma 2
Pharma N
...
CRO 1
CRO 2
CRO M
...
Investigator 1Investigator 2
Investigator K
...
Lab 1
Lab 2
Lab L
...
Regulatory
150
Standards to Enable Seamless Data Flow ….from Patient to Reviewers
OperationalDatabase
SDSADaM
ODMLAB
eSubmission forRegulatory Review
CRO EDC
LAB
ECG
151
Benefits of Standardizationin our Industry
Increase efficiencies of performing clinical trials
Improve link between healthcare delivery and clinical research/trials
Facilitate ‘business’ processes among investigators, biopharmaceutical companies, CROs, technology vendors, clinical laboratories
Provide means for long-term archive of electronic clinical trial data and digital signatures
Facilitate reviews of regulatory submissions
152
Data Sources• Site CRFs
•Laboratories •Contract Research
Organizations•Development
Partners•Discovery Data
OperationalDatabase
•Study Data•Audit Trail•Metadata
ODM
Submission Data•CRT Datasets
•Analysis datasets•Metadata
SDM
CDISC models
ODM = Operational Data ModelSDM = Submissions Data Model
Pharma Industry
Regulatory
ArchivalDatabase
OD
M
153
CDISC
Sponsors and Members join on a corporate basis (>120 Companies, including all of the top 20 global pharmaceutical companies)
54 members of the Industry Advisory Board, one from each Corporate Sponsor
9 Board members (from AstraZeneca, Aventis, Lilly, Merck, First Consulting Group, SAS, Sanofi-Synthelabo, PAREXEL Europe, and an HL7 liaison)
Active groups (~ 40 companies each) in Europe and Japan; group initiated in India
3.75 FTE Operations staff
4 Team facilitators (~3 days/month)
154
Just one section: ODM<xs:element name="ODM">
<xs:complexType>
<xs:sequence>
<xs:element ref="Study" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="AdminData" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="ReferenceData" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="ClinicalData" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="Association" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="ds:Signature" minOccurs="0" maxOccurs="unbounded"/>
<xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="Description" type="text"/>
<xs:attribute name="FileType" type="FileType" use="required"/>
<xs:attribute name="Granularity" type="Granularity"/>
<xs:attribute name="Archival" type="YesOnly"/>
<xs:attribute name="FileOID" type="oid" use="required"/>
<xs:attribute name="CreationDateTime" type="datetime" use="required"/>
<xs:attribute name="PriorFileOID" type="oidref"/>
<xs:attribute name="AsOfDateTime" type="datetime"/>
<xs:attribute name="ODMVersion" type="text"/>
<xs:attribute name="Originator" type="text"/>
<xs:attribute name="SourceSystem" type="text"/>
<xs:attribute name="SourceSystemVersion" type="text"/>
<xs:attribute name="Id" type="xs:ID"/>
<xs:anyAttribute namespace="##other"/>
</xs:complexType>
<xs:unique name="UC-O-1">
<xs:selector xpath="Study"/>
<xs:field xpath="@OID"/>
</xs:unique>
</xs:element>
155
Major sections of the model
156
Study
157
Metadata are described in the structre itself!
158
Clinical data
159
Subject data
use XMP Spy
160
Links
www.cdisc.org
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
12. HL7
© 2003 Phase Forward Incorporated. All rights reserved. Proprietary and Confidential.
13. Ist es Zukunft oder heute schon möglich?
163
If you have a serious adverse event during a study ...
CIS
CRO / Investigator
runs a study (EDC or paper)
Sponsor
B2B server
ClinTrial
ClinTrace
Authority
(FDA,
EMEA)
Study coordinator
Drug safetyyes, SAE
SAE
eMail, SMS
Acknowlegdement
Documentation
164
The (possible) use of XML extracts
CRO / Investigator SponsorStudy coordinator
Drug safety
CDISCClinTrial
WML
MedML
MedML
E2B
B2B server
ClinTrace
CIS
Authority
(FDA,
EMEA)
Documentation
165
Links
www.w3c.org
www.xml.com
xml.apache.org
… eine Suche bei Google mit den drei Buchstaben ergab
ca. 36.400.000 Treffer (und das in Sekunden) ….
166
Questions and Answers
167
THANK YOU!
Joerg DillertSenior Consultant
Doebelner Str. 4
12627 Berlin
Tel.: +49 - (0) 30 99 28 29 11
Fax: +49 - (0) 30 99 28 29 12
mobile: +49 – (0) 170 448 04 45
jdillert@phaseforward.com
Nothing beats experience.