Thomas Lengauer, Dietmar Schomburg, Michael S. Waterman ... · Molecular Graphics Delegate Analysis...

Thomas Lengauer, Dietmar Schomburg,Michael S. Waterman (editors):

Molecular Bioinformatics

Dagstuhl-Seminar-Report; 4607.09.-11.09.92 (9237)

ISSN 0940-1121

Copyright © 1992 by IBFI GmbH, Schloß Dagstuhl, W-6648 Wadern, GermanyTel.: +49-6871 - 2458

Fax: +49-6871 - 5942

Das Internationale Begegnungs- und Forschungszentrum für lnformatik (IBFI) ist eine gemein-nützige GmbH. Sie veranstaltet regelmäßig wissenschaftliche Seminare, welche nach Antragder Tagungsleiter und Begutachtung durch das wissenschaftliche Direktorium mit persönlicheingeladenen Gästen durchgeführt werden.

Verantwortlich für das Programm:Prof. Dr.-Ing. Jose Encamagao,Prof. Dr. Winfried Görke,Prof. Dr. Theo Härder,Dr. Michael Laska,Prof. Dr. Thomas Lengauer,Prof. Ph. D. Walter Tichy,Prof. Dr. Reinhard Wilhelm (wissenschaftlicher Direktor)

Gesellschafter: Universität des Saarlandes,Universität Kaiserslautern,Universität Karlsruhe,Gesellschaft für Informatik e.V., Bonn

Träger: Die Bundesländer Saarland und Rheinland-Pfalz

Bezugsadresse: Geschäftsstelle Schloß DagstuhlInformatik, Bau 36Universität des SaarlandesW - 6600 Saarbrücken

GermanyTel.: +49 -681 - 302 � 4396Fax: +49 -681 - 302 - 4397

e-mail: [email protected]�sb.de

Molecular Bioinformatics

Thomas Le11gauer, Dietlnar Schomburg. 1\=Ii(:l1ael VVato1'111an (editors)

Dagstuhl-~Semi11ar-�Report, S(��pt.emb<-�I 7 - 11 1992 (9237)

Workshop on Molecular Bioinformatics

Organizers:Thomas Lengauer (GMD, Schloss Birlinghoven/University of Bonn),

Dietmar Schomburg (GBF, Braunschweig),Michael S. Waterman (USC, Los Angeles)

September 7 - 11, 1992

Molecular Bioinformatics is a notion that. one can a.ssign to a.n area. in applied com-puter science which is rapidly gaining significance. Roughly, this area is conc'erneclwith the development of methods and tools for analyzing, nn(lerst.an('ling, reasoningabout and, eventually, designing large biomole('ul<-~s auch as DNA. R NA, and proteinswith t.he aid of the computer.

With the knowledge in molecula.r biology increasing at an explosive rate, anddata on genomes and their products being (.'ollect.e(.l at tremendous speeds. :V'lolecularBioinformatics becomes an important challenge to applied computer scientist.s.

Before this background, the Dagstuhl Seminar on Molecular Bioinformaticsbrought together experts from all over the world t.hat are working on algorit.hn1icissues in this �eld. The Workshop was interdiscplina.ry, with people frorn mole<'.u�lar biology, computer science, and applied 1cnathe1�natics a.ttending. The worksliopfocussed on the following topics:

0 Alignment of biomolecular sequences (DNA, RNA, Proteins),

o Modeling of large biomolecules, including the prediction a11d analysis of secon-dary and higher-level structure as well as spatial conformations (folding),

0 Molecular dynamics and simulations of interactions between biomolecules,

o Interpreting nucleotide sequences and their role in gene regulation,

0 Reading genomic sequences.

Besides the presentations, there were two organized evening discussion sessionson the topics PAM matrices and Computer-Aided Drug Design.

The experiment of bringing together researchers with wide] y varying bacl~:g'roun(lsto discuss an exciting new interdisciplinary field was successful. The attendees dis-cussed lively and often controversially, developed a sense of identity for the new fieldduring the workshop, and went back home with new insights, problems and ideas.For the German researchers, the workshop was an ideal preparation for formingcooperations within the new funding program Molecular Bioinformatics

that had just been announced by the BM FT (German Ministery for Research andTechnology). A few participants evaluated the workshop as their �most productiveworkshop experience�.

We are especially grateful to the Dagstuhl office and team for their excellentorganization in preparing and conducting the workshop, as well as their always

�Z

enga.ge(l and personal support.» in all matters. The cordial a.t.rnosphe~re in the .S'(fhIol$was an essential ingredient of the Sll(�('(_'5s of this workshop. We are also grateful toNSF for providing a grant for �nancial support. of int..ercontinent.al travel of the (7.8.based participants of this workshop.

Program

Lengauer

Morning Session

Apostolico

hdanber

Chang Giegerich

Naor Afternoon Session

Waterman Vingron Blum hdanber

Morning Session

Karp

Shamir Chang

Gonnet

Warn ow

Afternoon Session

Schomburg

Sander Selbig

Monday, September 7Welcome and Introductory Remarks

Sequence AlignmentChair: Ga.d Landau

Structuring Sequence Data Banks for lnstantant-éous Pa.ra.ll<-ISearch

Aapproxinialeie String Ma.t<�liir1g: Applications.»\ppro.\'i1na.l'e Mal cliing with (i'(msta.nt liraclion l:Crrorlinaihedcling Sequence Analysis in a Functional ProgrammingEnviron ment

Representing Suboptimal AligmentsChair: Gaston Gonnet

Parametric Sequence AlignmentInterpreta.tion of Pa1'a.1net1'ic Alignment PlotsOn Locally Optimal Alignments in Genetic SequencesApproximate String Matching: Algorithms

Tuesday, September 8

Physical MappingCl1a.ir: Gene Lawler

The Physica.l Mapping ProblemPhysical Mapping of DNA and lnterval G1°aplisPhysical Mapping in Practice

Sequences and Trees

Towards the Best Possible Theory a.nd Algorithms for Se-quence AlignmentNew problems in l_'3volut.ionary Tree (Tonstruction

Secondary and Tertiary StructureChair: Tim Havel

Computer-Ai(�le<l Design of Proteins With New Properties

Morning Session

Havel CrippenChan

Koch Miyano

Morning Session

I)ress Sankoffvon Haeseler

Gonnet

Afternoon Session

I)ress SchmidtZuker

Taylor Evening SessionHavel Crippen

Morning SessionSchulze-Kremer

SipplKaden

Schneider

Wednesday, September 9

Tertiary Structure

()7hair: (.i.'el1ris Sander

Distance (_ileomc�t1'_y and Homology ModelingPrediction of the 'l�e1'tiary Structure of Globular Proteins over l)is--crete (,i01lii()l�lIlEl.il01lil.l SpacesA Lattice l£nuri1c*ration Approach to Protein Folding

Secondary Structure

Graph Theo1°etic.al Description of Sheet TopologiesMachine Discovery by Decision 'l�1'ees over Regular Pat1.crns

Thursday, September 10Sequences �

Chair: Udi l\�lanber

ls Darwinism a Falsiiiable Tlieory�? Methods in Sequeiice .='\nalysisClimbing a Tree Tlirough the WindowReconstructing Phylogenetic TreesText Searching Algorithms in Darwin

Chair: Temple Smith

Statistical Geometry in Sequence SpaceSecondary Structure Problems in Coiled�coil ProteinsRNA Secondary Structure Modeling

Molecular Graphics

Delegate Analysis from a Macromolecular Graphics l1rt.e1�fa.ce: Ex�amples from Sequence AnalysisProblem Areas for Interaction Between Biochemistry and Compu-ter Science

N MR Spectroscopy for Determination of Protein StructureMethods of Drug Design

Friday, September 11Tertiary Structure

Chair: Gordon Crippen

Applications of Arti�cial Intelligence and Machine Learning to Pro-tein Structure AnalysisHide and Seek on a PolyproteinDouble-Point Chains in Proteins and Inverse Protein FoldingSequence-Structure Relationships in the Twighlight Zone

Structuring Sequence Data Banks for InstantaneousParallel Searching

Alberto Apostolico, University of Padua

Current molecular sequence data. ba.nks consist mainly of raw sequences with someannotation. While such a basic information will hardly be forfait.ed ever in thefuture, auxiliary data structures are being gradually introduced and studied whichfacilitate various kinds of searches and ('omparisons. For some such manipulations.serial computation is inadequate, so that efficient parallel methods are sought. Wepresent a data structure that supports constant-time implementation on a CRCVVPRAM of exact searches for arbitrary patterns into arbitrary substrings of a sequencedata. bank.

On the Accurate Notion of Locally Optimal Alignmentsand Subalignments in Genetic Sequences

Norbert Blum, University of Bonn

We review old and new results with respect to locally optimal alignments and sub-alignments in genetic sequences. The main theorem is that the subgraph of the edit.graph, containing exactly the locally optimal subalignments, can be computed in('9(nmlog(_n + m)) time, in the case that the underlying cost functions are concave.

A Lattice Enumeration Approach to Protein Folding

Hue Sun Chan, University of California at San Francisco (UCSF)

To study the protein folding problem, we use exhaustive sequence and conforma-tional enumerations to study copolymer chains configured on lattices. These modelmolecules are short self-avoiding chains of hydrophobic (H) and polar (P) mono-mers. This simple model shows that under folding conditions, a signi�cant fractionof H / P copolymers exhibit protein-like behavior such as high compactness, conside-rable amount of secondary structure, and low degeneracy of the lowest energy state.We also explore the folding kinetics of those H / P copolymers which have uniquenative structures. Under folding conditions, these model protein molecules collapsequickly to an ensemble of relatively compact conformations, and then re-arrangemuch more slowly as they seek their unique native states. Folding time of the modelmolecules is strongly sequence-dependent, because the arrangement of H�s and P�salong the sequence determines the energetic landscape of the chain�s conformatio-nal space. The fastest folding sequences are those whose native structures are mostaccessible and least protected by energy barriers.(This is joint work with Ken A. Dill at UCSF)

Physical Mapping in Practice

William I. Chang, Cold Spring Harbor Laboratory, N.Y. USA

(�lose collaboration l)et.ween biologists (D. lieacli Lab) and inatl1en'1atica.l s(�ic11l;isl.s(T. Marr. Lab) at C.S.ll.L. resnltecl in the lastest, mapping to (late of a.n entiregenoine S. pombc (fission yeast. a model organism for the stucly of the cell <'_\'('le).Contig assembly and error analysis are (lone using a l:)ra1ich-ancl-honncl algorithmthat �nds the globally optimal linear a.i°rz-i.1igm11<~-Iit of anchors, minimizing the mstof inconsistencies in the data (l'alse |)()Sll.l\'(;�.H�, false negati\'es, r(��potitive S(.�(|ll(�ll('(�S).Carrie(l out in conjunction with (�.-\�])f�l'llll(-�Hts. this analysis is used to resolve in-consistencies (l)_v repeating experiments) and to <�lircct Further (*..\'])6�l"illl(-3IllS. llighconfidence in the partially constrnct.<-~(l map inakes possible reclucerl lal)orator_v workand an accelerated rate of progress.

Sublinear Approximate Matching with Constant FractionError

William I. Chang, Cold Spring Harbor Laboratory, N.Y. USA

Pattern matching is a classical prohleni of computer science. and approximate inat-ching of sequences is motiva.te(l by molecular l)iolog_y. (liven a database of size n anda pattern of size in over a b-letter al pl1al:)et, we wish to find all locations in the-. data»base where the pattern occurs with at most Iv cliffeimices (substitutions, insertions,or deletions). There exist constants /lg, such that for k < pbm, k differences inatcliiii ghas average case cornplexity (')(('n./'m)(l.v + log,,m)). This algoritlnn requires spa(_'epolynomial in the. size of the pattern and can be generalized to other (listance aswell as similarit.y measures.

A Contact Potential that Recognizes the Correct Folding ofProteins

V.N. Maiorov and G.M. Crippen, University of Michigan

We have devised a c.ontinons function of interresidue contacts in globula.r proteinssuch that the X-ray crystal structure has a lower function value than that of thou-sands of protein-like alternative conformations. From a training set of 37 proteinsa.nd a total of 10,000 alternatives, the potential satis�es altogether 73 proteins vs.their 530,000 alternatives. In addition, another 95 highly homologous protein crystalstructures are correctly treated. While the potential is intended primarily to select.the native out of a large choice of rather similar or very (lissirnilar conformers. itcan also indicate approximatlely whether the native is one of the choices.

Is Darwinism a Falsifyable Theory?

Andreas Dress, Univ. Bielefeld; H.J. Bandelt, Univ. Hamburg

Though conceive(ol as a. theory. (�lescribing and e.\'plaining what has happened ratherthan predicting what. will happen, tliere is one predirttiou lollowing 110111 Darwin's(as well a.s from lc,amarque�s) theory which can be t(-st<,-(l: the claim that all livingspecies can be grouped in a sensible and consistent way into one big pliylogenetictree. Standard tree reconstruction nietho(ls - suc<'<+-ssl'ul as they inay be - are not [itto properly put. this claini to a test. because all of them presuppose that what tlieyare searching for:actuall.V is a tree structure. llence an alternative to these metho(lsis suggested. One looks lor all decoinpositions .1� = �Ä U B of the ('()lle('ti0n Ä� ofspecies under consideration into two 1ion":-enipty, <lisjoint snbset.s A and B such thatfor any species A, .4� in A and B, B� in B the four spe(�i(-s tree with .4 next 10 .141� and/3 next. to B� is not the least. probable ol� the tliree (non�degenerato) tree s1 rn('tu1'es,one can define on our four species. One then (."d.l.l visualise the resulting laniilyof decompositions, of which - by some abstract con1binal.orial arguments � therecannot be more than g on an n-set X, by a netted diagram which simultaneously1�epI�ese11ts - through systems of pairwise parallel edges - all found deconipositions.For biological data these diagrams turn out to be almost always almost treelil<e- thus corroborating Da1'win�s ideas - while for, say, psychological data. on coloursimilarity, the famous colour circle will be reprocluced.

Statistical Geometry in Sequence Space

Andreas Dress, Univ. Bielefeld with Manfred Eigen and Ruthild WinklerOswatitsch, MPI Göttingen

A statisti(�a.l method of coniparative sequeiice analysis that combines horizontal andvertical correlations among aligned seqiiences can be basecl on the analysis Inainly ofquartet coinbinations of sequences. consi(lere(l as geoinetric four-point configurationsin sequence space. Numerical invariants related to relati\'e internal segment. lengthare assigned to each such confi gu ration and statistical averages of tllese invariants areestablished. They can be used for internal calibration of the topology of divergenceand for quantitative determination of the noise level. (lomparison with computersin'1ulations reveals the high sensitivity of assignment of basic topologies even ifmuch randoini7,ed. ln addition, these procedures can be cliecked by vertical analysisof the aligned sequences to allow the stuccly of (li\.~'e1'gencies with positional] y varyingsubstitution prol)abilities.

Embedding Sequence Analysis in theFunctional Programming Paradigm

Robert Giegerich, University of Bielefeld

Colnpositiona]ity a.11d e.\'tensibility of analysis algoritlnns a.re important. for closeinvestigations of complex se(.'on(la.ry structures in biosequences. Since these two pro-perties belong t.o the main virtues of futictional programs. a (�a.sc study was per-formed to eva.luate the viability of embedding scqumicc analysis in the finictionalparadigm. Its first part shows how an advanc.ed pattern nlatching language can beimplemented in a concise, transparent, and extensible way. lts second part reportson an in1plen1enta.tion of la.zy position trees, including cflicie.1i<:_y measurenients per-formed with three current func.tiona.l language s.vstems.

A Formal Method for the Evaluation and Comparisonof a Class of Aligning Algorithms

Gaston H. Gonnet, ETH Zürich

Two a.spects are considered to be the essential measures of an alignment algorithm:(a.) How well does it discriminate between homologous sequences and random se-quences and (b) When aligning homologous sequences, how many errors (misalignedpositions) it will make. The methodology used for comparing algorithms and theirassociated scoring matrices, is based on simulating evolution, creating new S(:?(]u(;31'l('�(�.Sfrom a given one and then aligning the evolved sequence aga.inst the original one.Since we know the results of the evolution, it is directly measurable how inany errorswere done in the alignment. Since we can obviously produce 1'&t]1(.lO11'l sequences, wecan also test the discrimination of the algorithm. A couple of observations makethe simulation of evolution possible. Since we restrict ourselves to the class of al-gorithms which are based on dynamic programming, we can simulate evolution asa Ma.rkovian process. The DP algorithms will ignore any relation between aminoacids when they are compared, and will assign a cost which is constant and dependsonly on these amino acids. Something similar happens for insertions/ deletions. Themathematical exact evaluation of the results is only possible for dynamic. progra1n~ming algorithms without deletions. For the complete algorithms we have to contentourselves with Monte Carlo simulation results. This is work in progress;

The Darwin System

Gaston H. Gonnet, ETH Zürich

Darwin is an interactive and programmable system for doing computations in mo-lecular biology. Darwin is a descendent of the Maple system for doing computeralgebra and shares its syntax and various design philosophies. Darwin is particu-larly strong in text handling and in sequence comparison. Darwin uses Pat indices

as an underlying structure (Pat. in(li('es are Patricia tree impleinentat ions of suffixtrees) for doing: all against all sequence aligment of a database in (')( ;�\"""). J < �.2: onea.gainst a.ll sequence alignment in (")(.«\"").a < l; loiige-st repetition searcliingz mostfrequent k-gra.1ns and, of course, exact matches in 6(lo_q.\'). 1)arwin a.lso has appro-xima.te text searching (Levenshteinis distance) as a primitive. This is llI�l|)l(:'lll(?lI.1»(�(las a partially defiiied DFA. In this way the algoritlnn adapts to the s<-.-a.rc|1i11g stringand in practice is remarkable fast. Tlie systeni is i111ple111e11ted by a kernel i11 (T andlibra.ries (some contributed by users) in Darwin itself. l11 1'1u1s o11 various Uiiix work-stations and is distributed by e-mail at no cost. In total Darwin hab" more tlu.-1'1 150functions and commands, so this is clearly a very pa.rtial description of the system.We have been doing all are computations since 1990 in this system excl11sivel_y.

Reconstruction of Phylogenetic Trees

Arndt v. Haeseler, Univ. Munich

For a set X = {$1, . . . , .,�5'.,,} of n aligned sequences we want. to recoiistruct a phyloge-netic tree T displaying the evolutionary rela.tionsl1ip of the sequeiices. Using any dis-simila.rity measure 6 2 X x X �+ R. we define a neighbour relation ||�5 (with respect toÖ�) for each quartet of sequences. We sa_y S1 and S2 are neighbours with respect to .5};and S4 (S1S2ll6-S354) 0bk� (S(S1, S2) + (�S3, S4) < TII.t7�l.{6(S1,S;;) + (S(.S'-2, S4), ($(S1, S4) +6(S2, 53)}. A similar de�nition of ueighbourliness is made for binary unrooted treesT. We propose a method to find a tre.e T among all tree topologies for which thenumber of quartets that fulfil both relations � a11d 3l� is maximal. l�inally, wediscuss some examples from our studies of rR N A and tR.N A sequences.

Homology Modeling and Distance Geometry

Timothy F. Havel, Harvard Medical School

Distance geometry is a geometric model of molecules, wlierein the structure is defi-ned in terms of distance and chirality constraints. These a.re, respectively, lower andupper bounds on the interatomic distances, a11d the orient.ations of selected rigidand asymmetric tetrahedra of atoms. Dista.nce geometry calculations are designedto reveal the geometric structure of the set of all conformations (spatial arrange-ments of the atoms) consistent with this information. The. most important of thesecalculations involves computing a conformational ensemble, i.e. a set of conformati-ons satisfying the constraints, but otherwise random. By analysing such an ensembleto discover new geometric properties that are uniformly present in all its membersand hence are, with high probability, necessary consequences of the geometric cons-traints used as input, these calculations provide us with a. crude but effective methodof geometric reasoning. Distance geometry calculations have found numerous app-lications in chemistry and biology, most notably methods of determining structurefrom e. g. N MR data, exploring conformation space, and generating coordinates fromconnectivity tables. In this lecture, a new application is introduced, which obtains

10

the distance and chirality constraints sufficient to determine a protein sl1�u(�tu1�efrom aligmnents of its sequence with hoinologous proteins of known st1'ucture. Asan example. I have predicted the structure of the lillaxorloxin from 1L�.('oli using ashomologues the crystal st1�uctures of the l*�la.vodo.\'ins from .»l.ni(luIans. (�.I)eij<'i'i('l<.C.(�1�i.9'Pi1s and D.vulgaris. The conlplete results of this stud)� be found in a l'orthco-rning issue of the journal Molecular .S'irmzla.t.io1i.

Double Point Chains in Proteins and

Inverse Protein Folding

Frieder Kaden, GMD, St. Augustin

How 1na.ny possibilities a.re there to walk through the str11(�tu1�e of a giwn proteinif not only the main chain steps are allowed but also steps between residues thata.re far apart in the sequence but whose geometric distance is almost like that. ol�backbone neighbours, and each residue is visited exactly once"? Tlie main chain ofthe protein corresponds to its ordinary sequence. Other walks through the proteinlea.d to modi�ed sequences that can be a.pplied to lind alignnients to new proteinsequences in the sense of the inverse problem of protein folding. ln 2: suitable graphthe above question appea.rs as the Nl��cornplete problem of finding all llarniltonianpaths. The problem is transformed into a double point problem that can be s()l\~'e(lby a method which is a three-diinensional generalization of the formalisin of L.Kauffman presented in his Formal Knot Th(�()l'__\' in 1983.

Physical Mapping of Chromosomes

Richard M. Karp, University of California, Berkeley, CA, USA

We present several algorithms for reassemhling the overlap structure of clones on achromosome, given various kinds of fingerprint data for the clones.

Graph Theoretical Description of Sheet Topologies

Ina Koch, GMD, St. Augustin

My talk was about. the usage of graph theoretical descriptions of proteins at dill'erentstructure levels. We defined the protein graph, that describes the protein structureat the residue level, and the beta graph describiiig sheet topologies. At the proteingraph level we derived patterns, which call be divided in sequentially short-range(describing helical and turn structures) and long�range patterns (describing super-secondary structures). We matched the long�range patterns against certain sheettopologies, chosen from a set of non-homologous proteins in order to find relationsbetween patterns and topologies. First results show, that there is a quite ('lifI"erentamino acid distribution in the patterns, which are niatching against certain topolo-gies.

11

Approximate Pattern Matching:New Algorithms and Applications

Udi Manber, University of Arizona

We (les(�1�il)e(l a tool for approximate pattern 1nat.('hing. ('a.lled agrep. .-l grep cansearch, very last. for <'omplicat.ed patterns including a.r|)it.rar_v regular expressionsand allowing insertions. deletions. and/or suhst..itution errors. lC.\'amples of the use ofagrep were shown. and two of the five new algorithms that agrep uses were presented.Prelimiiiary ideas about fast a.pproxiinate pa.ttern mat:cl�1ing in a prepro(:esse(l libraryof patterns, using: the triangle inequality to prune the sea.I'('l1 tree. were also discussed.

Machine Discovery by Decision Trees over Regular Patterns

Satoru Miyano, Kyushu University

Wo describe a n1a.clline-learning system that produces hypotheses from positive andnegative examples. and report some experiments on protein dat.a using HR and(Ion Bank. This learning system is developed with a learning algorithm for decisiontrees over regular patterns, which we devised newly for this research. ln the experi-ments on transmembrane domain identification. the system disctovered very simplehypotheses with very high accuracy from a small number of positive and negativedata. These hypotheses show that negative motifs, that is. motifs of negative data.play a key role in such identi�catioii. In these experiments, we classi�ed 20 symbolsof amino acid residues according to the l1ydropa.thy iudices due to Kyte and D00-little. VVe call such trausformation of symbols an indexing. We. observed that theiiulexing by the hydropatliy indices is iniport..ant. in ma.king the learning algorithmeilicient and accurate. This observa.tion inspired us with a. desire to discover such anindexing itself without any help of biological l<nowledge but just by a learning algo-rithm with data.. We succeeded in it by con�il)iuing the above learning algorithm andthe local search technique for �nding indexings. We also report some experimentson signal pept.ides.(This work is with S.Aril<awa, S.Kl1l'l£l.\��. S.Shimozono. z�\.Shimhara a.nd T.Shiinhara)

Representing Suboptimal Alignments ofBiological Sequences

Dalit Naor, Stanford Univ.� USA

The opytimal alignment between a pair of biological sequences that minimizes theedit-distance may not necessarily reflect the correct biological alignment, that is thealignment based on Sl,l'1l('.l-l.1l�(_�. or evolutionary changes. However, in many cases theedit..-distance alignment. is a good approximation to the biological alignment. Sub-optimal alignments are alignments whose scores lie within the neigbonrhood of the

12

optimum, and they were suggested as all.ernat.ives t.o the optimal one. We study theconibinatorial nature of suboptirual aligiuneuts and give a compact. rcpresentatioiIof them. VVe define a canonical set of alignments. and argue that they can be viewedas the essential ones. We currently test this hypothesis with protein seqiiences.

Algorithmic Aspects of Protein Structure

Chris Sander, EMBL, Heidelberg

Proteins are beautiful and con�1plicated three-climensional structures. Their slialwa.nd biological function is coded by genetic inl'orn1ation. 'l�here are thousands of bio-logically distinct. protein classes. Piiysicist. view proteins a.s polymer chains and tryt.o understand the structural principles common to all. Biologists try to un(.lerstau(lhow genetic. information is translated into the highly iiidividualistic biological roleof a protein. (bmputer scientists see proteins as gra.phs or space curves and try todevelop algoritluns that siinplify the enormous combinatorial complexity of pickingout the correct structure for a given genetic sequence.(�During the. workshop we proved tha.t the protein folding problem is SO-l.�.l�cl.l�(l, U.Manber and (.7. Sander, unpublished)

Climbing a Tree Through the Window

David Sankoff, University of Montreal

The method of nearest-neighbourinterchange effects local improvements in a |)iua.rytree by replacing a 4-subtree by one ol� its two a.lt.ernatives if this improves the ob-jective function. We extend this to l<-subtrees in order to reduce the number of localoptima. Possible sequences of k-subtrees to be examined are pro<luce(;l by moving awindow over the tree, iucorpora.ting one edge at a time while deactivating another.The direct.ion of this movement. is chosen according to a hill-climbing strategy. Thealgorithm includes a backtra.c.l<ing coiupoueut. Series of simiilations of moleciilarevolution data/parsimony analysis are carried out. for I: : 4.1.. . . . 8. contrasting thehill-climbing strategy to one based on a random choice of ne_\'t. window. and compa-ring two St0P1I>ing rules. Increasing window size I: is found to be the most effectivewa.y of improving the local optimum, followed by the choice of hill-climbing overthe random strategy. A suggestion for achieving higher values of k is based on arecursive use of the hill-climbing st.rat(-gy.

Secondary Structure Problems in Coiled-Coil Proteins

Jeanette P. Schmidt, Polytechnic Univ., Brooklyn, N.Y.

We describe efficient computational tools for the (~xamiuation of a class of proteinsthat fold as alpha-helica.l coilecl-coil proteins. In particular we detect whether asequence of amino acids c.ont.ains the 7-residue periodicity of amino acids ll(�(i(�.?SH?1.l'_\�

13

to promote a coiled-coil conformation. A penalty matrix� is used which determinesthe quality of the fit of a given amino acid at a given position of the hepta.d. A simpleand ef�fic.ient algorithm is presented, which can detect irregular periodic repetitionsof a.ny constant size pattern in an arbitrary text, in time that is linear in the size ofthe text, (where gaps are allowed a.nd the pattern is speciliecl by a.n arbitrary penaltymatrix). The a.lgorithm is used to align proteins (presuined to be coiled-coil) to the7 positions of the heptad. The alignment is currently used to compare the outersurface of one coiled~coil protein to the other surface of a second coiled-coil protein.(Joint work with V. Fischetti, G. Landau and l�. Sellers.)

Sequence-Structure Relationship in the Twighlight Zone

Reihard Schneider, EMBL, Heidelberg

The da.t.abase of known protein three�dimensional structures can be significantly in-creased by the use of sequence homology, based on the following observations. (1)The database of known sequences, currently at more than 25000 proteins, is two or-ders of magnitude larger than the database of known st.ructures. �

� The currentlymost powerful method of predicting protein structures is model building by homo-logy. (3) Structural homology can be inferred from the level of sequence similarity.(4) The threshold of sequence similarity sufficient for structural homology dependsstrongly on the length of the alignment. This emprically derived threshold curvefor structural similarity was discussed. We first quantify the relation between se-quence similarity, struc.ture similarity and alignment length by an exhaustive surveyof alignments between proteins of known structure and report. a homology thresholdcurve as a. function of alignment length. We then produce a database of homology-derived secondary structure of proteins (H SSP ) by aligning to each protein of knownstructure all sequences deemed homologous on the basis of the. tlireshold curve. Foreach known protein structure, the derived database contains the aligned sequences.secondary structure, sequence variability and sequence pro�le. Tertiary structuresof the aligned sequences are implied, but not modelled explicitly. The results areuseful in assessing the structural significance of matcht-s in sequence database sear-ches, in deriving preferences and patterns for stru('tur<' prediction, in elucidatingthe structural role of conserved residues and in modelling three-dimensional detailby homology. The results of a c.omprel1ensive sequence analysis of the 18? predictedopen reading frames of yeast. chromosome III were presented and discussed. Whenthe results of database similarity searches are pooled with prior knowledge, a likelyfunction can be assigned to 42% of the proteins, and a predicted 3-D structure to athird of these (14%). The function of the remaining 58% remains to be determined.An outlook for other genome projects was given. In our opinion, development in thearea of protein sequence analysis should focus on three major areas. (1) improvedalgorithms for the detection of real, but difficult to catch, homologies and for thedirect prediction of structure and function. (2) the integration of heterogenous toolsinto a overall working enviroment with facile exchange of information between tasks

14

(sequence analysis workbench) (3) direct, on-line, on�desk availability of all relevant,information in the biological literature.

Computer-Aided Design of Proteins With New Properties

Dietmar Schomburg, GBF Braunschweig / CAPE

By the increase of knowledge on protein 3D-structiire, the last development of mo-lecular graphics and force �eld calculations, and a good understanding of structure-function correlation, a rational design of proteins with new desired properties hasbeen possible recently. A few examples are given in the lecture. Still, the computer-based methods as sequence alignment, 3D structure prediction, and docking pre-diction urgently need improvement. Examples of new develop1�nent.s at CA P E - tlivGerman Centre of Applied Protein Engineering - were presented in the lecture.

Applications of Arti�cial Intelligence and Machine Learningto Protein Structures

Steffen Schulze�Kremer, Brainware GmbH, Berlin

The IPSA method aims at lea.rning patterns of supersecondary structure from a setof known proteins. This involves selecting a list of properties of secondary struc-tures (topological, geometrical and cheniophysicali); setting up a database; runninglearning from observation programs on that database. So far pairs of a-helices andpairs of an a�helix and a /3-strand have been classified and described. Among theclasses generated there are some with three secondary structures in exact a.gree-ment, although only information on two secondary structures was given, and classesthat were formed completely by long range interactions. Another Al-applicationin Biochemistry was shown, the use of genetic algorithms. It involves a torsionangle representation using standard bond lengths and angles; the operators SE-LECT, MUTATE and CROSSOVER; and a very simple fitness function of the formE = Etor + Evdw + Esmt. Although no conformation generated resembled the nativestructure (of Crambin), the genetic algorithm produced very low �tness individuals.Work is going on to improve the fitness function.

Automatic Derivation of Patterns to Predict

Protein Structure

Joachim Selbig, GMD, St. Augustin

Pattern-based heuristic prediction of protein structure rests upon some form of localhomology where the homology information is extracted from structure data bases ina generic form by certain generalization principles. Taking into account sequentiallylong-range interactions requires an appropriate representation of protein structure.In particular, this holds to the case when the patterns are generated automatically

15

by machine learning methods. Though the pattern generation by hand on the baseof biophysical principles is very successful machine lea.rning methods may be used tosearch the available data. systematically and thus to improve our understanding ofthe protein folding principles. Learning has been mainly viewed as inducing generalconcept desciptions from a learning set subdivided into classes. One of the import-ant dimensions for cha.racterizing learning systems is the type of the representationlanguages used to describe the elements of the learning set and the concepts. Inour approach the data about the spatial structure of the proteins determined byX�ray crystallography are transformed into a graph description which provides thepossibility to de�ne a multitude of patterns for describing structural elements andwhich may be understood as discrete forms of the contact maps.

Physical Mapping of DNA and Interval Graphs

Ron Shamir, Tel Aviv University, Israel

A fundamental problem in temporal reasoning is to determine the consistency of aset of events, where for each pair of events a set of possible atomic relations (prece-dence, overlap, containment etc.) is prescribed. Events are assumed to be intervalson the real line. We study the problem for a simpli�ed model, where the only atomicrelations are precedence and intersection. By restricting the input to a subset of thepower set. of atomic relations one gets a variety of interesting combinatorial pro-blems. We give N P-hardness results or polynomial algorithms for a variety of suchrestricted problems. In the DNA physical mapping problem, the chromosome corre-sponds to the time line and the fragments of the DNA are the events. A simpli�edmodel for the biological problem is shown to be equivalent to one of the NP-hardrestrictions of the general model. Ways to exploit. the many polynomial restrictionsin order to expedite physical map assembly are suggested.(Joint work with M.C. Golumbic, IBM Haifa, Israel, and in part with H. Kaplan,Tel Aviv University, Israel) 0

Hide and Seek on a Polyprotein

Manfred Sippl, University Salzburg

The set of experimentally determined protein structures is used to derive a know-ledge based force �eld which in turn allows calculation of conformational energiesof proteins. The �nal goal is the computational determination of protein structuresusing the knowledge based force �eld. The development of force �elds depends 011useful techniques for the assessment of the performance of the force field at eachstage of development. A necessary condition is that the native fold of a protein haslowest energy compared to a number of nonnative decoys. The current version of theforce �eld is able to identify all native folds in our data base (160 individual chains)among approximately 40,000 alternatives. At the current state of development theforce field can be used to validate experimentally determined structures, to detect

16

nat.ive like folds for sequences of unknown structure in a da.ta base of known foldsusing sequence struc.ture. alignment techniques and to build models of ent.ire foldsfrom ensembles of overlapping fragments.

The Prediction of Some Protein Structural Features

Temple F. Smith, Boston University

Using an alignment of some 127 proteins from the PDB with their close homo-logs a number of statistical measure were obtained: these included the conditional

probabilities:P(.S'/,.|S;,._1,), P(.5';,|ak.), . . . , P(Sk|ak, a.k__1 . .5'k_1 @"�� mm these the missing or Shannoninformation was calcula.t.ed, given that. there are only eight structural states. Thesedata suggest that the standard secondary structure prediction can do no better than65% which is inaccord with experience. In addition these data suggest that the useof the variability at aligned homologous positions provides the largest reductionin missing information. with an upper limit of 86% on predictions fully exploitingsuch information. Finally if secondary structure prediction is done in the contextof our understandings of realizable tertiary structures much higher values may bepossible. This was tested by modeling the tertiary 3-D information of a set of sevendomain classes using discrete space state models. These models condition the pri-mary and secondary structure on allowed tertiary struc.tures and appear to increasethe predictability to near 95%.

Interpretation of Parametric Alignment Plots

Martin Vingron, Univ. of Southern California

Based on the methods presented by M. \/Vaterman (see above) the patterns arisingfor parametric alignments were analyzed. The logic behind the plot was first ex-empli�ed based on the comparison of random sequences. The most striking featurethere is the clear reflection of the statistical features of alignment score in the com-binatorial structure for the plots. When moving to the comparison of real biologicalsequences these features can be found again in a somewhat distorted form though.Nevertheless, we could highlight certain patterns which seem linked to biologicallycorrect alignments and which might aid in their recognition.

New Problems in Evolutionary Tree Construction

Tandy J. Warnow, Sandia National Labs, USA

Classical models for constructing trees from discrete data either use distance matri-ces or qualitative characters. The complexity of these problems are discussed, a.ndthe flaws in these models are examined. Several new models for tree construction are

17

then presented which may permit efficient algorithms to be discovered, a.nd whichavoid some of the inherent limitations of the classical formulations.

Parametric Sequence Alignment

Michael Waterman, Univ. Southern California

Dynamic programming algorithms for optimal alignments of two nucleic acid orprotein sequences require setting penalty paranieters. While the choice of these pa.-rameters greatly influences the quality of the resulting alignments. this choice hasbeen made in an ad hoc fashion. In this ta.lk we present an algorithm to find optimalalignment scores for all choices of the penalty parameters when the score is linear inthe penalty parameters. In addition some statistical theory of the asymptotic growthof alignment scores with the length of random sequences is presented and related tothe parametric sequence alignments.

RNA Secondary Structure Modeling

Michael Zuker, NRC, Ottawa, Canada

RNA secondary structure modeling differs fundame.ntally from conventional atomicresolution modeling. By borrowing discrete optimization methods used in sequencea.lignment, it ca.n unfailingly predict minimum free energy as well as close to opti-mal foldings. The de�nition of secondary structure and the recursion for computingoptimal foldings a.re given. The dynamic programming �ll algorithm runs in thetime @(n��) as presented, where n is the sequence size. A stopping rule is introducedthat limits a backtracking step in the search for a best interior or bulge loop closedby a given base pair. It is conjectured that the expected depth of the backtrackingsearch is bounded, resulting in an overall ('9(n3) performance. lt is shown how topredict suboptimal foldings by executing the fill algorithm on two ligated copies ofthe same sequence. Base pairs that can participate in optimal and close to optimalfoldings a.re displayed as points in triangular arrays called energy dot plots. Theenergy dot plot for the entire 4217 base genome of the bacteriophage P�� revealsdistinct structural domains that correspond well to what is observed by electronmicroscopy. A detailed model for the central region agrees well with data from en-zyme cleavage and chemical modification experiments, and is further supported bystudies on mutant phages. A cluttered region in the dot plot contains base pairs ofa slightly suboptimal alternate folding that is observed by electron mic.rosc.opy.

18'

Dagstuhl-Seminar 9237: List of Participants (update:17.09.92)

Alberto Apostolico Andreas DressPurdue University Universität BielefeldComputer Science Department Fakultät MathematikWest Lafayette IN 47907 Postfach 86 40USA W-4800 Bielefeld

[email protected] [email protected].: (317) 494 6015 bieIefeId.de

teI.: +49-521-106-50 34/20Norbert Blum fax.: +49-521-106-47 43Universität BonnInstitut für Informatik Robert Gie erichRömerstr. 164 Universität ielefeldW-5300 Bonn 1 Technische Fakultä[email protected] Postfach 86 40teI.: +49-228-550-250 W-4800 Bielefeldfax.: +49-228-550-4 40 [email protected]

teI.: +49�521-106 2913Hue Sun ChanUniversit of Calif. at San Francisco Gaston GonnetDept. of harmaceutical Chemistry ETH ZürichBox 12 04 InformatikSan Francisco CA 94143 ETH-ZentrumUSA CH-8092 Zü[email protected] SwitzerlandteI.: +1 -415-476-89 10 [email protected].: +1 -41 5-476-15 08 teI.: +41 -254-74 70

fax.: +41-1-262-39 73William I. ChangCold Spring Harbor Laboratory Timothy Havel100 Bungtown Road Harvard Medical SchoolCold Spring Harbor NY 11724--2202 BCMPUSA 240 Longwood [email protected] Boston MA 02115teI.: +1 -516-367-88 66 USAfax.: +1 -516-367-84 61 havel@pto|emy.med.harvard.edu

Gordon Crippen Arndt von HaeselerThe University of Michigan Universität MünchenCollege of Pharmacy Zoologisches lnstitut der Uni. MünchenAnn Arbor Ml 48109-1095 Luisenstraße 14USA W-8000 München 2

[email protected] [email protected]�muenchen.dbp.deteI.: +1 -31 3-763-97 22 teI.: +49-89-5 90 23 27

fax.: +49-89-5 90 24 74Maxime CrochemoreUniversité Paris VII Ralf HofestädtLlTP Universität Koblenz - LandauInstitut Blaise Pascal Fachbereich Informatik2 Place Jussieu Rheinau 3 - 4F-75251 Paris Cedex O5 W-5400 KoblenzFrance [email protected]@litp.ibp.fr teI.: +49�261 �91 1 9-431teI.: +33-1-44.27.68.47fax.: +33-1-44.27.68.49

Frieder Kaden Gad LandauGesellschaft für Mathematik und Polytechnic UniversityDatenverarbeitung mbH Dept. of CSSchloß Birlinghoven Six MetroTech CenterPostfach 12 40 Brooklyn NY 11201W-5205 St. Au ustin 1 USAFriedenKaden gmd.de [email protected].: +49-2241-14 27 86 teI.: +1 -71 8-260-31 54fax.: +49-2241-14 21 02 fax.: +1 -71 8-260-39 06

Richard Karp Eugene LawlerUniversity of California at Berkeley University of California at BerkeleyComputer Science Division Computer Science Division589 Evans Hall 591 Evans HallBerkeley CA 94720 Berkeley CA 94720USA [email protected] [email protected].: +1-510-642-15 59 teI.: +1-510-642-40 19fax.: +1 -510-642-57 75 fax.: +1 -510-642-57 75

Marek Karpinski Thomas LengauerUniversität Bonn Gesellschaft für Mathematik undInstitut für Informatik Datenverarbeitung mbHRömerstr. 164 Schloß BirlinghovenW-5300 Bonn 1 Postfach 12 40Germany W-5205 St. Augustin [email protected] [email protected].: +49-228-550-2 24 teI.: 02241-14 27 77fax.: +49�228�550-4 40 fax.: 02241-14 21 02

Peter Kleinschmidt Udi ManberUniversität Passau University of ArizonaFachbereich Mathematik & Informatik Department of Computer ScienceInnstraße 33 Tucson AZ 85721W-8390 Passau USA

[email protected] [email protected].: +49-851-50 93 39 tel.: +1 -602-621 -66 131ax.: +49-851-50 91 71 fax.: +1-602-621-42 46

Ina Koch Saira MianGesellschaft für Mathematik und University of California at Santa CruzDatenverarbeitung mbH Sinsheimer LaboratoriesSchloß Birlinghoven Santa Cruz CA 95064Postfach 12 40 USAW-5205 St. Augustin 1 [email protected]@cartan.gmd.de teI.: +1 -408-459-27 00teI.: +49-2241-141-28 36 fax.: +1-408-459-31 39

fax.: +49-2241-142-8 89 Satoru Miyano

Hans-Peter Kriegel Kyushu University 33Universität München Research Institute oflnstitut für Informatik Fundamental Information ScienceLeopoldstraße 11B Postal No. 812W-8000 München 40 Fukuoka

[email protected] JapanteI.: +49089-2180-52 68 [email protected].: +49~89�-2180-52 46 teI.: +81-92-641-11 01 ext: 44 71

�� fax.: +81-92-611-26 68

Dalit Naor Dietmar SchomburgStanford University Medical Center Ges. für biotechnologischeDepartment of Biochemistry Forschung mbH (GBF)Stanford CA 94305-5307 Mascheroder Weg 1USA W-3300 [email protected] Germanytel. : +1 -415-723-59 76 [email protected].: +1 -41 5-723-67 83 tel.: +49-531-61 81-3 50

fax.: +49-531-61 81-3 55Hartmut NoltemeierUniversität Würzburg Steffen Schulze�KremerLehrstuhl für Informatik Brainware GmbHAm Hubland Gustav-Meyer-Allee 25W�870O Würzburg W-1000 Berlin [email protected] [email protected].: +49-931�888-50 55 tel.: +49-30-46 330-40fax.: +49-931-888-46 00 fax.: +49�30-469-46 49

Chris Sander Joachim SelbigEuropean Molecular Biology Gesellschaft für Mathematik undLaboratory (EMBL) Datenverarbeitung mbHMeyerhofstraße 1 Schloß BirlinghovenW-6900 Heidelberg Postfach 12 [email protected] W-5205 St. Augustin 1teI.: +49-6221�38 73 61 [email protected].: +49-6221-38 73 06 teI.: +49-2241-14-27 92

fax.: +49-2241-14-21 02David SankoffUniversité de Montreal Ron ShamirCentre de Recherches Mathematiques Tel Aviv UniversityMontreal H3C 3J7 Dept. of Computer ScienceCanada Tel Aviv [email protected] IsraelteI.: +1 -51 4-343-7574 [email protected].: +1 -51 4-343-2254 teI.: +972-3-640-93 73

fax.: +972-3-640-93 57Jeanette P. Schmidt

Polytechnic University Manfred J. SipplComputer Science Department Universität Salzburg6 Metrotech Center Center for Applied MolecularBrooklyn NY 11201 EngineeringUSA Jakob Haringer Str. [email protected] A-5020 SalzburgteI.: +1 -718-260 3502 Austriafax.: +1 -718-260-39 06 [email protected]

teI.: +43-662-8044-57 97Reinhard SchneiderEuropean Molecular Biology Hans-Werner SixLaboratory (EMBL) Fernuniversität-GH-HagenMeyerhofstraße 1 Praktische Informatik lllW-6900 Heidelberg Feithstraße [email protected] W-5800 Hagen 1teI.: +49-6221-38 73 05 six@fernuni�hagen.defax.: +49-6221-38 73 06 teI.: -I-49'2331'987i�29 64

fax.: +49-2331-987-3 17

Temple SmithBoston UniversityBMERC36 Cummington StreetBoston MA [email protected].: +1-617-353-71 23fax.: +1 -617-353-70 20

Leslie A. TaylorUCSF School of PharmacyComputer Graphics LabBox 0446 Room S926513 Parnassus Avenue

Sän Francisco CA 94143-0446U A

[email protected].: +1 -415-476-5379fax.: +1-415-502-1755

Martin VingronUniversity of Southern CaliforniaDepartment of MathematicsDRB 287 - University ParkLos Angeles CA 90089-1113USA

[email protected].: +1-213�740-24 10fax.: +1 -213-740-24 37

Tandy WarnowPrinceton UniversityDepartment of Computer Science35 Olden StreetPrinceton NJ 08544USA

[email protected]

Michael WatermanUniversity of Southern CaliforniaDepartment of MathematicsDRB 287 � University ParkLos Angeles CA 90089-1113USA

[email protected].: +1 -21 3-740-24 08

Peter WildmayerETH Zürich

Department of Computer ScienceETH-ZentrumCH-8092 ZürichSwitzerland

[email protected].: +41 -1 -254-74 00

Ralf ZimmerGesellschaft für Mathematik und

Datenverarbeitung mbHSchloß BirlinghovenPostfach 12 40W�5205 St. Augustin [email protected].: +49-2241-14 28 18fax.: +49-2241-14 26 18

Michael ZukerNational Research CouncilInst. for Biological Sciences M-54Ottawa Ontario K1 A 0R6Canada

[email protected].: +1-613-993-48 30

Zuletzt erschienene und geplante Titel:

K. Compton, J.E. Pin , W. Thomas (editors):Automata Theory: Infinite Computations, Dagstuhl-Seminar-Report; 28, 6.-10.1.92 (9202)

H. Langmaack, E. Neuhold, M. Paul (editors):Software Construction - Foundation and Application, Dagstuhl-Seminar-Report; 29, 13..-17.1.92(9203)

K. Ambos-Spies, S. Homer, U. Schöning (editors):Structure and Complexity Theory, Dagstuhl-Seminar-Report; 30, 3.-7.02.92 (9206)

B. Booß, W. Coy, J.�M. Pflüger (edilors):Limits of Modelling with Programmed Machines, Dagstuhl-Seminar-Report; 31, 10.-14.2.92(9207)

K. Compton, J.E. Pin , W. Thomas (editors):Automata Theory: Infinite Computations, Dagstuhl-Seminar-Report; 28, 6.-10.1.92 (9202)

H. Langmaack, E. Neuhold, M. Paul (editors):Software Construction - Foundation and Application, Dagstuhl-Seminar-Fleport; 29, 13.-17.1.92(9203)

K. Ambos-Spies, S. Homer, U. Schöning (editors):Structure and Complexity Theory, Dagstuhl-Seminar-Report; 30, 3.-7.2.92 (9206)

B. Booß, W. Coy, J.-M. Pflüger (editors):Limits of Information-technological Models, Dagstuhl-Seminar-Report; 31, 10.-14.2.92 (9207)

N. Habermann, W.F. Tichy (editors):Future Directions in Software Engineering, Dagstuhl-Seminar-Report; 32; 17.2.-21.2.92 (9208)

H. Cole, E.W. Mayr� F. Meyer auf der Heide (editors):Parallel and Distributed Algorithms; Dagstuhl-Seminar-Report; 33; 2.3.-6.3.92 (9210)

P. Klint, T. Reps, G. Snelting (editors):Programming Environments; Dagstuhl-Seminar-Report; 34; 9.3.-13.3.92 (9211)

H.-D. Ehrich, J.A. Goguen, A. Sernadas (editors):Foundations of Information Systems Specification and Design; Dagstuhl-Seminar-Report; 35;16.3.-19.3.9 (9212)

W. Damm, Ch. Hankin, J. Hughes (editors):Functional Languages:Compiler Technology and Parallelism; Dagstuhl-Seminar-Report; 36; 23.3.-27.3.92 (9213)

Th. Beth, W. Diflie, G.J. Simmons (editors):System Security; Dagstuhl-Seminar-Report; 37; 30.3.-3.4.92 (9214)

C.A. Ellis, M. Jarke (editors):Distributed Cooperation in integrated Information Systems; Dagstuhl-Seminar-Report; 38; 5.4.-9.4.92 (9215)

J. Buchmann, H. Niederreiter, AM. Odlyzko, H.G. Zimmer (edilors):Algorithms and Number Theory, Dagstuhl-Seminar-Report; 39; 22.06.-26.06.92 (9226)

E. Borger, Y. Gurevich, H. Kleine-Bijning, M.M. Richter (editors):Computer Science Logic, Dagstuhl-Seminar~Fleport; 40; 13.07.-17.07.92 (9229)

J. von zur Gathen, M. Karpinski, D. Kozen (editors):Algebraic Complexity and Parallelism, Dagstuhl-Seminar-Fleport; 41; 20.07.-24.07.92 (9230)

F. Baader, J. Siekmann, W. Snyder (editors):6th lntemational Workshop on Unification, Dagstuhl-Seminar-Report; 42; 29.07.-31.07.92 (9231)

J.W. Davenport, F. Krückeberg, RE. Moore, S. Rump (editors):Symbolic, algebraic and validated numerical Computation, Dagstuhl-Seminar-Report; 43; 03.08.-07.08.92 (9232)

R. Cohen, R. Kass, C. Paris, W. Wahlster (editors):Third lntemational Workshop on User Modeling (UM�92), Dagstuhl-Seminar�Report; 44; 10.-13.8.92 (9233)

R. Reischuk, D. Uhlig (editors):Complexity and Realization of Boolean Functions, Dagstuhl-Seminar-Report; 45; 24.08.-28.08.92(9235)

Th. Lengauer, D. Schornburg, M.S. Waterman (editors):Molecular Biointormatics, Dagstuhl-Seminar-Report; 46; 07.09.-11.09.92 (9237)

V.R. Basili, H.D. Rombach, R.W. Selby (editors):Experimental Software Engineering Issues, Dagstuhl-Seminar-Report; 47; 14.-18.09.92 (9238)

Y. Dittrich, H. Hastedt, P. Schele (editors):Corrputer Science and Philosophy, Dagstuhl-Seminar-Report; 48; 21 .09.-25.09.92 (9239)

R.P. Daley, U. Furbach, K.P. Jantke (editors):Analogical and Inductive Inference 1992 , Dagstuhl-Seminar-Fleport; 49; 05.10.-09.10.92 (9241)

E. Novak, St. Smale, J.F. Traub (editors):Algorithms and Complexity for Continuous Problems, Dagstuhl-Seminar-Report; 50; 12.10.-16.10.92 (9242)

J. Encarnacao, J. Foley (editors):Multimedia � System Architectures and Applications, Dagstuhl-Seminar-Report; 51; 02.11.-06.1 1.92 (9245)

F.J. Rammig, J. Staunstrup, G. Zimmermann (editors):Sell-Timed Design, Dagstuhl-Seminar-Report; 52; 30.11 .-04.12.92 (9249 )

B. Courcelle, H. Ehrig, G. Rozenberg, H.J. Schneider (editors):Graph-Transformations in Computer Science, Dagstuhl-Seminar-Report; 53; 04.01.-08.01.93(9301)

A. Amold, L. Priese, R. Vollmar (editors):Automata Theory: Distributed Models, Dagstuhl-Seminar-Report; 54; 1 1.01 .-15.01 .93 (9302)

W.S. Cellary, K. Vldyasankar , G. Vossen (editors):Versioning in Data Base Management Systems, Dagstuhl-Seminar-Report; 55; 01.02.-05.02.93(9305)

B. Becker, R. Bryant, Ch. Meinel (editors): IComputer Aided Design and Test , Dagstuhl-Seminar-Report; 56; 15.02.-19.02.93 (9307)

M. Pinkal, R. Scha, L. Schubert (editors):Semantic Formalisms in Natural Language Processing, Dagstuhl-Seminar-Report; 57; 23.02.-26.02.93 (9308)

H. Bibel, K. Furukawa, M. Stickel (editors):Deduction , Dagstuhl-Seminar-Report; 58; 08.03.-12.03.93 (9310)

H. Alt, B. Chazelle, E. Welzl (editors):Computational Geometry, Dagstuhl-Seminar-Report; 59; 22.03.-26.03.93 (9312)

J. Pustejovsky, H. Kamp (editors):Universals in the Lexicon: At the Intersection of Lexical Semantic Theories, DagstuhI-Seminar-Report; 60; 29.03.�O2.04.93 (9313)

W. Stra�er, F. Wahl (editors):Graphics 8. Robotics, Dagstuhl-Seminar-Report; 61; 19.04.-22.04.93 (9316)

C. Beeri, A. Heuer, G. Saake, S.D. Urban (editors):Formal Aspects of Object Base Dynamics , Dagstuhl-Seminar-Report; 62; 26.04.-30.04.93 (9317)

Thomas Lengauer, Dietmar Schomburg, Michael S. Waterman ... · Molecular Graphics Delegate Analysis...

Documents

Transcript of Thomas Lengauer, Dietmar Schomburg, Michael S. Waterman ... · Molecular Graphics Delegate Analysis...