201607 brisbane rost1 evolteaches -...
Transcript of 201607 brisbane rost1 evolteaches -...
-
© Burkhard Rost 1
Evopro
Burkhard Rost TUM Munich
Comp Biol @ INF & IAS & WZWColumbia U NYC - Biochemistry
org
-
© Burkhard Rost
I. IntroductionEvolution teaches protein prediction
-
© Burkhard Rost 3
Predict protein function
-
© Burkhard Rost
Intuitive but not well-defined: • chemical how atom bound? • biochemical transferase • cellular (kinase) cell cycle • developmental time, regulatory • physiological related to disease • genetic dominant/recessive
Protein function as action: Function = anything that happens to or through a protein
4
Protein function
-
© Burkhard Rost 5
www.rostlab.org
© Burkhard Rost 55
Edda Kloppmann
Andrea Schafferhans
Esmeralda Vicedo
Guy Yachdav
Yannick Mahlich
EddaEdda Andrea EAndrea EEsmeraldaEEEsmeraldaEE GuyGuy YannickYannick
IngaWeise
Juan Miguel Cecjuela
Tatyana Goldberg
Tobias Hamp
MaximilianHecht
Thomas Hopf
Tim KarlTimTim Lothar
RichterMaximilianMaximilianTobias MMMTobias MMMJonas
ReebjMichael
BernhoferMichaelMichael
Martin Steinegger
Carsten Uhlig
Inga JJ JonasJonasll Jonas
Ashish Baghudana
pp
Ashish
pp
Ashish
gg
MadhukarSP ShankarMadhukar
Alexandru Buff
Syeda Tanzeem H
Charu
Joel Daon
Christian Dallago
ChristianC
Zosia Gasik
Caroline Gergen
Yulia Gembar-
zhevskaya
Yichun Lin
Maximilian Miller
Dimitrij Nechaev
Jade Martins
p
Dimitrij
Venkata P Satagopam
Sven Punga
Theresa Wirth
Sebastian WilzbachSebastiannSebastiannSebastiann
z
Monika Varshney
-
© Burkhard Rost
3D structureand function
Epstein & Anfinsen, 1961:sequence uniquely determines structure
INPUT: protein sequence OUTPUT:
6
Goal of protein prediction
-
© Burkhard Rost
Need to know history to predict!
Point mutationBinding (Substrate/Protein)Environmental change (DNA close/pH)
7
Prediction in terms of energy landscapes
-
© Burkhard Rost 8
Evolution is history!
Chris Sander & Reinhard Schneider 1991 Proteins 9:56-68B Rost 1999 Prot Engin 12:85-94
-
© Burkhard Rost 9
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
SH3 Src-homology 3 domain one domain of proteins such as Src tyrosine kinase (STK)
-
© Burkhard Rost 10
Evolution improves prediction
Evolutionary profile implicitly captures history of and individual protein!
fly
chicken
rat
mouse
human
-
© Burkhard Rost 11
Using evolution to predict structure
Sequence PSI-BLAST Filter
PHDsec1993
B Rost 1996 Meth Enzymol 266:525-539
60% -> 72% /77%
MaxHom
S
-
© Burkhard Rost
Q9P2H0
Interactions & networks: protein-protein / protein-([DR]NA)LOCtree etc: predict localization Predict enzymatic activity & flexibility Protein disorder Predict membrane regions, epitomes, Improve alignment methods SNP-pipeline: predict nsSNP effectsPredictProtein: web service since 1992 NESG & NYCOMPS: structural genomics
12
Exciting projects
-
© Burkhard Rost
Machine learning=
understanding in 21st century!
-
© Burkhard Rost
Machine learning: easy to make believe
-
© Burkhard Rost
Prediction is the acid test
for understanding
15
-
© Burkhard Rost
Prediction is the acid test for understanding...
also when prediction based on
machine learning?16
-
© Burkhard Rost
Machine learning = black magic
17
© Wikipedia© Wikipedia
Unknown artist - User scan of Sadie, Stanley, ed. (1992). The New Grove Dictionary of Opera 2: 132. London: Macmillan
Black box image
is nonsense17
© Wikipedi© Wiki
UUUnnknow(1992L
-
© Burkhard Rost
complexity matches problem
-> rules not simple
18
-
© Burkhard Rost
ML extracts truthIF { 1 cross-validation right 2 data set right}
19
-
© Burkhard Rost 20
Train
Test
Cross-validation: hide data under table
-
© Burkhard Rost 21
WEKA-like cross-validation
Train
Test
-
© Burkhard Rost 22
3-way cross-validation
Train
Test
Cross-Train
-
© Burkhard Rost 23
Family clustering
No two from same group in train & test|cross-train
-
© Burkhard Rost
Still not enough: exploit “prerelease” data (latest/hottest)
24
-
© Burkhard Rost 25
Chris Sander
Dana Faber - Harvard
-
© Burkhard Rost 26
et al
-
© Burkhard Rost 27
-
© Burkhard Rost
Prediction is the acid test for understanding
28
-
© Burkhard Rost
Learning from machine learning
-
© Burkhard Rost 30
Different interfaces, different physics?
PD Kwong, R Wyatt, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (1998) Nature 393, 648-659.PD Kwong, R Wyatt, S Majeed, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (2000) Structure 8, 1329-1339.
gp120
CD4
antibody-1antibody-2
HIV gp120 / CD4 / FAB
Do 6 types of interfaces differ in sequence?
Internal (inter-domain and intra-domain)External homomers (permanent/transient)External heteromers (permanent/transient)
Y Ofran & B Rost (2003) J Mol Biol 325, 377-87
-
© Burkhard Rost 31
Interface types differ in composition
Y Ofran & B Rost (2003) J Mol Biol 325, 377-87
gp120
CD4
antibody-1antibody-2
They obviously differ! But, are these differences meaningful?
3Yanay Ofran
-
© Burkhard Rost
Chi-square test: known problem: small data sets here millions of points
all differences < 10-300
-> SIGNIFICANT
32
Are these differences statistically significant?
Y Ofran & B Rost 2005 unpublished3
Yanay Ofran
-
© Burkhard Rost
Chi-square test: known problem: small data sets here millions of points
all differences < 10-300
-> SIGNIFICANT
… unfortunately also:proteins [a-b] vs [c-d]1 vs 2 authors random subsets ...
33
Are these differences statistically significant?
Y Ofran & B Rost 2005 unpublished3
Yanay Ofran
-
© Burkhard Rost 34
Find-self test (statistical significance)
Y Ofran & B Rost 2005 unpublished3
Yanay Ofran
-
© Burkhard Rost 35
Find-self test on six types of interfaces
Y Ofran & B Rost (2003) J Mol Biol 325, 377-87
gp120
CD4
antibody-1antibody-2
3Yanay Ofran
-
© Burkhard Rost 36
Predict PPI interfaces from sequence alone
-
© Burkhard Rost
Η
Ε
L
>
>
>
pickmaximal
unit=>
currentprediction
J2
inputlayer
first orhidden layer
second oroutput layer
s0 s1 s2J1
:GYIY
DPAVGDPDNGVEP
GTEF:
:GYIY
DPEVGDPTQNIPP
GTKF:
:GYEY
DPAEGDPDNGVKP
GTSF:
:GYEY
DPAEGDPDNGVKP
GTAF:
Alignments
5 . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 5 . .. . . . . . . 2 . . . . . 3 . . . . . .. . . . . . . . . . . . . . . . . 5 . .
. . . . 5 . . . . . . . . . . . . . . .
. . . 5 . . . . . . . . . . . . . . . .
. . 3 . . . . 2 . . . . . . . . . . . .
. . . . 1 . . 2 . . . 2 . . . . . . . .5 . . . . . . . . . . . . . . . . . . .. . . . 5 . . . . . . . . . . . . . . .. . . 5 . . . . . . . . . . . . . . . .. . . . 4 . 1 . . . . . . . . . . . . .. . . . 1 3 . . . 1 . . . . . . . . . .4 . . . . 1 . . . . . . . . . . . . . .. . . . . . . . . . . 4 . 1 . . . . . .. . . 1 . 1 . 1 2 . . . . . . . . . . .. . . 5 . . . . . . . . . . . . . . . .
5 . . . . . . . . . . . . . . . . . . .. . . . . . 5 . . . . . . . . . . . . .. 1 1 . 1 . . 1 1 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 5 .
GSAPD NTEKQ CVHIR LMYFW
profile table
:GYIY
DPEDGDPDDGVNP
GTDF:
Protein
corresponds to the the 21*3 bits coding for the profile of one residue
37
Alignment information
B Rost (1996) Methods Enzymol 266:525-39B Rost (2001) J Struct Biol 134: 204-18
-
© Burkhard Rost B Rost (1996) Methods Enzymol 266:525-39B Rost (2001) J Struct Biol 134: 204-18 38
More complex system to predict structure
1996) Methods Enzymol 266:525-39
Sequence PSI-BLAST Filter
PROFsecPROFacc
1999
-
© Burkhard Rost
1992-now >200 packages in Debian >100k users registered from >110 nations >500 users/day, >12k users/month >57M predictions in 2012
39
PredictProtein
Laszlo Kajan
Guy Yachdav
B Rost & C Sander (1992) Nature 360:540L Kajan, G Yachdav, et al. & B Rost (2013) Biomed Res Int G Yachdav et al & B Rost (2014) NAR 42:W337-43
-
© Burkhard Rost 40
PP interfaces predicted from sequence
Y Ofran & B Rost 2003 FEBS Lett 544:236-9Y Ofran & B Rost 2007 Bioinformatics e13-16
InteractSites
4Yanay Ofran
-
© Burkhard Rost
PPI hot spots?
41
-
© Burkhard Rost
residues that are essential for protein-protein interactions operational:
• 1. residue in the interface • 2. mutation of the residue knocks out interaction
42
Interaction HOT SPOTS
-
© Burkhard Rost 43
PP interfaces predicted from sequence
Very strong =
hot spots ?
Y Ofran & B Rost 2003 FEBS Lett 544:236-9Y Ofran & B Rost 2007 Bioinformatics e13-16
InteractSites
4Yanay Ofran
-
© Burkhard Rost
Y Ofran and B Rost (2003) FEBS Letters 544: 236-9Y Ofran and B Rost (2007) PLoS Comp Biol 3: e119 44
Strength of prediction reflects reliability?
0.6
0.4
0.9
0.1
weakstrong
-
© Burkhard Rost 45
Prediction of hot spots for CD4
• alanine scan for V1 domain of CD4 (bound to gp120)(A Ashkenazi et al. & DJ Capon (1990) PNAS87, 7150)
red: observed
• structure:PD Kwong et al. & WA Hendrickson (2000) Structure 8, 1329-1339.
purple: predicted
Y Ofran & B Rost 2007 PLoS CB 3:e119 Yanay Ofran
method
4Yanay Ofran
-
© Burkhard Rost 46
Hot spots prediction requires full information
Y Ofran & B Rost 2007 PLoS CB 3:e1194
Yanay Ofran
-
© Burkhard Rost 47
Connect micro- and macro-level?
micro level:residuesRIGHT: more hotspots
macro level:networks
UP: more partners
4Yanay Ofran
-
© Burkhard Rost
Hubs: promiscuous proteins
Date/Party hubsNotation introduced by Marc VidalJD Han et al. & M Vidal 2004 Nature 430:88-93
• Date hubs interactions at different times/same location?• Party hubs interactions at same time/different location
48
Date- and Party-hubs
-
© Burkhard Rost 49
More hotspots -> more party-hub like!
Non-hubsParty hubsDate hubs
0.0
0.1
0.2
0.3
0.4
0.4
9 26 42
micro: more hotspots
macro: more partners
Y Ofran, A Schlessinger & B Rost (2008) unpublishedA Feiglin, S Ashkenazi, A Schlessinger, B Rost and Y Ofran (2014) Mol Biosyst 10: 787-94 4
Yanay Ofran
-
© Burkhard Rost
secondary structure prediction
50
-
© Burkhard Rost
single residues (1. generation)
• Chou-Fasman, GOR 1957-70/8050-55% accuracy
segments (2. generation)
• GORIII 1986-9255-60% accuracy
problems • < 100% argument: 65% max • < 40% argument: strand non-local
51
Secondary structure prediction: 1.+2. Generation
residuesiandi+3
-
© Burkhard Rost
ACDEFGHIKLMNPQRSTVWY.
H
E
L
D (L)
R (E)
Q (E)
G (E)
F (E)
V (E)
P (E)
A (H)
A (H)
Y (H)
V (E)
K (E)
K (E)
52
Neural Network for secondary structure
-
© Burkhard Rost 53
NN sec str: training dynamics
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6 7 8 9 10
Other Strand Helix
time: 1 step = 20,000 training samples
Perfo
rman
ce
Eμ = oiμ − di
μ( )i∑
2
ΔJμ ∝ -∂Eμ{J}∂J
-
© Burkhard Rost 54
Balanced training: dynamics
0
0.20.40.6
0.81
1 2 3 4 5 6 7 8 9 10
Other Strand Helix
1 2 3 4 5 6 7 8 9 10
1 0.8 0.6 0.4 0.2 0
unbalanced balanced
Eμ = oiμ − di
μ( )i∑
2
ΔJμ ∝ - ∂Eμ{J}∂J
train:
E = oiμ − di
μ( )i∑
μ=α ,β,L∑
2μ
-
© Burkhard Rost
helix strand otheroverallaccuracymethod
unbalanced 62%comparison:data bankdistribution
comparison:33:33:33balanced 60%
55
full pie: all correctly predicted residues
-
© Burkhard Rost
Machine learning2nd major challenge: data set preparation
56
-
© Burkhard Rost
ML 1 - PPIphysical
Protein-ProteinInteractions
-
© Burkhard Rost 58
Physical interaction NOT association
PD Kwong, R Wyatt, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (1998) Nature 393, 648-659.PD Kwong, R Wyatt, S Majeed, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (2000) Structure 8, 1329-1339.
HIV gp120 / CD4 / FAB
YES
YES
NO
-
© Burkhard Rost
Predict PPI:best way: use evolutionary information
-
© Burkhard Rost 60
PPI pairs more conserved for paralogs than for orthologs
more similarless similar
S Mika & B Rost 2006 PLoS Genetics 2:e29
non-worm
worm
B
A’’ B’’
A’ B’?
?
A
?
-
© Burkhard Rost
PPI FACT 1:more conserved within than between organisms“paralogs” more conserved than
“orthologs”
61S Mika & B Rost 2006 PLoS Genetics, Vol 2, e29
-
© Burkhard Rost 62
Order
-
© Burkhard Rost
Measuring the interaction between A-B twice, results in the same interface?
636Tobias Hamp
Tobias Hamp
-
© Burkhard Rost T Hamp & B Rost 2012 PLoS Comp Biol 8:e1002623
64
Not homology-based inference, but details!
A B
identical
A-B1 experimental structure 1A-B2 experimental structure 2
identical
A B
interfaces 1 and 2
identical?
6Tobias Hamp
-
© Burkhard Rost 65
Mostly the same but many differ
T Hamp & B Rost 2012 PLoS Comp Biol 8:e10026236
Tobias Hamp
-
© Burkhard Rost 66
Many examples for alternative interfaces
T Hamp & B Rost 2012 PLoS Comp Biol 8:e10026236
Tobias Hamp
-
© Burkhard Rost
FACT 2:Protein-Protein
interaction interfaces vary a LOT LOT LOT
67T Hamp & B Rost 2012 PLoS Comp Biol 8:e1002623
Tobias Hamp 6
Tobias Hamp
-
© Burkhard Rost
Predicting pairs of protein-protein
interactions (PPIs)
68
-
© Burkhard Rost
PPIchallenge
machine learningMUCH more
69
-
© Burkhard Rost 70
Family clustering
No two from same group in train & test|cross-train
-
© Burkhard Rost 71
Family clustering
No two from same group in train & test|cross-train
B
A’ B’?
A
-
© Burkhard Rost
case 1: both used before: i.e. training contained A ∧ B NOT interaction AB
case 2: either used for training i.e. train on A | B
case 3: neither A nor B used before
72
PPI sampling needs to consider proteins
Edward Marcotte
Univ Texas Austin
Yungki Park
SUNY Buffalo
Y Park & EM Marcotte (2012) Nature Meth 9: 1134-1136
A B
-
© Burkhard Rost 73
Reduced performance for new proteins
PIPE2 C1 (AB in training)
C3 (AB NOT in training)
SIGPROD
C1 (AB in training)
C3 (AB NOT in training)
SIGPROD: S Martin, D Roe & JL Faulon (2005) Bioinformatics 21:218-26PIPE2: S Pitre et al & A Golshani (2008) NAR 36:4286-94
T Hamp & B Rost 2015 Bioinformatics 31: 1521-5 7Tobias Hamp
-
© Burkhard Rost
Sequence similarity only for PPIs,
i.e. positivesNOT enough!
74
Tobias HampTT Hamp & B Rost 2015 Bioinformatics 31: 1521-5
-
© Burkhard Rost
we HAVE to also consider negative PPIs
for cross-validation
75
Tobias HampTT Hamp & B Rost 2015 Bioinformatics 31: 1521-5
-
© Burkhard Rost
Missing experimental data:
-> take all we have?
NO avoid bias!!
76T Hamp & B Rost 2015 Bioinformatics 31: 1521-5 7Tobias Hamp
-
© Burkhard Rost
Cross-validation challenge squared
for PPIs
77
Tobias HampTT Hamp & B Rost 2015 Bioinformatics 31: 1521-5
-
© Burkhard Rost
T Hamp & B Rost 2015 Bioinformatics 31: 1945-50T Hamp & B Rost 2015 Bioinformatics 31: 1521-5
78
PPI from sequence through SVM profile kernel
Tobias HampT
• SIGPROD: S Martin, D Roe & JL Faulon (2005) Bioinformatics 21:218-26
• PIPE2: S Pitre et al & A Golshani (2008) NAR 36:4286-94
PIPE2SIGPROD
p4ip4i_filtered
newnew
new new
new
new
C1 proteins have known PPIs C3 not PPI known
-
© Burkhard Rost 79
Predict subcellular localization: LOCtree 2: 18 classes!
Tatyana Goldberg
T Goldberg, T Hamp & B Rost (2012) submitted
k-mer profile kernel SVM
TatyanaGoldberg
T Goldberg, T HampRost (2012) submit
k-mer profile kernel
SVM
SVM
SVM
Tobias Hamp
-
© Burkhard Rost
we can predict PPI from sequence
alone!
80T Hamp & B Rost 2015 Bioinformatics 31: 1945-50
Tobias Hamp
-
© Burkhard Rost
BUT
81
-
© Burkhard Rost
SVMkernel predict PPI from sequence
But 1:1998-2015
828Tobias Hamp
-
© Burkhard Rost
SVMkernel predict PPI from sequence
But 2:method 41...
838Tobias Hamp
-
© Burkhard Rost 84
Positives (A-B) from PDB
PD Kwong, R Wyatt, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (1998) Nature 393, 648-659.PD Kwong, R Wyatt, S Majeed, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (2000) Structure 8, 1329-1339.
HIV gp120 / CD4 / FAB
-
© Burkhard Rost
Negatives:not observed
(human ~ 1.9M)
85
-
© Burkhard Rost
Result: method 41:
very very good
86
-
© Burkhard Rost
only problem of method 41:
predicts that protein is PDB or SwissProt!
878Tobias Hamp
-
© Burkhard Rost 88
Draft network of cell-to-cell communication in humanR
IKE
N C
LST
& P
ER
KIN
S
Ro
st L
ab a
t T
UM
& L
CS
B
Jayson Harshbarger
Alistair ForrestPiero CarninciJordan Ramilowski
Edda Kloppmann
Venkata Satagopam
Burkhard RostTatyana
Goldberg
Jordan A. Ramilowski1, Tatyana Goldberg2, Jayson Harschberger1, Edda Kloppmann2, Marina Lizio1, Venkata P. Satagopam3, Masayoshi Itoh1,4, Hideya Kawaji1,4, Piero Carnici1
Burkhard Rost2 & Allistair R.R. Forrest1,5
RIKEN CLST, 2. TUM, 3. LCSB, 4. RIKEN PMI, 5. PERKINS
Nature COMMUNICATIONS 2015 6: 7866
Marina Lizio
Hideya Kawaji
Masayoshi Itoh
C
Jordan A. RamilowskiJ 1, Tatyana Goldberg2, Jayson Harschberger2 1, Edda KlMarina Lizio1, Venkata P. Satagopam3, Masayoshi Itoh3 1,4, Hideya Kawaji1,4, Pi
Burkhard Rost2tt & Allistair R.R. Forrest1,5
RIKEN CLST, 2. TUM, 3. LCSB, 4. RIKEN PMI, 5. PERKINS
Nature COMMUNICATIONS 2015 6: 7866
Receptors cell-type specific
& often evolve before their receptors
-
© Burkhard Rost
No annotation in mammals without
disordered proteins
(IUP: Intrinsically Unstructured Proteins)
-
© Burkhard Rost 90
Order
-
© Burkhard Rost 91
Natively unstructured regions: induced fit
HJ Dyson & PE Wright 2005 Nat Rev Mol Cell Biol 6:197-208
-
© Burkhard Rost 92
Types of natively unstructured regions
HJ Dyson & PE Wright 2005 Nat Rev Mol Cell Biol 6:197-208
-
© Burkhard Rost
CV • BS Chemistry UC Berkeley • MS Physics Univ Wisconsin• PhD Biophysics Univ Wisconsin • PD (CompBiol)Yale• Indiana Univ
Publications (2015/06 GoogleScholar)• > 200 publications • 2x >1,000 • 40x >200 • H-index 78
93
Keith Dunker - Indiana UnivShapers and Shakers
A Keith Dunker Indiana Univ
www.youtube.com
Pedro Romero, Zoran Obradovic, Charles R Kissinger, J Ernest Villafranca, Ethan Garner, Stephen Guilliot, A Keith Dunker (1998) Thousands of proteins likely to have long disordered regions. Pac Symp Biocomput 3, 437-448
-
© Burkhard Rost 94
Dunker-hypothesis
Residues not visible in 3D structures share disorder
A Keith Dunker Indiana Univ
www.youtube.com
-
© Burkhard Rost
< 5% helix or strand > 70 residues
J Liu, H Tan & B Rost 2002 J Mol Biol 322:53-64 95
NORS: no regular secondary structure
JinfengLiu
-
© Burkhard Rost 96
Types of NORS in PDB
J Liu, H Tan & B Rost 2002 J Mol Biol 322:53-64Jinfeng Liu
-
© Burkhard Rost
less than 5% helix or strand over > 70 residues
machine learning:true: all predictions in entire proteomesfalse: the whole PDB
97A Schlessinger, J Liu and B Rost (2007) PLoS Comput Biol 3: e140
Predict NORS (no regular secondary structure)
Avner Schlessinger
-
© Burkhard Rost 98
More hotspots -> more party-hub like!
Non-hubsParty hubsDate hubs
0.0
0.1
0.2
0.3
0.4
0.4
9 26 42
micro: more hotspots
macro: more partners
Y Ofran, A Schlessinger & B Rost submitted
-
© Burkhard Rost 99
More unstructured -> more date-hub like!
Non-hubsParty hubsDate hubs
micro: more hotspots
macro: more partners
A Schlessinger, J Liu and B Rost (2007) PLoS Comput Biol 3: e140
-
© Burkhard Rost 100
Ucon: unstructured regions from contact prediction
A Schlessinger, M Punta and B Rost (2007) Bioinformatics 23: 2376-84Avner SchlessingerMarco Punta
-
© Burkhard Rost C Schaefer & B Rost 2010 Bioinformatics 26:625-31
Secondary structure (helix, strand)robust under random mutation,
disorder not
101
Christian Schaefer
-
© Burkhard Rost 102
Eukaryotes dominate disorder (4-10x)
A Schlessinger et al & B Rost 2011 Curr Opin Struc Biol 21:412-8
7-13%
36-43%
Esmeralda Vicedo
Avner Schlessinger
-
© Burkhard Rost
Avner Schlessinger
E Vicedo, A Schlessinger & B Rost (2015) PLoS One 10:E0133990
habi
tat
evolu
tion
103
Proteome disorder content
more similar to habitat than
family
Esmeralda Vicedo
-
© Burkhard Rost E Vicedo, Z Gasik, YA Dong, T Goldberg & B Rost (2015) F1000Res 4:1222 104
Quick reaction to heat stress: duplicate chromosome
Chromosomal duplication is a transientevolutionary solution to stress
AH Yona et al & Y Pilpel & O Dahan(2012) PNAS 109:21010-5
Orna Dahan
Yitzhak Pilpel
-
© Burkhard Rost AH Yona et al & Y Pilpel & O Dahan (2012) PNAS 109:21010-5E Vicedo, Z Gasik, YA Dong, T Goldberg & B Rost (2015) F1000Res 4:1222 105
Quick reaction to heat stress: avoid disorder
Esmeralda Vicedo
Orna Dahan
Yitzhak Pilpel
Zosia Gasik
-
© Burkhard Rost AH Yona et al & Y Pilpel & O Dahan (2012) PNAS 109:21010-5E Vicedo, Z Gasik, YA Dong, T Goldberg & B Rost (2015) F1000Res 4:1222 106
Quick reaction to heat stress: avoid disorder
Esmeralda Vicedo
Orna Dahan
Yitzhak Pilpel
Zosia Gasik
-
© Burkhard Rost 107
Structural universe
B Rost 1998 Structure 6:259-263
-
© Burkhard Rost
B Rost unpublished108
Attrition in structure determination
Date: 2008-11-18
-
© Burkhard Rost 109
Close to known structure -> more success
0.0
0.020
0.040
0.060
0.080
0.10
0.12
0 20 40 60 80 100
Succ
ess
(in P
DB/
clon
ed)
PIDE (Percentage pairwise sequence identity)
-
© Burkhard Rost 110
We cannot only do prokaryotes!
-
© Burkhard Rost 111
NYCOMPS (New York Consortium On Membrane Protein Structure)
S T R U C T U R A L B I O L O G Y
A peep through anion channelsThe crystal structure of a protein channel provides clues about the mechanisms that control the closure of pores found in the epidermis of plant leaves. Excitingly, the protein channel folds in a way never seen before. See Article p.1074
ARTICLEdoi:10.1038/nature09487
Homologue structure of the SLAC1 anionchannel for closing stomata in leavesYu-hang Chen1,2, Lei Hu3, Marco Punta2,4, Renato Bruni2, Brandan Hillerich2, Brian Kloss2, Burkhard Rost1,2,4, James Love2,Steven A. Siegelbaum3,5,6 & Wayne A. Hendrickson1,2,6,7
Nature 28 Oct 2010 Vol 267 p. 1074-80
a b c d
e f g h
Bacterium reveals how plants work
Wayne Hendrickson Columbia Univ, NYC
-
© Burkhard Rost 112
NYCOMPS pipeline stages
NYCOMPS ALL 9 PSI membrane
E Kloppmann et al (2012) Curr Op Struc Biol 22:1-7
Wayne Hendrickson
Edda Kloppmann
Marco Punta
-
© Burkhard Rost 113
11 medically important TMH predicted
OCTN1 Adiponectin receptor 1 MT-ND1
Crohn‘s disease, rheumatoid arthritis
diabetes, obesity, cancer
LHON, MELAS, Alzheimer, Parkinson
© Thomas Hopf - TUM & Debbie Marks - HMS TA Hopf et al & C Sander & DS Marks (2012) Cell 10 May doi: 10.1016
Chris Sander Sloan Kettering NYC
Thomas Hopf TUM Munich
Debora Marks Harvard Medical
-
© Burkhard Rost
Dark proteome
-
© Burkhard Rost 115
Unexpected ... dark proteome
N Perdigao et al & S O’Donoghue (2015) Unexpected features of the dark proteome. PNAS 112: 15898–903
Sean O’Donoghue CSIRO & Garvan Inst
Sydney
-
© Burkhard Rost
MetaStudent:how good is the Methods section?
-
© Burkhard Rost
A
A’
similarity > X
117
Homology-based inference
A’ sequence similar enough
to A ->
A’ and A: same function
C Sander & R Schneider 1991 Proteins 9:56-68
TobiasHamp
-
© Burkhard Rost
Year 2011: 1st Critical Assessment of Function Annotations (CAFA)
• Community effort to measure the current state of the art in function prediction
• Independent assessors, supported by renowned principal investigators
118
Critical Assessment
Tobias Hamp
-
© Burkhard Rost 119
Course: Protein Prediction II: Function
A
A’
similarity > X
A’ sequence similar enough
to A ->
A’ and A: same function
C Sander & R Schneider 1991 Proteins 9:56-68 Tobias Hamp
-
© Burkhard Rost 120
Course: Protein Prediction II: Function
3 Teams
Tobias Hamp
-
© Burkhard Rost 121
Course: Protein Prediction II: Function
Tobias Hamp
Tobias Hamp
-
© Burkhard Rost
Conclusion
-
© Burkhard Rost
major challenges: • cross-validation • data set preparation
rule-of-thumb:spend 90% of data & validation10% on method, write-up, publication
123
Machine learning IS modern understanding
-
© Burkhard Rost 124
Evolution teaches protein prediction
-
© Burkhard Rost 125
got.show: prediction of odds to survive
TUM courseJavaScript + DataMining
Guy Yachdav
Tatyana Goldberg
Christian Dallago
> 10 TV shows > 5 radio shows >600 printed newspapers >1.2 billion potential readers
-
© Burkhard Rost 126
Thanks & Bye
$$ NIH €€ AvH
Andrea Schafferhans
Lothar Richter
Tim Karl
Laszlo Kajan r
Tatyana Goldberg
Yannick Mahlich
Guy Yachdav
YYMaximilian Hecht
Edda Kloppmann
Marlena Drabik
Chris Schäfer
k Ch i Edda
Yana Bromberg
Rutgers U
Janet Kelso
MPI Leipzig
SeanO’DonoghueCSIRO Sydney
Avner Schlessinger
Mount Sinai
Yanay Ofran
Bar-Ilan U
Michal Linial
Hebrew U
Karima Djabali
TUM
MichalYA Reinhard Schneider
U Luxembourg
Chris Sander
Sloan Kettering
t Karima Lena Rost
BIS
JJStephan KramerMainz U
arg
Tobias Hamp
Mikael Bodén
UQ Brisbane
UQ IMB Brisbane
Nicholas Hamilton
Mark Ragan