201607 brisbane rost1 evolteaches -...

126
© Burkhard Rost 1 Evo pro Burkhard Rost TUM Munich Comp Biol @ INF & IAS & WZW Columbia U NYC - Biochem istry org

Transcript of 201607 brisbane rost1 evolteaches -...

  • © Burkhard Rost 1

    Evopro

    Burkhard Rost TUM Munich

    Comp Biol @ INF & IAS & WZWColumbia U NYC - Biochemistry

    org

  • © Burkhard Rost

    I. IntroductionEvolution teaches protein prediction

  • © Burkhard Rost 3

    Predict protein function

  • © Burkhard Rost

    Intuitive but not well-defined: • chemical how atom bound? • biochemical transferase • cellular (kinase) cell cycle • developmental time, regulatory • physiological related to disease • genetic dominant/recessive

    Protein function as action: Function = anything that happens to or through a protein

    4

    Protein function

  • © Burkhard Rost 5

    www.rostlab.org

    © Burkhard Rost 55

    Edda Kloppmann

    Andrea Schafferhans

    Esmeralda Vicedo

    Guy Yachdav

    Yannick Mahlich

    EddaEdda Andrea EAndrea EEsmeraldaEEEsmeraldaEE GuyGuy YannickYannick

    IngaWeise

    Juan Miguel Cecjuela

    Tatyana Goldberg

    Tobias Hamp

    MaximilianHecht

    Thomas Hopf

    Tim KarlTimTim Lothar

    RichterMaximilianMaximilianTobias MMMTobias MMMJonas

    ReebjMichael

    BernhoferMichaelMichael

    Martin Steinegger

    Carsten Uhlig

    Inga JJ JonasJonasll Jonas

    Ashish Baghudana

    pp

    Ashish

    pp

    Ashish

    gg

    MadhukarSP ShankarMadhukar

    Alexandru Buff

    Syeda Tanzeem H

    Charu

    Joel Daon

    Christian Dallago

    ChristianC

    Zosia Gasik

    Caroline Gergen

    Yulia Gembar-

    zhevskaya

    Yichun Lin

    Maximilian Miller

    Dimitrij Nechaev

    Jade Martins

    p

    Dimitrij

    Venkata P Satagopam

    Sven Punga

    Theresa Wirth

    Sebastian WilzbachSebastiannSebastiannSebastiann

    z

    Monika Varshney

  • © Burkhard Rost

    3D structureand function

    Epstein & Anfinsen, 1961:sequence uniquely determines structure

    INPUT: protein sequence OUTPUT:

    6

    Goal of protein prediction

  • © Burkhard Rost

    Need to know history to predict!

    Point mutationBinding (Substrate/Protein)Environmental change (DNA close/pH)

    7

    Prediction in terms of energy landscapes

  • © Burkhard Rost 8

    Evolution is history!

    Chris Sander & Reinhard Schneider 1991 Proteins 9:56-68B Rost 1999 Prot Engin 12:85-94

  • © Burkhard Rost 9

    1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

    SH3 Src-homology 3 domain one domain of proteins such as Src tyrosine kinase (STK)

  • © Burkhard Rost 10

    Evolution improves prediction

    Evolutionary profile implicitly captures history of and individual protein!

    fly

    chicken

    rat

    mouse

    human

  • © Burkhard Rost 11

    Using evolution to predict structure

    Sequence PSI-BLAST Filter

    PHDsec1993

    B Rost 1996 Meth Enzymol 266:525-539

    60% -> 72% /77%

    MaxHom

    S

  • © Burkhard Rost

    Q9P2H0

    Interactions & networks: protein-protein / protein-([DR]NA)LOCtree etc: predict localization Predict enzymatic activity & flexibility Protein disorder Predict membrane regions, epitomes, Improve alignment methods SNP-pipeline: predict nsSNP effectsPredictProtein: web service since 1992 NESG & NYCOMPS: structural genomics

    12

    Exciting projects

  • © Burkhard Rost

    Machine learning=

    understanding in 21st century!

  • © Burkhard Rost

    Machine learning: easy to make believe

  • © Burkhard Rost

    Prediction is the acid test

    for understanding

    15

  • © Burkhard Rost

    Prediction is the acid test for understanding...

    also when prediction based on

    machine learning?16

  • © Burkhard Rost

    Machine learning = black magic

    17

    © Wikipedia© Wikipedia

    Unknown artist - User scan of Sadie, Stanley, ed. (1992). The New Grove Dictionary of Opera 2: 132. London: Macmillan

    Black box image

    is nonsense17

    © Wikipedi© Wiki

    UUUnnknow(1992L

  • © Burkhard Rost

    complexity matches problem

    -> rules not simple

    18

  • © Burkhard Rost

    ML extracts truthIF { 1 cross-validation right 2 data set right}

    19

  • © Burkhard Rost 20

    Train

    Test

    Cross-validation: hide data under table

  • © Burkhard Rost 21

    WEKA-like cross-validation

    Train

    Test

  • © Burkhard Rost 22

    3-way cross-validation

    Train

    Test

    Cross-Train

  • © Burkhard Rost 23

    Family clustering

    No two from same group in train & test|cross-train

  • © Burkhard Rost

    Still not enough: exploit “prerelease” data (latest/hottest)

    24

  • © Burkhard Rost 25

    Chris Sander

    Dana Faber - Harvard

  • © Burkhard Rost 26

    et al

  • © Burkhard Rost 27

  • © Burkhard Rost

    Prediction is the acid test for understanding

    28

  • © Burkhard Rost

    Learning from machine learning

  • © Burkhard Rost 30

    Different interfaces, different physics?

    PD Kwong, R Wyatt, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (1998) Nature 393, 648-659.PD Kwong, R Wyatt, S Majeed, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (2000) Structure 8, 1329-1339.

    gp120

    CD4

    antibody-1antibody-2

    HIV gp120 / CD4 / FAB

    Do 6 types of interfaces differ in sequence?

    Internal (inter-domain and intra-domain)External homomers (permanent/transient)External heteromers (permanent/transient)

    Y Ofran & B Rost (2003) J Mol Biol 325, 377-87

  • © Burkhard Rost 31

    Interface types differ in composition

    Y Ofran & B Rost (2003) J Mol Biol 325, 377-87

    gp120

    CD4

    antibody-1antibody-2

    They obviously differ! But, are these differences meaningful?

    3Yanay Ofran

  • © Burkhard Rost

    Chi-square test: known problem: small data sets here millions of points

    all differences < 10-300

    -> SIGNIFICANT

    32

    Are these differences statistically significant?

    Y Ofran & B Rost 2005 unpublished3

    Yanay Ofran

  • © Burkhard Rost

    Chi-square test: known problem: small data sets here millions of points

    all differences < 10-300

    -> SIGNIFICANT

    … unfortunately also:proteins [a-b] vs [c-d]1 vs 2 authors random subsets ...

    33

    Are these differences statistically significant?

    Y Ofran & B Rost 2005 unpublished3

    Yanay Ofran

  • © Burkhard Rost 34

    Find-self test (statistical significance)

    Y Ofran & B Rost 2005 unpublished3

    Yanay Ofran

  • © Burkhard Rost 35

    Find-self test on six types of interfaces

    Y Ofran & B Rost (2003) J Mol Biol 325, 377-87

    gp120

    CD4

    antibody-1antibody-2

    3Yanay Ofran

  • © Burkhard Rost 36

    Predict PPI interfaces from sequence alone

  • © Burkhard Rost

    Η

    Ε

    L

    >

    >

    >

    pickmaximal

    unit=>

    currentprediction

    J2

    inputlayer

    first orhidden layer

    second oroutput layer

    s0 s1 s2J1

    :GYIY

    DPAVGDPDNGVEP

    GTEF:

    :GYIY

    DPEVGDPTQNIPP

    GTKF:

    :GYEY

    DPAEGDPDNGVKP

    GTSF:

    :GYEY

    DPAEGDPDNGVKP

    GTAF:

    Alignments

    5 . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 5 . .. . . . . . . 2 . . . . . 3 . . . . . .. . . . . . . . . . . . . . . . . 5 . .

    . . . . 5 . . . . . . . . . . . . . . .

    . . . 5 . . . . . . . . . . . . . . . .

    . . 3 . . . . 2 . . . . . . . . . . . .

    . . . . 1 . . 2 . . . 2 . . . . . . . .5 . . . . . . . . . . . . . . . . . . .. . . . 5 . . . . . . . . . . . . . . .. . . 5 . . . . . . . . . . . . . . . .. . . . 4 . 1 . . . . . . . . . . . . .. . . . 1 3 . . . 1 . . . . . . . . . .4 . . . . 1 . . . . . . . . . . . . . .. . . . . . . . . . . 4 . 1 . . . . . .. . . 1 . 1 . 1 2 . . . . . . . . . . .. . . 5 . . . . . . . . . . . . . . . .

    5 . . . . . . . . . . . . . . . . . . .. . . . . . 5 . . . . . . . . . . . . .. 1 1 . 1 . . 1 1 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 5 .

    GSAPD NTEKQ CVHIR LMYFW

    profile table

    :GYIY

    DPEDGDPDDGVNP

    GTDF:

    Protein

    corresponds to the the 21*3 bits coding for the profile of one residue

    37

    Alignment information

    B Rost (1996) Methods Enzymol 266:525-39B Rost (2001) J Struct Biol 134: 204-18

  • © Burkhard Rost B Rost (1996) Methods Enzymol 266:525-39B Rost (2001) J Struct Biol 134: 204-18 38

    More complex system to predict structure

    1996) Methods Enzymol 266:525-39

    Sequence PSI-BLAST Filter

    PROFsecPROFacc

    1999

  • © Burkhard Rost

    1992-now >200 packages in Debian >100k users registered from >110 nations >500 users/day, >12k users/month >57M predictions in 2012

    39

    PredictProtein

    Laszlo Kajan

    Guy Yachdav

    B Rost & C Sander (1992) Nature 360:540L Kajan, G Yachdav, et al. & B Rost (2013) Biomed Res Int G Yachdav et al & B Rost (2014) NAR 42:W337-43

  • © Burkhard Rost 40

    PP interfaces predicted from sequence

    Y Ofran & B Rost 2003 FEBS Lett 544:236-9Y Ofran & B Rost 2007 Bioinformatics e13-16

    InteractSites

    4Yanay Ofran

  • © Burkhard Rost

    PPI hot spots?

    41

  • © Burkhard Rost

    residues that are essential for protein-protein interactions operational:

    • 1. residue in the interface • 2. mutation of the residue knocks out interaction

    42

    Interaction HOT SPOTS

  • © Burkhard Rost 43

    PP interfaces predicted from sequence

    Very strong =

    hot spots ?

    Y Ofran & B Rost 2003 FEBS Lett 544:236-9Y Ofran & B Rost 2007 Bioinformatics e13-16

    InteractSites

    4Yanay Ofran

  • © Burkhard Rost

    Y Ofran and B Rost (2003) FEBS Letters 544: 236-9Y Ofran and B Rost (2007) PLoS Comp Biol 3: e119 44

    Strength of prediction reflects reliability?

    0.6

    0.4

    0.9

    0.1

    weakstrong

  • © Burkhard Rost 45

    Prediction of hot spots for CD4

    • alanine scan for V1 domain of CD4 (bound to gp120)(A Ashkenazi et al. & DJ Capon (1990) PNAS87, 7150)

    red: observed

    • structure:PD Kwong et al. & WA Hendrickson (2000) Structure 8, 1329-1339.

    purple: predicted

    Y Ofran & B Rost 2007 PLoS CB 3:e119 Yanay Ofran

    method

    4Yanay Ofran

  • © Burkhard Rost 46

    Hot spots prediction requires full information

    Y Ofran & B Rost 2007 PLoS CB 3:e1194

    Yanay Ofran

  • © Burkhard Rost 47

    Connect micro- and macro-level?

    micro level:residuesRIGHT: more hotspots

    macro level:networks

    UP: more partners

    4Yanay Ofran

  • © Burkhard Rost

    Hubs: promiscuous proteins

    Date/Party hubsNotation introduced by Marc VidalJD Han et al. & M Vidal 2004 Nature 430:88-93

    • Date hubs interactions at different times/same location?• Party hubs interactions at same time/different location

    48

    Date- and Party-hubs

  • © Burkhard Rost 49

    More hotspots -> more party-hub like!

    Non-hubsParty hubsDate hubs

    0.0

    0.1

    0.2

    0.3

    0.4

    0.4

    9 26 42

    micro: more hotspots

    macro: more partners

    Y Ofran, A Schlessinger & B Rost (2008) unpublishedA Feiglin, S Ashkenazi, A Schlessinger, B Rost and Y Ofran (2014) Mol Biosyst 10: 787-94 4

    Yanay Ofran

  • © Burkhard Rost

    secondary structure prediction

    50

  • © Burkhard Rost

    single residues (1. generation)

    • Chou-Fasman, GOR 1957-70/8050-55% accuracy

    segments (2. generation)

    • GORIII 1986-9255-60% accuracy

    problems • < 100% argument: 65% max • < 40% argument: strand non-local

    51

    Secondary structure prediction: 1.+2. Generation

    residuesiandi+3

  • © Burkhard Rost

    ACDEFGHIKLMNPQRSTVWY.

    H

    E

    L

    D (L)

    R (E)

    Q (E)

    G (E)

    F (E)

    V (E)

    P (E)

    A (H)

    A (H)

    Y (H)

    V (E)

    K (E)

    K (E)

    52

    Neural Network for secondary structure

  • © Burkhard Rost 53

    NN sec str: training dynamics

    0

    0.2

    0.4

    0.6

    0.8

    1

    1 2 3 4 5 6 7 8 9 10

    Other Strand Helix

    time: 1 step = 20,000 training samples

    Perfo

    rman

    ce

    Eμ = oiμ − di

    μ( )i∑

    2

    ΔJμ ∝ -∂Eμ{J}∂J

  • © Burkhard Rost 54

    Balanced training: dynamics

    0

    0.20.40.6

    0.81

    1 2 3 4 5 6 7 8 9 10

    Other Strand Helix

    1 2 3 4 5 6 7 8 9 10

    1 0.8 0.6 0.4 0.2 0

    unbalanced balanced

    Eμ = oiμ − di

    μ( )i∑

    2

    ΔJμ ∝ - ∂Eμ{J}∂J

    train:

    E = oiμ − di

    μ( )i∑

    μ=α ,β,L∑

  • © Burkhard Rost

    helix strand otheroverallaccuracymethod

    unbalanced 62%comparison:data bankdistribution

    comparison:33:33:33balanced 60%

    55

    full pie: all correctly predicted residues

  • © Burkhard Rost

    Machine learning2nd major challenge: data set preparation

    56

  • © Burkhard Rost

    ML 1 - PPIphysical

    Protein-ProteinInteractions

  • © Burkhard Rost 58

    Physical interaction NOT association

    PD Kwong, R Wyatt, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (1998) Nature 393, 648-659.PD Kwong, R Wyatt, S Majeed, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (2000) Structure 8, 1329-1339.

    HIV gp120 / CD4 / FAB

    YES

    YES

    NO

  • © Burkhard Rost

    Predict PPI:best way: use evolutionary information

  • © Burkhard Rost 60

    PPI pairs more conserved for paralogs than for orthologs

    more similarless similar

    S Mika & B Rost 2006 PLoS Genetics 2:e29

    non-worm

    worm

    B

    A’’ B’’

    A’ B’?

    ?

    A

    ?

  • © Burkhard Rost

    PPI FACT 1:more conserved within than between organisms“paralogs” more conserved than

    “orthologs”

    61S Mika & B Rost 2006 PLoS Genetics, Vol 2, e29

  • © Burkhard Rost 62

    Order

  • © Burkhard Rost

    Measuring the interaction between A-B twice, results in the same interface?

    636Tobias Hamp

    Tobias Hamp

  • © Burkhard Rost T Hamp & B Rost 2012 PLoS Comp Biol 8:e1002623

    64

    Not homology-based inference, but details!

    A B

    identical

    A-B1 experimental structure 1A-B2 experimental structure 2

    identical

    A B

    interfaces 1 and 2

    identical?

    6Tobias Hamp

  • © Burkhard Rost 65

    Mostly the same but many differ

    T Hamp & B Rost 2012 PLoS Comp Biol 8:e10026236

    Tobias Hamp

  • © Burkhard Rost 66

    Many examples for alternative interfaces

    T Hamp & B Rost 2012 PLoS Comp Biol 8:e10026236

    Tobias Hamp

  • © Burkhard Rost

    FACT 2:Protein-Protein

    interaction interfaces vary a LOT LOT LOT

    67T Hamp & B Rost 2012 PLoS Comp Biol 8:e1002623

    Tobias Hamp 6

    Tobias Hamp

  • © Burkhard Rost

    Predicting pairs of protein-protein

    interactions (PPIs)

    68

  • © Burkhard Rost

    PPIchallenge

    machine learningMUCH more

    69

  • © Burkhard Rost 70

    Family clustering

    No two from same group in train & test|cross-train

  • © Burkhard Rost 71

    Family clustering

    No two from same group in train & test|cross-train

    B

    A’ B’?

    A

  • © Burkhard Rost

    case 1: both used before: i.e. training contained A ∧ B NOT interaction AB

    case 2: either used for training i.e. train on A | B

    case 3: neither A nor B used before

    72

    PPI sampling needs to consider proteins

    Edward Marcotte

    Univ Texas Austin

    Yungki Park

    SUNY Buffalo

    Y Park & EM Marcotte (2012) Nature Meth 9: 1134-1136

    A B

  • © Burkhard Rost 73

    Reduced performance for new proteins

    PIPE2 C1 (AB in training)

    C3 (AB NOT in training)

    SIGPROD

    C1 (AB in training)

    C3 (AB NOT in training)

    SIGPROD: S Martin, D Roe & JL Faulon (2005) Bioinformatics 21:218-26PIPE2: S Pitre et al & A Golshani (2008) NAR 36:4286-94

    T Hamp & B Rost 2015 Bioinformatics 31: 1521-5 7Tobias Hamp

  • © Burkhard Rost

    Sequence similarity only for PPIs,

    i.e. positivesNOT enough!

    74

    Tobias HampTT Hamp & B Rost 2015 Bioinformatics 31: 1521-5

  • © Burkhard Rost

    we HAVE to also consider negative PPIs

    for cross-validation

    75

    Tobias HampTT Hamp & B Rost 2015 Bioinformatics 31: 1521-5

  • © Burkhard Rost

    Missing experimental data:

    -> take all we have?

    NO avoid bias!!

    76T Hamp & B Rost 2015 Bioinformatics 31: 1521-5 7Tobias Hamp

  • © Burkhard Rost

    Cross-validation challenge squared

    for PPIs

    77

    Tobias HampTT Hamp & B Rost 2015 Bioinformatics 31: 1521-5

  • © Burkhard Rost

    T Hamp & B Rost 2015 Bioinformatics 31: 1945-50T Hamp & B Rost 2015 Bioinformatics 31: 1521-5

    78

    PPI from sequence through SVM profile kernel

    Tobias HampT

    • SIGPROD: S Martin, D Roe & JL Faulon (2005) Bioinformatics 21:218-26

    • PIPE2: S Pitre et al & A Golshani (2008) NAR 36:4286-94

    PIPE2SIGPROD

    p4ip4i_filtered

    newnew

    new new

    new

    new

    C1 proteins have known PPIs C3 not PPI known

  • © Burkhard Rost 79

    Predict subcellular localization: LOCtree 2: 18 classes!

    Tatyana Goldberg

    T Goldberg, T Hamp & B Rost (2012) submitted

    k-mer profile kernel SVM

    TatyanaGoldberg

    T Goldberg, T HampRost (2012) submit

    k-mer profile kernel

    SVM

    SVM

    SVM

    Tobias Hamp

  • © Burkhard Rost

    we can predict PPI from sequence

    alone!

    80T Hamp & B Rost 2015 Bioinformatics 31: 1945-50

    Tobias Hamp

  • © Burkhard Rost

    BUT

    81

  • © Burkhard Rost

    SVMkernel predict PPI from sequence

    But 1:1998-2015

    828Tobias Hamp

  • © Burkhard Rost

    SVMkernel predict PPI from sequence

    But 2:method 41...

    838Tobias Hamp

  • © Burkhard Rost 84

    Positives (A-B) from PDB

    PD Kwong, R Wyatt, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (1998) Nature 393, 648-659.PD Kwong, R Wyatt, S Majeed, J Robinson, RW Sweet, J Sodroski & WA Hendrickson (2000) Structure 8, 1329-1339.

    HIV gp120 / CD4 / FAB

  • © Burkhard Rost

    Negatives:not observed

    (human ~ 1.9M)

    85

  • © Burkhard Rost

    Result: method 41:

    very very good

    86

  • © Burkhard Rost

    only problem of method 41:

    predicts that protein is PDB or SwissProt!

    878Tobias Hamp

  • © Burkhard Rost 88

    Draft network of cell-to-cell communication in humanR

    IKE

    N C

    LST

    & P

    ER

    KIN

    S

    Ro

    st L

    ab a

    t T

    UM

    & L

    CS

    B

    Jayson Harshbarger

    Alistair ForrestPiero CarninciJordan Ramilowski

    Edda Kloppmann

    Venkata Satagopam

    Burkhard RostTatyana

    Goldberg

    Jordan A. Ramilowski1, Tatyana Goldberg2, Jayson Harschberger1, Edda Kloppmann2, Marina Lizio1, Venkata P. Satagopam3, Masayoshi Itoh1,4, Hideya Kawaji1,4, Piero Carnici1

    Burkhard Rost2 & Allistair R.R. Forrest1,5

    RIKEN CLST, 2. TUM, 3. LCSB, 4. RIKEN PMI, 5. PERKINS

    Nature COMMUNICATIONS 2015 6: 7866

    Marina Lizio

    Hideya Kawaji

    Masayoshi Itoh

    C

    Jordan A. RamilowskiJ 1, Tatyana Goldberg2, Jayson Harschberger2 1, Edda KlMarina Lizio1, Venkata P. Satagopam3, Masayoshi Itoh3 1,4, Hideya Kawaji1,4, Pi

    Burkhard Rost2tt & Allistair R.R. Forrest1,5

    RIKEN CLST, 2. TUM, 3. LCSB, 4. RIKEN PMI, 5. PERKINS

    Nature COMMUNICATIONS 2015 6: 7866

    Receptors cell-type specific

    & often evolve before their receptors

  • © Burkhard Rost

    No annotation in mammals without

    disordered proteins

    (IUP: Intrinsically Unstructured Proteins)

  • © Burkhard Rost 90

    Order

  • © Burkhard Rost 91

    Natively unstructured regions: induced fit

    HJ Dyson & PE Wright 2005 Nat Rev Mol Cell Biol 6:197-208

  • © Burkhard Rost 92

    Types of natively unstructured regions

    HJ Dyson & PE Wright 2005 Nat Rev Mol Cell Biol 6:197-208

  • © Burkhard Rost

    CV • BS Chemistry UC Berkeley • MS Physics Univ Wisconsin• PhD Biophysics Univ Wisconsin • PD (CompBiol)Yale• Indiana Univ

    Publications (2015/06 GoogleScholar)• > 200 publications • 2x >1,000 • 40x >200 • H-index 78

    93

    Keith Dunker - Indiana UnivShapers and Shakers

    A Keith Dunker Indiana Univ

    www.youtube.com

    Pedro Romero, Zoran Obradovic, Charles R Kissinger, J Ernest Villafranca, Ethan Garner, Stephen Guilliot, A Keith Dunker (1998) Thousands of proteins likely to have long disordered regions. Pac Symp Biocomput 3, 437-448

  • © Burkhard Rost 94

    Dunker-hypothesis

    Residues not visible in 3D structures share disorder

    A Keith Dunker Indiana Univ

    www.youtube.com

  • © Burkhard Rost

    < 5% helix or strand > 70 residues

    J Liu, H Tan & B Rost 2002 J Mol Biol 322:53-64 95

    NORS: no regular secondary structure

    JinfengLiu

  • © Burkhard Rost 96

    Types of NORS in PDB

    J Liu, H Tan & B Rost 2002 J Mol Biol 322:53-64Jinfeng Liu

  • © Burkhard Rost

    less than 5% helix or strand over > 70 residues

    machine learning:true: all predictions in entire proteomesfalse: the whole PDB

    97A Schlessinger, J Liu and B Rost (2007) PLoS Comput Biol 3: e140

    Predict NORS (no regular secondary structure)

    Avner Schlessinger

  • © Burkhard Rost 98

    More hotspots -> more party-hub like!

    Non-hubsParty hubsDate hubs

    0.0

    0.1

    0.2

    0.3

    0.4

    0.4

    9 26 42

    micro: more hotspots

    macro: more partners

    Y Ofran, A Schlessinger & B Rost submitted

  • © Burkhard Rost 99

    More unstructured -> more date-hub like!

    Non-hubsParty hubsDate hubs

    micro: more hotspots

    macro: more partners

    A Schlessinger, J Liu and B Rost (2007) PLoS Comput Biol 3: e140

  • © Burkhard Rost 100

    Ucon: unstructured regions from contact prediction

    A Schlessinger, M Punta and B Rost (2007) Bioinformatics 23: 2376-84Avner SchlessingerMarco Punta

  • © Burkhard Rost C Schaefer & B Rost 2010 Bioinformatics 26:625-31

    Secondary structure (helix, strand)robust under random mutation,

    disorder not

    101

    Christian Schaefer

  • © Burkhard Rost 102

    Eukaryotes dominate disorder (4-10x)

    A Schlessinger et al & B Rost 2011 Curr Opin Struc Biol 21:412-8

    7-13%

    36-43%

    Esmeralda Vicedo

    Avner Schlessinger

  • © Burkhard Rost

    Avner Schlessinger

    E Vicedo, A Schlessinger & B Rost (2015) PLoS One 10:E0133990

    habi

    tat

    evolu

    tion

    103

    Proteome disorder content

    more similar to habitat than

    family

    Esmeralda Vicedo

  • © Burkhard Rost E Vicedo, Z Gasik, YA Dong, T Goldberg & B Rost (2015) F1000Res 4:1222 104

    Quick reaction to heat stress: duplicate chromosome

    Chromosomal duplication is a transientevolutionary solution to stress

    AH Yona et al & Y Pilpel & O Dahan(2012) PNAS 109:21010-5

    Orna Dahan

    Yitzhak Pilpel

  • © Burkhard Rost AH Yona et al & Y Pilpel & O Dahan (2012) PNAS 109:21010-5E Vicedo, Z Gasik, YA Dong, T Goldberg & B Rost (2015) F1000Res 4:1222 105

    Quick reaction to heat stress: avoid disorder

    Esmeralda Vicedo

    Orna Dahan

    Yitzhak Pilpel

    Zosia Gasik

  • © Burkhard Rost AH Yona et al & Y Pilpel & O Dahan (2012) PNAS 109:21010-5E Vicedo, Z Gasik, YA Dong, T Goldberg & B Rost (2015) F1000Res 4:1222 106

    Quick reaction to heat stress: avoid disorder

    Esmeralda Vicedo

    Orna Dahan

    Yitzhak Pilpel

    Zosia Gasik

  • © Burkhard Rost 107

    Structural universe

    B Rost 1998 Structure 6:259-263

  • © Burkhard Rost

    B Rost unpublished108

    Attrition in structure determination

    Date: 2008-11-18

  • © Burkhard Rost 109

    Close to known structure -> more success

    0.0

    0.020

    0.040

    0.060

    0.080

    0.10

    0.12

    0 20 40 60 80 100

    Succ

    ess

    (in P

    DB/

    clon

    ed)

    PIDE (Percentage pairwise sequence identity)

  • © Burkhard Rost 110

    We cannot only do prokaryotes!

  • © Burkhard Rost 111

    NYCOMPS (New York Consortium On Membrane Protein Structure)

    S T R U C T U R A L B I O L O G Y

    A peep through anion channelsThe crystal structure of a protein channel provides clues about the mechanisms that control the closure of pores found in the epidermis of plant leaves. Excitingly, the protein channel folds in a way never seen before. See Article p.1074

    ARTICLEdoi:10.1038/nature09487

    Homologue structure of the SLAC1 anionchannel for closing stomata in leavesYu-hang Chen1,2, Lei Hu3, Marco Punta2,4, Renato Bruni2, Brandan Hillerich2, Brian Kloss2, Burkhard Rost1,2,4, James Love2,Steven A. Siegelbaum3,5,6 & Wayne A. Hendrickson1,2,6,7

    Nature 28 Oct 2010 Vol 267 p. 1074-80

    a b c d

    e f g h

    Bacterium reveals how plants work

    Wayne Hendrickson Columbia Univ, NYC

  • © Burkhard Rost 112

    NYCOMPS pipeline stages

    NYCOMPS ALL 9 PSI membrane

    E Kloppmann et al (2012) Curr Op Struc Biol 22:1-7

    Wayne Hendrickson

    Edda Kloppmann

    Marco Punta

  • © Burkhard Rost 113

    11 medically important TMH predicted

    OCTN1 Adiponectin receptor 1 MT-ND1

    Crohn‘s disease, rheumatoid arthritis

    diabetes, obesity, cancer

    LHON, MELAS, Alzheimer, Parkinson

    © Thomas Hopf - TUM & Debbie Marks - HMS TA Hopf et al & C Sander & DS Marks (2012) Cell 10 May doi: 10.1016

    Chris Sander Sloan Kettering NYC

    Thomas Hopf TUM Munich

    Debora Marks Harvard Medical

  • © Burkhard Rost

    Dark proteome

  • © Burkhard Rost 115

    Unexpected ... dark proteome

    N Perdigao et al & S O’Donoghue (2015) Unexpected features of the dark proteome. PNAS 112: 15898–903

    Sean O’Donoghue CSIRO & Garvan Inst

    Sydney

  • © Burkhard Rost

    MetaStudent:how good is the Methods section?

  • © Burkhard Rost

    A

    A’

    similarity > X

    117

    Homology-based inference

    A’ sequence similar enough

    to A ->

    A’ and A: same function

    C Sander & R Schneider 1991 Proteins 9:56-68

    TobiasHamp

  • © Burkhard Rost

    Year 2011: 1st Critical Assessment of Function Annotations (CAFA)

    • Community effort to measure the current state of the art in function prediction

    • Independent assessors, supported by renowned principal investigators

    118

    Critical Assessment

    Tobias Hamp

  • © Burkhard Rost 119

    Course: Protein Prediction II: Function

    A

    A’

    similarity > X

    A’ sequence similar enough

    to A ->

    A’ and A: same function

    C Sander & R Schneider 1991 Proteins 9:56-68 Tobias Hamp

  • © Burkhard Rost 120

    Course: Protein Prediction II: Function

    3 Teams

    Tobias Hamp

  • © Burkhard Rost 121

    Course: Protein Prediction II: Function

    Tobias Hamp

    Tobias Hamp

  • © Burkhard Rost

    Conclusion

  • © Burkhard Rost

    major challenges: • cross-validation • data set preparation

    rule-of-thumb:spend 90% of data & validation10% on method, write-up, publication

    123

    Machine learning IS modern understanding

  • © Burkhard Rost 124

    Evolution teaches protein prediction

  • © Burkhard Rost 125

    got.show: prediction of odds to survive

    TUM courseJavaScript + DataMining

    Guy Yachdav

    Tatyana Goldberg

    Christian Dallago

    > 10 TV shows > 5 radio shows >600 printed newspapers >1.2 billion potential readers

  • © Burkhard Rost 126

    Thanks & Bye

    $$ NIH €€ AvH

    Andrea Schafferhans

    Lothar Richter

    Tim Karl

    Laszlo Kajan r

    Tatyana Goldberg

    Yannick Mahlich

    Guy Yachdav

    YYMaximilian Hecht

    Edda Kloppmann

    Marlena Drabik

    Chris Schäfer

    k Ch i Edda

    Yana Bromberg

    Rutgers U

    Janet Kelso

    MPI Leipzig

    SeanO’DonoghueCSIRO Sydney

    Avner Schlessinger

    Mount Sinai

    Yanay Ofran

    Bar-Ilan U

    Michal Linial

    Hebrew U

    Karima Djabali

    TUM

    MichalYA Reinhard Schneider

    U Luxembourg

    Chris Sander

    Sloan Kettering

    t Karima Lena Rost

    BIS

    JJStephan KramerMainz U

    arg

    Tobias Hamp

    Mikael Bodén

    UQ Brisbane

    UQ IMB Brisbane

    Nicholas Hamilton

    Mark Ragan