Post on 05-Jun-2018
Concepts of molecular analysis I:Big Data and Knowledge Mining
SAMO Interdisciplinary Workshop on Molecular Analysis in Clinical Practice
Hotel Hermitage, Lucerne, October 21, 2016
Peter J. Wild
Systems Pathology
Institute of Pathology and Molecular Pathology
FAZ 15.6.2016: 2 Terabyte pro Patient«Big Data kann helfen, Leben zu retten. Abernur, wenn man die Informationsflut bewältigt»
Disclosures
• Participation in advisory boards or speakers bureau (compensated):
Thermo Fisher Scientific, Roche Diagnostics, Sophia Genetics SA, Myriad
Genetics, Ventana Medical Systems, Life Technologies, Astellas Pharma AG,
Merck AG, Sanofi-Aventis, Janssen-Cilag AG, Astra Zeneca.
• Research Support:
Gilead, Astra Zeneca, Ventana Medical Systems
Big Data = Automation of Experience
The term “Big Data” is on everyone’s lips, but not everyone understands thesame thing by it.
Donald Kossmann: „My favourite definition of big data is the “automation ofexperience.” Essentially, this means that you learn from the past with an eye onthe future and avoid making the same mistake twice.“
Globe magazine, ETHZ, June 2014
Joachim Buhmann: „Machine learningalgorithms search data sets for patternsand characteristic structures. Typical tasksare the classification of data, ...“
Outline1. Big Data in Molecular Analysis2. Examples for Knowledgebase Mining3. ZurichCancerMaps
KRAS Mutation
Normal(Wildtype)
Missense mutationKRAS c.34G>T
(p.G12C)
DNA ... GCT GGT GGC...
... GCT TGT GGC ...
RNA
Protein ... A - G - G ... ... A - C - G ...Glycin(G) Cystein (C)
Function Normal Activation
Example: c.34G>T (G12C) Exon 2
Evidence Indication Alteration Drug (s)
FDAApprovedLabels
Breast Cancer ERBB2 amplification pertuzumab, trastuzumab
Colorectal Cancer KRAS mutationcetuximab, panitumumabcontraindicated
Gastric Cancer ERBB2 amplification trastuzumab
Melanoma BRAF mutationdabrafenib, trametinib,vemurafenib
Non-Small Cell Lung Cancer ALK fusion ceritinib, crizotinib
Non-Small Cell Lung Cancer EGFR mutation afatinib, erlotinib
NCCNGuidelines
Gastrointestinal Stromal Tumor PDGFRA mutation dasatinib
Colorectal Cancer NRAS mutationcetuximab, panitumumabcontraindicated
Melanoma KIT mutation imatinib mesylate
Non-Small Cell Lung Cancer BRAF mutation dabrafenib, vemurafenib
Non-Small Cell Lung Cancer ERBB2 mutation afatinib
Non-Small Cell Lung Cancer MET amplification crizotinib
Non-Small Cell Lung Cancer RET fusion cabozantinib
Non-Small Cell Lung Cancer ROS1 fusion crizotinib
12 different alterations aligned to 14 different approved therapies
Many Alterations Already Aligned to Therapies
Alteration Indication Investigational drug(s)
AKT1 mutation Multiple MK-2206, MSC-2363318A
CCND1 amplification Multiple palbociclib
CDK4 amplification, mutation Melanoma, NSCLC palbociclib
CDK6 amplification NSCLC palbociclib
DDR2 mutation Multiple crizotinib + dasatinib
KRAS mutation Multiple various MEKi combinations
ERBB3 mutation Multiple neratinib
FGFR1-4 mutation, amplification, fusion Multiple BGJ-398, JNJ-42756493
GNA11 mutation Melanoma vorinostat
GNAQ mutation Melanoma vorinostat
HRAS mutation Multiple binimetinib + panitumumab, BVD-523
IDH1 mutation Multiple AG-120
KIT amplification Melanoma dasatinib
NRAS mutation Multiple various MEKi combinations
MET mutation Multiple AMG-337, crizotinib, INCB-028060
MTOR mutation Multiple MSC-2363318A
MYCN amplification Multiple GSK-525762
PDGFRA amplification Glioblastoma nilotinib, sorafenib
PIK3CA mutation Multiple various PI3K pathway combinations
PPARG fusion Thyroid Cancer pioglitazone
PTCH1 mutation Multiple vismodegib
RET mutation NSCLC, Thyroid Cancer ponatinib, sunitinib
SMO mutation Multiple vismodegib
STK11 mutation Multiple MSC-2363318A
More Therapies are Under Investigation...
Today’s Challenges
• Growing number of oncology biomarkers with clinical utility
• Multiple, global sources of information are not standardized
• Variants identified via NGS need to be quickly and accuratelyassociated to actionable information
Molecular Biomarkers in Oncology
Current methods are time intensive and requireextensive research of multiple sources to mapactionable information to variants
OncoPortal (Sophia Genetics)Oncomine Knowledge Base (Thermo Fisher)
Science 2016
Outline1. Big Data in Molecular Analysis2. Examples for Knowledgebase Mining3. ZurichCancerMaps
The world’s largest curated cancer genomic database, gathered from public sources,peer reviewed literature, and published clinical trials
Hovelson et al., Neoplasia 2015
The Oncomine Knowledgebase
Solid Tumor Variant Map
Hovelson et al., Neoplasia 2015
Data analysis performed using the Oncomine Knowledgebase
Oncomine Focus Assay – Gene List
Detects variants in 52 solid tumor genes that are associated withcurrent oncology drugs and backed by published evidence
Oncomine Knowledgebase Reporter (OKR)
Oncomine™
KnowledgebaseReporter*
Ion Reporter™
Workflow*
Lab GeneratedReport*
ResearchLaboratory
• US FDA labels• US NCCN Guidelines• EMA labels• ESMO Guidelines• Global clinical trials
Geneticvariants
Associated publishedevidence
For Research Use Only. Not for use in diagnostic procedures.
23 Cancer types
69 Countries with enrolling trials
4 Sources for labels and guidelines
Generating a Custom Report
For Research Use Only. Not for use in diagnostic procedures.
The Report: 1 Variant Summary
Variant summary: shows all gene variants with associated information in the
report and the cancer type information by source as well as global clinical trial
status.
For Research Use Only. Not for use in diagnostic procedures.
The Report: 2 Relevant Therapy Summary
For each gene variant, published therapies from
each source are given an evidence label.
In this cancer type
In other cancer type
In this cancer type and other cancer type
Contraindicated
Both for use and contraindicated
No evidence
Global clinical trials are also labeled.
For Research Use Only. Not for use in diagnostic procedures.
The Report: 3 Current Source Information
For each gene variant, a summary of each
therapy is given with:
Cancer type
Label date
Class
Indication and usage summary for FDA
labels
Reference
For Research Use Only. Not for use in diagnostic procedures.
The Report: 4 Global Clinical Trial Information
For each gene variant, a summary of open
global clinical trials with:
Summary: Trial identifier, Trial title
Cancer type
Class
Other identifiers
Population segments
Phase
Published therapies
Countries
US States
Contact information
For Research Use Only. Not for use in diagnostic procedures.
Example: EGFR del exon 19
• 6/2015: Biopsy of the right pleura due to recurrent effusions with advanced
adenocarcinoma of lung
• Sanger sequencing: EGFR Deletion in Exon 19 (pE746_A750del)
53 y/o male pt. with lung adenocarcinoma
Initital diagnosis07/2015
PR 3 months later10/2015
Oligo-progressivedisease6/2016
Therapy with Afatinib
Whole body PET scan
Mechanisms of drug resistence to EGFR tyrosine kinaseinhibitors in EGFR-mutant NSCLC
EGFR p.T790M (50%)
Sharma et al, Nature Rev. Cancer 2007Cortot, Jänne. ERR 2014
Case: Liquid Biopsy from cfDNA
Resistence mechanism? Osimertinib in EGFR-TKI-resistantEGFR p.T790M positive NSCLC
Jänne et al, NEJM 2015
53 y/o male pt. with lung adenocarcinoma
Outline1. Big Data in Molecular Analysis2. Examples for Knowledgebase Mining3. ZurichCancerMaps
The concept of ZurichCancerMapsor how do we get from Big Data to PM?
Combine big medical data & cancer genomics data to model patients, predict
outcomes, optimize treatment and design clinical trials.
Gunnar Rätsch
etc.
The problem
• To date, large amounts of molecular, image and clinical data are savedunstructured and not accessible.
• The quantitative molecular make-up of a particular specimen to gainclinically important insights is a central component of Precision Medicine.
• Clinical specimens are unique, finite and cannot be reproduced.
ZurichCancerMaps
Definition
Generation of a Digital Biobank of clinicalspecimens where the genomic and expressedtranscriptomic, proteomic and metabolomicinformation is recorded in searchable digital filesthat are stored in a database, along with clinicalmetadata
Data Warehouse
KlinischesInformationssystem
(KISIM)
LIMS(Molis*, Patho- /Dermapro****)
Liquid *(Blut, Urin, Speichel,
Liquor (Hirnflüssigkeit),Ergüsse)
Tissue & Cell **(Gewebe & aus
Tumoren gezüchteteZellen)
Reproduction ***(Stammzellen,
Eizellen, Spermien)
Research Data Service Center(Oracle TRC)
Raw files
ProcessingInput files(u.a. VCF)
Studiensystem(Secutrial)
PACS(Impax,(Radiologiebilder)
Bilder Allgemein(Synedra / Histo-DB)
Krebsregister(nicht nur USZ;Survivaldaten)
* Molis: für Liquid Diagnostik und evtl. Liquid Biobank** SLIMS: mögliche Lösung für Tissue & Cell Biobank*** RURO: bereits im Einsatz für Probenlagerung****Patho-/Dermapro: für Tissue Diagnostik (keine Biobank Funktionalität)
Bereitstellunganonymisierte undverdichteteInformationen
Separate, abgeschottete Zonefür Externe
Processing
Probe
Sequencing
Measurements
Biobanken
Data Visualization cBioPortal (The Hyve, Cambridge)
EGFR p.T790M
2008: “Computational Pathology”
Fuchs, Wild, et al. MICCAI 2008Raman et al., BMC Bioinformatics 2010Schüffler et al. J Pathol Inform 2013Rupp et al., J Pathol Inform 2016Zhong et al., Sci Rep 2016Zerhouni, et al., Proc. of SPIE 2016
Detects cancer cell nuclei of renal cell carcinoma and predictsimmunohistochemical staining of Ki-67 on TMAs
Total no. of digital slide scans since 2007 at USZ
20
07
.12
20
16
.05
2000
1500
1000
500
20
12
.12
20
13
.12
20
14
.12
20
15
.12
20
08
.12
20
09
.12
20
10
.12
20
11
.12
Scans per month
2011-2013: TMARKER software
P.J. Schüffler, et al. J Pathol Inform 2013
http://www.nexus.ethz.ch/equipment_tools/software/tmarker.html
• Generic• Integrative• Open-source
PrECISE Project
TeamMaria Rodriguez Martinez – IBMHeinz Köppl - TU DarmstadtPavel Sumazin – Baylor CollegeZsolt Torok – Astrid Bio Technologies Kft.Julio Saez-Rodriguez – EBIRudolf Aebersold – ETHZLaurence Calzone – Institut CurieWalter Koch – TechnikonPeter Wild – UZH/USZ
Zurich - Basel Alliance
Impact• The project will democratize PM research because it will support a multitude
of research projects with unique data resources and enable in silico research(e.g. search for drug resistance patterns across different cancer cohorts)
• The project will be a data and knowledge hub for many research projectsexpected to be funded through national and international PM programs
• The project is presently unique and will strengthen the standing of Swissscience in the field
Identifying Personal Genomes by Surname Inference
Gymrek et al., Science 2013
Surnames can be recovered from personal genomes byprofiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogydatabases.
Craig Venter
Acknowledgements
Ruedi AebersoldGunnar RätschBernd WollscheidTiannan GuoYasuo UchidaAlex EbhardtJoachim BuhmannNiko BeerenwinkelChristian BeiselManfred ClaassenChristian StirnimannWilhelm Krek
Qing ZhongMarkus RechsteinerChristine FritzNadejda ValtchevaUlrich WagnerVanessa FreyAnnette BohnertNadezhda VelizhevaKathrin OehlElisa BelliniMalamati KoletouNiels RuppJan RüschoffLorenz BuserSimone BrandtDario VischiLivia BaldiniChristian FankhauserAilsa ChristiansenNathan EschbacherLaura De Vargas RoditiNora Toussaint
Silke GillessenMarkus JoergerWolfram JochumAurelius OmlinArnoud TempletonChristian Rothermund
Bernd BodenmillerIan FrewAndrea JacobsLukas PelkmansMarkus HermannChristian von Mering
Thomas FuchsPeter J. Schüffler
Holger MochPeter SchramlNorbert WeyAndre WethmarMonika BieriAurelia NoskeAndre FitscheChrissie MittmannSimone BrandtVerena TischlerSusanne DettwilerMartina Storz
Markus ManzStefan Balabanov
MatthiasGuckenberger
Tullio Sulser
Roger StuppAlessandra CurioniThomas WinderChristian Britschgi
Martin MatterEmmanuel Eschmann
Andreas WickiLuigi Terracciano
Systems Pathology Order Sheet
Oncomine Focus Assay & Oncomine Knowledgebase Reporter Workflow
RunSequence
PrepareLibrary
Low SampleInput
AnalyzeData
PrepareTemplate
Oncomine FocusAssay
FFPE materialincluding fine needle
aspirates, needlebiopsy(10 ng)
Ion Reporter Software
OncomineKnowledgebase
Reporter
Ion PGM System
Ion Select™ 318 Chip
Ion OneTouch 2System
Ion OneTouch™
Select Template 200Kit
EGFR p.T790M in liquid biopsies
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
53 y/o male pt. with lung adenocarcinoma