Post on 22-Jul-2020
Entwicklung einer flexiblenbioinformatischen Plattform zur Analyse vonMassenspektrometriedaten
Sebastian GibbInstitut fur Medizinische Informatik, Statistik und Epidemiologie (IMISE)Universitat Leipzigseit Dezember 2014: Klinik fur Anasthesiologie, Universitatsmedizin Greifswald
22. Juli 2015Sebastian Gibb, MALDIquant, 2015-07-22 1
Fruherkennung von Erkrankungen
Sebastian Gibb, MALDIquant, 2015-07-22 2
Funktionsweise MALDI/TOF MS
Beschleunigung Flugstrecke (s)
Ionenquelle Massenanalysator Detektor
Probe
U
m3 > m2 > m1
LASER
Spektrum
m/z
Inte
nsita
t
m1 < m2 < m3
J. H. Gross. Mass Spectrometry: A Textbook. Springer, 2004. URL http://www.springer.com/chemistry/analytical+chemistry/book/978-3-642-10709-2
Sebastian Gibb, MALDIquant, 2015-07-22 3
*omics Analyse
technologieabhangig
2000 4000 6000 8000 10000
A
2000 4000 6000 8000 10000
B
● ●●
●●●
●
●●
● ● ●●● ●
●●● ●●
●
●
●
●
●
● ●
●
● ● ●
2000 4000 6000 8000
−2
−1
01
23
4 C
4180 4190 4200 4210 4220 4230 4240
D
4180 4190 4200 4210 4220 4230 4240
E
2000 4000 6000 8000 10000
F
1206.796
1466.027
1545.929
3262.745
4209.948
4644.373
5336.871
5904.68
7766.411 9290.533
⇓technologieunabhangig
HC
002
HC
050
HC
055
HC
062
HC
033
HC
064
HC
001
HC
049
HC
011
HC
118
HP
424
HC
066
HC
120
HC
057
HP
419
HC
008
HC
054
HC
122
HC
067
HC
056
HC
059
HC
119
HP
120
HP
410
HP
208
HP
321
HP
438
HP
413
HP
416
HP
151
HP
161
HP
393
HP
150
HP
402
HP
417
HP
262
HP
425
HP
121
HP
212
HP
429
control cancer
The 10 Top Ranking Features
t−Scores (Centroid vs. Pooled Mean)
mz
1866.16591692443
5945.5697657874
2022.94475790442
5906.17351903972
5864.49053296298
8989.20382965523
4494.80267780907
8868.2678310697
4468.06600951353
8936.97236585095
−5 0 5
−5 0 5cancer control
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Sebastian Gibb, MALDIquant, 2015-07-22 4
Transparente Software
“The analysis and interpretation of the enormousvolumes of proteomic data remains an unsolvedchallenge, . . .Therefore, the development of transparent tools forthe analysis of proteomic data . . . is a key challenge.. . .The development of such proteomics tools is still in itsinfancy.”
Ruedi Aebersold & Matthias Mann, Nature 2003
R. Aebersold and M. Mann. Mass spectrometry-based proteomics. Nature, 422:198–207, Mar 2003. URL http://dx.doi.org/10.1038/nature01511
Sebastian Gibb, MALDIquant, 2015-07-22 5
Open Source Software/Reproduzierbarkeit
• Reproduzierbarkeit.• Transparenz/Dokumentation.• Flexibilitat und Erweiterbarkeit.• Innovationen.• Verstandnis fur Daten und Analyse.
Sebastian Gibb, MALDIquant, 2015-07-22 6
Eigene Beitrage
• MALDIquant• “state-of-the-art” Methoden zur Analyse von 2D-MS Daten.• Abbildung individueller Arbeitsablaufe.• Behandlung von Spektren unterschiedlicher Auflosung und
biologischer/technischer Replikate.• Ausfuhrliche Dokumentation und Beispielanalysen.• Automatische Tests.
• MALDIquantForeign• Import von 11 verschiedenen Dateiformaten.• Export in 5 verschiedene Dateiformate.• readBrukerFlexData und readMzXmlData.
Sebastian Gibb, MALDIquant, 2015-07-22 7
Funktionen
MALDIquant ist freie Software (GPLv3).Funktionen in MALDIquant 1.12:
• Intensity Transformation.• Intensity Smoothing.• Baseline Correction.• Peak Detection.• Warping/Peak Alignment.• Peak Binning.• Peak Filtering.• Calibration.• Multiple plotting methods.
• Peak Labeling.• Handling biological/technical
replicates.• Handling different resolutions.• Merging mass spectra/peaks.• Handling of MSI data.• Fast (parallel support).• Modular: easy to customize.
Sebastian Gibb, MALDIquant, 2015-07-22 8
Anwendungsbeispiel Fiedler et al. 2009
Serum Peptidome Profiling Revealed Platelet Factor 4 as aPotential Discriminating Peptide Associated withPancreatic CancerG.M. Fiedler, A.B. Leichtle, J. Kase et alClin Cancer Res June 1, 2009 15:3812-3819
“Two significant peaks (m/z 3884; 5959) achieved asensitivity of 86.3% and a specificity of 97.6% for thediscrimination of patients and healthy controls . . . ”
“MALDI-TOF MS-based serum peptidome profilingallowed the discovery and validation ofplatelet factor 4 [m/z 3884, 7767; S.G.] as a newdiscriminating marker in pancreatic cancer.”
Sebastian Gibb, MALDIquant, 2015-07-22 9
Import der RohdatenRaw Data
Data Import
Smoothing
Baseline Correction
Intensity Calibration
Peak Detection
Peak Alignment
Peak Binning
Feature Matrix
Post Processing
Result
2000 4000 6000 8000 10000
0e+
002e
+04
4e+
046e
+04
8e+
041e
+05
Pankreas_HB_L_061019_G10.M19
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2015-07-22 10
Transformation der IntensitatenRaw Data
Data Import
Smoothing
Baseline Correction
Intensity Calibration
Peak Detection
Peak Alignment
Peak Binning
Feature Matrix
Post Processing
Result
2000 4000 6000 8000 10000
050
100
150
200
250
300
Pankreas_HB_L_061019_G10.M19
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2015-07-22 11
Korrektur der GrundlinieRaw Data
Data Import
Smoothing
Baseline Correction
Intensity Calibration
Peak Detection
Peak Alignment
Peak Binning
Feature Matrix
Post Processing
Result
2000 4000 6000 8000 10000
050
100
150
200
250
300
SNIP
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2015-07-22 12
Korrektur der GrundlinieRaw Data
Data Import
Smoothing
Baseline Correction
Intensity Calibration
Peak Detection
Peak Alignment
Peak Binning
Feature Matrix
Post Processing
Result
2000 4000 6000 8000 10000
050
100
150
200
250
Pankreas_HB_L_061019_G10.M19
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2015-07-22 12
Kalibrierung der IntensitatenRaw Data
Data Import
Smoothing
Baseline Correction
Intensity Calibration
Peak Detection
Peak Alignment
Peak Binning
Feature Matrix
Post Processing
Result
5200 5250 5300 5350 5400
020
4060
8010
012
014
0
mass
∆ = 16 %
5200 5250 5300 5350 5400
0.00
000.
0010
0.00
200.
0030
∆ = 6 %
5800 5850 5900 5950 6000
020
4060
8010
012
014
0
mass
inte
nsity
∆ = 23 %
5800 5850 5900 5950 6000
0.00
000.
0010
0.00
200.
0030
inte
nsity
∆ = 3 %
Sebastian Gibb, MALDIquant, 2015-07-22 13
Identifizierung von MerkmalenRaw Data
Data Import
Smoothing
Baseline Correction
Intensity Calibration
Peak Detection
Peak Alignment
Peak Binning
Feature Matrix
Post Processing
Result
3800 4000 4200 4400 4600 4800
0.00
000.
0005
0.00
100.
0015
Pankreas_HB_L_061019_D9.G18
mass
inte
nsity
●●●●●●●●●●
●
● ●●
●●●●●
●
●●
●
●●●●●●●
●
●
●
●
●●●●●●●
●
●
●●
●
●
●
●●● ●●●
● ●
●
●
●●
●●●●
●●
●
●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
3883.66 4092.263
4210.747
4467.658 4645.361
accepted maximarejected maximanoise threshold
Sebastian Gibb, MALDIquant, 2015-07-22 14
Kalibrierung der m/z-WerteRaw Data
Data Import
Smoothing
Baseline Correction
Intensity Calibration
Peak Detection
Peak Alignment
Peak Binning
Feature Matrix
Post Processing
Result
4180 4190 4200 4210 4220 4230 4240
unwarped
mass
9200 9250 9300 9350 9400
unwarped
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2015-07-22 15
Kalibrierung der m/z-WerteRaw Data
Data Import
Smoothing
Baseline Correction
Intensity Calibration
Peak Detection
Peak Alignment
Peak Binning
Feature Matrix
Post Processing
Result
●●●
●
●●
●●●●●●●●
●●●●●●
●
●
●●●●
●
●
●●
●
●
●●● ● ●●
●
●●●●●
●●●●●●● ●●●●
●
●●● ●
●
●●●
●●●
● ●
●
● ●●
●
●
●●
●
●
●●
●●●
●
● ●
●
●●
●
●
●
●
2000 4000 6000 8000
−6
−4
−2
02
46
sample 1 vs reference(matched peaks: 94/94)
mass
diffe
renc
e
●●●●●●●●●●●●
●
●●
●●●●●●●●●
●●●
●●
●
●
●
●
●
●●
●
●
●●●
●
●●●●●●
●● ●●●●●
●●●●
●
●●●
●
●●● ●
●● ●
●
●
●●
●
●●●●
●
●
●
● ●
●
●
●
●
●
●
●
2000 4000 6000 8000
−6
−4
−2
02
46
sample 2 vs reference(matched peaks: 92/94)
mass
diffe
renc
e
●●●●●
●
●●●●●●●●●
●
●
●●●
●●●●●●●●
●●
●
●●●● ●
●●
●
●
●●
●
●
●
●●●
●
●
●●●●●●●
●
●
●●
●
●●●
●
●●●
●●●
●●●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
2000 4000 6000 8000
−6
−4
−2
02
46
sample 3 vs reference(matched peaks: 90/94)
mass
diffe
renc
e
●●●●
●●●●●●●●
●●●●●●●●
●
●●●
●●●●
●●
●
●●●● ●
●
●
●
●
●●●●●
●
●●●
●
●●●●●
●
●●
●●
●
●
●
●
●
●●
●●●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
2000 4000 6000 8000−
6−
4−
20
24
6
sample 4 vs reference(matched peaks: 94/94)
mass
diffe
renc
e
Sebastian Gibb, MALDIquant, 2015-07-22 15
Kalibrierung der m/z-WerteRaw Data
Data Import
Smoothing
Baseline Correction
Intensity Calibration
Peak Detection
Peak Alignment
Peak Binning
Feature Matrix
Post Processing
Result
4180 4190 4200 4210 4220 4230 4240
unwarped
mass
9200 9250 9300 9350 9400
unwarped
mass
inte
nsity
4180 4190 4200 4210 4220 4230 4240
MALDIquant
mass
9200 9250 9300 9350 9400
MALDIquant
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2015-07-22 15
Kalibrierung der m/z-WerteRaw Data
Data Import
Smoothing
Baseline Correction
Intensity Calibration
Peak Detection
Peak Alignment
Peak Binning
Feature Matrix
Post Processing
Result
4180 4190 4200 4210 4220 4230 4240
Pankreas_HB_L_061019_G10.M20
mass
4193.201 4209.9344194.806 4210.076
4193.367 4210.0944192.13 4209.903
9200 9250 9300 9350 9400
Pankreas_HB_L_061019_G10.M20
mass
inte
nsity
9290.9619291.7069291.0519290.646
4180 4190 4200 4210 4220 4230 4240
Pankreas_HB_L_061019_G10.M20
mass
4193.764 4209.9854193.764 4209.9854193.764 4209.9854193.764 4209.985
9200 9250 9300 9350 9400
Pankreas_HB_L_061019_G10.M20
mass
inte
nsity
9290.519290.519290.519290.51
Sebastian Gibb, MALDIquant, 2015-07-22 16
Feature MatrixRaw Data
Data Import
Smoothing
Baseline Correction
Intensity Calibration
Peak Detection
Peak Alignment
Peak Binning
Feature Matrix
Post Processing
Result
1011.95683040433 1020.93287697588HC056 0.0001662288 0.0010911416 . . .HC001 0.0001865509 0.0007400781 . . .HC002 0.0001690267 0.0006584539 . . .HC008 0.0001516142 0.0002370617 . . .HC011 0.0001514336 0.0004122820 . . .HC033 0.0001457631 0.0006348643 . . .HC049 0.0001615118 0.0004522258 . . .HC050 0.0001637069 0.0011229157 . . .HC054 0.0002076213 0.0005488659 . . .HC055 0.0001860750 0.0005867420 . . .HC122 0.0001261214 0.0005200141 . . .HC057 0.0001386024 0.0003449330 . . .. . . . . . . . . . . .
Sebastian Gibb, MALDIquant, 2015-07-22 17
Arbeitsablauf in R
1 library (" MALDIquant ")2 library (" MALDIquantForeign ")34 spectra <- import (" fiedler2009spectra .tar.gz")56 spectra <- transformIntensity (spectra , method ="sqrt")7 spectra <- smoothIntensity (spectra , method =" SavitzkyGolay ")8 spectra <- removeBaseline (spectra , method ="SNIP")9 spectra <- calibrateIntensity (spectra , method ="TIC")
1011 peaks <- detectPeaks ( spectra )1213 warpingFunctions <- determineWarpingFunctions ( peaks )14 spectra <- warpMassSpectra (spectra , warpingFunctions )15 peaks <- warpMassPeaks (peaks , warpingFunctions )16 peaks <- binPeaks ( peaks )1718 featureMatrix <- intensityMatrix (peaks , spectra )
Sebastian Gibb, MALDIquant, 2015-07-22 18
Ergebnisse/Klassifizierung
The 10 Top Ranking Features
t−Scores (Centroid vs. Pooled Mean)
mz
1866.16591692443
5945.5697657874
2022.94475790442
5906.17351903972
5864.49053296298
8989.20382965523
4494.80267780907
8868.2678310697
4468.06600951353
8936.97236585095
−5 0 5
−5 0 5cancer control
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
M. Ahdesmaki and K. Strimmer. Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. The Annals of Applied Statistics, 4(1):503–519, Mar 2010. doi: 10.1214/09-AOAS277. URL http://dx.doi.org/10.1214/09-AOAS277
Sebastian Gibb, MALDIquant, 2015-07-22 19
Ergebnisse/Spektrenvergleich
2000 4000 6000 8000 10000
0.00
00.
002
0.00
4
mass
inte
nsity
: Pankreas_HB_L_061019_B4control
3884.03 4468.07 5906.17 7768.08 8936.97
2000 4000 6000 8000 10000
0.00
00.
002
0.00
4
mass
inte
nsity
: Pankreas_HB_L_061019_D5cancer
3884.03 4468.07 5906.17 7768.08 8936.97
Sebastian Gibb, MALDIquant, 2015-07-22 20
Ergebnisse/ClusteringH
C00
2
HC
050
HC
055
HC
062
HC
033
HC
064
HC
001
HC
049
HC
011
HC
118
HP
424
HC
066
HC
120
HC
057
HP
419
HC
008
HC
054
HC
122
HC
067
HC
056
HC
059
HC
119
HP
120
HP
410
HP
208
HP
321
HP
438
HP
413
HP
416
HP
151
HP
161
HP
393
HP
150
HP
402
HP
417
HP
262
HP
425
HP
121
HP
212
HP
429
control cancer
Sebastian Gibb, MALDIquant, 2015-07-22 21
Ergebnisse/Biologische Relevanz
• Complement C3 (CO3 HUMAN)
• Pancreatic Progenitor Cell Differentiation and ProliferationFactor-Like Protein (PDPFL HUMAN)
Komplette Analyse unter:http://strimmerlab.org/software/maldiquant/
8750 8800 8850 8900 8950 9000
0.00
000.
0010
0.00
20
mass
inte
nsity
control
1st Quantile
Median
3rd Quantile
8750 8800 8850 8900 8950 9000
0.00
000.
0010
0.00
20
mass
cancer
Sebastian Gibb, MALDIquant, 2015-07-22 22
Publikation
MALDIquant: a versatile R package for the analysis of massspectrometry dataS. Gibb, K. Strimmer - Bioinformatics, 2012
> 50 Publikationen:Antibiotikaresistenzen von Bakterien, Spezienbestimmung (Bakterien,Insekten), MSI, Profiling/Fruherkennung von Krankheiten, Aktivierungdes Immunsystems, Spiegelbestimmung von Medikamenten, . . .
Software: MSnbase, MSI.R, Mass-Up
Verfugbarkeit: CRAN, RforProteomics, MASSyPup, Debian, Ubuntu
Sebastian Gibb, MALDIquant, 2015-07-22 23
Zusammenfassung
MALDIquant
• Freie, transparente Software fur 2D-MS Daten.• Reproduzierbare Analysen.• Ausfuhrliche Dokumentation und Beispielanalysen.• Flexible Anwendungsmoglichkeiten.• Vielfaltige Verwendung.
Kontakt:mail@sebastiangibb.de
MALDIquant Software:http://strimmerlab.org/software/maldiquant/
Sebastian Gibb, MALDIquant, 2015-07-22 24
Danksagung
G. M. Fiedler und A. B. Leichtle: Diskussionen, Beispieldaten(Universitatsinstitut fur Klinische Chemie, Universitatsspital Bern)
Katrin Uhlmann und Ralph Feltens: Diskussionen, Beispieldaten(Department Proteomik, Helmholtz-Zentrum fur Umweltforschung (UFZ) Leipzig)
Korbinian Strimmer: Betreuung der Disseration(Epidemiology and Biostatistics, School of Public Health, Imperial College London)
Sebastian Gibb, MALDIquant, 2015-07-22 25
Mass Spectrometry Imaging
This dataset was kindly provided by Dr. Adrien Nyakas (adrien.nyakas@dcb.unibe.ch; http://dx.doi.org/10.6084/m9.figshare.735961).
Sebastian Gibb, MALDIquant, 2015-07-22 26
Mass Spectrometry Imaging
3364.079 ± 0.53510.382 ± 0.5
This dataset was kindly provided by Dr. Adrien Nyakas (adrien.nyakas@dcb.unibe.ch; http://dx.doi.org/10.6084/m9.figshare.735961).
Sebastian Gibb, MALDIquant, 2015-07-22 26