Building Real-Time Proteome Simulations with Mass Spectrometry, Live-Seq, Raman Imaging, and Machine Learning

· Dara O’Boyle · science

Table of Figures

  1. Figure 1. Database for Real-Time Proteomics.
  2. Figure 2. Mass Spectrometry for Proteomics.
  3. Figure 3. Spectra and Protein Inference.
  4. Figure 4. SCoPE2 Protocol.
  5. Figure 5. Secondary Omic Domains.

List of Abbreviations

AbbreviationMeaning
MSMass Spectrometry
HPLC / LCHigh-Performance / Liquid Chromatography
HCDHigh-energy Collisional Dissociation
ESIElectrospray Ionisation
TIMSTrapped-Ion Mobility Spectrometry
FAIMSField Asymmetric Ion Mobility Spectrometry
CCSCollisional Cross Section
TOFTime-of-Flight
MS1 / MS2First / Second stage MS run
SCoPE2Single-Cell Proteomics by Mass Spectrometry 2
TMT-proTandem-Mass-Tag (18-plex)
AQUAAbsolute QUAntification heavy peptide standard
MARQUISMultiplex Absolute Re-QUantification Using Internal Standards
iBAQIntensity-Based Absolute Quantification
Bayesian iBAQiBAQ weighted with Bayesian priors
RNA-velocitySpliced/unspliced RNA to predict future states
SlingshotTrajectory inference (pseudotime)
t-SNEt-distributed Stochastic Neighbour Embedding
d::pPopDL model for peptide detectability
DeepMassDL model for ionisation efficiency
PASEFParallel Accumulation–Serial Fragmentation
Live-SeqForce-microscopy cytoplasmic sampling for scRNA-seq
DropMapDroplet assay for secreted proteins
MEFISTOMethod for Function Integration of Spatial & Temporal Omics
ODEOrdinary Differential Equation
CRLCausal Representation Learning
PTMPost-Translational Modification
GSEAGene-Set Enrichment Analysis

Summary

Goal: To develop in-silico single-cell proteomic simulations to facilitate virtual drug testing and hypothesis generation.

Problem: No technology currently measures thousands of protein copy numbers in the same cell at multiple time-points, yet such data are needed to train a real-time simulator.

Solution: A modified SCoPE2 single-cell mass-spectrometry protocol paired with Live-Seq and Raman imaging. Machine learning is split into (1) a translational model converting Raman/Live-Seq to proteome snapshots, and (2) a dynamics model predicting proteome evolution through time.

Real-Time Proteomic Simulation

With a database comprising thousands of quantitative protein measurements at the single-cell level across timepoints, it may be possible to use machine learning to predict how a cell’s proteome evolves (Fig. 1). …

Brief Description of Mass Spectrometry for Proteomics

First, the sample is homogenised … peptides are separated by HPLC, ionised by ESI, analysed by MS1/MS2, etc. (Fig. 2, Fig. 3).

Schematic of a multi-omic database needed for real-time proteomic simulation.
Figure 1. Database for Real-Time Proteomics.
Annotated MS setup: HPLC, ESI, quadrupole, collision cell, TIMS/FAIMS, TOF/Orbitrap.
Figure 2. Mass Spectrometry for Proteomics.
Chromatogram, MS1, MS2 and protein inference workflow (SEQUEST) diagrams.
Figure 3. Spectra and Protein Inference.

Using Mass Spectrometry to Generate the Database

For the required database, SCoPE2 currently offers quantification of ~1,000 proteins across thousands of cells (Fig. 4). …

SCoPE2 workflow: TMT-pro multiplexing with carrier channel, global reference, and single cells.
Figure 4. SCoPE2 Protocol.

Ionisation Bias

Most MS proteomics is sample-to-sample relative … AQUA vs label-free approaches, proteomic ruler, DeepMass, MARQUIS ladder, d::pPop detectability, instrument choices, and a mixed strategy for absolute quantification. …

Redundant Proteins

Bottom-up ambiguity for protein groups; add targeted top-down on carrier material (~10%) to resolve isoforms/PTMs and estimate proportions. …

Sample Destruction

Proteomic MS destroys cells; propose a hybrid secondary domain for longitudinal signals: Live-Seq anchors + continuous Raman imaging; optional DropMap. (Fig. 5) …

Panels: Live-Seq concept and output; DropMap secretome time-series; Raman 3-D spectral cube.
Figure 5. Secondary Omic Domains.

Applying Machine Learning to Generate a Real-Time Simulation

Use a shared-encoder translational model (Raman + Live-Seq → latent Z → proteome P), then a dynamics model (PtPt+Δ) augmented with CRL and mechanistic priors (turnover ODEs, stoichiometry). Validate on synthetic data first (sparsity, dropout, ionisation bias). …

Equipment

  • Mass Spec: UCD Conway Proteomics Core (Orbitrap Exploris 480; timsTOF Pro). Orbitrap Eclipse Astral available at University of Birmingham (UK).
  • Raman: UCD Spectral Imaging Research Group (Renishaw inVia).
  • Live-Seq: FluidFM setup (would need local installation).
  • Compute: ICHEC GPU partitions for SFI-funded projects; dedicated storage for Raman cubes.

Figure References

  • Figure 2: A [7]; B [8]; C [9]; D [10]; E [11]; F [12]; G [13]
  • Figure 3: A [14]; B [15]; C [15]; D [14]
  • Figure 4: [16]
  • Figure 5: A [17]; B [17]; C [18]; D [18]; E [19]

References

  1. Kettenbach AN, Rush J, Gerber SA. Nat Protoc. 2011.
  2. Zimmer D et al. Front Plant Sci. 2018.
  3. Kobayashi-Kirschvink KJ et al. Nat Biotechnol. 2024.
  4. Pavillon N, Smith NI. Sci Rep. 2023.
  5. UCD Conway Proteomics Core. Online.
  6. UCD Spectral Imaging Research Group. Online.
  7. Sinha A, Mann M. The Biochemist 2020.
  8. Murray K. Online.
  9. Thomas RJ. Online.
  10. Bruker Daltonics. 2018. Online.
  11. Kumar M. 2021. Online.
  12. Griffiths RL et al. Anal Chem. 2020.
  13. Q Exactive Hybrid Q-Orbitrap. Online.
  14. Shuken SR. J Proteome Res. 2023.
  15. Johnson PE et al. J AOAC. 2011.
  16. Specht H et al. Genome Biol. 2021.
  17. Chen W et al. Nature 2022.
  18. Bounab Y et al. Nat Protoc. 2020.
  19. Bovenkamp D et al. Molecules 2019.