Skip to main content Skip to navigation

Artificial Intelligence and Machine Learning in Single-Cell Multiomics: Emerging Applications

 

Artificial intelligence (AI) and Machine learning (ML) and are rapidly reshaping the field of single-cell multiomics, with new tools enabling faster, more accurate, and more scalable biological insight. Recent advances extend far beyond basic clustering and dimensionality reduction, and deliver real impact in multimodal integration, data harmonization, trajectory inference, and disease modeling. 

 

Here are a few standout use cases highlighting the next frontier of AI and ML in single-cell biology: 

 

 

1. The Virtual Cell Challenge (Arc Institute)
Benchmarking ML prediction of perturbation effects in single cells

Led by the Arc Institute, this open challenge tasks participants with building models to predict cellular responses to genetic perturbations. The dataset includes CRISPR-induced knockdowns across hundreds of genes, with matched transcriptomic profiles. The goal: train AI systems to infer complex cellular behavior from sparse or partial input signals thus laying the groundwork for “virtual cell” modeling.1

 

 

2. Transfer Learning for Cell Type Annotation
Reuse knowledge from public atlases to label new datasets 

Tools like scArches use transfer learning to map new single-cell datasets onto large reference atlases. This reduces manual labeling and improves consistency across studies. Particularly useful for clinical or low-sample-size contexts, these models fine-tune on small datasets while preserving broader biological structure.2

 

 

3. Multimodal Data Integration
Learning joint representations across RNA, ATAC, and protein 

Probabilistic generative models like totalVI and MultiVI unify transcriptomics and cell surface protein expression into a common latent space. These tools are essential for projects using CITE-seq or other multiomic analysis, enabling improved clustering, imputation, and downstream inference. 3,4

 

 

4. Semi-Supervised Cell Type Inference
Bridging labeled and unlabeled data using models like scANVI

scANVI (single-cell Annotation using Variational Inference) is a ML model designed to analyze single-cell RNA sequencing data. It enables the integration of both labeled and unlabeled data, allowing researchers to perform cell type annotation even when only a subset of the data has known labels. This tool is particularly useful when curated labels are limited, or when exploring new tissues and disease states. Its scalability and adaptability make it particularly effective in complex scenarios, such as hierarchical cell type structures or datasets with varying compositions 5.6

 

 

5. Trajectory and Perturbation Modeling
Using ML to uncover lineage relationships or gene regulatory effects

Emerging tools like DeepVelo and scGen allow for dynamic modeling of time-course or perturbation-based data. DeepVelo infers RNA velocity-informed trajectories via neural networks, while scGen simulates perturbation responses in silico which is useful for understanding gene function and treatment effects. 7,8

 

 

6. Spatial Transcriptomics Meets Computer Vision
ML models combine gene expression and tissue architecture

Methods such as Tangram align spatial gene expression maps with histological images. This intersection of ML and computer vision enables true spatial resolution of biological signals, opening new avenues in developmental biology, neuroscience, and pathology.9

 

 

7. TranscriptFormer
Generative Model for Cellular Diversity Exploration Across Species

TranscriptFormer, by the Chan Zuckerburg Initiative, applies transformer-based models trained on up to 112 million cells spanning 1.53 billion years of evolutionary history across species. To train the models, a data corpus of scRNA-seq cell atlases from 8 animal species, a fungus and a protist were compiled and integrated with the CZ CELLxGENE (CxG) database, which contains harmonized scRNA-seq data from human and mouse for a total of 12 species. 10

 

 

8. scGPT
Using Genes as Tokens

scGPT is a foundation model trained on over 33 million human cells, treating genes as tokens analogous to words in language models. After pretraining, scGPT can be finetuned for tasks such as batch integration, multiomic alignment, cell type annotation, perturbation effect prediction, and inference of gene regulatory networks. 11

 

 

9. scBaseCamp
Benchmarking Platform for Machine Learning in Single-Cell Biology

Developed by the ARC Institute, scBaseCamp is an open-source platform for benchmarking machine learning methods in single-cell biology. It provides curated datasets, pre-defined tasks, and standardized evaluation metrics, helping the community compare model performance in areas like cell type prediction, perturbation response, and modality alignment. By lowering the barrier for rigorous ML experimentation, scBaseCamp supports transparency, reproducibility, and accelerated progress in this space. It complements efforts like the Virtual Cell Challenge and reinforces the growing role of community-wide benchmarking in AI for biology. 12

 

Final Thoughts

As experimental platforms grow increasingly complex, capturing multiple omic layers across space and time, AI will become essential for interpretation. The field is shifting from tools that merely describe cellular heterogeneity to models that predict, simulate, and hypothesize. 


BD Biosciences offers end-to-end tools for single cell multiomics, from sample preservation, cell capture, library prep to data analysis. 

References

  1. Roohani YH, Hua TJ, Tung PY, Bounds LR, Yu FB, Dobin A, Teyssier N, Adduri A, Woodrow A, Plosky BS, Mehta R, Hsu B, Sullivan J, Ricci-Tam C, Li N, Kazaks J, Gilbert LA, Konermann S, Hsu PD, Goodarzi H, Burke DP. Virtual Cell Challenge: Toward a Turing test for the virtual cell. Cell. 2025; Jun 26;188(13):3370-3374. doi: 10.1016/j.cell.2025.06.008.
  2. Lotfollahi, M., Naghipourfar, M., Luecken, M.D. et al. Mapping single-cell data to reference atlases by transfer learning. Nat Biotechnol. 2022;40, 121–130. https://doi.org/10.1038/s41587-021-01001-7
  3. Gayoso, A., Steier, Z., Lopez, R. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat Methods. 2021; 18, 272–282. https://doi.org/10.1038/s41592-020-01050-x
  4. Ashuach, T., Gabitto, M.I., Koodli, R.V. et al. MultiVI: deep generative model for the integration of multimodal data. Nat Methods. 2023; 20, 1222–1231. https://doi.org/10.1038/s41592-023-01909-9
  5. Gayoso A, Lopez R, Xing G, Boyeau P, Amiri Discover Day of education, et al. Nature Biotechnology 2022 Feb 07. doi: 10.1038/s41587-021-01206-w
  6. Virshup I , Bredikhin D, Heumos L, Palla G, Sturm G, et al. Nature Biotechnology. 2023; Apr 10. doi: 10.1038/s41587-023-01733-8
  7. Cui, H., Maan, H., Vladoiu, M.C. et al. DeepVelo: deep learning extends RNA velocity to multi-lineage systems with cell-specific kinetics. Genome Biol 25, 27 (2024). https://doi.org/10.1186/s13059-023-03148-9
  8. Lotfollahi, M., Wolf, F.A. & Theis, F.J. scGen predicts single-cell perturbation responses. Nat Methods. 2019; 16, 715–721. https://doi.org/10.1038/s41592-019-0494-8
  9. Biancalani, T., Scalia, G., Buffoni, L. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat Methods. 2021; 18, 1352–1362. https://doi.org/10.1038/s41592-021-01264-7
  10. Pearce JD, Sara Simmonds ES,  Mahmoudabadi G,  Krishnan L,  Palla G, et al. A cross-species generative cell atlas across 1.5 billion years of evolution: the transcriptformer single-cell. Model.bioRxiv 2025;04.25.650731. doi: https://doi.org/10.1101/2025.04.25.650731
  11. Cui, H., Wang, C., Maan, H. et al. scGPT: Toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods. 2024; 21, 1470–1480 (2024). https://doi.org/10.1038/s41592-024-02201-0
  12. Youngblut ND,  Carpenter C,  Prashar J,  Ricci-Tam C,  Ilango R, et al. 
    scBaseCamp: an AI agent-curated, uniformly processed, and continually expanding single cell data repository. bioRxiv 2025; 02.27.640494; doi: https://doi.org/10.1101/2025.02.27.640494