--- Navigating the multidimensional universe of systems biology ---
I am a final-year PhD student in Systems Biology at Columbia University, advised by Raul Rabadan and David Knowles. My research focuses on developing statistical and computational methods to understand the complexities of human diseases from large-scale sequencing data. I am particularly interested in RNA biology and the applications of single-cell and spatial omics technologies. My most recent work (SPLISOSM, 2025) has uncovered a hidden layer of tissue regulation through spatially variable RNA processing in the brain and glioblastoma. Additionally, I also work on kernel-based statistical machine learning and subspace learning problems.
Previously, I received my Bachelor’s degrees in Biology and Mathematics from Peking University, where I worked with Cheng Li on methods for single-cell data to study aging. I have also interned at the University of Chicago and Harvard Medical School and briefly worked as a bioinformatics engineer at a precision medicine startup during the pandemic.
Some other research directions I am excited about include:
Spatiotemporal tissue dynamics The interaction between cancer cells and surrounding tissue affects immune response and treatment. Spatial multi-omics across samples and time may help unravel this complex ecosystem.
RNA in cancer therapy Splicing dysregulation generates oncogenic isoforms and targetable neoantigens. Identifying these can open new therapeutic possibilities.
Aging Aging involves molecular changes that can lead to cancer. Studying the link between RNA biology, aging, and cancer offers insights into prevention and treatment.
Transcript diversity including splicing and alternative 3’end usage is crucial for cellular identity and adaptation, yet its spatial coordination remains poorly understood. Here, we present SPLISOSM (SpatiaL ISOform Statistical Modeling), a computational framework for detecting isoform-resolution patterns from spatial transcriptomics data. SPLISOSM leverages multivariate testing to account for spot- and isoform-level dependencies, demonstrating robust and theoretically grounded performance on sparse data. In the mouse brain, we identify over 1,000 spatially variable transcript diversity events, primarily in synaptic signaling pathways linked to neuropsychiatric disorders, and uncover both known and novel regulatory relationships with region-specific RNA binding proteins. We further show that these patterns are evolutionarily conserved between mouse and human prefrontal cortex. Analysis of human glioblastoma highlights pervasive transcript diversity in antigen presentation and adhesion genes associated with specific microenvironmental conditions. Together, we present a comprehensive spatial splicing analysis in the brain under normal and neoplastic conditions.
Disentangling Interpretable Factors with Supervised Independent Subspace Principal Component Analysis
Jiayu Su†, David A. Knowles†, and Raul Rabadan†
In Advances in Neural Information Processing Systems, 2024
The success of machine learning models relies heavily on effectively representing high-dimensional data. However, ensuring data representations capture human-understandable concepts remains difficult, often requiring the incorporation of prior knowledge and decomposition of data into multiple subspaces. Traditional linear methods fall short in modeling more than one space, while more expressive deep learning approaches lack interpretability. Here, we introduce Supervised Independent Subspace Principal Component Analysis (sisPCA), a PCA extension designed for multi-subspace learning. Leveraging the Hilbert-Schmidt Independence Criterion (HSIC), sisPCA incorporates supervision and simultaneously ensures subspace disentanglement. We demonstrate sisPCA’s connections with autoencoders and regularized linear regression and showcase its ability to identify and separate hidden data structures through extensive applications, including breast cancer diagnosis from image features, learning aging-associated DNA methylation changes, and single-cell analysis of malaria infection. Our results reveal distinct functional pathways associated with malaria colonization, underscoring the essentiality of explainable representation in high-dimensional data analysis.
Smoother: a unified and modular framework for incorporating structural dependency in spatial omics data
Jiayu Su†, Jean-Baptiste Reynier, Xi Fu, and 8 more authors
Spatial omics technologies can help identify spatially organized biological processes, but existing computational approaches often overlook structural dependencies in the data. Here, we introduce Smoother, a unified framework that integrates positional information into non-spatial models via modular priors and losses. In simulated and real datasets, Smoother enables accurate data imputation, cell-type deconvolution, and dimensionality reduction with remarkable efficiency. In colorectal cancer, Smoother-guided deconvolution reveals plasma cell and fibroblast subtype localizations linked to tumor microenvironment restructuring. Additionally, joint modeling of spatial and single-cell human prostate data with Smoother allows for spatial mapping of reference populations with significantly reduced ambiguity.
A transcriptome-based single-cell biological age model and resource for tissue-specific aging measures
Shulin Mao*, Jiayu Su*, Longteng Wang, and 3 more authors
Accurately measuring biological age is crucial for improving healthcare for the elderly population. However, the complexity of aging biology poses challenges in how to robustly estimate aging and interpret the biological significance of the traits used for estimation. Here we present SCALE, a statistical pipeline that quantifies biological aging in different tissues using explainable features learned from literature and single-cell transcriptomic data. Applying SCALE to the “Mouse Aging Cell Atlas” (Tabula Muris Senis) data, we identified tissue-level transcriptomic aging programs for more than 20 murine tissues and created a multitissue resource of mouse quantitative aging-associated genes. We observe that SCALE correlates well with other age indicators, such as the accumulation of somatic mutations, and can distinguish subtle differences in aging even in cells of the same chronological age. We further compared SCALE with other transcriptomic and methylation “clocks” in data from aging muscle stem cells, Alzheimer’s disease, and heterochronic parabiosis. Our results confirm that SCALE is more generalizable and reliable in assessing biological aging in aging-related diseases and rejuvenating interventions. Overall, SCALE represents a valuable advancement in our ability to measure aging accurately, robustly, and interpretably in single cells.
Single-cell transcriptome profiling reveals neutrophil heterogeneity in homeostasis and infection
Xuemei Xie*, Qiang Shi*, Peng Wu, and 14 more authors
The full neutrophil heterogeneity and differentiation landscape remains incompletely characterized. Here, we profiled >25,000 differentiating and mature mouse neutrophils using single-cell RNA sequencing to provide a comprehensive transcriptional landscape of neutrophil maturation, function and fate decision in their steady state and during bacterial infection. Eight neutrophil populations were defined by distinct molecular signatures. The three mature peripheral blood neutrophil subsets arise from distinct maturing bone marrow neutrophil subsets. Driven by both known and uncharacterized transcription factors, neutrophils gradually acquire microbicidal capability as they traverse the transcriptional landscape, representing an evolved mechanism for fine-tuned regulation of an effective but balanced neutrophil response. Bacterial infection reprograms the genetic architecture of neutrophil populations, alters dynamic transitions between subpopulations and primes neutrophils for augmented functionality without affecting overall heterogeneity. In summary, these data establish a reference model and general framework for studying neutrophil-related disease mechanisms, biomarkers and therapeutic targets at single-cell resolution.
I try to avoid social media as much as possible, so email is always the best way to reach me: {uni} at cumc.columbia.edu; uni = js5756. If you don't hear back within a few days (possibly due to aggressive spam rules), you can reach me on LinkedIn.