Research
My research sits at the intersection of computational genomics, statistical genetics, and clinical translation. I build end-to-end pipelines, develop and evaluate ML models, and apply statistical methods to large-scale biological and clinical datasets — with a focus on turning rigorous analytical work into clinically meaningful outputs. Full list of publications.
Long-Read Sequencing and Clinical Genomics
Datasets: ONT, PacBio, 1000 Genomes ONT Consortium, GIAB, UW clinical cohorts
Methods: Haplotype phasing, de novo assembly, variant calling, benchmarking, identity-by-descent detection, machine learning
- Developed PhaseQuality, a Random Forest-based confidence stratification system for phasing errors across ~140M variant pairs and two sequencing platforms
- Benchmarked phasing algorithms and de novo assembly algorithms to characterize failure modes in clinically relevant genomic loci
- Developed a carrier screening workflow for Spinal Muscular Atrophy using long-read de novo assemblies to resolve SMN1 copy number at the haplotype level
- Evaluated long-read phasing for identity-by-descent detection in familial cohorts
Publications:
- Damaraju N* et al. “Determinants of haplotype phasing accuracy in long-read human genome sequencing.” Under review. bioRxiv: 10.64898/2026.05.04.722832
- Gustafson JA*, Gibson SB*, Damaraju N* et al. “High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.” Genome Research (2024). Paper
- Paschal CR* et al. “Concordance of Whole-Genome Long-Read Sequencing with Standard Clinical Testing for Prader-Willi and Angelman Syndromes.” Journal of Molecular Diagnostics (2025). Paper
Statistical Genetics and Predictive Modeling
Datasets: UK Biobank, Garbhini cohort (North India), population-scale genetic datasets
Methods: Polygenic risk scores, penalized regression, shrinkage methods, ML classification, neural networks
- Developed a penalized regression-based shrinkage method to improve polygenic risk score prediction across traits spanning the heritability spectrum using UK Biobank data
- Developed and validated India-specific gestational age prediction models (Garbhini-GA1 and GA2) evaluating classical ML and neural network architectures, benchmarked against international dating standards and externally validated in an independent cohort
Publications:
- Gadekar VP*, Damaraju N* et al. “Development and external validation of Indian population-specific Garbhini-GA2 model for estimating gestational age.” The Lancet Regional Health – Southeast Asia (2024). Paper
- Vijayram R*, Damaraju N* et al. “Comparison of first trimester dating methods for gestational age estimation and their implication on preterm birth classification in a North Indian cohort.” BMC Pregnancy and Childbirth (2021). Paper
Machine Learning for Biological Discovery
Datasets: RNA-seq (multi-cohort), whole exome sequencing, transcriptomic data
Methods: Random forests, CNNs, meta-analysis, classifier evaluation, feature selection, generalized linear models
- Derived a diagnostic gene expression classifier for Systemic Lupus Erythematosus using a multi-cohort RNA-seq meta-analysis framework, optimizing for sensitivity, specificity, and AUC toward a clinical diagnostic device
- Built an AWS-deployed pipeline linking copy number variation to immunotherapy response across multiple tumor types using whole exome sequencing data
- Built generalized linear models to identify conserved associations between repeat elements and alternative splicing across primate genomes
- Designed a bioinformatics pipeline to quantify differential circRNA expression across EBV infection states in human B-cells
Translational and Real-World Data
Datasets: Clinical EHR records (Mozambique), pharmacovigilance cohorts, clinical genomics cohorts
Methods: Pharmacovigilance, statistical evaluation pipelines, health economics, meta-analysis
- Developed statistical evaluation pipelines for two international drug safety monitoring programs in Mozambique across HIV-positive and ART patient cohorts
- Conducting a meta-analysis and cost-effectiveness analysis of long-read sequencing as a first-line diagnostic test for rare genetic diseases
Publications:
- Mussá M, …, Damaraju N, Stergachis A. “Active safety monitoring of Isoniazid and Rifapentine (3HP) among ART patients in Mozambique.” ISoP Africa Chapter Meeting (2024).
- Wood K*, Damaraju N* et al. “Exposomics in Practice: Multidisciplinary Perspectives on Environmental Health and Risk Assessment.” Integrated Environmental Assessment and Management (2024). Paper
- Damaraju N* et al. “Diagnostic yield of long-read sequencing as a first-line test for genetic diseases.” In Preparation.
* denotes equal contribution