• About
  • CV
  • Research
  • Blog
  • Helpful links
  • Media

On this page

  • Long-Read Sequencing and Clinical Genomics
  • Statistical Genetics and Predictive Modeling
  • Machine Learning for Biological Discovery
  • Translational and Real-World Data

Research

My research sits at the intersection of computational genomics, statistical genetics, and clinical translation. I build end-to-end pipelines, develop and evaluate ML models, and apply statistical methods to large-scale biological and clinical datasets — with a focus on turning rigorous analytical work into clinically meaningful outputs. Full list of publications.


Long-Read Sequencing and Clinical Genomics

Datasets: ONT, PacBio, 1000 Genomes ONT Consortium, GIAB, UW clinical cohorts
Methods: Haplotype phasing, de novo assembly, variant calling, benchmarking, identity-by-descent detection, machine learning

  • Developed PhaseQuality, a Random Forest-based confidence stratification system for phasing errors across ~140M variant pairs and two sequencing platforms
  • Benchmarked phasing algorithms and de novo assembly algorithms to characterize failure modes in clinically relevant genomic loci
  • Developed a carrier screening workflow for Spinal Muscular Atrophy using long-read de novo assemblies to resolve SMN1 copy number at the haplotype level
  • Evaluated long-read phasing for identity-by-descent detection in familial cohorts

Publications:

  • Damaraju N* et al. “Determinants of haplotype phasing accuracy in long-read human genome sequencing.” Under review. bioRxiv: 10.64898/2026.05.04.722832
  • Gustafson JA*, Gibson SB*, Damaraju N* et al. “High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.” Genome Research (2024). Paper
  • Paschal CR* et al. “Concordance of Whole-Genome Long-Read Sequencing with Standard Clinical Testing for Prader-Willi and Angelman Syndromes.” Journal of Molecular Diagnostics (2025). Paper

Statistical Genetics and Predictive Modeling

Datasets: UK Biobank, Garbhini cohort (North India), population-scale genetic datasets
Methods: Polygenic risk scores, penalized regression, shrinkage methods, ML classification, neural networks

  • Developed a penalized regression-based shrinkage method to improve polygenic risk score prediction across traits spanning the heritability spectrum using UK Biobank data
  • Developed and validated India-specific gestational age prediction models (Garbhini-GA1 and GA2) evaluating classical ML and neural network architectures, benchmarked against international dating standards and externally validated in an independent cohort

Publications:

  • Gadekar VP*, Damaraju N* et al. “Development and external validation of Indian population-specific Garbhini-GA2 model for estimating gestational age.” The Lancet Regional Health – Southeast Asia (2024). Paper
  • Vijayram R*, Damaraju N* et al. “Comparison of first trimester dating methods for gestational age estimation and their implication on preterm birth classification in a North Indian cohort.” BMC Pregnancy and Childbirth (2021). Paper

Machine Learning for Biological Discovery

Datasets: RNA-seq (multi-cohort), whole exome sequencing, transcriptomic data
Methods: Random forests, CNNs, meta-analysis, classifier evaluation, feature selection, generalized linear models

  • Derived a diagnostic gene expression classifier for Systemic Lupus Erythematosus using a multi-cohort RNA-seq meta-analysis framework, optimizing for sensitivity, specificity, and AUC toward a clinical diagnostic device
  • Built an AWS-deployed pipeline linking copy number variation to immunotherapy response across multiple tumor types using whole exome sequencing data
  • Built generalized linear models to identify conserved associations between repeat elements and alternative splicing across primate genomes
  • Designed a bioinformatics pipeline to quantify differential circRNA expression across EBV infection states in human B-cells

Translational and Real-World Data

Datasets: Clinical EHR records (Mozambique), pharmacovigilance cohorts, clinical genomics cohorts
Methods: Pharmacovigilance, statistical evaluation pipelines, health economics, meta-analysis

  • Developed statistical evaluation pipelines for two international drug safety monitoring programs in Mozambique across HIV-positive and ART patient cohorts
  • Conducting a meta-analysis and cost-effectiveness analysis of long-read sequencing as a first-line diagnostic test for rare genetic diseases

Publications:

  • Mussá M, …, Damaraju N, Stergachis A. “Active safety monitoring of Isoniazid and Rifapentine (3HP) among ART patients in Mozambique.” ISoP Africa Chapter Meeting (2024).
  • Wood K*, Damaraju N* et al. “Exposomics in Practice: Multidisciplinary Perspectives on Environmental Health and Risk Assessment.” Integrated Environmental Assessment and Management (2024). Paper
  • Damaraju N* et al. “Diagnostic yield of long-read sequencing as a first-line test for genetic diseases.” In Preparation.

* denotes equal contribution