Himel Mallick, PhD, FASA

Principal Investigator

Cornell University


Himel is a Principal Investigator and Tenure-track Faculty at Cornell University’s Department of Population Health Sciences and an Adjunct Faculty of Statistics and Data Science at Bowers College of Computing and Information Science.

His group at Cornell develops computational methods, software, and data products to generate and validate testable hypotheses that accelerate data-driven discovery. Much of his research has focused on reverse translational efforts aiming to integrate vastly different kinds of biological data by leveraging a combination of machine learning, systems biology, and omics data science techniques to enable target identification and biomarker discovery across a range of indications.

A recipient of the IISA ECASDS award, Himel is a Fellow of the American Statistical Association (FASA) and an elected member of the International Statistical Institute (ISI).

Curriculum Vitae: CV, Resume | Researcher Profile: ResearchGate, WOS


  • AI/ML/Statistics
  • Bayesian Biostatistics
  • Computational Metagenomics
  • Statistical Bioinformatics
  • Omics Data Science


  • Postdoctoral Fellowship in Computational Biology and Bioinformatics, 2019

    Harvard University and Broad Institute

  • PhD in Biostatistics, 2015

    University of Alabama at Birmingham

  • MSc in Statistics, 2009

    Indian Institute of Technology Kanpur

  • BSc in Statistics, 2007

    University of Calcutta



Statistical computing for reproducible research


Exploratory data analysis and visualization


Version control for scientific workflows

Machine Learning

Development and deployment of ML models

Bayesian Inference

Scalable Bayes and uncertainty quantification

Applied Data Science

Analysis and interpretation of biological data



Principal Investigator

Cornell University

Apr 2023 – Present New York, NY
Building and managing a cross-disciplinary research group of computational scientists that brings powerful techniques associated with AI/ML/Statistics, Systems Biology, and Omics Data Science to create reverse translational solutions for target identification and biomarker discovery, significantly contributing to the understanding of complex biological processes, facilitate new treatments and cures, and improve public health outcomes

Senior Scientist/Associate Principal Scientist, Biostatistics

Merck Research Laboratories

Mar 2019 – Apr 2023 Rahway, NJ

Led multiple data science projects providing end-to-end bioinformatics and biostatistics support across all stages of biomarker discovery and development including assay quality control, drafting SAP, and performing statistical analyses

Supported Merck’s internal efforts in the single-cell and spatial transcriptomics initiatives for translational oncology studies

Liaised with digital pathology scientists and bioinformaticians on crucial statistical analyses for both exploratory and clinical development purposes

Employed machine learning classification to the discovery of clinically actionable biomarkers from integrated multi-omics profiles to enable better disease outcome prediction and patient stratification

Secured funding to initiate and execute multiple academic collaborations to conduct research on various 5-year strategic initiatives specified by the organization

Successfully hired and co-mentored four Ph.D. summer interns towards the development and implementation of sophisticated statistical and AI/ML approaches for enabling personalized medicine

Published research across several disease areas (infectious diseases, oncology, and microbiology) that utilized innovative statistical methodology and systems biology techniques spanning a wide range of translational applications


Postdoctoral Associate, Computational Biology and Bioinformatics

Harvard University and Broad Institute

Oct 2015 – Mar 2019 Cambridge, MA

Lead developer of MaAsLin 2, a R/Bioconductor package for associating microbial multi-omic data with arbitrarily complex clinical metadata (>10K official downloads)

Lead developer of MelonnPan, a computational method to predict metabolite profiles from metagenomic sequencing data using concepts from machine learning and ecology, implemented in R (>100 citations)

Contributed to grant writing, manuscript preparation, interdisciplinary collaborations, and teaching and mentoring of graduate and undergraduate students and trainees


Intern, Biostatistics

Mayo Clinic

May 2015 – Aug 2015 Rochester, MN
Developed Bayesian adaptive trial designs for clinical trials utilizing surrogate endpoints in the presence of biomarkers

Intern, Biostatistics

Novartis Pharmaceuticals Corporation

May 2014 – Aug 2014 East Hanover, NJ
Developed novel Bayesian methods to conduct heterogeneity of treatment effect (HTE) analyses in phase III clinical trials

Intern, Biostatistics

University of Arkansas for Medical Sciences

Jun 2013 – Aug 2013 Little Rock, AR
Developed methods for detecting maternal-fetal gene-gene interactions associated with obstructive heart defects in newborns from mother-offspring paired genetic data

Research Assistant, Biostatistics

University of Alabama at Birmingham

Aug 2010 – Apr 2015 Birmingham, AL

Developed Bayesian machine learning methods for high-dimensional feature selection in personalized medicine applications

Developed and validated risk prediction models for assessing short-term mortality in obese adults

Conducted high-dimensional predictive modeling of zero-inflated count phenotypes to identify genetic susceptibility markers in Rheumatoid Arthritis patients


Intern, Biostatistics

Indian Statistical Institute

May 2008 – Jul 2008 Kolkata, India
Performed research on non-linear statistical modeling of cross-sectional growth curve data by the Preece-Baines growth model

Recent Posts

A re-visit to the famous Jeff Leek commentary

In 2013, I came across this famous blog post by Jeff Leek. Little did I know that I would revisit this post 9 years down the lane. As …

Yearender: Looking back at 2021

Every year a set of memes is circulated widely on social media, the 2022 version of which goes along the lines of: My goal for 2022 is …

Year-end reflection: My 2020 highlights

The last day of 2020 is finally here. To bring this decade to a close, I thought I would take a moment to share some of the highlights …

A paper that changed my life - The Bayesian LASSO

As tough as this year has been, it goes without saying that 2020 is a particularly good year to be thankful for science, which happens …

Student paper awards and travel grants - A resource for data science graduate students and postdocs

This post is motivated by the growing list of awesome public repositories that curate a list of resources dedicated to a specific topic …


  • 402 E 67th St, New York, NY 10065