Interpretable miRNA-based Prediction Model for Early Detection of Pancreatic Cancer: Development and Cross-platform Validation

Yanfei Zhu, Linglin Zhu, Yumei Liu, Yongshuo Ji, Junqiu Zhu, Hong Zhao

Abstract

Pancreatic cancer remains one of the most lethal malignancies, largely due to delayed diagnosis. Although microRNA (miRNA) biomarkers show promise, many previous studies lack cross-platform validation and model interpretability, limiting clinical applicability.

Introduction

Pancreatic cancer continues to represent one of the most lethal malignancies globally, with a persistent 5-year survival rate below 10% and projections indicating it will become the second leading cause of cancer-related mortality by 2030. This devastating prognosis stems primarily from the lack of effective early detection strategies, as approximately 85% of patients present with locally advanced or metastatic disease at diagnosis when curative interventions are no longer feasible.

Materials and Methods:

This study aimed to develop and externally validate a miRNA-based diagnostic prediction model for pancreatic cancer using a structured machine learning framework. Publicly available datasets were divided into independent training and validation cohorts. All feature selection and model development procedures were conducted exclusively within the training cohort to prevent information leakage.

Discussion:

In this study, we developed and validated an interpretable 20-miRNA signature for pancreatic cancer diagnosis using Random Forest machine learning approaches. The signature demonstrated reproducible diagnostic performance across independent validation cohorts (n = 767), achieving cross-validation AUC of 0.87 and external validation AUCs ranging from 0.78 to 0.83. External validation across independent datasets (TCGA-PAAD, GTEx, GSE59856) yielded AUCs ranging from 0.78 to 0.83. Explainable AI analysis via SHAP identified key contributing miRNAs, with pathway enrichment analysis suggesting involvement in cancer hallmark processes including cell proliferation, apoptosis evasion, and metabolic reprogramming.

Citation: Zhu Y, Zhu L, Liu Y, Ji Y, Zhu J, Zhao H (2026) Interpretable miRNA-based prediction model for early detection of pancreatic cancer: Development and cross-platform validation. PLoS One 21(5): e0348699. https://doi.org/10.1371/journal.pone.0348699

Editor: Sharif Moradi, Royan Institute for Stem Cell Biology and Technology, Islamic Republic of Iran

Received: November 12, 2025; Accepted: April 20, 2026; Published: May 4, 2026.

Copyright: © 2026 Zhu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data used in this study are publicly available from established repositories. The miRNA expression datasets GSE59856, GSE85589, and GSE128508 are available from the Gene Expression Omnibus (GEO) database: GSE59856: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59856 GSE85589: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85589 GSE128508: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128508 Pancreatic cancer miRNA sequencing data from TCGA-PAAD were obtained through the UCSC Xena platform: https://xenabrowser.net/datapages/?dataset=TCGA.PAAD.sampleMap/miRNA_HiSeq_gene Normal pancreatic tissue miRNA expression data from GTEx were accessed via UCSC Xena: https://xenabrowser.net/datapages/?dataset=gtex_RNAseq_gene_miRNA&host=gtex.xenahubs.net.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.