Frontiers
Browse

Data Sheet 1_A machine-learning approach for pancreatic neoplasia classification based on plasma extracellular vesicles.pdf

Download (219.71 kB)
dataset
posted on 2025-04-25, 04:07 authored by Ioanna Angelioudaki, Angeliki Iosif, Konstadina Kourou, Alexandros-Georgios Tzingounis, Vassiliki Kigka, Androniki-Maria Skreka, Myrto Costopoulos, Nikolaos Memos, Agapi Kataki, Manousos M. Konstadoulakis, Dimitrios I. Fotiadis
Introduction

Pancreatic cancer (PC) is a lethal disease developing from either exocrine or endocrine cells. Efforts to assist early diagnosis focus on liquid biopsy methods, and especially on the detection of Extracellular Vesicles (EVs) secreted from cancer cells in their microenvironment and accumulated in systemic circulation. Multiple studies explore how EVs size, surface biomarkers or content can determine their unique role and function in the recipient cell’s gene expression, metabolism and behavior affecting cancer development. This study aimed to develop a machine learning-driven (ML) pipeline utilizing clinical variables and EV-based features to predict the presence of pancreatic tumors of different nature (exocrine/endocrine) in patients’ plasma compared to patients with benign lesions or age-matched non-oncological patients.

Methods

All available plasma samples (N=126) and variables were collected prior to surgery. EVs were detected and characterized by flow cytometry-immunostaining. Data including size and a unique set of biomarkers (CD45, CD63 and EphA2) were combined with hematological/biochemical data and processed under two use cases, each formulated as a 3-class classification problem for patient risk stratification. The first use case aimed at classifying patients as with benign lesions or exocrine/endocrine neoplasms. The second use case aimed to distinguish patients with exocrine/endocrine neoplasms from non-oncological patients. Various ML methods were applied, including Logistic Regression, Random Forest, Support Vector Machines, and Extreme Gradient Boosting. Evaluation metrics, as area under the receiver operating characteristic curve (AUC-ROC), were computed, and Shapley values were utilized to determine features with the greatest impact on the discrimination of outcome groups.

Results

Analyses identified hematological and biochemical features, among significant predictors. Models demonstrated substantial accuracy and AUC-ROC values based on plasma EVs subpopulations, which scored over 0.90 in accuracy of the Random Forest and XGBoost algorithms, presenting 0.96 +/- 0.03 accuracy in the first use case and 0.93 +/- 0.04 in the second.

Discussion

By leveraging advanced analytical ML-driven approaches and integrating diverse data types, this study achieved significant accuracy, assisting patient’s risk estimation and supporting the feasibility for early detection of pancreatic cancer. Going beyond currently used biomarkers such as CEA, or CA19.9, EV-based features represent an added value offering increased diagnostic capacity.

History

Usage metrics

    Frontiers in Oncology

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC