Machine learning classification of Plasmodium falciparum virulence genes using genomic differentiation scores and boosting algorithms

Authors

  • Hussein KHT Department of Biology, College of Education for Pure Sciences, Tikrit University, IRAQ

DOI:

https://doi.org/10.38029/babcockuniv.med.j..v9i1.1226

Keywords:

Plasmodium falciparum, Machine Learning, Virulence Genes, Genomic Differentiation, LightGBM, SHAP, Bioinformatics

Abstract

Objective: This study aims to identify virulence-associated genes in Plasmodium falciparum by applying machine learning models to genomic differentiation features, to aid in the discovery of novel therapeutic targets.

Methods: We utilised a dataset of 5,561 P. falciparum genes, labelled based on membership in known virulence gene families (VAR, RIF, EPF, RESA). Three genomic differentiation scores, Global Differentiation, Local Differentiation, and Distance to Higher Local Differentiation, served as input features. We evaluated five classifiers: Random Forest, Gradient Boosting, Support Vector Machine, XGBoost, and LightGBM. To handle class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied strictly within stratified 5-fold cross-validation folds, alongside hyperparameter tuning. Performance was assessed using accuracy, precision, recall (sensitivity), F1-score, and Area Under the Precision-Recall Curve (AUC-PR).

Results: LightGBM achieved the highest performance with a test accuracy of 85.14% ± 1.2% and an AUC-PR of 0.87 ± 0.02, significantly outperforming the next best model, XGBoost (p = 0.018). Feature importance analysis via SHAP (Shapley Additive Explanations) identified Local Differentiation Score as the most predictive feature.

Conclusion: Boosting algorithms, particularly LightGBM, are highly effective for classifying virulence genes based on genomic differentiation patterns. This approach provides a scalable, data-driven method for prioritising candidate virulence factors in P. falciparum for functional validation.

Published

2026-04-01

How to Cite

Hussein, K. (2026). Machine learning classification of Plasmodium falciparum virulence genes using genomic differentiation scores and boosting algorithms. Babcock University Medical Journal, 9(1), 230–235. https://doi.org/10.38029/babcockuniv.med.j.v9i1.1226

Issue

Section

Basic Medical Research