Genomic Data Machined: The Random Forest Algorithm for Discovering Breast Cancer Biomarkers
Year:
2023Published in:
SpringerAdvanced data analysis tools and bioinformatics are essential foruncovering the nature of breast cancer, which is the leading cause of cancer deathamong women. The goal of this study is to identify potential genomic biomark-ers that have a significant impact on four prognostic factors, including tumoursize, lymph node involvement, metastasis, and overall survival status. The Ran-dom Forest algorithm has been trained on data from The Cancer Genome AtlasBreast Cancer, which contains the expression values of 19,737 genes. In orderto obtain the optimal learning model, the process has been repeated 20 times foreach indicator, and only the genes with a p-value <0.05 were taken into furtherconsideration. Several performance metrics (e.g., F1 score) were calculated tocheck the algorithm’s reliability. As a result, 97 and 7 genes were included in theextended and final databases, respectively. The chosen genes have been provento play a critical role in cancer-related pathways, such as Toll-like receptor andNF-κB, and have effects on cell proliferation, tumour formation, and angiogene-sis. Thus, this study demonstrates the potential of machine learning analyses forbiomedical purposes and provides machine-generated insights into breast cancerdevelopment, setting the groundwork for further in vitro examinations to validatethe prognostic potential of these biomarkers