Pengembangan Model Prediktif Akurat untuk Deteksi Dini Kanker Payudara: Analisis Algoritma Pohon Keputusan dan Optimasi Hiperparameter dengan SMOTE
Abstract
Kanker payudara merupakan masalah kesehatan global yang memerlukan deteksi dini untuk meningkatkan harapan hidup pasien. Tantangan signifikan dalam pengembangan sistem diagnostik adalah data medis yang sering tidak seimbang, di mana kasus kanker seringkali merupakan kelas minoritas. Penelitian ini bertujuan mengembangkan model prediktif akurat untuk deteksi dini kanker payudara, dengan menganalisis kinerja algoritma pohon keputusan dan mengoptimalkan berbagai parameter kuncinya, sembari mengatasi ketidakseimbangan data menggunakan teknik penambahan sampel minoritas sintetis. Eksperimen dilakukan pada dataset kanker payudara dengan memvariasikan status penggunaan teknik penambahan sampel, jumlah tetangga, jumlah pohon, kriteria pemisahan, kedalaman maksimum pohon, strategi pemangkasan, strategi pengambilan keputusan, serta eksekusi paralel. Kinerja model dievaluasi komprehensif menggunakan berbagai metrik seperti akurasi, nilai Kappa, dan kemampuan mendeteksi kelas minoritas. Hasil menunjukkan bahwa penggunaan teknik penambahan sampel secara signifikan meningkatkan identifikasi kasus kanker. Konfigurasi optimal yang melibatkan kriteria pemisahan tertentu dan jumlah pohon yang lebih banyak menghasilkan kinerja diagnostik yang konsisten. Optimalisasi kedalaman pohon dan pemangkasan krusial dalam menghindari ketidaksesuaian model, dan eksekusi paralel mempercepat proses komputasi. Model yang dikembangkan mencapai akurasi 81.20% dan nilai Kappa 0.622. Penelitian ini menegaskan pentingnya optimasi parameter model dan penanganan data tidak seimbang untuk meningkatkan akurasi deteksi dini kanker payudara, mendukung pengembangan alat diagnostik yang lebih andal.
References
Abuzinadah, N., Umer, M., Ishaq, A., Hejaili, A. Al, Alsubai, S., Eshmawi, A. A., Mohamed, A., & Ashraf, I. (2023). Role of convolutional features and machine learning for predicting student academic performance from MOODLE data. PLoS ONE, 18(11 November), 1–22. https://doi.org/10.1371/journal.pone.0293061
Akila, S., & Allin Christe, S. (2022). A wrapper based binary bat algorithm with greedy crossover for attribute selection. Expert Systems with Applications, 187(September 2021), 115828. https://doi.org/10.1016/j.eswa.2021.115828
Andre, F. (2023). Annals of Oncology 2018-2023. Annals of Oncology, 34(12), 1069–1070. https://doi.org/10.1016/j.annonc.2023.08.019
Badrouchi, S., Ahmed, A., Mongi Bacha, M., Abderrahim, E., & Ben Abdallah, T. (2021). A machine learning framework for predicting long-term graft survival after kidney transplantation. Expert Systems with Applications, 182, 115235. https://doi.org/https://doi.org/10.1016/j.eswa.2021.115235
Borowska, K., & Stepaniuk, J. (2019). A rough-granular approach to the imbalanced data classification problem. Applied Soft Computing, 83, 105607. https://doi.org/https://doi.org/10.1016/j.asoc.2019.105607
Clémentin, T. D., Cabrel, T. F. L., & Belise, K. E. (2021). A novel algorithm for extracting frequent gradual patterns. Machine Learning with Applications, 5, 100068. https://doi.org/https://doi.org/10.1016/j.mlwa.2021.100068
Ghorbian, M., & Ghorbian, S. (2023). Usefulness of machine learning and deep learning approaches in screening and early detection of breast cancer. Heliyon, 9(12), e22427. https://doi.org/10.1016/j.heliyon.2023.e22427
Guanin-Fajardo, J. H., Guaña-Moya, J., & Casillas, J. (2024). Predicting Academic Success of College Students Using Machine Learning Techniques. Data, 9(4), 1–27. https://doi.org/10.3390/data9040060
Hasan, A. M., Al-Waely, N. K. N., Aljobouri, H. K., Jalab, H. A., Ibrahim, R. W., & Meziane, F. (2024). Molecular subtypes classification of breast cancer in DCE-MRI using deep features. Expert Systems with Applications, 236(August 2023), 121371. https://doi.org/10.1016/j.eswa.2023.121371
Hassoun, S., Bruckmann, C., Ciardullo, S., Perseghin, G., Di Gaudio, F., & Broccolo, F. (2023). Setting up of a machine learning algorithm for the identification of severe liver fibrosis profile in the general US population cohort. International Journal of Medical Informatics, 170, 104932. https://doi.org/https://doi.org/10.1016/j.ijmedinf.2022.104932
Hou, J., Liu, J., Chen, F., Li, P., Zhang, T., Jiang, J., & Chen, X. (2023). Robust lithium-ion state-of-charge and battery parameters joint estimation based on an enhanced adaptive unscented Kalman filter. Energy, 271, 126998. https://doi.org/https://doi.org/10.1016/j.energy.2023.126998
Hussein, M., Elnahas, M., & Keshk, A. (2024). A framework for predicting breast cancer recurrence. Expert Systems with Applications, 240(February 2023), 122641. https://doi.org/10.1016/j.eswa.2023.122641
Li, Q., Zhang, Z., & Ma, Z. (2023). Raman spectral pattern recognition of breast cancer: A machine learning strategy based on feature fusion and adaptive hyperparameter optimization. Heliyon, 9(7), e18148. https://doi.org/10.1016/j.heliyon.2023.e18148
Liu, Y., Fu, Y., Peng, Y., & Ming, J. (2024). Clinical decision support tool for breast cancer recurrence prediction using SHAP value in cooperative game theory. Heliyon, 10(2), e24876. https://doi.org/10.1016/j.heliyon.2024.e24876
Løyland, B., Sandbekken, I. H., Grov, E. K., & Utne, I. (2024). Causes and Risk Factors of Breast Cancer, What Do We Know for Sure? An Evidence Synthesis of Systematic Reviews and Meta-Analyses. Cancers, 16(8). https://doi.org/10.3390/cancers16081583
Macedo, M., Santana, M., dos Santos, W. P., Menezes, R., & Bastos-Filho, C. (2021). Breast cancer diagnosis using thermal image analysis: A data-driven approach based on swarm intelligence and supervised learning for optimized feature selection. Applied Soft Computing, 109, 107533. https://doi.org/10.1016/j.asoc.2021.107533
Nilashi, M., Ahmadi, H., Abumalloh, R. A., Alrizq, M., Alghamdi, A., & Alyami, S. (2024). Knowledge discovery of patients reviews on breast cancer drugs: Segmentation of side effects using machine learning techniques. Heliyon, 10(19), e38563. https://doi.org/10.1016/j.heliyon.2024.e38563
Prinzi, F., Orlando, A., Gaglio, S., & Vitabile, S. (2024). Breast cancer classification through multivariate radiomic time series analysis in DCE-MRI sequences. Expert Systems with Applications, 249(PA), 123557. https://doi.org/10.1016/j.eswa.2024.123557
Ragni, A., Ippolito, D., & Masci, C. (2024). Assessing the impact of hybrid teaching on students’ academic performance via multilevel propensity score-based techniques. Socio-Economic Planning Sciences, 92(December 2023). https://doi.org/10.1016/j.seps.2024.101824
Shi, L., Yan, F., & Liu, H. (2023). Screening model of candidate drugs for breast cancer based on ensemble learning algorithm and molecular descriptor. Expert Systems with Applications, 213(PC), 119185. https://doi.org/10.1016/j.eswa.2022.119185
Tariq, M., Iqbal, S., Ayesha, H., Abbas, I., Ahmad, K. T., & Niazi, M. F. K. (2021). Medical image based breast cancer diagnosis: State of the art and future directions. Expert Systems with Applications, 167(June 2020), 114095. https://doi.org/10.1016/j.eswa.2020.114095
Triayudi, A., Aldisa, R. T., & Sumiati, S. (2024). New Framework of Educational Data Mining to Predict Student Learning Performance. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications, 15(1), 115–132. https://doi.org/10.58346/JOWUA.2024.I1.009
Vergaray, A. D., Guerra, C., Cervera, N., & Burgos, E. (2022). Predicting Academic Performance using a Multiclassification Model: Case Study. International Journal of Advanced Computer Science and Applications, 13(9), 881–889. https://doi.org/10.14569/IJACSA.2022.01309102
Yan, F., Huang, H., Pedrycz, W., & Hirota, K. (2023). Automated breast cancer detection in mammography using ensemble classifier and feature weighting algorithms. Expert Systems with Applications, 227(April), 120282. https://doi.org/10.1016/j.eswa.2023.120282
Zeiser, F. A., da Costa, C. A., Roehe, A. V., Righi, R. da R., & Marques, N. M. C. (2021). Breast cancer intelligent analysis of histopathological data: A systematic review. Applied Soft Computing, 113, 107886. https://doi.org/10.1016/j.asoc.2021.107886



