A comprehensive evaluation of ensemble learning methods and decision trees for predicting trauma patient discharge status using real-world data

Document Type : Original Article

Authors

1 Department of Health Information Management and Technology, Kashan University of Medical Sciences, Kashan, Iran & Research Centre for Health Information Management, Kashan University of Medical Sciences, Kashan, Iran

2 Medical Informatics Department, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran

3 Trauma Research Center, Kashan University of Medical Sciences, Kashan, Iran

Abstract

Background: Trauma registries collect and document data about the acute injury care in hospitals. The goal of trauma care systems is to reduce injury occurrence and enhance trauma patient survival rates.
Objectives: In this article, the Kashan trauma registry was used to predict trauma patient discharge status using machine learning.
Methods: This study employed 3930 Kashan Trauma Centre Registry entries after preprocessing. The study experimented with decision trees of varying complexity, using three separate metrics - information gain, Gini index, and gain ratio - to build and evaluate the trees. Finally, bagging, boosting and stacking ensemble learning techniques were implemented to evaluate their predictive performance. Ensemble learning models were developed based on decision trees of varying depths that utilized different learning measures/metrics. The predictive performance of the algorithms was evaluated using metrics such as accuracy, precision, recall, and the area under the receiver operating characteristic curve (AUC). This study aimed to compare ensemble-learning techniques like bagging, boosting and stacking to decision trees configured with various parameter settings, to assess their ability to predict trauma patients' discharge status outcomes.
Results: The stacking technique, which used decision tree algorithms (depth=5) that integrated parameters like information gain, gain ratio and Gini index at the base level along with KNN (k=12) using Euclidean distance, and then incorporated logistic regression as the meta-classifier, demonstrated superior predictive performance compared to using individual decision trees, bagging or boosting approaches alone.
Conclusion: However, while decision trees are straightforward algorithms and ensemble methods are more time-consuming and computationally complex, this study indicates that stacking learning is superior to single decision tree methods with a variety of parameters, bagging, and boosting.

Keywords


  1. Trauma 2021. https://www.apa.org/topics/trauma.
  2. Traumatic Brain Injury & Concussion 2021. https://www.cdc.gov/traumaticbraininjury/index.html.
  3. Injuries and violence 2021 [Available from: https://www.who.int/news-room/fact-sheets/detail/injuries-and-violence.
  4. Nathens AB, Brunet FP, Maier RV. Development of trauma systems and effect on outcomes after injury. The Lancet 2004; 363 (9423):1794-801. doi:10.1016/S0140-6736(04)16307-1 PMid:15172780
  5. Global status report on road safety 2018. Geneva 2018.
  6. Pfeifer R, Tarkin IS, Rocos B, Pape H-C. Patterns of mortality and causes of death in polytrauma patients-has anything changed? Injury 2009;40(9):907-11. doi:10.1016/j.injury.2009.05.006 PMid:19540488
  7. Murray CJ, Lopez AD. Alternative projections of mortality and disability by cause 1990-2020: Global Burden of Disease Study. lancet 1997;349(9064):1498-504. doi:10.1016/S0140-6736(96)07492-2 PMid:9167458
  8. McGwin Jr G, MacLennan PA, Fife JB, Davis GG, Rue III LW. Preexisting conditions and mortality in older trauma patients. J Trauma Acute Care Surg 2004;56(6):1291-6. doi:10.1097/01.TA.0000089354.02065.D0 PMid:15211139
  9. Milzman DP, Boulanger BR, Rodriguez A, Soderstrom CA, Mitchell KA, Magnant CM. Pre-existing disease in trauma patients: a predictor of fate independent of age and injury severity score. J Trauma 1992;32(2):236-43. doi:10.1097/00005373-199202000-00021
  10. Morris JA, MacKenzie EJ, Edelstein SL. The effect of preexisting conditions on mortality in trauma patients. JAMA 1990;263 (14): 1942-6. doi:10.1001/jama.1990.03440140068033 PMid:2313871
  11. Lefering R, Ruchholtz S. Trauma registries in Europe. Eur J Trauma Emerg Surg 2012;38:1-2. doi:10.1007/s00068-011-0169-3 PMid:26815665
  12. Rutledge R. The goals, development, and use of trauma registries and trauma data sources in decision making in injury. Surg Clin N Am 1995;75(2):305-26. doi:10.1016/S0039-6109(16)46590-4 PMid:7900000
  13. Moore L, Clark DE. The value of trauma registries. Injury 2008;39 (6):686-95. doi:10.1016/j.injury.2008.02.023 PMid:18511052
  14. Stonko DP, Guillamondegui OD, Fischer PE, Dennis BM. Artificial intelligence in trauma systems. Surgery 2021;169(6): 1295-9. doi:10.1016/j.surg.2020.07.038 PMid:32921479
  15. Farrow L, Zhong M, Ashcroft GP, Anderson L, Meek RMD. Interpretation and reporting of predictive or diagnostic machine-learning research in Trauma & Orthopaedics. Bone Joint J 2021;103-b(12):1754-8. doi:10.1302/0301-620X.103B12.BJJ-2021-0851.R1 PMid:34847720
  16. El Naqa I, Murphy MJ. What is machine learning?: Springer; 2015. doi:10.1007/978-3-319-18305-3_1
  17. Santhanam P, Ahima RS. Machine learning and blood pressure. J Clin Hypertens 2019;21(11):1735-7. doi:10.1111/jch.13700 PMid:31536164 PMCid:PMC8030505
  18. Simjanoska M, Gjoreski M, Gams M, Madevska Bogdanova A. Non-invasive blood pressure estimation from ECG using machine learning techniques. Sensors 2018;18(4):1160. doi:10.3390/s18041160 PMid:29641430 PMCid:PMC5949031
  19. Lindsay GW. Attention in psychology, neuroscience, and machine learning. Front Comput Neurosci 2020;14:29. doi:10.3389/fncom.2020.00029 PMid:32372937 PMCid:PMC7177153
  20. Makroum MA, Adda M, Bouzouane A, Ibrahim H. Machine learning and smart devices for diabetes management: Systematic review. Sensors 2022;22(5):1843. doi:10.3390/s22051843 PMid:35270989 PMCid:PMC8915068
  21. Li K, Wu H, Pan F, Chen L, Feng C, Liu Y, et al. A machine learning-based model to predict acute traumatic coagulopathy in trauma patients upon emergency hospitalization. Clin Appl Thromb Hemost 2020;26:1076029619897827. doi:10.1177/1076029619897827 PMid:31908189 PMCid:PMC7098202
  22. Akbari M, Atoof F, Nazari-Alam A, Fatemi Nasab ZS, Miranzadeh MB, Mirzaei N. Assessment of bacterial bioaerosols and particulate matter characteristics in indoor air of dentistry clinics. Int Arch Health Sci 2023; 10(3): 130-136. doi: 10.48307/iahsj.2023.179403
  23. Tang H, Li G, Liu C, Huang D, Zhang X, Qiu Y, et al. Diagnosis of lymph node metastasis in head and neck squamous cell carcinoma using deep learning. Laryngoscope Investig Otolaryngol 2022; 7 (1):161-9. doi:10.1002/lio2.742 PMid:35155794 PMCid:PMC8823170
  24. Sarmadi H, Entezami A, Saeedi Razavi B, Yuen KV. Ensemble learning‐based structural health monitoring by Mahalanobis distance metrics. Struct Control Health Monit 2021;28 (2): e2663. doi:10.1002/stc.2663
  25. Cao L, Li Y, Zhang J, Jiang Y, Han Y, Wei J. Electrical load prediction of healthcare buildings through single and ensemble learning. Energy Rep 2020;6:2751-67. doi:10.1016/j.egyr.2020.10.005 PMCid:PMC7560125
  26. Raza K. Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule. U-Healthcare Monitoring Systems: Elsevier; 2019. p. 179-96. doi:10.1016/B978-0-12-815370-3.00008-6
  27. Kalmady SV, Greiner R, Agrawal R, Shivakumar V, Narayanaswamy JC, Brown MR, et al. Towards artificial intelligence in mental health by improving schizophrenia prediction with multiple brain parcellation ensemble-learning. NPJ Schizophrenia 2019;5(1):1-11. doi:10.1038/s41537-018-0070-8 PMid:30659193 PMCid:PMC6386753
  28. Zhang C, Ma Y. Ensemble machine learning: methods and applications: Springer; 2012. doi:10.1007/978-1-4419-9326-7
  29. Zhou Z-H. Ensemble methods: foundations and algorithms: CRC press; 2012. doi:10.1201/b12207
  30. Sahid MA, Hasan MZ, Akter N, Tareq MMR. Effect of Imbalance Data Handling Techniques to Improve the Accuracy of Heart Disease Prediction using Machine Learning and Deep Learning. IEEE Region 10 Symposium 2022:1-6. doi:10.1109/TENSYMP54529.2022.9864473
  31. Ghorpade SJ, Chaudhari RS, Patil SS. Enhancement of Imbalance Data Classification with Boosting Methods: An Experiment. ECS Transactions 2022. doi:10.1149/10701.15923ecst
  32. Yao M, Zhu Y, Li J, Wei H, He P. Research on Predicting Line Loss Rate in Low Voltage Distribution Network Based on Gradient Boosting Decision Tree. Energies 2019. doi:10.3390/en12132522
  33. Lee M-W, Chen SY, editors. D Decision Tree Applications for Data Modelling2016. doi:10.4018/978-1-59904-849-9.ch067
  34. Song YY, Lu Y. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 2015;27(2):130-5. doi:10.11919/j.issn.1002-0829.215044.
  35. Abujaber A, Fadlalla A, Gammoh D, Abdelrahman H, Mollazehi M, El-Menyar A. Prediction of in-hospital mortality in patients with post traumatic brain injury using National Trauma Registry and Machine Learning Approach. Scand J Trauma Resusc Emerg Med 2020;28:1-10. doi:10.1186/s13049-020-00738-5 PMid:32460867 PMCid:PMC7251921
  36. Feng J-z, Wang Y, Peng J, Sun M-w, Zeng J, Jiang H. Comparison between logistic regression and machine learning algorithms on survival prediction of traumatic brain injuries. J Crit Care 2019; 54:110-6. doi:10.1016/j.jcrc.2019.08.010 PMid:31408805
  37. Gravesteijn BY, Nieboer D, Ercole A, Lingsma HF, Nelson D, Van Calster B, et al. Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury. J Clin Epidemiol 2020;122:95-107. doi:10.1016/j.jclinepi.2020.03.005 PMid:32201256
  38. Lu H-Y, Li T-C, Tu Y-K, Tsai J-C, Lai H-S, Kuo L-T. Predicting long-term outcome after traumatic brain injury using repeated measurements of Glasgow Coma Scale and data mining methods. J Med Syst 2015;39:1-10. doi:10.1007/s10916-014-0187-x PMid:25637541
  39. van der Ploeg T, Nieboer D, Steyerberg EW. Modern modeling techniques had limited external validity in predicting mortality from traumatic brain injury. J Clin Epidemiol 2016;78:83-9. doi:10.1016/j.jclinepi.2016.03.002 PMid:26987507
  40. Hertz AM, Hertz NM, Johnsen NV. Identifying bladder rupture following traumatic pelvic fracture: a machine learning approach. Injury 2020;51(2):334-9. doi:10.1016/j.injury.2019.12.009 PMid:31866131
  41. Al-Shehari T, Alsowail RA. An insider data leakage detection using one-hot encoding, synthetic minority oversampling and machine learning techniques. Entropy 2021;23(10):1258. doi:10.3390/e23101258 PMid:34681982 PMCid:PMC8535057
  42. Rodríguez P, Bautista MA, Gonzalez J, Escalera S. Beyond one-hot encoding: Lower dimensional target embedding. Image Vis Comput 2018;75:21-31. doi:10.1016/j.imavis.2018.04.004
  43. Breiman L. Bagging predictors. Machine Learning 1996;24(2):123-40. doi:10.1007/BF00058655
  44. Ensemble methods 2023 [Available from: https://scikit-learn.org/stable/modules/ensemble.html.
  45. Fernández-Delgado M, Cernadas E, Barro S, Amorim D. Do we need hundreds of classifiers to solve real world classification problems? J Mach Lear Res 2014; 15 (1): 3133-81.
  46. Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1997;55(1):119-39. doi:10.1006/jcss.1997.1504
  47. Caruana R, Niculescu-Mizil A, editors. An empirical comparison of supervised learning algorithms. Proceedings of the 23rd international conference on Machine learning; 2006. doi:10.1145/1143844.1143865
  48. David HW. Stacked generalization. Neural Netw 1992;5(2): 241-59. doi:10.1016/S0893-6080(05)80023-1
  49. Russell S, Norvig P. Artificial intelligence: A modern approach prentice-hall. Englewood cliffs 1995.
  50. Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, et al. Top 10 algorithms in data mining. Knowl Inf Syst 2008;14(1):1-37. doi:10.1007/s10115-007-0114-2
  51. Carbonell JG, Michalski RS, Mitchell TM. An overview of machine learning. Mach Lear 1983:3-23. doi:10.1016/B978-0-08-051054-5.50005-4
  52. Sperandei S. Understanding logistic regression analysis. Biochem Med 2014;24(1):12-8. doi:10.11613/BM.2014.003 PMid:24627710 PMCid:PMC3936971
  53. Yang Y, Wei L, Hu Y, Wu Y, Hu L, Nie S. Classification of Parkinson's disease based on multi-modal features and stacking ensemble learning. J Neurosci Methods 2021; 350:109019. doi:10.1016/j.jneumeth.2020.109019 PMid:33321153
  54. Chaurasia V, Pal S. Stacking-Based Ensemble Framework and Feature Selection Technique for the Detection of Breast Cancer. SN Comput Sci 2021;2(2):67. doi:10.1007/s42979-021-00465-3
  55. Dhanya R, Paul IR, Akula SS, Sivakumar M, Nair JJ. F-test feature selection in Stacking ensemble model for breast cancer prediction. Procedia Comput Sci 2020;171:1561-70. doi:10.1016/j.procs.2020.04.167
  56. Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 1992;46(3):175-85. doi:10.1080/00031305.1992.10475879
  57. Alpaydin E. Design and analysis of machine learning experiments. 2010. https://ieeexplore.ieee.org/abstract/document/6284951/metrics#metrics
  58. Carreira-Perpiñán MÁ, Zharmagambetov A. Ensembles of Bagged TAO Trees Consistently Improve over Random Forests, AdaBoost and Gradient Boosting. Proceedings of the ACM-IMS on Foundations of Data Science Conference 2020. doi:10.1145/3412815.3416882
  59. García EM, Alberti MG, Arcos Álvarez AA. Measurement-While-Drilling Based Estimation of Dynamic Penetrometer Values Using Decision Trees and Random Forests. Appl Sci 2022. doi:10.3390/app12094565
  60. Powers M. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. J Mach Lear Technol. doi:10.48550/arXiv.2010.16061.
  61. Singh A, Mehta JC, Anand D, Nath P, Pandey B, Khamparia A. An intelligent hybrid approach for hepatitis disease diagnosis: Combining enhanced k‐means clustering and improved ensemble learning. Expert Syst 2021;38(1):e12526. doi:10.1111/exsy.12526
  62. Shafi A, Rahman MB, Anwar T, Halder RS, Kays HE. Classification of brain tumors and auto-immune disease using ensemble learning. Inform Med Unlocked 2021;24:100608. doi:10.1016/j.imu.2021.100608
  63. Gelbard RB, Hensman H, Schobel S, Khatri V, Tracy BM, Dente CJ, et al. Random forest modeling can predict infectious complications following trauma laparotomy. J Trauma Acute Care Surg 2019;87(5):1125-32. doi:10.1097/TA.0000000000002486 PMid:31425495