الفهرس | Only 14 pages are availabe for public view |
Abstract Heart diseases (HDs) are considered one of the most dangerous diseases that affect humans and can lead to death. The failure to detect heart problems in their early stages is a major factor contributing to the increasing number of deaths caused by heart disease (HD). According to statistics provided by the World Health Organization (WHO), approximately 31% of total deaths worldwide are due to HDs. Many researchers have analyzed the factors that contribute to the incidence of HDs in order to use them in building models based on machine learning (ML) for early prediction of patients with heart problems. However, the models used by researchers suffer from many problems, including a lack of data and reliance on a small dataset during the process of training ML models. Therefore, the models used need to be improved to obtain accurate results. In order to develop reliable models for the early prediction of HD patients and contribute to reducing the number of deaths resulting from this disease, our work is divided into two steps. The first step involves analyzing the factors that contribute to HDs and using them to build models based on ML for the early prediction of patients with heart problems, in order to reduce the mortality rate resulting from this disease. The second step involves improving the models used, as these models suffer from various problems such as high data dimensionality and relying on small datasets during their training process. By doing so, researchers will be able to obtain more accurate results in predicting patients with HD in its early stages, and thus reduce the deaths resulting from this disease. The contribution to this study is divided into two parts: Part one: The first part proposes a practical approach for predicting HDs using five different datasets from the UCI repository. The proposed approach includes two main processes: data preparation and prediction. Three feature selection methods, including Principal Component Analysis (PCA), are used in the data preparation process to identify the most relevant features from the data. Fourteen different prediction techniques are applied to the preprocessed datasets and are evaluated using four metrics: accuracy, precision, recall, and F1 score. The experimental results indicate that the LASSO method |