Search In this Thesis
   Search In this Thesis  
Using some data mining approaches with application on insurance data /
Shaban, Daila Sherif.
هيئة الاعداد
باحث / داليا شريف شعبان
مشرف / زهدي محمد نوفل
مشرف / آية شحاتة محمود
مناقش / صلاح مهدي محمد
Data Mining. Insurance Data processing.
تاريخ النشر
عدد الصفحات
148 p. :
الإحصاء والاحتمالات
تاريخ الإجازة
مكان الإجازة
جامعة بنها - كلية التجارة - الاحصاء
Only 14 pages are availabe for public view

from 147

from 147


Insurance faces many problems but it has various objectives to achieve. Fraud is one of the insurance problems and providing insurance services is one of its objectives. Two types of data are used and analyzed by data mining techniques, as data mining is very important to predict whether fraud or to improve the quality of insurance services provided. This study uses a various of data mining classification techniques to identify and predict the target class (car insurance “the target of the 1st data” and fraud_reported “the target of the second data”). The data was cleaned and pre-processed by removing duplication, filling the missing data, managing the categorical data by label encoding and detecting the outliers. Then the data was split into train and test data. After that, using the standardization feature scaling for the data and using the balance techniques. Finally, the data was evaluated by some data mining models. According to the first data “car insurance data”; the best model that gives the highest results is Random Forest at the undersampling technique. The Random Forest model gives 83.111% accuracy, 83.035% recall, 74.598% precision, 78.591% F1_score and 64.965% MCC. By comparing the proposed new models, we find that the best model of them is GRA with 82.444% accuracy, recall equals 76.724%, precision value is 77.616%, F1_score is 77.167%, MCC value is 62.911% and 81% AUC in the case of soft classifier after applying SMOTE technique. According the insurance fraud detection data, the best model is AdaBoost after applying SMOTE technique. The results of AdaBoost are 92% accuracy, 73.170% recall, 81.081% precision, 76.923% F1_score, 72.238% MCC and 85% AUC. By comparing the proposed new models, we find that the best model of them is AGR. The results of AGR model with hard classifier are 89.333% accuracy, 68.292% recall, 71.794% precision, 70% F1_score, 63.547% MCC and 81% AUC.