Research on Loan Default Prediction and Influencing Factors Based on Catboost-Shap Value

Hits: 857
Research areas:
Year:
2023
Type of Publication:
Article
Keywords:
Loan Default Prediction, Embedded Feature Selection, CatBoost, SHAP Value
Authors:
Ningxin Bi; Zhezhe Cai; Feiyue Du; Surui Xiang; Haoran Yang
Journal:
IJAIM
Volume:
12
Number:
3
Pages:
1-11
Month:
November
ISSN:
2320-5121
Abstract:
Personal credit business is the main business of banks and various financial institutions, and with the development of the Internet, credit business has gradually become the core segment of Internet finance companies. However, while personal credit business brings revenue for enterprises, it is also accompanied by a huge risk of default. How to better apply new technologies such as machine learning in the field of credit to achieve risk control and effectively improve returns has become the focus of traditional financial institutions and Internet finance companies. Based on this background, this paper uses real lending data to do empirical analysis, constructs an integrated learning model that can accurately determine whether a user will default or not, and researches the influencing factors of default risk based on the model, trying to resolve the contradiction between the prediction accuracy and interpretability of machine learning models. Firstly, data processing is done on the dataset to deal with the problems of missing data and anomalies, and the imbalance of data is solved with SMOTE algorithm. For feature engineering, selection of features is made with embedded feature selection. Subsequently, the construction of the loan default prediction model based on the CatBoost model model was carried out and the model evaluation was performed, and the analysis found that all the indicators of Catboost were higher, which indicated that the model's performance was better, and it could be used to accurately predict loan defaults. Finally, the method of SHAP value is used to study the influencing factors of user default risk, exploring the influencing factors from the perspective of individual samples and the model as a whole, so as to analyse the influencing factors of default risk of the key users and the platform as a whole, so as to provide references for commercial banks and other lending platforms when they provide credit products to borrowers.
Full text: IJAIM_670_FINAL.pdf

Indexed By