Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

CGrant109/Supervised-Machine-Learning-Homework---Predicting-Credit-Risk

Folders and files, repository files navigation, supervised machine learning homework - predicting credit risk.

credit-risk.png

In this assignment, you will be building a machine learning model that attempts to predict whether a loan will be approved or not.

Lending services companies allow individual investors to partially fund personal loans as well as buy and sell notes backing the loans on a secondary market. This data will be used to

You will be using this data to create machine learning models to classify the risk level of given loans. Specifically, you will be comparing the Logistic Regression model and Random Forest Classifier.

Instructions

Retrieve the data.

The data is located in the Resources folder.

  • lending_data.csv

Import the data using Pandas.

Consider the models

You will be creating and comparing two models on this data: a logistic regression, and a random forests classifier. Before you create, fit, and score the models, make a prediction as to which model you think will perform better. You do not need to be correct! Write down (in markdown cells in your Jupyter Notebook or in a separate document) your prediction, and provide justification for your educated guess.

Fit a LogisticRegression model and RandomForestClassifier model

Create a LogisticRegression model, fit it to the data, and print the model's score. Do the same for a RandomForestClassifier. You may choose any starting hyperparameters you like. Which model performed better? How does that compare to your prediction? Write down your results and thoughts.

Unit 19 - Supervised Machine Learning Homework Rubric

Loan Approval Dataset (2022). Data generated by Trilogy Education Services, a 2U, Inc. brand, and is intended for educational purposes only.

© 2022 Trilogy Education Services, a 2U, Inc. brand. All Rights Reserved.

  • Jupyter Notebook 100.0%

Credit_Risk_Analysis

Using Supervised Machine Learning to Predict Credit Risk

credit_risk

Image source

Overview of Analysis

This project consists of three technical analysis deliverables and a written report.

Deliverable 1: Use Resampling Models to Predict Credit Risk

Deliverable 2: Use the SMOTEENN Algorithm to Predict Credit Risk

Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk

Deliverable 4: A Written Report on the Credit Risk Analysis (README.md)

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, we needed to employ different techniques to train and evaluate models with unbalanced classes.

Using the credit card credit dataset from LendingClub, a peer-to-peer lending services company, we’ll oversample the data using the RandomOverSampler and SMOTE algorithms, and undersample the data using the ClusterCentroids algorithm. Then, we’ll use a combinatorial approach of over- and undersampling using the SMOTEENN algorithm. Next, we’ll compare two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier , to predict credit risk.

Data source:

  • (1) credit_risk_ensemble_starter_code, (2) credit_risk_resampling_starter_code, (3) LoanStats_2019Q1
  • Python 3.9.10, Jupyter Lab 4.6, Visual Studio Code 1.71.2

Methodology

D1: use resampling models to predict credit risk.

Using the imbalanced-learn and scikit-learn libraries, we evaluated three machine learning models by using resampling to determine which was better at predicting credit risk. First, we used the oversampling RandomOverSampler and SMOTE algorithms, and then we used the undersampling ClusterCentroids algorithm. Using these algorithms, we resampled the dataset, viewed the count of the targeted classes, trained a logistic regression classifier, calculated the balanced accuracy score, generated a confusion matrix, and generated a classification report.

D2: Use the SMOTEENN algorithm to Predict Credit Risk

Using the imbalanced-learn and scikit-learn libraries, we used a combinatorial approach of over- and undersampling with the SMOTEENN algorithm to determine if the results from the combinatorial approach were better at predicting credit risk than the resampling algorithms from Deliverable 1. Using the SMOTEENN algorithm, we resampled the dataset, viewed the count of the targeted classes, trained a logistic regression classifier, calculated the balanced accuracy score, generated a confusion matrix, and generated a classification report.

D3: Use Ensemble Classifiers to Predict Credit Risk

Using the imblearn.ensemble library, we train and compare two different ensemble classifiers, BalancedRandomForestClassifier and EasyEnsembleClassifier , to predict credit risk and evaluate each model. Using both algorithms, we resampled the dataset, viewed the count of the targeted classes, trained the ensemble classifier, calculated the balanced accuracy score, generated a confusion matrix, and generated a classification report.

D1. For all three algorithms, the following have been completed:

Randomoversampler.

  • Balance accuracy score: 0.832
  • Precision: high_risk 0.03 , low_risk 1.00
  • Recall: high_risk *0.82 , low_risk 0.84
  • F1: high_risk 0.06 , low_risk: 0.91

An accuracy score for the model is calculated:

amazon_dataset_example

Figure (1.1) RandomOverSampler balanced accuracy report

A confusion matrix has been generated:

amazon_dataset_example

Figure (1.2) RandomOverSampler matrix

#### An imbalanced classification report has been generated:

amazon_dataset_example

Figure (1.3) RandomOverSampler imbalanced classification report

  • Balance accuracy score: 0.884
  • Recall: high_risk *0.82 , low_risk 0.87
  • F1: high_risk 0.07 , low_risk: 0.93

amazon_dataset_example

Figure (1.4) SMOTE balanced accuracy report

amazon_dataset_example

Figure (1.5) SMOTE confusion matrix

An imbalanced classification report has been generated:

amazon_dataset_example

Figure (1.6) SMOTE imbalanced classification report

ClusterCentroids

amazon_dataset_example

Figure (1.7) ClusterCentroids balanced accuracy report

amazon_dataset_example

Figure (1.8) ClusterCentroids_confusion_matrix

amazon_dataset_example

Figure (1.9) ClusterCentroids imbalanced classification report

D2. The combinatorial SMOTEENN algorithm does the following:

amazon_dataset_example

Figure (1.10) SMOTEEN balanced accuracy report

amazon_dataset_example

Figure (1.11) SMOTEENN matrix

amazon_dataset_example

Figure (1.12) SMOTEENN imbalanced classification report

D3. The algorithm does the following:

Balancedrandomforestclassifier.

  • Balance accuracy score: 0.759
  • Recall: high_risk *0.63 , low_risk 0.88
  • F1: high_risk 0.06 , low_risk: 0.94

#### An accuracy score for the model is calculated:

amazon_dataset_example

Figure (1.13) BalancedRandomForestClassifier balanced accuracy report

#### A confusion matrix has been generated

amazon_dataset_example

Figure (n) BalancedRandomForestClassifier matrix

amazon_dataset_example

Figure (1.14) BalancedRandomForestClassifier imbalanced classification report

The features are sorted in descending order by feature importance:

amazon_dataset_example

Figure (1.15) features_sorted_in_descending_order

EasyEnsembleClassifier

  • Balance accuracy score: 0.932
  • Recall: high_risk 0.92 , low_risk 0.94
  • F1: high_risk 0.16 , low_risk: 0.97

An accuracy score of the model is calculated:

amazon_dataset_example

Figure (1.16) EasyEnsembleClassifier balanced accuracy_ report

#### A confusion matrix has been generated:

amazon_dataset_example

Figure (1.17) EasyEnsembleClassifier_matrix

amazon_dataset_example

Figure (1.18) EasyEnsembleClassifier imbalanced classification report

The recall (sensitivity) for prediction of the high risk and low_risk are in line with each other for most of the models. However, the precision for predicting high risk is much lower than it is for predicting low risk. The lower precision for high risk is reflected in the dropped F1 score.

In this scenario, the sensitivity is very high, while the precision is very low. Clearly, this is not a useful algorithm, so let’s take a look at the F1 value. A pronounced imbalance between sensitivity and precision will yield a low F1 score.

The F1 values for our models:

RandomOverSampler - 0.06

SMOTE - 0.07

ClusterCentroids - 0.07

SMOTEENN - 0.07

BalancedRandomForestClassifier - 0.06

EasyEnsembleClassifier - 0.16

The accuracy scores for our models:

RandomOverSampler - 83.2

SMOTE - 88.4

ClusterCentroids - 88.4

SMOTEENN - 88.4

BalancedRandomForestClassifier - 75.9

EasyEnsembleClassifier - 93.2

To summarize our results, we’ll focus on our targeted class (high_risk), out of the 6 models:

RandomOverSampler performed the worst with an F1 value of 0.06 (higher imbalance) and accuracy score of 83.2 .

EasyEnsembleClassifier performed the best with an F1 value of 0.16 (less imbalance) and accuracy score of 93.2 .

In general the models were not very good at predicting high risk since the F1 values for most models were between 0.06-0.07. However, from our 6 models we would recommend the EasyEnsembleClassifier since the model did a better job classifying the data, improved the F1 value from 0.06 to 0.16 , and had a better accuracy score with 93.2% .

For future evaluations we may want to explore the use of Precision-Recall Curve to compare the model performances in imbalanced data sets.

scikit-learn

imbalanced-learn

Supervised_Machine_Learning

Credit risk evaluator machine learning: supervised, predicting credit risk.

Build a machine learning model that attempts to predict whether a loan from LendingClub will become high risk or not.

Machine Learning

LendingClub is a peer-to-peer lending services company that allows individual investors to partially fund personal loans as well as buy and sell notes backing the loans on a secondary market. LendingClub offers their previous data through an API. Using this data to create machine learning models to classify the risk level of given loans. Specifically, comparing the Logistic Regression model and Random Forest Classifier.

Preprocessing

  • Convert categorical data to numeric
  • Consider the models
  • Create and compare two models on this data: a logistic regression, and a random forest classifier.
  • Fit a LogisticRegression model and RandomForestClassifier model

The Random Forest Classifier Model after scaling performed better than the Logistic Regression model when scaled as previously predicted. The inital Random Forest Classifer training score was 0.79 and test score was 1.0.

Credit Score

LendingClub (2019-2020) Loan Stats. Retrieved from: https://resources.lendingclub.com/ https://medium.com/@amirmehrbakhsh/credit-scores-2-0-how-ai-and-machine-learning-will-revamp-how-we-evaluate-credit-worthiness-f97d5e1e6de1 https://medium.com/henry-jia/how-to-score-your-credit-1c08dd73e2ed

A Concern of Predicting Credit Recovery on Supervised Machine Learning Approaches

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • S&P Dow Jones Indices
  • S&P Global Market Intelligence
  • S&P Global Mobility
  • S&P Global Commodity Insights
  • S&P Global Ratings
  • S&P Global Sustainable1
  • Investor Relations Overview
  • Investor Presentations
  • Investor Fact Book
  • News Releases
  • Quarterly Earnings
  • SEC Filings & Reports
  • Executive Committee
  • Corporate Governance
  • Merger Information
  • Stock & Dividends
  • Shareholder Services
  • Contact Investor Relations
  • Email Subscription Center
  • Media Center

S&P Global Market Intelligence

Machine Learning and Credit Risk Modelling

Net Zero 2050: How Climate Risk May Reshape the FTSE 100

From Royal Courts to Muddy Fields: Hunter Boot Ltd. goes into administration.

Early Warning Signals (EWS) ASEAN Case Study - KNM Group Berhad (Malaysia)

Data Centres: How to evaluate the risk of your project financing

  • 30 Nov, 2020
  • Author Luka Vidovic Lei Yue
  • Theme Credit Analysis
  • Segment Banking Corporations Insurance Investment Banking Private Equity Professional Services
  • Tags Credit Analytics AI Credit Risk Machine Learning

Download the Full Report

Machine Learning (ML) algorithms leverage large datasets to determine patterns and construct meaningful recommendations. Likewise, credit risk modelling is a field with access to a large amount of diverse data where ML can be deployed to add analytical value. In the following analysis, we explore how various ML techniques can be used for assessing probability of default (PD) and compare their performance in a real-world setting.

Machine Learning in Finance

A recent publication by the Bank of England (BoE) and the Financial Conduct Authority (FCA) reports the results of a survey on the use of ML in United Kingdom (UK) financial services. [1]    Results show that two-thirds of respondents use ML in some form. The use cases have passed the development stage and are starting to enter into the deployment stage. The banking and insurance sectors are advanced with respect to deployment, and ML is most often used in anti-money laundering and fraud-detection applications. The survey also notes that ML may amplify existing model risk, while validation frameworks still need to evolve to cope with the complexity of ML applications.

As ML is becoming more represented and influential in finance, it is important to recognize its benefits and drawbacks to prudently evaluate its performance. ML models have the potential to uncover subtle relationships, capture various nonlinearities, and process unstructured data. For example, applications such as fraud-detection analysis or textual data analytics benefit from not needing to predefine structure, that is, the theory behind finding patterns and extracting meaningful outputs. ML can do this without the need for humans to derive theoretical models with accompanied assumptions, and the data is empirically driving the ML model.

However, ML may still contain assumptions, such as the dataset does not contain. This can pose a significant challenge when analyzing noisy historical financial data and may lead to poor model performance. Imposing constraints on the model to control for model biases or counterintuitive behavior can also be an onerous task for some ML techniques. In addition, decomposing ML models can be complicated, thus creating issues when there is a need to explain the model’s functionality in detail. [2] [3] [4] [5]

We analyze the performance of selected ML algorithms for the prediction of PD. To make this analysis relevant and material, we use a real-world example of constructing a default prediction model for private companies. To that end, we collected a global sample of private companies across various industries. [6]   Private companies are a particularly relevant example for our analysis for a number of reasons. The universe of private companies is large and highly heterogeneous, as it includes large international corporations, as well as local small- and medium-sized enterprises. The composition of a global sample captures companies from various macroeconomic environments, thus introducing additional macroeconomic risk components. Additionally, private companies tend to publish limited and infrequent financial disclosures, which reduces the scope of available information.

The characteristics of private companies create a need for a default prediction model to be well designed in order to capture the heterogeneity of private companies and achieve good performance under the data availability constraints. We leverage the S&P Capital IQ platform to collect annual financials for private companies globally from 2002 to 2016. Our final sample includes a total of 52,500 observations, of which 8,200 companies have defaulted.

Feature Engineering: We ‘pre-treat’ the financial data by calculating relevant financial ratios to express various risk dimensions, such as profitability, leverage, and efficiency. We also include a Country Risk Score (CRS) and Industry Risk Score (IRS) as additional variables to help the model capture systemic risk components of various countries and industries. We also standardize the ratios to make them comparable and limit the impact of outliers, thus enabling the algorithms to achieve better performance.

Variable Selection: To account for the limited availability of private company financial data, we only use ratios that have sufficiently good coverage across the S&P Capital IQ platform, while also ensuring the representation of relevant risk dimensions. Such parsimonious construction simplifies the use of the model in deployment, as it requires fewer inputs and less data handling, and increases the model coverage. This is especially important for private companies, where financial data is generally more infrequent and less comprehensive. Table 1 contains the final list of selected variables used to train the PD model with various ML algorithms.

Table 1: List of variables used to train PD models for private companies

supervised machine learning homework predicting credit risk

Source: S&P Global Market Intelligence. As of January 21 2020. For illustrative purposes only.  

In-Sample and Out-of-Sample Analysis: We split the dataset of private companies into two samples to help assess the performance of the model in a real-world deployment. The in-sample portion (90%) represents our training dataset and is used to develop the model, while the out-of-sample portion (10%) is used to evaluate the model. We also make sure that the two datasets are similar with respect to the default rate and other descriptive properties (such as industry sectors and revenue size).

Different ML Algorithms

There are several ML algorithms available, and selecting the optimal algorithm is not straightforward. Algorithm selection depends on various factors, such as data type and features, transparency and interpretability, and model performance characteristics. We selected the following classification and regression algorithms for further analysis:

  • Altman Z-score: The Z-score is an established model that leverages a linear combination of financial ratios to estimate the likelihood of financial distress. The model is based on the discriminant analysis technique to optimize model parameters.
  • Logistic regression: A logistic regression is a statistical model that uses a logit function to model a binary dependent variable. It is a classical and widely used technique to model the PD. The optimization function usually tends to include a regularization term (e.g., lasso, elastic net, or ridge) to limit the overfitting.
  • Support Vector Machine (SVM): A SVM is similar to logistic regression and constructs a hyperplane multidimensional surface to separate two classes in the dataset. Inputs are transformed using a kernel function, allowing SVM to model nonlinear classification problems. However, by using a nonlinear kernel, the SVM becomes a black box because each prediction is not easily attributable to an individual variable.
  • Naïve Bayes: Naïve Bayes is a classification technique that utilizes Bayes' theorem with an assumption of independence among predictors. Although this assumption is often violated in practice, naïve Bayes still tends to perform well. The technique is relatively robust and easy to implement, however, strong violations of the independence assumptions and nonlinear classification problems can lead to poor performance.
  • Decision Tree: A decision tree model produces a flow chart structure where model prediction is obtained through a sequence of nodes and branches. While decision trees are a highly flexible tool, their usability may be hindered by poor out-of-sample performance as a result of overfitting. Various techniques exist to reduce overfitting by controlling the size of decision trees, such as pruning. We opted to contain the tree size by setting a limit of 50 observation per node.

We tested the performance of the described ML algorithms using our global sample of private companies and accompanied variables, listed in Table 1. We implemented the analysis using Statistics and ML Toolbox™ functions in MATLAB®, and applied default algorithm settings to train the PD models and calculate their performance statistics. [7]

We evaluated the ML models using the receiver operating characteristics (ROC) curve and corresponding area under the curve (AUC). Table 2 shows the in-sample and out-of-sample AUC performance statistics. In-sample, the decision tree model exhibits superior performance with a near-perfect classification of defaulted and non-defaulted companies. Logistic regression and SVM are similar techniques and exhibit equally excellent performance, while the other two approaches demonstrate good or fair performance. [8]  

Out-of-sample AUC, however, demonstrates a more realistic measure of the model’s performance in a real-world situation. While the decision tree method still shows the best performance, it is only marginally better than logistic regression. It is worth noting that the performance of the decision tree deteriorates considerably out-of-sample compared to in-sample, indicating lower reliability of this method in a real-world application. In comparison, the other approaches exhibit more consistent performance.

Table 2: AUC using various ML models

supervised machine learning homework predicting credit risk

Source: S&P Global Market Intelligence. As of January 21 2020. For illustrative purposes only.

In Figure 1, we depict the out-of-sample ROC curves for the analyzed ML models. While two models may have the same AUC, the shape of corresponding ROC curves may be very different. For example, the decision tree and logistic regression have very similar out-of-sample AUCs, but their corresponding ROC curves are very distinct and cross at the low false positive rate and the high true positive rate. This reflects the Type I error and Type II error characteristics of the two models. [9]   The decision tree outputs are rather binary, i.e., producing PD estimates of either 0% or 100, resulting in a more abrupt shape. The logistic regression, however, produces much more granular and continuous estimates of PD, resulting in a much smoother shape of the ROC curve.

Selection of the optimal model also depends on the use case. For example, Type I Error is more relevant when the goal is to minimize the incorrect classification of borrowers as creditworthy. Type II error, on the other hand, is more relevant when the goal is to minimize denying a loan to a creditworthy customer. If users focus on identifying defaults among the worst companies, they might prefer the decision tree model. However, those interested in good overall performance and differentiation among low-, medium-, and high-risk companies might favor the logistic regression model. [10]

 Figure 1: Out-of-sample ROC curve for various ML models

supervised machine learning homework predicting credit risk

In addition to model performance, transparency and interpretability also play a vital role in the model evaluation. Namely, understanding drivers and the sensitivity of model predictions to changes in the input is an important aspect of model usability. In that aspect, logistic regression is preferred to SVM as it is more straightforward to analyze and interpret. The logistic regression also enables users to incorporate various constrains easily, thus making this technique highly controllable and adaptable.

S&P Global Market Intelligence’s Approach

At S&P Global Market Intelligence, we developed PD Model Fundamentals (PDFN) - Private Corporates, a statistical model that produces PD values for all private companies globally. The model is based on the maximum expected utility (MEU) theory and employs a logistic regression algorithm with ridge (Tikhonov) regularization. [11] [12]    The methodology includes a number of data handling techniques to support robust treatment of financial ratios and management of extreme values. The process of variable selection leverages a k-fold Greedy Forward Approach to support a good out-of-sample and out-of-time performance. The transparent, ‘glass-box’ model structure of PDFN - Private Corporates enables users to understand the model behavior and easily analyze sensitivity and contributions of model inputs.

Figure 2 shows an example of PDFN - Private Corporates outputs for Neiman Marcus Group, Inc. (‘Neiman Marcus’), an omni-channel luxury fashion retailer primarily located in the U.S. Based on the latest available financial data, the company’s PD of 4.1% implies a credit score of ‘b’. [13]   The in-depth analysis of the model drivers reveals that the retailer is highly risky from a financial and business point of view. The contribution analysis shows that low profitability and high debt are the main drivers of the PD estimate. The sensitivity metrics indicate that Neiman Marcus’s credit score is highly sensitive to any adverse changes in industry and country risk factors.

Figure 2: PDFN - Private Corporates outputs for Neiman Marcus Group, Inc.

supervised machine learning homework predicting credit risk

Note: Industry median calculated based on a sample of department stores in the U.S.

Source: S&P Global Market Intelligence, as of January 21 2020. For illustrative purposes only.

A prudent approach includes reviewing and assessing various techniques for the problem at hand. While all presented models could be further refined and optimized to achieve better performance, the knowledge of the end application should also be factored into the decision-making process. In a real-world environment, this includes taking into account data availability limitations, model transparency requirements, the granularity of model outputs, and ease-of-use.

[1] Bank of England, Financial Conduct Authority: “Machine learning in UK financial services”, October 2019.

[2] Bazarbash, M.: “Fintech in Financial Inclusion: Machine Learning Applications in Assessing Credit Risk”, IMF Working Paper, 2019.

[3] Bracke, P., Datta A., Jung C. and Sen S.: ”Machine learning explainability in finance: an application to default risk analysis”, Staff Working Paper No. 816, Bank of England, 2019.

[4] Rasekhschaffe, C. K. and Jones, C. R.: “Machine Learning for Stock Selection”, Financial Analysts Journal,  2019.

[5] Addo, M. P., Guegan, D., Hassani, B.: ”Credit Risk Analysis Using Machine and Deep Learning Models”, Risks, 2018.

[6] Financial sector is excluded from the analysis.

[7] MATLAB and Statistics and Machine Learning Toolbox 2019b, The MathWorks, Inc., Natick, Massachusetts, U.S.

[8] Typically, AUC values between 70% and 80% are considered fair, values between 80% and 90% are considered a sign of good discriminatory power, and values above 90% are considered excellent.

[9] Type I error (false positive rate) is the probability of assigning a low PD to an obligor that will default. Type II error (false negative rate) is the probability of assigning a high PD to an obligor that will not default.

[10] Stein, M. R.: ”Benchmarking default prediction models: pitfalls and remedies in model validation”, Journal of Risk Model Validation, 2007

[11] Friedman. C and Sandow S.: "Learning Probabilistic Models: An Expected Utility Maximization Approach." Journal of Machine Learning Research, 4, 2003.

[12] S&P Market Intelligence: “PD Model Fundamentals - Private Corporates”, White Paper, 2018.

[13] S&P Global Ratings does not contribute to or participate in the creation of credit scores generated by S&P Global Market Intelligence. Lowercase nomenclature is used to differentiate S&P Global Market Intelligence PD scores from the credit ratings used by S&P Global Ratings.

Learn more about our Credit Analytics Model

Learn more about market intelligence.

  • Luka Vidovic Lei Yue
  • Credit Analysis
  • Banking Corporations Insurance Investment Banking Private Equity Professional Services
  • Credit Analytics AI Credit Risk Machine Learning
  • Open access
  • Published: 01 February 2024

A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method

  • Ileberi Emmanuel 1 ,
  • Yanxia Sun 1   na1 &
  • Zenghui Wang 2   na1  

Journal of Big Data volume  11 , Article number:  23 ( 2024 ) Cite this article

1764 Accesses

Metrics details

Credit risk prediction is a crucial task for financial institutions. The technological advancements in machine learning, coupled with the availability of data and computing power, has given rise to more credit risk prediction models in financial institutions. In this paper, we propose a stacked classifier approach coupled with a filter-based feature selection (FS) technique to achieve efficient credit risk prediction using multiple datasets. The proposed stacked model includes the following base estimators: Random Forest (RF), Gradient Boosting (GB), and Extreme Gradient Boosting (XGB). Furthermore, the estimators in the Stacked architecture were linked sequentially to extract the best performance. The filter- based FS method that is used in this research is based on information gain (IG) theory. The proposed algorithm was evaluated using the accuracy, the F1-Score and the Area Under the Curve (AUC). Furthermore, the Stacked algorithm was compared to the following methods: Artificial Neural Network (ANN), Decision Tree (DT), and k-Nearest Neighbour (KNN). The experimental results show that stacked model obtained AUCs of 0.934, 0.944 and 0.870 on the Australian, German and Taiwan datasets, respectively. These results, in conjunction with the accuracy and F1-score metrics, demonstrated that the proposed stacked classifier outperforms the individual estimators and other existing methods.

Introduction

One of the earliest applications of machine learning was for the prediction of credit risk, which uses financial data to predict the risk of customers defaulting a loan, credit card, and other lending services [ 1 ]. Credit risk prediction is a challenge for financial institutions, and several research works have attempted to address this problem [ 2 ]. The proper utilization of credit risk prediction tools can lead to increased profitability for financial institutions. Credit card and loan applications are two areas where this can be applied. Creditors who have been unable to adequately predict the credit risk of potential clients have had severe losses. Hence, proper risk assessment is crucial for the survival of these financial institutions [ 3 ].

Credit risk prediction has been a trending topic for the past few decades; credit card default prediction is among the most crucial tasks facing creditors. This is because the numbers of default transactions considerably outnumber the non-default transactions [ 4 ]. Therefore, the datasets used for credit risk prediction can be considered to have a class imbalance problem. Prior studies have shown that class imbalance can lead to poor classification performance of machine learning (ML) models that results in model bias towards a specific class at inference time [ 5 ]. In literature, several techniques have been proposed to solve the class imbalance problem, and they can be classified into three groups: ensemble learning, cost-sensitive learning, and re-sampling methods. Among these three methods, ensemble learning has been widely studied [ 6 ]. Ensemble learners perform better than a single model since they combine the advantages of several base learners. Furthermore, ensemble models can be divided into two groups: classifier ensemble and hybrid classifier. The former implies an ensemble model that combines an attribute selection technique or hyperparameter tuning prior to the classification whereas the latter combines numerous classifiers that run side by side [ 7 ].

Moreover, the datasets that are used to build credit risk prediction systems may possess a large feature space [ 16 ]. This can lead to an increased complexity while training machine learning models [ 37 ]. It is therefore vital to implement a feature selection (FS) algorithm that can alleviate the growing issue of feature space. FS algorithms are categorized as follows: filter, wrapper and hybrid. The filter-based FS make its decision based on the intrinsic nature of the dataset and therefore, is independent from the estimator that is used. The wrapper-based FS selects an optimal subset of features based on the performance obtained using estimator. Finally, the hybrid-based FS algorithms combines the filer and wrapper-based methodologies [ 8 , 9 ].

In this research we implement a filter-based FS method that uses Information Gain (IG) [ 28 ]. IG is inspired from Information Theory [ 29 ]. The filter-based FS technique is selected because it is computationally less expensive in comparison to the wrapper and hybrid-based approaches [ 10 ].

Furthermore, we develop a multilevel ensemble-based model using the stacking method. Stacking or Stacked generalization is a technique that stacks the output of individual algorithms and uses a single classifier for the final prediction. This method uses the effectiveness of each individual classifier within a stack and utilizes their results as the input the final estimator [ 17 ]. The structure of the stack includes the following algorithms: Gradient Boosting [ 18 ], Random Forest [ 21 ] and Extreme Gradient Boosting [ 19 ].

The major contributions of this research are as follows:

An IG filter-based FS method is implemented on multiple credit-risk datasets. This algorithm will ensure that only the best attributes are selected before the modelling process.

We implement a Stacked-based model using XGB, RF, and XGB. To achieve the best performance, the Stacked model was built sequentially. Further- more, we compare the performance of the Stacked-model against individual estimators.

The remaining part of this paper is structured as follows. " Related work " section presents a review of related works. In " Machine learning methods " section, we provide a background of the various machine learning algorithms used in this research. " Datasets " section provides an overview of the datasets. " Research Methodology " section presents the methodology that was followed in this research. " Feature Selection " section provides the details about the experimental settings. " Proposed Credit Risk Prediction Framework " section discusses the results and " Experimental Setup and Performance metrics " section concludes this paper.

Related work

Pande et al. [ 11 ] conducted a credit risk analysis using machine learning classifiers. In this analysis, the authors considered several methods including Artificial Neural Network (ANN), k-Nearest Neighbour and Naive Bayes (NB). To evaluate the performance of the ML models, the authors used the German credit risk dataset and the accuracy was considered as the main performance metric. The results demonstrated that the ANN, NB and KNN obtained accuracies of 77.45%, 77.20%, and 72.20%, respectively. Although these results represent a step in the right direction; the authors did not evaluate their models using additional metrics such as the F1-Score and Area Under the Curve (AUC) score.

Zhang et al. in [ 12 ] presented a credit scoring algorithm using adaptive sup- port vector machine (AdaSVM). This method was assessed on the Australian credit risk dataset and evaluated using the accuracy. The results demonstrated that the AdaSVM obtained an accuracy of 80%. This paper did not expand further in terms of evaluating the quality of classification by using additional metrics such the precision and recall.

Nasser and Maryam [ 13 ] developed a customer credit risk assessment system using Artificial Neural Networks (ANNs). In this research, the authors considered learning method such as the Gradient Descent. Moreover, the accuracy was the main performance metric that was utilized to assess the effectiveness of the proposed method. Furthermore, the authors used the Australian, Japanese and German credit risk datasets. The outcome of the experiments demonstrated that the ANN-GD obtained accuracies of 78.11%, 76.87%, and 68.26% for each dataset, respectively.

Hsu et al. [ 14 ] implemented an enhanced recurrent neural network (RNN) for combining static and dynamic attributes for credit card default prediction. This method was developed using an enhanced RNN and was evaluated using the Taiwan credit risk dataset. To enhance the RNN, the authors Gated Recurrent Units (GRUs) as the base nodes. The outcome of the numerical experiments showed that the RNN model achieved an AUC of 0.782 and a lift index of 0.659.

In [ 15 ], the authors presented a combination strategy of integrating super- vised learning coupled with unsupervised learning for credit risk assessment. In this work, the researchers used datasets such as the German dataset to assess the effectiveness of their proposed algorithms. Additionally, metrics such as the accuracy and the AUC were used to assess the performance of methods. In the instance of cluster-based approach, the KNN achieved an accuracy of 76.80 % and an AUC of 0.788. The RF achieved an accuracy of 72.10 % and an AUC of 0.811. The ANN obtained an accuracy of 78.6% and an AUC of 0.843. Finally, the cluster-based consensus (combined model) obtained an accuracy 80.8%.

Ha et al. [ 16 ] implemented an improved credit risk prediction model for online peer-to-peer (P2P) lending systems using a feature selection (FS) method and deep learning (DL). In this study, the first step consisted of preprocessing the data. The second step involved feature selection using Restricted Boltzmann Machines (RBMs). In the third step, the authors implemented the modeling process using machine learning (ML) methods such as Linear Discriminant Analysis (LDA), Artificial Neural Networks (ANN), k-Nearest Neighbors (KNN), and Random Forest (RF). These models were evaluated on various datasets, including the Australian and German credit risk datasets. Accuracy was the primary performance metric considered in the experiments. For the German dataset, the results showed the following accuracies: 76.50%, 75.8%, 67.10%, and 67.72% for LDA, ANN, KNN, and RF, respectively. For the Australian dataset, LDA, ANN, KNN, and RF achieved the following accuracies: 85.80%, 71.45%, 65.94%, and 67.72%, respectively. Although these results demonstrated some improvements compared to existing methods, the authors did not consider additional metrics such as precision, recall, and AUC.

Machine learning methods

This section provides an overview of the machine learning methods that were considered in this paper.

The RF algorithm computes its predictions by using a group of n Decision Trees (DTs) [ 20 ]. DT is a supervised ML technique that is used for classification and regression problems. A DT has the following categories of nodes: leaf node, decision node, and root node. The decision node represents a splitting point in a DT. A leaf node computes the final decision of the DT. The root node represents the initial state in the DT approach. Majority vote is a process that the RF algorithm uses to compute the predictions [ 21 ] as follows: let RF = {f ( X, d i ) } , where i is the number of DTs and X represents an input vector and d i is a set of DTs. The majority vote process is computed by d i . The class with the most votes represents the prediction.

K-Nearest Neighbor (KNN) technique is a supervised ML method used for classification and regression tasks. The KNN approach uses the standard Euclidean (ED) method to compute the distance between data points as follows [ 22 ]: let n and m data points in space Q , the distance between n and m , D ( n, m ), is computed using the expression in (3).

where t is total number of data points in space Q . The KNN approach estimates a prediction n 0 in Q by computing the ED between n 0 and its k closest data points within Q . As a result, n 0 is assumed to be like its neighbors [ 23 ].

Artificial Neural Network (ANN) is another type of ML algorithm that is used for classification and regression tasks. In this research, we used feed for- ward ANNs. ANNs are built using Artificial Neurons (ANs). An AN processes information from its input and forwards it to its output. Moreover, an AN is designed to solve both linear and non-linear problems. This is achieved by using different types of activation functions such as the Sigmoid, \(\sigma = \frac{1}{1+{e}^{-2}}\) ; the Rectified Linear Unit (ReLU): f ( x ) = max (0 , x ); or an hyperbolic tangent in (2).

Gradient boosting (GB) is a technique used to build regression and classification models to improve the learning process of the final model. In the GB algorithm, a meta-learner is built by using a group of weak estimators such as DTs. Each estimator is gradually added to the base group in a sequential manner. The aim of this process is to optimize the performance of the ensemble model by rectifying the mistakes made by the previous meta-learner [ 18 ]. This can be mathematically expressed as follows:

where g represents the ensemble, t is the total number of estimators, h n represents a single learner, and θ n is a tunable parameter.

In this research, we selected feed forward ANNs because of their simplicity and training efficiency. ANNs are generally simpler in their structure compared to GANs. This simplicity is evident in their operational mechanics, as FFNNs involve a straightforward processing of inputs through hidden layers to outputs, using weights and biases, followed by an activation function. This linear processing makes FFNNs inherently less complex and more efficient in training than GANs, which require training two networks simultaneously (generator and discriminator). This complexity in GANs can lead to longer training times and increased computational cost [ 38 ].

Furthermore, we have selected ANNs because of the low computational cost and high scalability. From a computational standpoint, ANNs are generally more cost-effective. They require less computational power due to their simpler architecture, which also makes them more scalable for handling large datasets typical in credit risk analysis. In contrast, the dual-network structure of GANs demands more computational resources, leading to higher costs, especially when scaling up for extensive datasets. [ 39 ]. Additionally, we used ANNs because of model stability and predictive accuracy as explained in [ 40 ].

Finally, it must be noted that Generative Adversarial Networks (GANs) [ 36 ] or a Transformers based architecture could be considered in lieu of ANNs. However, GANs or Transformers are computationally expensive to train and require long training times. Moreover, GANs are better suited for tasks that involve data generation or more complex scenarios where adversarial training is beneficial.

All the datasets used in this work were obtained from the University of California, Irvine (UCI) machine learning repository. The Australian credit approval dataset [ 25 ] contains 690 instances and 14 attributes; in this dataset, there are 307 creditworthy clients and 383 defaulting clients. The German credit dataset [ 26 ] comprises 1000 cases and 20 features, with 700 creditworthy clients and 300 defaulting clients. Meanwhile, the Taiwan default of credit clients dataset [ 27 ] contains 30000 instances and 24 attributes, with 23364 creditworthy clients and 6636 defaulting clients. The German and Taiwan datasets are highly imbalanced, whereas the Australian credit dataset is relatively balanced. A summary of the number of features and instances in these datasets is provided in Table 1 . The details about the nature of features in each dataset are provided in Tables 2 , 3 , 4 . Moreover, these datasets are mostly made up of financial records and personal information, which were encoded for confidentiality reasons.

Research methodology

  • Feature selection

In this research, a feature selection method is applied to pick the most optimal attributes that will be used in the classification process. The IG-FS in Fig. 1 ranks the attributes using an method based on Information Gain (IG) [ 28 ] which is derived from Information Theory [ 29 ]. IG-FS computes the IG of each attribute with relation to the class attribute. In contrast with the standard correlation methods such as the Pearson Linear Correlation Coefficient [ 30 ] that is only able of establishing linear relationship between attributes, IG can uncover nonlinear relationships as well. The IG is mathematically computed as follows:

figure 1

The Proposed credit risk prediction framework

Therefore, a feature A is strongly correlated to feature B than to feature V if IG ( A | B ) > G ( V | B ). Algorithm 1 shows the implementation of the IG ranking algorithm that was used to reduce the number of features in each of the datasets that were used. In the ranking algorithm, X is the original set of features, X ranked represents the subset of features that is selected using the IG method. The selected attributes are loaded in X ranked using a threshold value, IG tresh . This value can be changed as required. C is the target feature (the class).

figure a

IG-FS Ranking Algorithm

Proposed credit risk prediction framework

The proposed credit risk prediction framework is depicted in Fig. 1 . This architecture includes two main phases, namely, the data processing phase (phase 1) and the modelling phase (phase 2). In the first phase, the full credit card fraud dataset is normalized and processed using the IG-based FS method. Moreover, the full dataset is split into a training data subset and testing data subset. In the modelling phase, the following individual classifiers are considered: RF, KNN, ANN, GB, and XGB. The proposed stacked classifier is built using the GB, XGB and RF estimators. Furthermore, once phase 1 is completed; each of the estimators in phase 2 are trained and tested using the training and testing sets generated from phase 1. The evaluation process is conducted using the accuracy, the f1-score and the Area Under the ROC Curve (AUC) as explained in " Feature Selection " section. The Compare Results block compares the metrics generated by each classifier and forwards the results to the Select Best Classifier for model selection.

Experimental setup and performance metrics

The experiments were implemented on Google Colab [ 31 ]. The compute specifications are as follows: Intel(R) Xeon(R), 2.30GHz, 2 Cores. The ML framework used in this research is the Scikit-Learn [ 32 ].

Performance metrics are important factors to consider when evaluating the performance of classifiers. In this work, the following performance metrics are considered: accuracy, F1-score, and Area Under the ROC Curve [ 33 , 34 , 35 ]. These metrics are computed using the true positive (TP), true negative (TN), false positive (FP), and false negative (FN):

TP: Instances (data points) correctly predicted as positive.

TN: Instances correctly predicted as negative.

FP: Instances incorrectly predicted as positive (also known as Type I error).

FN: Instances incorrectly predicted as negative (also known as Type II error).

The Accuracy is the ratio of correctly predicted instances; it is, however, not an effective metric in evaluating classifier performance when the data is imbalanced since it is sensitive to the distribution of the data. The F1-score is a more effective performance metric that represents the harmonic mean of the precision and sensitivity (recall) of the classifier. AUC demonstrates the tradeoff between the true positive rate (TPR) and false-positive rate (FPR), and it is an indication of the model’s ability to classify positive samples correctly. The mathematical representations of the performance metrics are shown below:

Results and discussions

This section discusses the results that were obtained after conducting the experiments in a simulated environment.

Table 5 shows the number of features that were selected using IG-FS. In the instance of the Australian dataset, 9 features were selected. For the German dataset, 13 features were selected. In the case of the Taiwan dataset, 17 attributes were picked. These selected features are used for the experiments presented in this proposed study.

Table 6 shows the results that were obtained using the Australian dataset and the Stacked model had the structure and hyperparameters shown in Fig. 2 .

figure 2

Structure and hyperparameters of the Stacked model in Table  6

In this instance the model that achieved the highest accuracy is the RF model with an accuracy of 87.68%. The model that underperformed in comparison to other estimators is the KNN method with an accuracy of 70.28%, a F1-Score of 60.19%, and an AUC of 0.683. In contrast, the Stacked model achieved the best and most optimal results with an accuracy of 86.23%, an F1-Score of 84.58%, and an AUC of 0.934. These results demonstrated that using a Stacked approach substantially improves the F1-Score and the AUC.

Table 7 outlines the results that were achieved using the German dataset and the structure and hyperparameters of the Stacked model in Table 7 are showing in Fig. 3 . The model that performed the best is the Stacked algorithm with an accuracy of 82.80%, a F1-Score of 86.35 %, and an AUC of 0.944. Moreover, the Stacked model outperformed all other methodologies in terms of overall performance. In contrast, the model that underperformed is the KNN method with an accuracy of 68.40%, a F1-Score of 48.82%, and an AUC of 0.547. In terms of accuracy, the other models that performed optimally are the RF, GB, XGB, ANN, and DT with the following scores, respectively: 75.20%, 72.40%, 74.80%, and 73.60%. Table 5 shows the results that were obtained using the Taiwan dataset a.

figure 3

Structure and hyperparameters of the Stacked model in Table  7

In terms of accuracy, the method that performed optimally is the RF with an accuracy 87%. In terms of overall performance, the Stacked algorithm achieved an accuracy of 86.23%, a F1-Score of 84.58% and a AUC of 0.934 %. The experiments on the Taiwan dataset demonstrated the same pattern that has been observed on the Australian and German datasets. Using the Stacked-based methodology has proven to produce results that are superior to individual estimators.

In comparison to the research that were proposed in [ 11 ] using the German dataset, the proposed Stacked model outperformed the ANN, NB, and KNN by the following accuracy margins, respectively: 5.35%, 5.6%, and 10.6%. The research in [ 12 ] considered the AdaSVM and achieved an accuracy of 80% on the Australian dataset. In contrast, our proposed Stacked model obtained an accuracy that is 6.23% higher than the AdaSVM. The research in [ 13 ] used ANNs-GD on the Australian and German datasets and obtained accuracies of 78.11% and 68.26%. In comparison to the ANNs-GD, the Stacked model obtained the following superior results using the same datasets: 86.23% and 82.80%. Furthermore, the researchers in [ 14 ] used RNNs and obtained AUC 0.782 using the Taiwan dataset. In contrast, the Stacked model obtained an AUC of 0.870 on the same dataset. This represents an increase of 0.088. Additionally, the researcher in [ 16 ] used the KNN, RF, and ANN using credit risk datasets such as the German dataset and obtained an accuracy of 76.80%, 72.10%, and 78.6%, respectively. In terms of AUC, the KNN, RF, and ANN achieved 0.788, 0.811, and 0.843, respectively. In contrast, the Stacked method obtained much higher performance results as shown in Table 8 . The structure and the hyperparameters of the Stacked model are depicted in Fig. 4 .

figure 4

Structure and hyperparameters of the Stacked model in Table  8

This research presented the development and implementation of a ML-based credit risk prediction model. This method was implemented using a FS method based on IG in conjunction with a stacking algorithm. These processes were implemented on the Australian, German, and Taiwan datasets. The accuracy, the F1-Score, and AUC were the performance metrics the were used to evaluate the performance of the proposed method. To put the experimental process into context, the following additional ML methods were considered: RF, GB, XGB, KNN, ANN, and DT. The outcome of the numerical experiments demonstrated that the proposed Stacked algorithm achieved an accuracy of 86.23%, a F1- Score of 84.58% and AUC of 0.934 in the instance of the Australian dataset. With regards to the German dataset, the Stacked method obtained an accuracy of 82.80%, a F1-Score of 86.35% and AUC of 0.944. Finally, for the Taiwan dataset, the Stacked method achieved an accuracy of 85.80%, a F1-Score of 51.35 % and AUC of 0.870. These results were superior to those obtained by individual estimators and other existing algorithms. In future work, our aim is to delve deeper into the realm of feature selection and augmentation techniques with the objective of improving the performance of the proposed machine learning model. We envisage a comprehensive investigation into the applicability and efficacy of transformer-based architectures, which have recently gained prominence in various domains such as text generation and classification, to address the intricate challenges associated with credit risk prediction.

Availability of data and materials

Available upon request.

Moradi S, Mokhatab RF. A dynamic credit risk assess- ment model with data mining techniques: evidence from Iranian banks. Financ Innov. 2019;5(1):15.

Article   Google Scholar  

Rehman ZU, Muhammad N, Sarwar B, Raz MA. Impact of risk management strategies on the credit risk faced by commercial banks of Balochistan. Financ Innov. 2019;5(1):44.

Khemakhem S, Boujelbene Y. Predicting credit risk on the basis of financial and non-financial variables and data mining. Rev Acc Financ. 2018;17(3):316–40.

Dornadula VN, Geetha S. Credit card fraud detection using machine learning algorithms. Procedia Computer Science. 2019;165:631–41.

Garcıa V, Marques AI, S´anchez J.S. Improving Risk Pre- dictions by Preprocessing Imbalanced Credit Data. Neural Information Processing. 2012;67:68–75.

Google Scholar  

Song Y, Peng Y. A MCDM-Based Evaluation Approach for Imbalanced Classification Methods in Financial Risk Prediction. IEEE Access. 2019;7:84897–906.

Guo S, He H, Huang X. A multi-stage self-adaptive classi- fier ensemble model with application in credit scoring. IEEE Access. 2019;7:78549–59.

Liu H, Yu L. Toward integrating feature selection algorithms for classification and clustering. IEEE Tran Knowl Data Eng. 2005;17(4):491–502.

Tang PS, Tang XL, Tao ZY, Li JP (2014) Research on feature selection algorithm based on mutual information and genetic algorithm. 11th Int. Comput. Conf. Wavelet Active Media Tech. Inf. Processing (ICCWAMTIP) IEEE, 403–406.

Liu C, Wang Q, Zhao Q, Shen X, Konan M. A new feature selection method based on a validity index of feature subset. Pattern Recogn Lett. 2017;92:1–8.

Pandey TN, Jagadev AK, Mohapatra SK, Dehuri S (2017) Credit risk analysis using machine learning classifiers. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS) (pp. 1850–1854). IEEE.

Zhang L, Hui X, Wang L (2009) Application of adaptive support vector machines method in credit scoring. In: International Conference on Management Science and Engineering, 1410–1415.

Mohammadi N, Zangeneh M. Customer credit risk assess- ment using artificial neural networks. IJ Information Technol Computer Science. 2016;8(3):58–66.

Hsu TC, Liou ST, Wang YP, Huang YS, Che-Lin (2019) Enhanced Recurrent Neural Network for Combining Static and Dynamic Features for Credit Card Default Prediction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1572–1576.

Bao W, Lianju N, Yue K. Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Syst Appl. 2019;128:301–15.

Ha VS, Lu DN, Choi GS, Nguyen HN, Yoon B (2019) Improv- ing credit risk prediction in online peer-to-peer (P2P) lending using feature selection with deep learning. In: 21st International Conference on Advanced Communication Technology, 511–515.

Chen C, Zhang Q, Yu B, Yu Z, Lawrence PJ, Ma Q, Zhang Y. Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput Biol Med. 2020;123: 103899.

Chakrabarty N, Kundu T, Dandapat S, Sarkar A, Kole DK (2019) Flight arrival delay prediction using gradient boosting classifier. In: Emerging technologies in data mining and information security, 651-659

Weldegebriel HT, Liu H, Haq AU, Bugingo E, Zhang D. A new hybrid convolutional neural network and eXtreme gradient boosting classifier for recognizing handwritten Ethiopian characters. IEEE Access. 2019;8:17804–18.

Liang J, Qin Z, Xiao S, Ou L, Lin X. Efficient & secure decision tree classification for cloud-assisted online diagnosis services. IEEE Trans Dependable Secure Comput. 2019;18(4):1632–44.

Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

Trstenjak B, Mikac S, Donko D. KNN with TF-IDF based framework for text categorization. Procedia Eng. 2014;69:1356–64.

Tan S. An effective refinement strategy for KNN text classifier. Expert Syst Appl. 2006;3(2):290–8.

Kasongo SM, Sun Y. A deep learning method with filter based feature engineering for wireless intrusion detection system. IEEE access. 2019;7:38597–607.

“UCI Machine Learning Repository: Stat-log (Australian Credit Approval) DataSet.” http://archive.ics.uci.edu/ml/datasets/statlog+(australian+credit+approval ) (accessed Oct. 31, 2020).

“UCI Machine Learning Repository: Stat-log (German Credit Data) Data Set.” https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data ) (accessed Oct. 31, 2020).

“UCI Machine Learning Repository: default of credit card clients Data Set.” https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients (accessed Mar. 14, 2020).

Gao Z, Xu Y, Meng F, Qi F, Lin Z (2014) Improved information gain-based feature selection for text categorization. Int. Conf. Wireless Commun. Vehicular Technol. Inform Theory and Aerosp. Electron. Sys. (VITAE) IEEE, 1–5.

Shannon CE. A mathematical theory of communication. ACM SIGMOBILE. 2001;5(1):3–55.

MathSciNet   Google Scholar  

Zhou H, Deng Z, Xia Y, Fu M. A new sampling method in particle filter based on pearson correlation coefficient. Neurocomputing. 2016;216:208–15.

Google Colab [Online]. Available: https://colab.research.google.com/

Scikit-learn : machine learning in Python. https://scikit-learn.org/stable/

Ileberi E, Sun Y, Wang Z. A machine learning based credit card fraud detection using the GA algorithm for feature selection. J Big Data. 2022;9:24.

Lipton ZC, Elkan C, Narayanaswamy B (2014) Thresh- olding Classifiers to Maximize F1 Score. arXiv:1402.1892 [cs, stat], May 2014, Accessed: Nov. 01, 2020. http://arxiv.org/abs/1402.1892

Muschelli J. ROC and AUC with a binary predictor: a poten- tially misleading metric. J Classif. 2020;37(3):696–708.

Article   MathSciNet   Google Scholar  

Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA. Generative adversarial networks: An overview. IEEE Signal Process Mag. 2018;35(1):53–65.

Zhao T, Zheng Y, Wu Z. Feature selection-based machine learning modeling for distributed model predictive control of nonlinear processes. Computers Chem Eng. 2023;169:108074.

Edmond C, Girsang AS. Classification performance for credit scoring using neural network. Int J. 2020;2020(8):5.

Laudani A, Lozito GM, Fulginei FR, Salvini A. On training efficiency and computational costs of a feed forward neural network: A review. Comput Intell Neurosci. 2015;2015(2015):83.

Stoffel M, Bamer F, Markert B. (2019). Stability of feed forward artificial neural networks versus nonlinear structural models in high speed deformations: A critical comparison. Arch Mech. 2019;71(2):34

Download references

Acknowledgements

This work was supported in part by the South African National Research Foundation under Grants 137951, 141951 and Grant 132797, and in part by the South African National Research Foundation Incentive under Grant 132159.

University of Johannesburg.

Author information

Yanxia Sun and Zenghui Wang contributed equally to this work.

Authors and Affiliations

Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa

Ileberi Emmanuel & Yanxia Sun

Department of Electrical Engineering, University of South Africa, Johannesburg, South Africa

Zenghui Wang

You can also search for this author in PubMed   Google Scholar

Contributions

IE wrote the algorithms and methods related to this research and he interpreted the results. YS and ZW provided guidance in terms of validating the obtained results. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ileberi Emmanuel .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Emmanuel, I., Sun, Y. & Wang, Z. A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method. J Big Data 11 , 23 (2024). https://doi.org/10.1186/s40537-024-00882-0

Download citation

Received : 04 October 2022

Accepted : 19 January 2024

Published : 01 February 2024

DOI : https://doi.org/10.1186/s40537-024-00882-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Credit risk

supervised machine learning homework predicting credit risk

  • Open access
  • Published: 15 November 2020

Machine learning predictivity applied to consumer creditworthiness

  • Maisa Cardoso Aniceto 1 ,
  • Flavio Barboza   ORCID: orcid.org/0000-0002-3449-5297 2 &
  • Herbert Kimura 1  

Future Business Journal volume  6 , Article number:  37 ( 2020 ) Cite this article

10k Accesses

16 Citations

11 Altmetric

Metrics details

Credit risk evaluation has a relevant role to financial institutions, since lending may result in real and immediate losses. In particular, default prediction is one of the most challenging activities for managing credit risk. This study analyzes the adequacy of borrower’s classification models using a Brazilian bank’s loan database, and exploring machine learning techniques. We develop Support Vector Machine, Decision Trees, Bagging, AdaBoost and Random Forest models, and compare their predictive accuracy with a benchmark based on a Logistic Regression model. Comparisons are analyzed based on usual classification performance metrics. Our results show that Random Forest and Adaboost perform better when compared to other models. Moreover, Support Vector Machine models show poor performance using both linear and nonlinear kernels. Our findings suggest that there are value creating opportunities for banks to improve default prediction models by exploring machine learning techniques.

Introduction

Consumer spending is one of the main drivers of macroeconomic conditions and systemic risk [ 15 ]. Therefore, the analysis of credit granting to consumers becomes relevant [ 12 , 24 ], since individuals may eventually seek loans to meet their consumption needs. In addition, the credit market size demonstrates its importance, as mentioned by Khandani et al [ 15 ] (above USD $13.63 trillion for Americans in 2008), and Li et al [ 19 ] (more than 12% of Chinese GDP, excluding mortgages in 2017).

Luo, Pl awiak et al, and Twala [ 20 , 25 , 30 ] established that credit risk assessment is an important issue in financial risk management, because banks should make important decisions about whether or not make a loan to a counterparty. In this context, Assef et al [ 1 ] suggest that one of the main problems in finance involves the prediction of bankruptcy or default.

Due to the large number of potential borrowers, it is necessary to use models and algorithms that avoid human failures in the analysis of credit application in consumer lending [ 15 ]. In fact, Twala [ 30 ] indicated that many of the world’s largest banks have developed sophisticated automated systems to model credit risk, giving crucial information to decision making.

Within the context of credit risk research using machine learning techniques, there are several studies that seek to analyze the adequacy of the models in specific databases [ 1 , 25 , 35 ]. However, the literature has not yet identified techniques that consistently lead to higher credit prediction accuracy [ 10 ]. Vieira et al [ 31 ] examined the performance some of the most promising techniques, such as Support Vector Machines (SVM, which makes a line that seeks to maximize the distance between the instances from different groups), Decision Trees (DT, that classify instances by ordering them into sub-trees, from the root to some leaf), Bagging (or Bootstrap aggregating, takes n bootstraps from the full sample and builds a classifier that gives a vote for each sample and uses a majority vote for classifying each instance), AdaBoost (adaptative boosting is similar to bagging, just include a weight in each vote based on its quality), and Random Forest (RF, that classifies by majority decision of votes given by a multitude of decision trees).

Studies with different datasets have being conducted, exploring diverse types of credit operations in distinct institutions or countries. For instance, some credit data are made available in the UCI Repository of Machine Learning Databases, allowing researchers to evaluate classification results in different contexts. Lei et al, Shen et al, Yeh and Lien [ 18 , 26 , 34 ] investigated a Taiwanese credit card database, and Twala [ 30 ] analyzed credit operations in United States, Germany and Australia. These two last countries are also studied by Damrongsakmethee and Neagoe, Feng et al, Kamalloo and Saniee Abadeh, Kozodoi et al, Moula et al, Shi et al, Siami et al and Xiao et al [ 9 , 12 , 14 , 16 , 22 , 27 , 28 , 33 ].

Outside this repository, Feng et al [ 12 ] also examined Chinese credit data, as well as, Li et al [ 19 ] and Moula et al [ 22 ]. In Latin America, Assef et al and Vieira et al [ 1 , 31 ] analyzed a set of a Brazilian bank, Morales et al [ 21 ] explored Peruvian microfinance data. Besides that, numerous cases can be cited, such as [ 7 ] (France), [ 18 ] (Nigeria), [ 23 ] (Greece), [ 8 , 11 ] (UK), and [ 20 ] (61 countries).

Our dataset comprises low-income borrowers from a large financial institution in Brazil. Due to confidentiality issues, some information such as the name of the bank or credit spreads of loans cannot be disclosed. Data are restricted and not publicly available. We had access to more than 250,000 low-income individuals with low-value line of credit (up to BRL 10,000 or USD 6,020). In particular, the borrowers are from all 5 regions of the country. Most borrowers are from the Southeast (50%), which is the largest financial region, whereas 18% of the borrowers are from the South, 17% from the Northeast, 10% from the Midwest, and 5% from the Northern regions.

The borrowers’ age range from 18 to 96 years old (87% are in the 20–60 years old age group). The majority of the borrowers are from low education social group (98% did not complete elementary school). However, almost 50% of individuals have their own houses, whereas 19% still live with their parents, 16% are in a different condition of housing, 14% live in a rented house and only 4% have the property financed. With regards to marital status, 40% are married, 39% are single, and the rest are in separated, divorced, or widowed.

Given the characteristics of the borrowers and the type of line of credit under analysis, the portfolio is comprised of loans with high probability of default. The records indicate 48% of bad payers. To the best of our knowledge, we did not find in the literature of credit risk analysis, another actual database with this level of default. Therefore, our study may contribute to the literature by investigating machine learning models applied to the credit risk assessment of high default portfolios.

According to the Central Bank of Brazil, in 2007, when the data of our study begins, the government bond rate was 11.25% a year and individuals paid, on average, an annual 43.9% interest rate for personal loans [ 5 ]. Since our dataset comprises high default borrowers, credit spread of the loans in this financial institution is even higher. Therefore, although default rates are high, financial institutions may not lose money since the interest rate that good borrowers pay overcome default losses. This characteristic of the dataset, from a practical perspective, differs from other studies, since we analyze a high default portfolio, depicting the unusual context of the Brazilian financial system. Lines of credit for low-income individuals are scarce, implying that even good borrowers are subject to very high interest rates, to compensate high default rates of bad borrowers.

In addition, since the portfolio of loans that we study is from a state-owned financial institution, political interference may direct financial resources to low-income families aiming to achieve social goals of governments.

Under these constraints, the bank has to establish mechanisms that, at the same time, comply with its social role and safeguard its financial soundness. Finally, despite the high default rate, the volume of these high risk personal loans is relatively small in comparison to the overall credit portfolio.

In this paper, we assess machine learning techniques to classify individuals into groups of defaulters and non-defaulters. According to Khandani et al [ 15 ], machine learning procedures refer to a set of algorithms developed to recognize patterns using computational algorithms. Moreover, these tools have been widely employed in credit applications [ 12 , 18 , among others], as underpinned by Dastile et al [ 10 ].

We analyze the borrower’s classification using a database of consumer loans from a credit portfolio of a major Brazilian bank. Therefore, our study contributes to a broad literature of the use of machine learning algorithms in credit risk analysis, bringing the case of a dataset of loans of a high risk credit portfolio from an emerging country. We investigate an unusual credit portfolio, due to its high default rate. It is important to highlight that other papers have studied Brazilian datasets using machine learning such as Assef et al [ 1 ] that explored 6,000 firms that applied for loans and Vieira et al [ 31 ] that investigated mortgages for low-income borrowers. However, despite some papers analyze emerging countries, most published papers focus on developed countries, which data is usually more available to researchers.

Results from calibration and validation samples of different classification techniques, with emphasis on Support Vector Machines and Ensemble Methods, such as Decision Trees, Bagging, AdaBoost, and Random Forest, are compared. We confront the performance of all models and discuss different metrics of adequacy for evaluating the classifications, i.e., ROC (Receiver Operating Characteristic) Curve, Sensitivity, Specificity and AUC (Area under the ROC Curve). These metrics are examined by other papers [e.g. 13 , 19 , 20 , 21 , 22 , 25 ] and are vastly used to assess performance of classification methods [ 10 ]. The findings are compared with previous results published in the literature.

In this context, the article aims to contribute to the literature, still under development, on the adequacy of machine learning techniques for the phenomena related to the classification of observations, more particularly for credit risk analysis, as studied by Assef et al., Crone and Finlay, Pl awiak et al., Shi et al., Xiao et al., Yeh and Lien [ 1 , 8 , 25 , 27 , 33 , 34 ], among others.

The paper is structured as follows. In the next section, we briefly present machine learning techniques used in the context of classifications for credit risk analysis. Next, we discuss the concept underlying the machine learning techniques used in this study and the characteristics of the credit data from a large Brazilian bank. We examine the results generated by different classification approaches. Finally, we present the main considerations of the research and describe some limitations of the study.

Theoretical background

One of the first studies to apply machine learning techniques in credit risk was Davis et al. [ 11 ]. In the article, the authors tested a series of algorithms for assessing credit default risk, integrating two models: (1) a general computational model based on a selection process and a pairing procedure, and (2) an artificial neural network (ANN) connective model. Although the results are limited by the small number of observations of the database and the characteristics of the techniques tested, the study supports the relevance of the use of machine learning tools for credit analysis. Another early study, from [ 2 ], proposed an attribute selection metric for constructing models that substantially decrease the non-monotonicity problem of decision trees, without compromising the accuracy of classification.

The study from [ 13 ] uses classification and regression tree (CART) and artificial neural networks (ANN) and compares with k-nearest neighbor (KNN) models in a dataset of mortgage loans. Shi et al. [ 27 ] discuss a credit scoring model based on SVM and RF for credit risk assessment, establishing a score for the ranking of importance of a given characteristic. The authors analyze the proposed SVM model, comparing with traditional SVM models, in datasets from German and Australian credit transactions.

Another stream of studies explores machine learning techniques that use accounting ad market data for rating analysis. The study from [ 23 ] established a credit risk classification model through SVM that combines accounting data with the approach based on the options pricing model. Considering a larger set of different rating groups, Zhong et al. [ 35 ] conducted a comprehensive comparative study on the effectiveness of four learning algorithms, Backpropagation (BP), Extreme Learning Machine (ELM), Incremental Extreme Learning Machine (I-ELM), and SVM, where the suggested SVM model outperforms ANNs.

More recently, Luo [ 20 ] investigates classification accuracy of five different models: ANN, Support Vector Machines (SVM), Random Forest (RF), Näive Bayes and logistic regression (LR). The author, using data from publicly listed companies with headquarters in various countries and from different industries, concludes that RF was the best classifier.

ANN is one of the first machine learning techniques to be used in credit risk assessment [ 10 ] and is still vastly used. For instance, Luo [ 20 ] examined the rating accuracy of five techniques, including ANN, in a single structure, combining with bagging. In the study, RF was considered the best algorithm, by presenting error rates over to 5%. ANN revealed to be the second best classifier as error rate for default companies decrease for 22.6%.

Another work that compared ANN with traditional techniques explored credit classification performance, contrasting Multilayer Perceptron (MLP) and LR [ 1 ]. Their findings showed MLP correctly predicts defaults, temporarily defaults, and non-defaults, 74.7%, 91.4% and 74,6%, respectively, whereas LR achieved 88.9% of accuracy for the temporarily class and around 72% for defaults and non-defaults.

Damrongsakmethee and Neagoe [ 9 ] also describe the case of a successful application of ANN for credit risk assessment. The authors concluded that ANNs has been more accurate in the analysis of both German and Australian credit data, reaching an overall accuracy of 81.2% and 90.85%, against 78.67% and 89% from a mixed model (decision tree with Adaboost). However, neither error rates were discussed nor significance of the difference in the model accuracies was evaluated in their study.

One of the first articles to use Decision Trees (DT) in the credit risk assessment was [ 2 ]. In fact, the author analyzes monotonicity in machine learning algorithms in several empirical applications including the classification of bonds. Crone and Finlay [ 8 ] find that a decision tree based algorithm, CART, presented the worst prediction power for credit scoring in a database from UK, when compared to LDA, LR and ANN. The authors also noticed that each technique was differently affected by an increase in the sample size.

C4.5, another DT-based technique, has been studied in credit data either. For instance, Damrongsakmethee and Neagoe [ 9 ] compared it, Adaboost, and MLP, in some cases with combining models. The results revealed that MLP presented more accuracy than others in both German and Australian credit datasets.

However, other studies show DT models may present superior results. For instance, Moula et al. [ 22 ] investigated the performance of six techniques (among them CART and SVM) in six credit databases. The results showed that CART outperformed the others in the Japanese, Chinese and Kaggle credit databases, providing lower levels of Type I and Type II errors. In addition, Li et al. [ 19 ] developed a hybrid model with a DT structure and increased the prediction accuracy for a Chinese dataset.

SVM is a technique widely tested in the academy and for various datasets [ 23 ]. In the credit risk context, we can cite [ 18 , 22 , 23 , 24 , 25 , 27 , 31 , 33 , 35 , among others].

To measure the default probability of Greek non-listed companies, Niklis et al. [ 23 ] applied SVM and obtained ”positive preliminary results”. More recently, Pławiak et al. [ 25 ] asserted their best result for German Credit data was better than [ 9 ], by using a deep learning structure where SVM is inserted as a learner.

In addition to the techniques previously discussed, within the context of machine learning, there are still several mechanisms that can be used in credit analysis, for example, ensemble methods. Two traditional ensemble algorithms are Bagging and Boosting.

Bagging (Bootstrap Aggregating), proposed by Breiman [ 3 ], is based on bootstrap samples that aggregate or combine individual predictors to establish a better final predictor. The author verified the variance of the combined predictor is lesser or equal to the variance of any other individual predictor used.

Another paper that showed the superiority of the ensemble classifiers was [ 32 ]. The authors performed a comparative evaluation of the performance of three ensemble methods, Bagging, Boosting, and Stacking, from four learning-base mechanisms, Logistic Regression, Decision Trees, Artificial Neural Networks and Support Vector Machines. The experimental results show that the three methods can substantially improve learning from the base functions. More specifically, Bagging performs better than Boosting. Stacking and Bagging DT obtained better results in terms of the three performance indicators, mean accuracy, type I error and type II error.

Tsai et al. [ 29 ] conducted a study comparing the ensemble classifiers by three widely used classification techniques, MLP, SVM and DT. For the analysis, a set of bankruptcy data from Taiwan was used, and the result of the research demonstrates that the performance of the ensemble DT classifiers is superior to other ensemble methods. The authors mentioned that the average computational cost of DT ensemble in Boosting is relatively low, being more efficient than SVM by Bagging, and that Ensemble MLP by Bagging and Boosting.

The experimental results showed that the Boosting DT ensemble method composed of 80-100 classifiers shows a better performance [ 29 ]. Therefore, Boosting DT can be considered as the starting ensemble technique in future classifier-related studies.

Artificial intelligence techniques from other areas of knowledge, such as evolutionary computation and biology, are also applied in credit analysis. Using algorithms inspired by biology, Kamalloo and Saniee Abadeh [ 14 ] proposed a classifier that uses principles of the immune system and fuzzy rules to predict default. In this approach, the concept of immunological learning in cloning processes is explored.

Other studies using machine learning focus on several different topics, such as [ 24 ] that integrated genetic algorithm with neural networks. The study focused on the identification of an ideal subset of variables that allowed the increase in the classification accuracy and the scalability of the model for credit risk analysis.

Moreover, considering the diversity of machine learning methods, it is important to note that, according to Dastile et al. and Galindo and Tamayo [ 10 , 13 ], algorithms for credit risk analysis vary substantially in their structure, approach and rationale, but can be classified into some groups, which we organized in the following subsections.

It is important to highlight that our study is essentially exploratory and descriptive since we are not concerned with the discussion of the theoretical framework that supports the choice of explanatory variables. In fact, machine learning techniques applied in credit risk assessment are more data-driven, rather than directed to hypothesis testing.

However, the study is indirectly supported by a theoretical background as we rely on the analysis of variables that are commonly used in traditional statistical models to assess credit risk. More specifically, logistic regression models imply an underlying cause and effect relationship, where the independent variables, based on a theoretical framework, explain default. Therefore, by using explanatory variables that are compatible with a logistic regression approach, we follow a theoretical foundation already discussed in the literature about the potential determinants of default. In this context, as in Twala [ 30 ] and Vieira et al. [ 31 ], for instance, we use similar explanatory variables and also logistic regression to compare results of prediction of default from machine learning models.

Based on real-world data, we developed models based on machine learning techniques to predict default in a credit line and then compare the performance of these models with logit, usually applied to this. This section presents database details (variables and basic information), prediction methods, and also the performance metrics that are the basis for the analysis.

We use a database from a large Brazilian financial institution of 124,624 consumer loans with tenor of 24 months and the repayments should be made on a monthly basis. Delays of 2 months to repay the loan imply default, since this is the criterion used by the financial institution to classify customers. Together, tenor and time to default, compose the level of risk of this operation. Based on that, the Central Bank of Brazil defines the rules and the limits for the interest rates. In particular, the credit portfolio has a high level of credit risk, reflecting not only the characteristics of the loan but also the Brazilian economic context.

The default rate of the portfolio is almost 48%. Therefore, one contribution of the paper is to explore the use of machine learning techniques in a portfolio of loans with a high probability of default, which is unconventional and unusual. In a more stable economic environment, it is not likely that a credit portfolio would have such a default rate. Such level of default implies very high interest rates, which is usual in the Brazilian financial market. For instance, interest rates from major Brazilian banks for personal loans in May/2020 were 41.83% a year [ 6 ].

The credit data refers to loans from September 2007 to January 2010. This was the period determined by the bank for the data to be used. We gather data for variables as depicted in Table  1 . Although the data are not recent, we highlight that the paper focuses on the study of the applicability of the machine learning models in high risk credit portfolios. Many studies, especially those that explore the UCI Repository of Machine Learning Databases, use more outdated data and a smaller number of variables [ 9 ].

The volume of the loans differs considerably for each transaction, ranging from USD 55 to USD 6,020. The mean, median, and standard deviation of the loans are, respectively, 1,192.63, 722.41, and 1,134.73 (USD). The transaction is a generic line of credit, without a specific destination of the borrowed money. The borrower has a pre-approved line of credit that can be used for general expenditures.

The borrower has an average age of 42 years and average monthly gross income of USD1,190. The borrowers have, on average, a checking account in the bank for 51 months and a savings account for 63 months. The average balance in the checking account of the borrower is USD393. Among defaulted borrowers, half of them enter this credit status in 386 days, i.e., approximately 1 year after the beginning of the contract.

Table  1 depicts the variables in the database of our study. Many authors, e.g. [ 14 , 18 , 21 , 26 , 28 , 27 ], and [ 34 ], use similar variables, such as income, past loans, savings amount, marital status, type of job, and number of dependents to analyze credit risk with machine learning techniques. Notwithstanding, great part of them is also available in the German and Australian credit data.

The complete database was divided in two random samples: (i) the training or learning sample with 70% (87,237 loans), and (ii) the test or validation sample with 30% (37,387 loans). Both samples, training and testing, have similar characteristics, and a default rate of 47.8% and 48.0%, respectively.

Our aim is to compare classification of borrowers using different models, including machine learning techniques. Thus, we do not focus on the study of theoretical explanations to justify whether a variable positively or negatively affects default. Thus, we seek to identify predictive models that can be generated by algorithms, based on real data, without worrying about theoretical arguments for the inclusion of an explanatory variable on borrower’s default.

We proceed by presenting a brief overview of the classification techniques used in the paper.

Decision Trees

Decision Trees follow the structure of an upside down tree, dividing data into branches. The model comprises a series of logical decisions, similar to a flowchart, with nodes indicating a decision to be made on an attribute. The branches reflect the choice of the decisions [ 17 ].

The nodes in each branch represent both classes and class distributions. The largest node in a tree is the root node with the highest information gain [ 29 ]. After the first node, one of the subsequent nodes with the highest information gain is then chosen to be tested as a potential element for the next node. This process continues until all variables are compared or there are no remaining variables in which the samples can be divided. Then the tree ends in nodes that show the path regarding a combination of decisions, comparing classes or class distributions.

Random Forest

According to Lantz [ 17 ], the Random Forest method, which is based on Decision Tree sets, combines versatility and power in a single machine learning approach. The method uses only a small random part of the complete set of observations, and can handle large data sets, where the so-called “curse of dimensionality” can cause other models to fail.

This approach uses the basics of bagging of random selection of characteristics to add diversity to decision tree models. After a random forest is generated, the model combine predictions from trees following a procedure based on the number of votes [ 10 , 30 ].

Based on the Breiman’s description [ 4 ], Random Forest is a classifier consisting of a collection of structured classification trees \({h(x, \ominus _k), k = 1, ...}\) where  \({\ominus_k}\) are randomly independent and identically distributed vectors, and each tree casts a single vote for the most likely class from the input data x .

Support Vector Machines

The aim of an SVM is to create a hyperplane that could lead to partitions of data on groups reasonably homogeneous [ 17 ]. This technique separates a set of training vectors into two different classes: \((x_1, y_1), (x_2, y_2), ..., (x_m, y_m)\) , where \(x_i \in R^d\) denotes characteristic vectors in a d -dimensional space and \(y_i \in \{-1, 1\}\) denotes different classes for the observations.

According to [ 29 ], to generate an SVM model, input vectors are mapped into a new upper-dimensional feature space denoted as \(\phi : R_d \rightarrow H^f\) , where \(d<f\) . We build a separation hyperplane in the new feature space by a Kernel function \(K(x_i, x_j)\) .

Moula et al., Pławiak et al., Zhong et al. [ 22 , 25 , 35 ] mention that the kernel function can be associated to linear functions, radial basis functions (RBF), polynomial functions or sigmoid functions. We use in our study, linear functions and RBF, since these models lead to interesting levels of performance in previous studies [ 14 , 16 , 22 , 25 ] and capture linear and/or nonlinearity patterns, in the case of RBF.

Bagging is an ensemble method, where classifiers are trained independently by different training sets through sample bootstrapping [ 3 ]. By using a base classifier, k re-samples are studied and the final classification is based on an appropriate combination method, such as the majority of votes. This strategy is simple, but can reduce variance when combined with other base learners [ 32 ].

Bagging is particularly attractive when the available information is limited. According to Xiao et al. [ 33 ], to ensure that there are sufficient training samples in each subset, large sample proportions of the sample (75-100%) are placed in each subset. Thus, individual subsets of training overlap significantly, with many cases being part of most subsets and may even appear several times in the same subset.

In order to ensure the diversity of situations, a relatively unstable base learner is used. Therefore, different classification decisions can be obtained by considering small perturbations in different training samples [ 32 ].

Boosting and AdaBoost

Similarly to Bagging, in Boosting, each classifier is trained using a different training set. The main difference in relation to Bagging, as commented by [ 10 ], is that the re-sampled datasets in Boosting are built specifically to generate complementary learning. In Boosting, the votes are weighted based on the performance of each model rather than on the attribution of the same weight for all votes. This procedure allows to increase the performance of the classification technique by simply adding weak or base learning methods. Given the usefulness of this finding, Boosting is considered one of the most significant discoveries in machine learning [ 17 ].

According to Tsai et al. [ 29 ], AdaBoost is a combination of Bagging and Boosting ideas and does not require a large training set like the other two methods. Initially, in the first step, each observation of the training set has the same weight or probability to be chosen in the first sample. In this algorithm, a base classifier or weak learning model is used to classify observations of the sample. Then the training classifier is evaluated to identify the observations that were not correctly classified.

Then, the algorithm is applied to a modified training set that reinforces the importance of those observations that were incorrectly classified in the previous step. More specifically, observations that were incorrectly classified have more probability to be chosen in the next sample, which goes through the same procedure using the training classifier. This sampling procedure will be repeated until k training samples are built for the \(k-th\) step. The final decision, i.e., classifications, is based on the weighted vote of the individual classifiers [ 29 ]. Although there are several versions of Boosting algorithms, the most used is AdaBoost [ 10 , 32 ]. We use this algorithm in this study.

Performance metrics

We use standard metrics to analyze the performance of the credit classification models, following [ 12 , 19 , 20 , 21 , 22 , 28 ]. The metrics include overall accuracy (ACC), Type I error (T1E), and Type II error (T2E), and are depicted by a confusion matrix, as shown in Table  2 .

The metrics are defined as follows:

Sensitivity has values close to 1 when Type I Error is low, whereas specificity has values close to 1 when Type II Error is low. The Receiver Operating Characteristic (ROC) Curve was built for all models. We use the AUC (Area Under the Curve) ROC measurement, which provides a precision criterion for the validation set, to compare results from the models [ 19 ].

In order to verify how important is the size of the sample, we apply the procedure equivalent to Crone and Finlay and Vieira et al. [ 8 , 31 ], and also explore our models with different quantity of instances, that is, by generating results for sets of 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10,000 instances, totaling 10 different sets.

Results and discussion

All models were implemented in the R software and applied on the same sets of samples. Before explaining the results, we describe specifications of the algorithms we used in this study. Taking into account Decision Trees, there are several algorithms, such as CART, C4.5, C5.0, ID3, among others. In this study, we use the algorithm C5.0, which is an enhancement of the C4.5 algorithm. According to Lantz [ 17 ], the C5.0 algorithm has become the industry standard for Decision Trees, generating good results for most types of problems when compared to other advanced machine learning models.

The C5.0 algorithm can produce more than two sub-groups in each division, allowing non-binary classifications. The evaluation of the possible nodes for separation of the sample is based on the information gain [ 17 ].

Considering the results in the training sample, the algorithm built a tree of size 1,974, indicating the number of decisions. The Decision Tree technique can therefore be applied in the validation dataset.

We also implemented a Random Forest model, which according to Luo [ 20 ], represents a set of decision trees, generalizing the method of classification and regression trees, and can be faster than bagging. We use the package randomForest, which is based on Breiman [ 4 ]. Because the dataset has many data (124,624 instances with 21 explanatory variables in the full sample), we also apply parallel processing through the packages doParallel and h2o for developing this model, similar to Vieira et al. [ 31 ].

Figure  1 shows that the classification error decreases as the number of decision trees increases. However, as long as new trees are included in the model, the error rate tends to be stabilizing after the inclusion of approximately 60 trees in the model. This plot shows that this model is potentially not overfitting, since both curves are decreasing and going in the same way.

figure 1

Error Rate, given by the root-mean-square error (rmse) versus number of Trees—Output of the Random Forest Model

Taking into account the SVM algorithms, we build two models: one with a linear kernel function [ 23 , 31 ] and the other with a radial basis Kernel function [ 14 , 20 , 22 , 35 ], implemented by the R package called e1071 and parallel SVM. A Kernel K is a function that takes two points \(x_i\) and \(x_j\) from the input space and computes the scalar product of that data in the feature space. The adequate choice of Kernel parameters is crucial to obtain good results. We use the tune.svm function to find the best parameters for the algorithm, like [ 16 ].

While the SVM with Linear Kernel function presents linear boundaries for the separation of data belonging to two classes, the radial basis kernel (RBF) allows deformations in the hyperplane, bringing better fit in cases of classes that are difficult to separate, which is very common in financial problems.

We also study results from the Bagging algorithm. This method generates a bootstrap sampled data from the original data. The data generates a set of models using a simple learning algorithm, called base classifier, combining the results into a simple voting system for classification. The ipred package in R offers a classic Bagging application using Decision Tree as base classifier. To train the model, we use the function bagging() [ 17 ].

Another ensemble algorithm explored in the study is AdaBoost, in which several Decision Trees are built and then the best class for each observation is chosen [ 9 ]. The best model found using AdaBoost was with 20 attempts. We use the R package C50 to evaluate a model with AdaBoost and Decision Trees approaches.

Logistic Regression is the most traditional technique used for modeling classification in credit risk [ 16 , 21 ]. Thus, we also study Logistic Regression results as benchmark. Therefore, we can compare results found using machine learning techniques with a base technique, commonly applied in credit risk classification. For Logistic Regression, therefore, we use the traditional glm R package.

Model performances: full sample

First results indicate that, when we examine all instances available with complete data, the SVM algorithms presented better Sensitivity, with lower Type I error than the other algorithms, reflecting that SVMs better predict cases of bad borrowers. However, the low specificity shows that the algorithm did not perform well in identifying good borrowers. In general, the SVM with RBF kernel model underperforms other techniques, as shown in other studies [ 14 , 18 , 20 , 22 , 25 , 28 ], which examined different datasets. The SVM-based model with linear kernel presented similar outputs compared to [ 31 ] (63.86% vs 63.72%). This comparison is more reliable because the characteristics of default present a close match, specially in geographical source and borrower profile, which both have low-income clients. Table  2 shows the values of performance measures in the test set (almost 40,000 instances), Accuracy, Sensitivity and Specificity for all the techniques studied (Fig.  2 ).

figure 2

Performance Measures (Accuracy in light blue, AUC in blue, Sensitivity in light green, and Specificity in green, respectively) for each technique, ordered by name

In the dataset comprised of personal loans of a Brazilian financial institution, the AdaBoost algorithm had the best Specificity, followed by Bagging and SVM-Linear. A good Specificity indicates a low Type II error, and therefore Adaboost is the best algorithm at identifying good borrowers. This finding also occurred in the study of Moula et al. [ 22 ] when using Chinese data, but other cases were inconclusive. [ 19 ] found the same ambiguous results, but the best model presented sensitivity greater than specificity.

Contrary to various results found in the literature [ 21 , 22 , 31 , 34 ], our results advocates the effectiveness of the logit model. For instance, Vieira et al. [ 31 ] found great disparity between sensitivity and specificity (close to 77%), and better performance for predicting non-defaulters when using logit. Our findings show that the disparity between sensitivity and specificity is not far from 2%. In addition, our results lead, compared the study from [ 31 ], to a much higher probability of correctly identifying bad borrowers (64.3% vs 20.6%), but a lower probability of correctly forecasting good clients (62% vs 97.3%). These results might reflect the peculiar characteristics of our sample, in special, (i) the high default rate of the portfolio, with nearly half of bad borrowers, (ii) the modest loan amount that could reduce the borrower’s concern, since the financial impact of the delay in the payment would be small. In contrast, [ 31 ] study housing financing, which is usually related to much larger loan and to a more essential item to the borrower, Due to the misclassification rates, our results suggest that credit data has an undefined structure neither linear nor nonlinear, and may be subject to other non-observed data. Therefore, credit data is hard to interpret not only by traditional models as logistic regression [ 19 ] but also by machine learning techniques.

Changes in the sample size

Taking advantage of the availability of a large number of observations in our database, we can analyze sensitivity of models in relation to sample size. Figure  3 depicts the ROC Curve for all the techniques studied for different sizes of sample. It’s possible to note that, in general, the performance results improve as the sample size increases, following Crone and Finlay [ 8 ] when testing their models in balanced data such as our purpose.

figure 3

ROC curves for seven techniques in nine sample sizes ( n )

Looking at the AdaBoost models, the outputs show that the model based on the full sample one outperforms the others, presenting higher AUC and higher average accuracy than smaller sample models. For Random Forest, Bagging and Decision Trees, the models have analogous behavior, but SVM-based models diverges in both kernels. Bagging showed higher AUC as the sample size increased. However, the mean accuracy was slightly higher in the smallest sample, with 100 observations. In AdaBoost, the model with the complete base was the one with the best performance, presenting higher AUC and higher average accuracy. Both metrics decline as the number of observations in the sample decreases.

The SVM Linear did not present a good performance. Comparing the performance metrics of the different samples, it is possible to note that the smallest sample, with 100 observations, had the best AUC, with mean accuracy equivalent to the sample with the complete dataset.

Results also show that radial SVM also performed poorly. Comparing the performance measurements of the different samples, the smallest sample, with 100 observations, had the best AUC and average accuracy, but with null Sensitivity, meaning that the model classified all borrowers as good payers. Therefore, the radial SVM is uniformative in our dataset and particularly worrisome, since the model does not identify bad borrowers.

Figure  4 presents the Accuracy (ACC) and Area Under the ROC Curve (AUC) performance measures for all the techniques studied.

figure 4

Results in terms of best Accuracy ( a ) and AUC ( b ) of each technique in different sample sizes

These outputs reinforce that AdaBoost presented the best AUC and better average accuracy. These values were better in the sample considering the complete dataset. Considering the sample of 1,000 observations, Random Forest has the best AUC, 67.4%, and the highest average accuracy, 63.3%. For the small sample of 100 observations, Random Forest has the best AUC, 65.3%, and has, together with Bagging and SVM, the best average accuracy, 63.3%. It is important to highlight that machine learning models in general outperform the logistic regression, which is a traditional technique used in credit classification in Brazilian financial institutions.

Variable importance analysis

Concerning variables, all techniques provide the importance of each variable as output, except SVM. If we compare the most important ones with logit model terms, some interesting findings can be observed. In particular, three types of variables present remarkable insights.

Age, the most important variable in three models (RF, DT, and Bagging) and the second one for Adaboost, has a negative coefficient in the logit model (p value <0.000), which means that young people are prone to default. These outcomes confirm that age is a crucial variable for any credit scoring model (linear or not).

The loan amount (second variable in RF) has a positive coefficient and, then, shows that the more borrowed money the client needs the more likelihood he/she has to default. Oppositely, DT and Bagging consider low importance to it.

At last, Income-based variables are highly relevant in RF (three of top-five and top 7 for Adaboost) and surprisingly present negative coefficients in the logit, showing that people with higher income have difficulty managing their money. In the case of DT and Bagging, income has lower relevance.

Conclusions

Machine learning, as a sub-field of Artificial Intelligence, has been widely used in the evaluation of credit risk. Various studies show competitive results of machine learning techniques, when compared with logistic regression, which is traditionally used in credit scoring classification analysis.

The objective of the study was to conduct an empirical analysis of machine learning models in a real-world database from a Brazilian bank. We tested five machine learning-based models in the context of the assessment of credit application. According to our study, machine learning techniques outperform the traditional model based on Logistic Regression. While ML algorithms have an average accuracy of 63%, Logistic Regression depicts competitive outcomes.

The best method, considering the performance metric based on AUC, was AdaBoost, followed by Random Forest and SVM-RBF. It is interesting to note that the TOP2 algorithms are based on ensemble classifiers. SVM algorithms presented intermediary Sensitivity and Specificity. The AdaBoost algorithm had the best Specificity, followed by Bagging and SVM-Linear. Considering overall results, AdaBoost presented the best performance among the models tested.

We also compared performance metrics considering different sample sizes to verify the sensitivity of the proposed models in relation to the number of observations. Therefore, the models were also implemented in samples of different sizes. In the smaller samples the results varies and as the sample size grows, Adaboost outperformed the other methods, considering AUC and average accuracy. In the analysis using different sample sizes, AdaBoost would be the second best classifier model.

The results of our paper have some implications. From a theoretical perspective, there is no definite model or algorithm that consistently leads to superior accuracy performance in different datasets. Our study seeks to contribute to the literature by exploring a variety of machine learning techniques applied in an unusual portfolio of high risk loans. In developed countries, which are the focus of the majority of studies, a 48% default rate would be unlikely, and empirical evidence of machine learning techniques are not usually tested on a very high default portfolio. From a practical standpoint, the study can contribute to better credit decisions. The bank of our study is state-owned and may be under political pressure to grant loans to low-income and high risk borrowers to achieve social goals.

However, the results of the study show that the use of straightforward machine learning models, in relation to the traditional logistic regression analysis, can reduce default losses. In this context, the bank can at the same time comply with its social role and diminish its credit risk. A lower default rate from the use of machine learning techniques to grant loans could also benefit good borrowers by reducing credit spread for low-income individuals.

Brazilian regulators do not allow capital requirements of credit exposure being calculated by machine learning models yet. But for managerial purposes, results show that the use of artificial intelligence algorithms can detect complex relationships among variables in the analysis of default, especially in a highly volatile environment, in which Brazilian financial institutions operate.

This study has some limitations. For instance, as in many empirical studies of credit analysis, we use a biased sample, since only data of the loans effectively granted are available. That is, there has already been an initial selection of potential borrowers conducted by the bank. The observations we analyzed contain only borrowers that the institution considered suitable for receiving loans.

As a suggestion for future studies, we suggest the analysis of different costs of misclassification. Since classifying a bad borrower as good is more costly than classifying a good borrower as bad, it is important to adjust accuracy by costs of type I and type II errors. Another suggestion involves comparing the results of the machine learning techniques considering different definitions of default, such as 30, 90 and 120 days of delay.

A broader feature analysis could be also studied in future research, exploring the variety of available variables. In particular, trying to identify, through the various machine learning algorithms, the importance of variables in explaining credit risk could bring contributions to the theory, by suggesting determinants of default.

Finally, another suggestion would be the investigation of the performance of high default portfolios of personal loans using more recent data. Whereas in 2007 the Brazilian treasury bond interest rate was 11.25% a year, in August 2020, the rate is an all time low of 2.0%. However, due to the COVID-19 pandemic, the default in personal loans is very high. Analyzing whether the performance of machine learning algorithms is not strongly influenced by different economic scenarios helps managers and regulators assess the adequacy of these new tools for credit risk assessment.

Data availability statement

Dataset used during the current study are available from the corresponding author on reasonable request.

Abbreviations

overall accuracy

artificial neural network

area under ROC curve

Logistic Regression or Logit

machine learning

receiver operating characteristic

Support Vector Machines with linear kernel

Support Vector Machines with radial basis function kernel

United Kingdom

Assef F, Steiner MT, Neto PJS, de Barros Franco DG (2019) Classification algorithms in financial application: credit risk analysis on legal entities. IEEE Lat Am Trans 17(10):1733–1740

Article   Google Scholar  

Ben-David A (1995) Monotonicity maintenance in information-theoretic machine learning algorithms. Mach Learn 19(1):29–43

Google Scholar  

Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

Breiman L (2001) Random forests. Mach Learn 45(1):5–32

Central Bank of Brazil (2007) Annual report. https://www.bcb.gov.br/pec/boletim/banual2007/rel2007p.pdf

Central Bank of Brazil (2020) Consumer personal loan. https://www.bcb.gov.br/estatisticas/reporttxjuros/

Cornée S (2019) The relevance of soft information for predicting small business credit default: Evidence from a social bank. J Small Bus Manag 57(3):699–719

Crone SF, Finlay S (2012) Instance sampling in credit scoring: an empirical study of sample size and balancing. Int J Forecast 28(1):224–238

Damrongsakmethee T, Neagoe V (2019) C4.5 decision tree enhanced with AdaBoost versus multilayer perceptron for credit scoring modeling. In: Silhavy R, Silhavy P, Prokopova Z (eds) Computational statistics and mathematical modeling methods in intelligent systems. CoMeSySo 2019. Advances in intelligent systems and computing, vol 1047. Springer, Cham, pp 216–226

Dastile X, Celik T, Potsane M (2020) Statistical and machine learning models in credit scoring: a systematic literature survey. Appl Soft Comput 106263

Davis R, Edelman D, Gammerman A (1992) Machine-learning algorithms for credit-card applications. IMA J Manag Math 4(1):43–51

Feng X, Xiao Z, Zhong B, Dong Y, Qiu J (2019) Dynamic weighted ensemble classification for credit scoring using Markov Chain. Appl Intell 49(2):555–568

Galindo J, Tamayo P (2000) Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Comput Econ 15(1/2):107–143

Kamalloo E, Saniee Abadeh M (2014) Credit risk prediction using fuzzy immune learning. Adv Fuzzy Syst 2014:1–11

Khandani AE, Kim AJ, Lo AW (2010) Consumer credit-risk models via machine-learning algorithms. J Bank Finance 34(11):2767–2787

Kozodoi N, Lessmann S, Papakonstantinou K, Gatsoulis Y, Baesens B (2019) A multi-objective approach for profit-driven feature selection in credit scoring. Decis Support Syst 120:106–117

Lantz B (2013) Machine learning with R. Packt Publishing Ltd, Birmingham

Lei K, Xie Y, Zhong S, Dai J, Yang M, Shen Y (2019) Generative adversarial fusion network for class imbalance credit scoring. Neural Comput Appl pp 1–12

Li W, Ding S, Chen Y, Wang H, Yang S (2019) Transfer learning-based default prediction model for consumer credit in China. J Supercomput 75(2):862–884

Luo C (2020) A comprehensive decision support approach for credit scoring. Ind Manag Data Syst 120(2):280–290

Morales EA, Ramos BM, Aguirre JA, Sanchez DM (2019) Credit risk analysis model in microfinance institutions in Peru through the use of Bayesian networks. In: 2019 Congreso Internacional de Innovación y Tendencias en Ingenieria (CONIITI), IEEE, pp 1–4

Moula FE, Guotai C, Abedin MZ (2017) Credit default prediction modeling: an application of support vector machine. Risk Manag 19(2):158–187

Niklis D, Doumpos M, Zopounidis C (2014) Combining market and accounting-based models for credit scoring using a classification scheme based on support vector machines. Appl Math Comput 234:69–81

Oreski S, Oreski G (2014) Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst Appl 41(4):2052–2064

Pławiak P, Abdar M, Pławiak J, Makarenkov V, Acharya UR (2020) DGHNL: a new deep genetic hierarchical network of learners for prediction of credit scoring. Inf Sci 516:401–418

Shen KY, Sakai H, Tzeng GH (2019) Comparing two novel hybrid MRDM approaches to consumer credit scoring under uncertainty and fuzzy judgments. Int J Fuzzy Syst 21(1):194–212

Shi J, Sy Zhang, Lm Qiu (2013) Credit scoring by feature-weighted support vector machines. J Zhejiang Univ Sci C 14(3):197–204

Siami M, Gholamian MR, Basiri J (2013) An application of locally linear model tree algorithm with combination of feature selection in credit scoring. Int J Syst Sci 45(10):2213–2222

Tsai CF, Hsu YF, Yen DC (2014) A comparative study of classifier ensembles for bankruptcy prediction. Appl Soft Comput 24:977–984

Twala B (2010) Multiple classifier application to credit risk assessment. Expert Syst Appl 37(4):3326–3336

Vieira J, Barboza F, Sobreiro VA, Kimura H (2019) Machine learning models for credit analysis improvements: predicting low-income families’ default. Appl Soft Comput 83(105):640

Wang G, Hao J, Ma J, Jiang H (2011) A comparative assessment of ensemble learning for credit scoring. Expert Syst Appl 38(1):223–230

Xiao H, Xiao Z, Wang Y (2016) Ensemble classification based on supervised clustering for credit scoring. Appl Soft Comput 43:73–86

Yeh IC, Lien Ch (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36(2):2473–2480

Zhong H, Miao C, Shen Z, Feng Y (2014) Comparing the learning effectiveness of BP, ELM, I-ELM, and SVM for corporate credit ratings. Neurocomputing 128:285–295

Download references

Acknowledgements

The authors acknowledge CNPq for financial support.

Open Access

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

CNPq provided funds to this research, grants 435173/2018-9 (FB), 409725/2013-7 (HK), 310666/2016-3 (HK), 438314/2018-2 (HK), and 312866/2019-4 (HK).

Author information

Authors and affiliations.

Department of Management, University of Brasília, Campus Darcy Ribeiro – North Wing, Brasília, Federal District, 70910–900, Brazil

Maisa Cardoso Aniceto & Herbert Kimura

School of Business and Management, Federal University of Uberlandia, Av. Joao Naves de Avila, 2121, Uberlandia, Minas Gerais, 38400–902, Brazil

Flavio Barboza

You can also search for this author in PubMed   Google Scholar

Contributions

MCA holds a Master of Administration from University of Brasilia. The paper is extracted from his Master thesis. HK supervised the thesis of MCA and FB co-supervised MCA. All authors read and approved the final manuscript. All authors contributed equally to this work.

Corresponding author

Correspondence to Maisa Cardoso Aniceto .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Aniceto, M.C., Barboza, F. & Kimura, H. Machine learning predictivity applied to consumer creditworthiness. Futur Bus J 6 , 37 (2020). https://doi.org/10.1186/s43093-020-00041-w

Download citation

Received : 17 June 2020

Accepted : 11 October 2020

Published : 15 November 2020

DOI : https://doi.org/10.1186/s43093-020-00041-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Credit risk
  • Consumer lending
  • Default prediction
  • Performance analysis

supervised machine learning homework predicting credit risk

Predicting of Credit Risk Using Machine Learning Algorithms

  • Conference paper
  • First Online: 28 February 2024
  • Cite this conference paper

supervised machine learning homework predicting credit risk

  • Tisa Maria Antony   ORCID: orcid.org/0000-0002-4848-2112 13 &
  • B. Sathish Kumar   ORCID: orcid.org/0000-0002-2292-842X 13  

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 843))

Included in the following conference series:

  • International Conference on Artificial Intelligence on Textile and Apparel

157 Accesses

Credit risk management is one of the key processes for banks and is crucial to ensuring the bank’s stability and success. However, due to the need for more rigid forecasting models with strong mapping abilities, credit risk prediction has become challenging for the banking industry. Therefore, this paper attempts to predict commercial banks’ credit risk (CR) by using various machine learning algorithms. Machine learning algorithms, namely linear regression, KNN, SVR, DT, RF, XGB, and MLP, are compared with and without feature selection and feature extraction techniques to examine their prediction capabilities. Various determinants of credit risk (features) have been extracted to predict credit risk, and these features have been used to train machine learning models. Findings revealed that the decision tree algorithm had the highest performance, with the lowest mean absolute error (MSE) value of 0.1637 and the lowest root mean squared error (RMSE) value of 0.2158.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Chen SC, Huang MY (2011) Constructing credit auditing and control & management model with data mining technique. Expert Syst Appl 38:5359–5365. https://doi.org/10.1016/j.eswa.2010.10.020

Article   Google Scholar  

Hassani H, Huang X, Silva E (2018) Digitalisation and big data mining in banking. Big Data Cogn Comput 2:1–13. https://doi.org/10.3390/bdcc2030018

Aziz S, Dowling M (2018) Machine learning and AI for risk management. Palgrave Studies in Digital Business & Enabling Technologies, Cham. https://doi.org/10.1007/978-3-030-02330-0_3

Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using MCDM methods. Inf Sci (Ny) 275:1–12. https://doi.org/10.1016/j.ins.2014.02.137

Wang L, Zhang W (2023) A qualitatively analyzable two-stage ensemble model based on machine learning for credit risk early warning: evidence from Chinese manufacturing companies. Inf Process Manag 60:103267. https://doi.org/10.1016/j.ipm.2023.103267

Gholamzadeh M, Faghani M, Pifeh A (2021) Implementing machine learning methods in the prediction of the financial constraints of the companies listed on Tehran’s stock exchange. Int J Financ Manag Account 6:131–144

Google Scholar  

Popescu ME, Dragotă V (2018) What do post-communist countries have in common when predicting financial distress? Prague Econ Pap 27:637–653. https://doi.org/10.18267/j.pep.664

Wang N (2017) Bankruptcy prediction using machine learning. J Math Financ 7:908–918. https://doi.org/10.4236/jmf.2017.74049

Huang Y-P, Yen M-F (2019) A new perspective of performance comparison among machine learning algorithms for financial distress prediction. Appl Soft Comput 83. https://doi.org/10.1016/j.asoc.2019.105663

Abellán J, Castellano JG (2017) A comparative study on base classifiers in ensemble methods for credit scoring. Expert Syst Appl 73:1–10. https://doi.org/10.1016/j.eswa.2016.12.020

Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42:741–750. https://doi.org/10.1016/j.eswa.2014.08.029

Khalid S, Khan MA, Mazliham MS, Alam MM, Aman N, Taj MT, Zaka R, Jehangir M (2022) Predicting risk through artificial intelligence based on machine learning algorithms: a case of Pakistani nonfinancial firms. Complexity 2022. https://doi.org/10.1155/2022/6858916

Mousa GA, Elamir EAH, Hussainey K (2022) Using machine learning methods to predict financial performance: does disclosure tone matter? Int J Discl Gov 19:93–112. https://doi.org/10.1057/s41310-021-00129-x

Ozgur O, Karagol ET, Ozbugday FC (2021) Machine learning approach to drivers of bank lending: evidence from an emerging economy. Finan Innov 7:1–29. https://doi.org/10.1186/s40854-021-00237-1

Xia Y, Xu T, Wei M-X, Wei Z-K, Tang L-J (2023) Predicting chain’s manufacturing SME credit risk in supply chain finance based on machine learning methods. Sustainability 15:1–18. https://doi.org/10.3390/su15021087

Hamal S, Senvar Ö (2021) Comparing performances and effectiveness of machine learning classifiers in detecting financial accounting fraud for Turkish SMEs. Int J Comput Intell Syst 14:769–782. https://doi.org/10.2991/ijcis.d.210203.007

Kou G, Xu Y, Peng Y, Shen F, Chen Y, Chang K, Kou S (2021) Bankruptcy prediction for SMEs using transactional data and two-stage multiobjective feature selection. Decis Support Syst 140. https://doi.org/10.1016/j.dss.2020.113429

Moula FE, Guotai C, Abedin MZ (2017) Credit default prediction modeling: an application of support vector machine. Risk Manag 19:158–187. https://doi.org/10.1057/s41283-017-0016-x

Abedin MZ, Guotai C, Colombage S, Moula F (2018) Credit default prediction using a support vector machine and a probabilistic neural network. J Credit Risk 14. https://doi.org/10.21314/JCR.2017.233

Chi G, Uddin MS, Abedin MZ, Yuan K (2019) Hybrid model for credit risk prediction: an application of neural network approaches. Int J Artif Intell Tools 28:1–33. https://doi.org/10.1142/S0218213019500179

Machado MR, Karray S (2022) Assessing credit risk of commercial customers using hybrid machine learning algorithms. Expert Syst Appl 200:116889. https://doi.org/10.1016/j.eswa.2022.116889

Lu Y, Yang L, Shi B, Li J, Abedin MZ (2022) A novel framework of credit risk feature selection for SMEs during industry 4.0. Ann Oper Res. https://doi.org/10.1007/s10479-022-04849-3

Abedin MZ, Chi G, Uddin MM, Satu MS, Khan MI, Hajek P (2021) Tax default prediction using feature transformation-based machine learning. IEEE Access 9:19864–19881. https://doi.org/10.1109/ACCESS.2020.3048018

Satu MS, Zoynul Abedin M, Khanom S, Ouenniche J, Shamim Kaiser M (2021) Application of feature engineering with classification techniques to enhance corporate tax default detection performance. In: Proceedings of international conference on trends in computational and cognitive engineering. Springer, Singapore, pp 53–63. https://doi.org/10.1007/978-981-33-4673-4_5

Aksoy B, Boztosun D (2021) Comparison of classification performance of machine learning methods in prediction financial failure: evidence from Borsa Istanbul. Hitit Sos Bilim Derg 14:56–86. https://doi.org/10.17218/hititsbd.880658

Madhu B, Rahman MA, Mukherjee A, Islam MZ, Roy R, Ali LE (2021) A comparative study of support vector machine and artificial neural network for option price prediction. J Comput Commun 9:78–91. https://doi.org/10.4236/jcc.2021.95006

Gupta A, Raghav A, Srivastava S (2021) Comparative study of machine learning algorithms for Portuguese bank data. In: International conference on computing, communication, and intelligent systems (ICCCIS). IEEE, Greater Noida, India, pp 401–406. https://doi.org/10.1109/ICCCIS51004.2021.9397083

Ismail QF, Al-Sobh ES, Al-Omari SS, Bani Yaseen TM, Abdullah MA (2021) Using machine learning algorithms to predict the state of financial inclusion in Africa. In: 12th international conference on information and communication systems (ICICS), pp 317–323. https://doi.org/10.1109/ICICS52457.2021.9464590

Jin Y, Zhu Y (2015) A data-driven approach to predict default risk of loan for online peer-to-peer (P2P) lending. In: Fifth international conference on communication systems and network technologies. IEEE, Gwalior, India, pp 609–613. https://doi.org/10.1109/CSNT.2015.25

Abbas F, Iqbal S, Aziz B (2019) The impact of bank capital, bank liquidity and credit risk on profitability in postcrisis period: a comparative study of US and Asia. Cogent Econ Finan 7. https://doi.org/10.1080/23322039.2019.1605683

Madugu AH, Ibrahim M, Amoah JO (2020) Differential effects of credit risk and capital adequacy ratio on profitability of the domestic banking sector in Ghana. Transnatl Corp Rev 12:37–52. https://doi.org/10.1080/19186444.2019.1704582

Ekinci R, Poyraz G (2019) The effect of credit risk on financial performance of deposit banks in Turkey. Procedia Comput Sci 158:979–987. https://doi.org/10.1016/j.procs.2019.09.139

Abdelaziz H, Rim B, Helmi H (2020) The interactional relationships between credit risk, liquidity risk and bank profitability in MENA region. Glob Bus Rev. https://doi.org/10.1177/0972150919879304

Liu Y, Brahma S, Boateng A (2020) Impact of ownership structure and ownership concentration on credit risk of Chinese commercial banks. Int J Manag Finan 16:253–272. https://doi.org/10.1108/IJMF-03-2019-0094

Patra B, Padhi P (2020) Resilience of Indian banks: macroeconomic stress test modeling for credit risk. J Public Aff 1–14. https://doi.org/10.1002/pa.2350

Alzoubi T, Obeidat M (2020) How size influences the credit risk in Islamic banks. Cogent Bus Manag 7. https://doi.org/10.1080/23311975.2020.1811596

Lepetit L, Nys E, Rous P, Tarazi A (2008) Bank income structure and risk: an empirical analysis of European banks. J Bank Finan 32:1452–1467. https://doi.org/10.1016/j.jbankfin.2007.12.002

Ben Jabra W, Mighri Z, Mansouri F (2017) Determinants of European bank risk during financial crisis. Cogent Econ Finan 5. https://doi.org/10.1080/23322039.2017.1298420

Siddika A, Haron R (2019) Capital regulation and ownership structure on bank risk. J Finan Regul Compliance 28:39–56. https://doi.org/10.1108/JFRC-02-2019-0015

Majumder MTH, Li X (2018) Bank risk and performance in an emerging market setting: the case of Bangladesh. J Econ Finan Adm Sci 23:199–229. https://doi.org/10.1108/JEFAS-07-2017-0084

Lee TH, Chih SH (2013) Does financial regulation affect the profit efficiency and risk of banks? Evidence from China’s commercial banks. North Am J Econ Finan 26:705–724. https://doi.org/10.1016/j.najef.2013.05.005

García-Alcober MP, Prior D, Tortosa-Ausina E, Illueca M (2019) Risk-taking behavior, earnings quality, and bank performance: a profit frontier approach. BRQ Bus Res Q. https://doi.org/10.1016/j.brq.2019.02.003

Twum AK, ZhongMing T, Agyemang AO, Ayamba EC, Chibsah R (2021) The impact of internal and external factors of credit risk on businesses: an empirical study of Chinese commercial banks. J Corp Account Finan 1–14. https://doi.org/10.1002/jcaf.22482

Focarelli D, Panetta F, Salleo C (2002) Why do banks merge? J Money Credit Bank 34:1047–1066

Mpofu TR, Nikolaidou E (2018) Determinants of credit risk in the banking system in Sub-Saharan Africa. Rev Dev Finan 8:141–153. https://doi.org/10.1016/j.rdf.2018.08.001

Srairi S (2019) Transparency and bank risk-taking in GCC Islamic banking. Borsa Istanbul Rev 19:S64–S74. https://doi.org/10.1016/j.bir.2019.02.001

Kharabsheh B (2019) Determinants of bank credit risk: empirical evidence from Jordanian commercial banks

İncekara A, Çetinkaya H (2019) Credit risk management: a panel data analysis on the Islamic banks in Turkey. Procedia Comput Sci 158:947–954. https://doi.org/10.1016/j.procs.2019.09.135

Al-Qudah AA, Hamdan A, Al-Okaily M, Alhaddad L (2022) The impact of green lending on credit risk: evidence from UAE’s banks. Environ Sci Pollut Res. https://doi.org/10.1007/s11356-021-18224-5

Gupta N, Mahakud J (2020) Ownership, bank size, capitalization and bank performance: evidence from India. Cogent Econ Finan 8. https://doi.org/10.1080/23322039.2020.1808282

Misman FN, Bhatti MI (2020) The determinants of credit risk: an evidence from ASEAN and GCC Islamic banks. J Risk Finan Manag 13:89. https://doi.org/10.3390/jrfm13050089

Masood O, Ashraf M (2012) Bank-specific and macroeconomic profitability determinants of Islamic banks: the case of different countries. Qual Res Finan Mark 4:255–268. https://doi.org/10.1108/17554171211252565

Salike N, Ao B (2018) Determinants of bank’s profitability: role of poor asset quality in Asia. China Finan Rev Int 8:216–231. https://doi.org/10.1108/CFRI-10-2016-0118

Sivasankaran SN, Shukla A, Ayyalusamy K, Chakraborty S (2020) Do women directors impact the risk and return of Indian Banks? IIM Kozhikode Soc Manag Rev 10:44–65. https://doi.org/10.1177/2277975220938013

Battaglia F, Mazzuca M (2014) Securitization and Italian banks’ risk during the crisis. J Risk Finan 15:458–478. https://doi.org/10.1108/JRF-07-2014-0097

Almaqtari FA, Al-Homaidi EA, Tabash MI, Farhan NH (2018) The determinants of profitability of Indian commercial banks: a panel data approach. Int J Finan Econ 24:1–18. https://doi.org/10.1002/ijfe.1655

Download references

Author information

Authors and affiliations.

School of Commerce Finance and Accountancy, CHRIST (Deemed to be University), Bengaluru, India

Tisa Maria Antony & B. Sathish Kumar

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Tisa Maria Antony .

Editor information

Editors and affiliations.

Department of Computer Science and Engineering, Rajasthan Technical University, Kota, Rajasthan, India

Harish Sharma

Department of Computer Science and Electrical Engineering, University of Stavanger, Stavanger, Norway

Antorweep Chakravorty

Biomedical Robotics, Information, Technology (IT) and Systems, University of Canberra, Bruce, ACT, Australia

Shahid Hussain

IBS, Bangalore, Off-Campus Centre, ICFAI Foundation for Higher Education (IFHE) University, Bengaluru, Karnataka, India

Rajani Kumari

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Antony, T.M., Kumar, B.S. (2024). Predicting of Credit Risk Using Machine Learning Algorithms. In: Sharma, H., Chakravorty, A., Hussain, S., Kumari, R. (eds) Artificial Intelligence: Theory and Applications. AITA 2023. Lecture Notes in Networks and Systems, vol 843. Springer, Singapore. https://doi.org/10.1007/978-981-99-8476-3_9

Download citation

DOI : https://doi.org/10.1007/978-981-99-8476-3_9

Published : 28 February 2024

Publisher Name : Springer, Singapore

Print ISBN : 978-981-99-8475-6

Online ISBN : 978-981-99-8476-3

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

COMMENTS

  1. Supervised Machine Learning Homework

    In this assignment, I built a machine learning model that attempts to predict whether a loan from LendingClub is high risk or not. Background LendingClub is a peer-to-peer lending services company that allows individual investors to partially fund personal loans as well as buy and sell notes backing the loans on a secondary market.

  2. Supervised Machine Learning Homework

    Supervised Machine Learning Homework - Predicting Credit Risk Background Instructions Retrieve the data Preprocessing: Convert categorical data to numeric Consider the models Fit a LogisticRegression model and RandomForestClassifier model Revisit the Preprocessing: Scale the data Rubric References

  3. Supervised Machine Learning Homework

    CGrant109/Supervised-Machine-Learning-Homework---Predicting-Credit-Risk This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main

  4. Credit_Risk_Analysis

    (1) credit_risk_ensemble_starter_code, (2) credit_risk_resampling_starter_code, (3) LoanStats_2019Q1; Software: Python 3.9.10, Jupyter Lab 4.6, Visual Studio Code 1.71.2; Methodology D1: Use Resampling Models to Predict Credit Risk. Using the imbalanced-learn and scikit-learn libraries, we evaluated three machine learning models by using ...

  5. Credit Risk Modeling with Machine Learning

    Credit risk modeling-the process of estimating the probability someone will pay back a loan-is one of the most important mathematical problems of the modern world.In this article, we'll explore from the ground up how machine learning is applied to credit risk modeling. You don't need to know anything about machine learning to understand this article!

  6. (PDF) Machine Learning for Credit Risk Prediction: A Systematic

    Abstract: In this systematic review of the literature on using Machine Learning (ML) for credit risk. prediction, we raise the need for financial institutions to use AI and ML to assess credit ...

  7. Machine learning-driven credit risk: a systemic review

    Credit risk assessment is at the core of modern economies. Traditionally, it is measured by statistical methods and manual auditing. Recent advances in financial artificial intelligence stemmed from a new wave of machine learning (ML)-driven credit risk models that gained tremendous attention from both industry and academia. In this paper, we systematically review a series of major research ...

  8. Data

    In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions about algorithms, metrics, results, datasets, variables, and related limitations in predicting credit risk. In ...

  9. Credit Risk Evaluator Machine Learning: Supervised

    Build a machine learning model that attempts to predict whether a loan from LendingClub will become high risk or not. Background LendingClub is a peer-to-peer lending services company that allows individual investors to partially fund personal loans as well as buy and sell notes backing the loans on a secondary market.

  10. Machine Learning Credit Risk Modelling : A Supervised Learning. Part 6

    The AUC is a scalar value that represents the area under the ROC curve. It provides a single numerical measure of a model's ability to discriminate between positive and negative instances across ...

  11. A Concern of Predicting Credit Recovery on Supervised Machine Learning

    For banking and financial sectors credit risk is a great threat and to predict the credit worthiness of a customer there are many techniques that exist. ... we have explored the dataset of bank which contains the information of credit defaulter client and applied some supervised Machine Learning algorithms to predict the credit ability of a ...

  12. Machine Learning and Credit Risk Modelling

    Lowercase nomenclature is used to differentiate S&P Global Market Intelligence PD scores from the credit ratings used by S&P Global Ratings. Machine Learning (ML) algorithms leverage large datasets to determine patterns and construct meaningful recommendations. Likewise, credit risk modelling is a field with access to a large amount of diverse ...

  13. A machine learning-based credit risk prediction engine system using a

    Credit risk prediction is a crucial task for financial institutions. The technological advancements in machine learning, coupled with the availability of data and computing power, has given rise to more credit risk prediction models in financial institutions. In this paper, we propose a stacked classifier approach coupled with a filter-based feature selection (FS) technique to achieve ...

  14. Machine learning predictivity applied to consumer creditworthiness

    Credit risk evaluation has a relevant role to financial institutions, since lending may result in real and immediate losses. In particular, default prediction is one of the most challenging activities for managing credit risk. This study analyzes the adequacy of borrower's classification models using a Brazilian bank's loan database, and exploring machine learning techniques. We develop ...

  15. Credit risk prediction based on causal machine learning: Bayesian

    The predictive and interpretable power of models is crucial for financial risk management. The purpose of this study was to perform credit risk prediction in a structured causal network with four stages—data processing, structural learning, parameter learning, and interpretation of inferences—and use six real credit datasets to conduct empirical research on the proposed model.

  16. PDF Machine Learning Applied to Banking Supervision a Literature Review

    In recent years, machine learning (ML) methods and, to some extent, deep learning (DL), have been used for the assessment of credit risk, and more broadly, predicting bank failures. Currently, traditional statistical methods are still commonly used for this purpose. Nevertheless, machine learning techniques are overcoming traditional approaches by

  17. Predicting of Credit Risk Using Machine Learning Algorithms

    Abstract. Credit risk management is one of the key processes for banks and is crucial to ensuring the bank's stability and success. However, due to the need for more rigid forecasting models with strong mapping abilities, credit risk prediction has become challenging for the banking industry. Therefore, this paper attempts to predict ...

  18. PDF Machine Learning for an Enhanced Credit Risk Analysis: A Comparative

    whether machine learning techniques can better predict the potential risks. To study the machine learning paradigm in this sector, the mental health dataset and loan approval dataset presenting survey results from 1991 individuals are used as inputs to experiment with the credit risk prediction ability of the chosen machine learning algorithms.

  19. Integration of Unsupervised and Supervised Machine Learning Algorithms

    Bao et al. (2019) proposed an integration strategy of unsupervised learning with supervised learning for credit risk assessment. Pan et al. (2020) designed a credit risk analysis framework for ...

  20. Machine Learning in Credit Risk: Measuring the Dilemma Between ...

    Abstract. New reports show that the fi nancial sector is increasingly adopting machine learning (ML) tools to manage credit risk. In this environment, supervisors face the challenge of allowing credit institutions to benefi t from technological progress and financial innovation, while at the same ensuring compatibility with regulatory requirements and that technological neutrality is observed.