Fair Isaac: Combining Machine Learning with Credit Risk Dashboards
With all the benefits of artificial intelligence, many of our customers want to leverage machine learning to improve other types of analytical models already in use, such as credit risk assessment. With 30 years of experience with AI and machine learning under our belt, we can definitely help.
My colleague Scott Zoldi blogged a few years ago about how we use AI to create credit risk models. In this article, I’d like to dive deeper into one of the examples he gave, to discuss an approach that offers a way to harness the full power of machine learning in the area of credit risk for those who are still on the fence and still using dashboard templates. It should also be noted that in the years since, FICO has made tremendous strides in making AI models transparent, constrained and conceivable and able to withstand regulatory scrutiny, thus making AI models AI adapted to credit risk models. These have been discussed in other blogs.
How to build a model with limited data?
A traditional credit risk scorecard model generates a score reflecting the probability of default, using various customer characteristics as inputs to the model. These characteristics could be any customer information deemed relevant to assess the probability of default, provided that the information is also permitted by regulation. The input is grouped into different ranges of values and each of these ranges is assigned a score weight. When scoring an individual, the score weights corresponding to the individual’s information are added together to produce the score.
When building a dashboard model, we need to “group” features into ranges of values, and buckets are intended to maximize the separation between known good cases and known bad cases. This separation is measured using the weight of evidence (WoE), a logarithmic ratio of the fraction of good cases and the fraction of bad cases present in the bin. A WoE of 0 means the bin has the same distribution of good and bad cases as the overall population. The further this value is from 0, the more the class has a concentration of one type of case over the other, relative to the overall population. A dashboard will usually have a few bins, with a smooth distribution of WoE.
As Scott described in his article, our project was to create credit risk models for a real estate portfolio. Home equity lending slowed significantly after the recession, and for this reason we had few bad examples in the development sample, and only a default rate of 0.2%. It was difficult to build models using traditional dash techniques.
The main reason for this is the inability of a dashboard model to interpolate information. The information must be explicitly provided to the dashboard model and the standard way to do this is to provide enough good and bad counts for each bin to calculate a reliable WoE. If good or bad counts aren’t enough, as in this case, this approach ends up producing a noisy and jerky WoE distribution between bins, leading to poorly performing scoreboard models.
Enter machine learning
Then we used a machine learning algorithm called Tree Ensemble Modeling or TEM. TEM involves building multiple “tree” models, where each node in the tree is a variable that is split into two further subtrees.
Each tree model we build in TEM is built on a subset of the training dataset and uses only a handful of features as input. This limits the degrees of freedom of the tree model, produces a shallow tree as a result, and ensures that splitting of variables is limited. This allows us to respond more diligently to the requirement for a minimum number of good and bad cases.
The following diagram shows an artistic interpretation of a TEM, representing several shallow trees in a group or set. The final score output, produced by Ensemble Modeling, is usually an average of the scores of all the constituent tree models in the Ensemble.
Such a model may have thousands of trees and tens of thousands of parameters that have no easy interpretation. Unlike a scorecard, you can’t tell a borrower, a regulator, or even a risk analyst why someone got such a high rating. This inability to explain why someone got a particular score is a big limitation of an approach like TEM.
However, by building a machine learning model, we were able to confirm that our dashboard approach was losing a significant amount of predictive power. Although impractical to use, the machine learning score outperformed the dashboard. Our next challenge was to try to close the performance gap between the TEM and the dashboard models.
Dashboards that mimic machine learning
FICO has faced this challenge many times before: when there is a demand for a dashboard model, even though a machine learning model provides a much more powerful decision-making capability: how to retain deep knowledge of the machine learning and AI, which can discover dashboard models development approaches can not pass this knowledge to a dashboard model?
Over the years, we have developed practical ways to address previously identified limitations of machine learning models which, until now, did not allow them to be applied in regulated decision scenarios, such as lending decisions and of credit risk. For example, we have developed mechanisms to impute domain knowledge and interpretability in neural networks and other machine learning models. We have also developed methodologies to apply constraints on the relationships between the input and output of machine learning models, as well as to design these relationships much like in the case of dashboards.
However, there are situations where some companies are reluctant to use machine learning models directly. In such cases, we recommend using a methodology called teacher-student learning. In this approach, a machine learning model is first trained which is able to learn complex nonlinear relationships in the data. Such a model is called a “Teacher” model. The TEM machine learning model we discussed earlier can serve as a teacher model.
We then train a “Student” model which is a set of segmented dashboards, which recodes the discovered patterns and insights using the teacher’s machine learning model. Our goal is to match the distribution of scores generated by the teacher model, instead of relying on the WoE approach we discussed earlier. So instead of providing good and bad data points and directly calculating the WoE, the distribution of scores in each bin derived from the teacher’s machine learning model ends up providing an estimate of the WoE.
Significantly, the final model is almost as predictive as the teacher’s machine learning model. A timeless validation of the final model demonstrates that it performs well over a period of time, as shown in the following figure.
The end result of this approach is a solid and palatable segmented dashboard template. Our hybrid approach overcame the limitations imposed by the small number of bad cases. While previously it was considered impossible to create powerful dashboards for problem spaces with such rare cases, the teacher-student approach allows us to do so, whenever a machine learning algorithm can be built to extract more signals from these datasets.
This is just one approach that FICO uses to leverage the power of AI in heavily regulated areas where reasons related to score loss are needed. This represents our commitment to extending AI to new areas for our customers, which we have been doing for 30 years. To see some of the ways we do this today, check out our Artificial Intelligence and Machine Learning page.
How FICO Can Help You Build Better Risk Models
This is an update of an article from 2017.