Fair Isaac: Using Alternative Data in Credit Risk Modeling
When it comes to using alternative data in credit risk assessment, the field has really opened up in recent years. Alternative data is a hot topic, partly because of the data explosion of the past few years, and partly because of the push for financial inclusion. Here is some useful information on how to evaluate alternative data and combine it with so-called traditional data to improve credit risk models.
Several alternative data types
What is alternative The data? In credit granting, it generally refers to any data that is not directly related to a consumer’s credit behavior. Traditional data usually refers to data from a credit bureau, a credit application, or a lender’s own files on an existing customer. Alternate data is everything else.
There are approximately 3 billion adults in the world who have no credit and therefore have no credit history. Opening up this market is a priority for lenders. And while many of these people are in developing markets with nascent credit infrastructure, there are so-called “credit invisibles” in the most mature credit markets, people who don’t have of credit and are unknown to the credit bureaus.
With that in mind, let’s look at some alternative data sources and how useful they are for credit decisions.
- Transaction data. This is usually data about how customers use their credit or debit cards. This may not sound “alternative” – most lenders already have this data, often manipulated into monthly summaries – but it is not often mined to extract the maximum predictive value. It can be used to generate a wide range of predictive features such as cash to total expense ratios over the past X weeks or expense ratios over the past X weeks to the past Y weeks and even features based on the number, frequency and value of transactions at different types of retailers. Processing them can take time, but the data itself is usually clean.
- Telecom / Utilities / Rental Data. This data is basically credit history data, but it is an alternative because it does not appear in most credit reports. FICO extracted this data for FICO® XD score in the United States.
- Social Profile Data. Mining of Facebook, LinkedIn, Twitter, Instagram, Snapchat or other social media sites is possible, but few lenders would want to brave the regulatory hurdles of being the first to act. Although it is possible to derive value not from what people say on these channels but from metadata – for example, the number of posts and their frequency or the size of their social graph – this would still probably raise problems of confidentiality. Moreover, despite what some enterprising fintechs might say, the value of this data would be much lower than the value of data with a stronger credit link. It is also possible for a consumer to manipulate this data.
- Clickstream data. How a customer moves through your website, where they click, and how long they take on a page can be predictive.
- Audio and text data. This data takes the form of information found on credit applications, in recorded customer service or collection calls. It can complement “light files” and is already proven in collections.
- Analysis of social networks. New technologies allow us to map a consumer’s network in two important ways. First, this technology can be used to identify all files and accounts from the same customer, even if the files have slightly different names or different addresses. This gives you a better understanding of the consumer and their risk. Second, we can identify the individual’s ties to other people, such as members of their household. When evaluating a new credit applicant with little or no credit history, the applicant’s network credit scores can provide useful information. However, this will not pass regulatory tests in all markets.
- Survey/questionnaire data. Psychometrics is an innovative new way to assess the credit risk of someone with little or no credit history. The leader in this field, EFL, bases its scores on 10 years of research at Harvard. FICO has partnered with EFL to enable more people to be rated in markets around the world.
FICO research has shown that these data sources add predictive margin value to traditional data-based credit risk models. The amount of predictive value given in the table below should be taken as relative indicators, not absolute values, because the additional value of the data source is based on many parameters such as the predictive power of existing models , the strength of the customer relationship with the lender, etc.
Please note that the traditional models used as a reference were application models, not credit bureau score models (such as the FICO® score).
The graph below shows the result of a project carried out by FICO for a personal loan origination portfolio. Traditional credit traits captured more value than alternative data traits (with alternative data capturing around 60% of the predictive power), and there was a high degree of overlap between the two. However, by combining the characteristics of traditional and alternative data (and understanding the overlap so as not to overweight the contribution of certain variables), we were able to produce a more powerful model.
Machine learning and explainability
It is impossible to talk about alternative data without talking about different analytical and machine learning technologies, such as neural networks, random forests and stochastic gradient boosting. With large unstructured datasets, the intelligent use of these technologies can identify data patterns related to credit risk and make the model development process more manageable.
However, as is the case with AI in general, data scientists play an important role. They should verify the accuracy of the output, ensure that the model does not overfit the data, ensure that the model provides a stable output, and ensure that the patterns discovered are robust, relevant, and explainable.
Explainability is a challenge when it comes to AI and machine learning. Lenders need to explain how consumers are rated – certainly to regulators, and often to consumers themselves. FICO uses technology that takes patterns identified in AI, machine learning, and other techniques and turns them into easy-to-understand and implement dashboards, and produces the same increases in predictive power as models. machine learning. For more information on these techniques, see FICO’s Director of Analytics Scott Zoldi’s blog post on How to Build Credit Risk Models Using AI and Machine Learning Automatique.
How FICO can help you use alternative data in credit risk modeling
This is an update of an article first published in August 2017.