What Big Data Thinks About P2P Loans

Executive Summary:

Peer to peer services have been growing in popularity in the past decade. New lending marketplaces have seen an equally substantial growth even as they have strayed from the P2P standard.  The global marketplace loan market has seen a near tenfold increase in loan origination since 2010.  A key advantage driving the growth of peer to peer lending lies in its transparency where a lender can collect and analyze nearly all aspects of a borrower’s profile ranging from credit score and occupation to more dynamic data including debt to income.  This has not only sped up and reduced the cost of underwriting, but has also bolstered the investment narrative relative to a traditional fixed income portfolio.

However, the processing and analysis of high volume data is a daunting task for human intuition to surmount alone.  Multiple platforms and alternative data sources are nearing or have already amassed a decade of borrower data.  However, with well over 50 variables per platform often filling gigs of data, the major challenge facing the lender is finding that right combination of features at specific cutoffs that will yield a consistent profit with a minimal computational burden. This paper walks through a sample analysis identifying the mix of factors driving loan quality.


Analytical Overview of Data:

Exploration of the data is by far the most important part of the the process. After downloading the data set, fire up your favorite data exploration tool and start digging around. These can range from programming languages, such as python and R, to graphical exploration tools, such as Tableau and Weka. For the purpose of visualization in this paper, we’ll use Tableau to portray relationships from the Lending Club data collection. 

After a little bit of digging around, one variable that piques our interest was Home Ownership, and how likely someone is to default based on their ownership status. It turns out that mortgaging and owning a house are within .1% of each other in terms of likelihood. Renting a place to live generally increases the likelihood of defaulting by 1-2%. However, if anyone falls under any other type of home ownership, they are more than twice as likely to default.

After further exploration, we see that there is a strong correlation between the amount of inquiries within the last six months and the predicted loss. As a general rule, the more inquiries that occurred within the last six months, the lower the amount of money that is predicted to be lost.

Interesting right? Both Inquiries and Home Ownership show some correlation to loss. Since we have already started to focus on these variables, let us see how much more information we can squeeze out of them.

All three of these Home Ownership statuses conserve the same general notion that more inquiries within the last six months means less money is lost. However, the comparison between each type tells a more striking story. Owning a home proves to be the least costly to investors, and that should make sense. After all, the owner has probably spent over a decade keeping up with mortgage payments, ultimately building a good habit. The owner also has less of an overhead without constantly needing to pay principal for the house. The shock comes when comparing borrowers who live in mortgaged houses versus borrowers living in rented houses. Both show significantly more money lost when compared to borrowers who own their houses, but it comes rather as a shock to see that borrowers who rent will be less costly than borrowers who mortgage. But does this not contradict the first chart? One possible explanation is that borrowers who rent usually borrow less money. Another possible speculation is that borrowers who mortgage will default earlier in the loan term, allowing investors to recoup more money from borrowers who rent. The below graphic shows another way to look at borrowers who already own their home. 


Modeling and Tuning:

This section requires a white paper of its own, so I will reference the one written by Ben McMillan over here. It explains how to build a traditional credit model to predict P2P loan defaults as well as introduces some basic “machine learning” concepts.

Above are the tables that measure how much worse the model does when each of the variables are left out and the gini rating (“node impurity”), respectively. Some extra variables were added to see if there were other relationships that could be uncovered. This model was created on only 100 trees and hasn’t been properly tuned. Nonetheless, it shows a telling picture. Our initial hypothesis that home ownership would be rather important holds true. We can see that interest rate is also a powerful attribute.  For creating one’s own model, the variable could be included amongst others in order to simplify the model creation process.

The tuning process is rather time consuming, but well worth it. While random forests were used to create the graphic above, there are a variety of methods to create predictive models. Some include xgboost, which is essentially a more advanced version of random forests but is harder to tune, or the abundance of neural networks of neural network structures.  



The historical data that is provided by lending institutions is a very useful (and free) asset.  As more data becomes available across multiple investment regimes, the more the uncertainty of investing in Lending Marketplaces decreases.  This exercise provided a glimpse of how to use data to augment our intuition, while seeing how adding features to the same variable can change the narrative.  While platforms such as LendingCalc have automated model selection and investment into these loans, a quick skim of the prevailing literature shows new methods and packages being released constantly.  Much attention has been focused on deep learning, but a fair amount of innovation and refinement continues in traditional machine learning tools such as bagging and boosting.  We will explore a few of these models in future posts and invite the reader to do the same and engage with us. 


Tomlinson, Neil. “A Temporary Phenomenon? Marketplace Lending.” Deloitte, Deloitte, 2016, www2.deloitte.com/content/dam/Deloitte/uk/Documents/financial-services/deloitte-uk-fs-marketplace-lending.pdf.

Wack, Kevin. “Marketplace Lending Grew by 700% in Four Years: Report.” American Banker, American Banker, 8 Apr. 2016, www.americanbanker.com/news/marketplace-lending-grew-by-700-in-four-years-report.