The United States has one of the world’s largest automobile markets, second only to China. With 270.4 million registered vehicles as of 2017 on American roads, there are millions of crashes every year. According to the National Highway Traffic Safety, there were an estimated 7 million police-reported motor vehicle crashes in the US in 2016. This led to about 207 million dollars in collision loss in 2016.
Being able to predict the likelihood of a driver filling a claim in the coming months provides the insurer with the ability to adjust premiums and plan the provisions ahead of time.
Applying predictive analytics to insurance claim is nothing new, however we are witnessing a transition from classical, static and general data-based models (driver age, driver license age, car type, etc.) to models based on actual driving-behavior (sudden braking and other sorts of unusual driving behavior indicators).
This transition is mainly driven by the emergence of big data frameworks and their ability to manipulate and analyze larger and less structured data sets. This has led some companies to start collecting data related to driving patterns by using the devices installed by insurance companies in the insured person’s car.
While a number of devices exist to monitor and log data provided by the car, there are large disparities in the quality of the data elements available between older and newer models. Therefore, in a given insurer’s portfolio, a large portion of the collected data may contain much less information and the data is very inhomogeneous. Nevertheless, good data mining techniques and engineering make this data exploitable.
This is the challenge GDS Link faced recently when analyzing data from a few insurance companies where different logging devices had been used over a very diverse fleet; the only data fields consistent across all the cars were timestamp, coordinates, speed and distance traveled (odometer data). The challenge of such an analysis resides in the limited number of features available and the frequency difference at which they are collected (from 30 seconds to every few minutes).
Despite this challenge, GDS Link was able to transform the features into a set of attributes defining the profile of the driver. These includes acceleration, routine trips, similar routes, time at which the vehicle is being used, number of trips, driver type (week-end, casual, commuter), route patterns, usual distance traveled and frequency, speeding patterns, etc. These features will help the insurance company to understand the risk level associated with a particular driver.
By combining this data with publicly available data sources we can dive deeper in the analysis of the driver’s behavior and the conditions under which they operated their vehicles. Including weather data such as sunset/sunrise times, the type of road and the amount of rain that might have fallen on a specific route at a specific date brings another risk dimension to an already rich data set.
Starting from a large raw data set of 123 million records containing only four attributes each, the GDS Link Advanced Analytics team constructed rich drivers’ profiles, with more than 1,200 attributes describing their behavior patterns and associated risk. The number and granularity of these attributes allow modelling claim insurance probability with a high degree of confidence.
The key to successfully derive predictive analytics from real life scenarios is often more about the data than the sophistication of the algorithm. It is about being creative, and being able to extract innovative attributes that reveal the complexity of the data. Only then, modelling algorithms will be able to exploit the full potential of the data and deliver the best predictions.
About the author
Florian Lyonnet is Chief Data Scientist at GDS Link. Florian is responsible for heading the analytics activities of the group which are centered around credit risk modelling and model monitoring solutions. The GDS Link Advanced Analytics team has strong expertise in applying standard scoring methods as well as machine learning algorithms to real business problems and delivering impactful models. Florian holds a Ph.D. in Theoretical Particle Physics from Grenoble Alpes University and has spent the last seven years bringing solutions to analytics problems in both academia and the credit industry.