Predictive Analytics on Synthetic Data Case Study

Over 80%

usability achieved while maintaining anonymity

97%

effectiveness achieved by the predictive analytics model

3 months

saved in evaluating data privacy risks

Case Study Summary

The best use of your data can only be achieved with excellent data management and protection. Insurance customer data is sensitive and cannot be freely shared between departments or external partners, slowing down data analysis efforts.

Through testing Anonos Data Embassy software, the data science team at Provinzial, the second largest public insurance group in Germany, aimed to revamp the way they put their customer data to work.

Due to the challenges of sensitive insurance customer data processing and the need to work with data faster in a competitive market, Provinzial sought out advanced data anonymization solutions.

Provinzial used synthetic insurance data for 'next best offer', a form of predictive analytics in insurance, to identify the needs of over a million customers.

Through this project, the team:

Streamlined the data usage approval process;
Achieved over 80% usability of synthetic data while maintaining data anonymity;
Trained a machine learning model on synthetic insurance data and achieved 97% in performance effectiveness;
Reduced time-to-data by 4 weeks without having to adjust the internal data sharing workflow;
Saved up to 3 months in evaluating data privacy risks.

"Anonos data protection software helped us conduct predictive analytics and test our hypotheses while keeping customer data secure. We have found it to be a useful solution for our data science team to simplify data access and focus on our data projects, machine learning model optimizations, and testing new ideas."

Dr. Sören Erdweg, Artificial Intelligence & Data Development at Provinzial

Challenge

Data scientists are often unable to foresee all possible insurance data applications at the outset of a project, and both internal and external privacy restrictions strictly limit data use. A back-and-forth process of determining all possible uses of the data, and evaluating the potential risk of leakage, may require several weeks or even months.

Provinzial’s data scientists used synthetic data to gain quick access to a large insurance data pool to experiment with and develop a use case for insurance predictive analytics.

Customer satisfaction Provinzial wanted to make the most of its data operations to boost growth and increase customer satisfaction

ML model development & deployment To train insurance machine learning models, gather insights, and develop predictive analytics use cases, Provinzial's data team needed quick access to a large data pool

Data privacy & regulatory compliance Provinzial's data team needed a solution that would adhere to internal data privacy policies as well as regulations such as the GDPR

Solution

Data Utility

Simple masking solutions or k-anonymity can increase privacy but at the expense of utility. Because Provinzial's insurance customer data was highly detailed and extremely sensitive, they needed an anonymization solution that would not adversely impact the usefulness of this data.

Synthetic data was a great fit as it maintained the statistical value of the original data, thus retaining the utility. Anonos’ privacy assessment tool, Anonymeter, wraps multiple evaluators and provides a high-level view of the utility of synthetic dataset without disclosing any of the statistical properties.

Data Privacy

Provinzial's data team was seeking a high privacy-preserving solution to meet the GDPR requirements and the company's internal privacy regulations to obtain approval for the use of sensitive customer data.

Synthetic data ensured a high level of privacy. The process of synthetic data generation completely breaks 1-1 relationships between original and synthetic records, minimizing the chance of re-identification. The Anonos solution added additional layers of privacy to the synthesization mechanisms, such as differential privacy. ‍

Internal workflow

For the Provinzial data science team, it was essential to be able to reduce time-to-data without having to change the internal system. The solution had to also go along with the existing workflow of the data without disruption.

The team established a data architecture using anonymized synthetic data and could perform specific tests without needing original data, resulting in accelerating time-to-data by 4 weeks.

Results

Provinzial used their existing “next best offer” model (a form of personalized marketing; the next best offer model predicts consumers' needs and shows them offers and products based on their habits), to train it on synthetic data and compare the result to the model trained on real data. The Provinzial team performed a three-fold evaluation, focusing on data usability, model usage, and privacy regulations.

Automating privacy evaluations saves 3 months The synthetic dataset has shown a high level of privacy without any re-identification concerns. Although there were many variables in the dataset, the large volume acted as an additional shield, minimizing the risk of re-identification.

Data usability & granularity preserved By comparing original and synthetic datasets, Provinzial found that over 80% of the synthetic data was usable. Utilizing Anonos utility evaluations, the team was able to assess the usefulness of the synthetic data and adjust it as needed, saving a month's worth of manual effort.

Optimal effectiveness of model performance Provinzial's second evaluation phase focused on the model usage - training synthetic data versus real data, where synthetic data reached 97% in performance compared to training on original data.

Usability beyond the initial use case Synthetic data has proven useful not just for the use case that Provinzial tested it for, but also for other applications, slightly different predictive analytics models, and use cases with minimal adaptations.

LET’S TALK ABOUT YOUR USE CASE

Get this case study into your inbox

Provinzial Successfully Conducts Predictive Analytics on Synthetic Insurance Data

Identifying the needs of over a million customers while preserving the highest standard of privacy