Challenge
Data scientists are often unable to foresee all possible insurance data applications at the outset of a project, and both internal and external privacy restrictions strictly limit data use. A back-and-forth process of determining all possible uses of the data, and evaluating the potential risk of leakage, may require several weeks or even months.
Provinzial’s data scientists used synthetic data to gain quick access to a large insurance data pool to experiment with and develop a use case for insurance predictive analytics.
Customer satisfaction
Provinzial wanted to make the most of its data operations to boost growth and increase customer satisfaction
ML model development & deployment
To train insurance machine learning models, gather insights, and develop predictive analytics use cases, Provinzial's data team needed quick access to a large data pool
Data privacy & regulatory compliance
Provinzial's data team needed a solution that would adhere to internal data privacy policies as well as regulations such as the GDPR
Solution
Data Utility
Simple masking solutions or k-anonymity can increase privacy but at the expense of utility. Because Provinzial's insurance customer data was highly detailed and extremely sensitive, they needed an anonymization solution that would not adversely impact the usefulness of this data.
Synthetic data was a great fit as it maintained the statistical value of the original data, thus retaining the utility. Anonos’ privacy assessment tool,
Anonymeter, wraps multiple evaluators and provides a high-level view of the utility of synthetic dataset without disclosing any of the statistical properties.
Data Privacy
Provinzial's data team was seeking a high privacy-preserving solution to meet the GDPR requirements and the company's internal privacy regulations to obtain approval for the use of sensitive customer data.
Synthetic data ensured a high level of privacy. The process of synthetic data generation completely breaks 1-1 relationships between original and synthetic records, minimizing the chance of re-identification. The Anonos solution added additional layers of privacy to the synthesization mechanisms, such as differential privacy.
Internal workflow
For the Provinzial data science team, it was essential to be able to reduce time-to-data without having to change the internal system. The solution had to also go along with the existing workflow of the data without disruption.
The team established a data architecture using anonymized synthetic data and could perform specific tests without needing original data, resulting in accelerating time-to-data by 4 weeks.
Results
Provinzial used their existing “next best offer” model (a form of personalized marketing; the next best offer model predicts consumers' needs and shows them offers and products based on their habits), to train it on synthetic data and compare the result to the model trained on real data. The Provinzial team performed a three-fold evaluation, focusing on data usability, model usage, and privacy regulations.
Automating privacy evaluations saves 3 months
The synthetic dataset has shown a high level of privacy without any re-identification concerns. Although there were many variables in the dataset, the large volume acted as an additional shield, minimizing the risk of re-identification.
Data usability & granularity preserved
By comparing original and synthetic datasets, Provinzial found that over 80% of the synthetic data was usable. Utilizing Anonos utility evaluations, the team was able to assess the usefulness of the synthetic data and adjust it as needed, saving a month's worth of manual effort.
Optimal effectiveness of model performance
Provinzial's second evaluation phase focused on the model usage - training synthetic data versus real data, where synthetic data reached 97% in performance compared to training on original data.
Usability beyond the initial use case
Synthetic data has proven useful not just for the use case that Provinzial tested it for, but also for other applications, slightly different predictive analytics models, and use cases with minimal adaptations.