Warsaw Presentation
Digital Privacy Variant Twins
Unlock Maximum Data Utility
Lawful Repurposint and Sharing of Data

Presentation Transcript
Gary LaFever Gary LaFever:
[00:11] Thank you very much. I appreciate this opportunity. My Polish is very bad. So, I will speak in English. Thank you for your indulgence. I'm very proud and happy to be here at the request of our partner, Hitachi.
Nasya Bennacer:
[00:27] Dzień dobry! Bonjour! My name is Nasya Bennacer. I'm leading the financial services business unit for Southern Europe and Poland for Hitachi Vantara and Hitachi Company. So, I’m here to present with you the solution of Digital Privacy.
Gary LaFever Gary LaFever:
[00:46] And my name is Gary LaFever. I’m the CEO of Anonos working together with our close partner, Hitachi. If you would like a copy of this document, which by the end of the presentation I hope you want, this is the official document from the EU agency for network and information security, which goes into the benefits of Pseudonymisation, which is the technology that we are working with Hitachi on together. And by the end of this presentation, I hope that you want to LinkedIn with me, and I will send you a copy of this official document. Thank you very much.

[01:18] What are we here to talk about? You've heard a lot about artificial intelligence (AI), analytics, and machine learning. But what if despite all the time, energy, resources, and money that you put into that data you cannot use it lawfully? We are here to talk about the GDPR (General Data Protection Regulation) but in a way that you haven't heard it discussed before. The GDPR forces you to do things differently when you want to repurpose data.

[02:00] The people in this room are not here to post credits and debits. To take checks, you're here to take the information that comes from those transactions and turn them into valuable information through analytics, AI, and machine learning. And that actually requires that you do things differently than you did before. So, this is a story that has a great ending and a use case with Raiffeisen from Austria. So, what have you done to collect your data? What have you done to collate the data? What have you done to combine it? Now comes the exciting part. You have to make use of the data.

[02:40] Before the GDPR, life was similar to the fish on the left hand side. If you had data, you could process. You could do almost anything you wanted to with the data. Under the GDPR, the rules have changed. And if you don't do things differently with new controls, technical and organizational, you're like the fish in the middle. What's wrong with the fish in the middle? They can't get out of their little bag and there's only so much oxygen in those bags. And after a while, they will die. But look at the third column. If you do things differently the way that GDPR requires and rewards, you can actually do everything you could before the GDPR a little differently, but even more. So, if you're looking to share data, combine data, analyze data, process AI and machine learning on that data, this is what you want to do.

[03:42] This is what surprises people. These are recent actions by Data Protection Authorities, and they're shocking. The Hellenic DPA said that PwC, a major company, cannot process data on its own employees using consent. Even more shocking on the bottom left, the Dutch DPA has told banks that they can't use their own customer data to market to their customers.
Nasya Bennacer:
[04:17] So, Gary, hold on. That means that a bank cannot use the data they have collected about their own customers to do marketing?
Gary LaFever Gary LaFever:
[04:27] Not if they're using consent. The rules for consent and contract that we all relied on in the past have changed, and the new way you continue to do that is through technical controls and that's called legitimate interest process, and that's what Digital Privacy enables by leveraging GDPR Pseudonymisation.

[04:52] So, without GDPR Pseudonymisation, you are like a rat in a wheel. You can see the data. But no matter how fast you run, you can't get to the data. Again, consent, contract, and anonymisation do not work the way they used to work. And so, if you try to process data the way you did in the past, this is you in the wheel. You will never reach your data. That data value is seeping out. But you can solve this problem.

[05:32] This is what we hear from our customers. Banking is all about risk and risk management. If you rely on consent, contract, or anonymisation, there's an unacceptable risk that the data that you possess, that you can technologically process will be unlawful to process using those as the reasons for the process, and legitimate interest is not merely saying I have a legitimate interest unless you have controls that protect the data when in use. This is not encryption. This is new technology that controls the data while in use. Pseudonymisation is the magic word. It is hard to say. It is hard to pronounce. It's worth learning.

[06:25] What do you get when you can pseudonymise data? Digital Privacy from Hitachi, which uses Anonos technology, creates what we call variants things. We'll describe what a variant twin is. But a variant twin is a sustainable state of the art data asset that you as a bank or financial institution own and you have the best of all worlds, the full utility of source data, the protection of anonymised data, and what really matters is the controlled relinkability of pseudonymised data.

[06:57] If a data subject is an individual, a digital twin is simply a digital representation of that individual. Ones and zeros. Data about the customer. But variant twins are different versions of the data that only deliver the minimum necessary data and information for an intended process, and that ability to granularly control the data that you are provided is what enables ongoing AI, machine learning, and analytics. And in fact allows you to have greater opportunities to share and combine data.

[07:35] A simple example. A digital twin would give you all the information you have about this individual - where they live, age, how much they make, the different loans that they may have or credit facilities. But two different uses of that data may give you a subset of that data. So, a variant twin is a subset of the digital twin that is lawful to use because of detailed controls while the data is in process.
Nasya Bennacer:
[08:06] So, Gary, just a question. If you use a variant twin, for example, is there any way to re-identify the person that is represented by this variant twin?
Gary LaFever Gary LaFever:
[08:18] I learned a long time ago from my friends in security to never say never. But mathematically and statistically, it is very, very difficult. And believe it or not, the GDPR does not deal in absolutes. It's a risk-based statute. And if you can show that you have reasonable safeguards in place technical and organizational to reduce the risk of the data subject, your legitimate interest in the business processing actually prevails. So, you have dramatically reduced any risks to the data subjects.

[08:49] This is a rainbow. We all like rainbows, right? And what this rainbow shows is I can take data up on the top left. It’s subject to all kinds of restrictions. The GDPR is just one restriction. If you're trying to move data across countries, you have data sovereignty restrictions. You can have banking restrictions. You can have contractual restrictions. So all those restrictions up top and then you have all the different sources of data. By using variant twins and Pseudonymisation, you can actually create just the data that you need for each use. And by doing that and by having those controls, you actually have greater use today.

[09:30] So, your IT department is going to tell you: “Well, we already have all that. We have everything we need already.” The problem is the technologies that banks typically have in that egg are maybe encryption. Those were invented decades ago, and they do what they do well, but they're intended to protect data. They were never designed and they are not architected to allow you to protect the data as you use it on a distributed basis. As you share and combine and put it into the cloud, they break down. But Pseudonymisation as now newly defined under the GDPR, actually takes you up right there where you're both protecting the data as well as maximizing data utility. And that's where you want to go on this journey.

[10:23] So, what is Pseudonymisation? Pseudonymisation has existed before in different member state laws, but never at the EU level. So, the definition of Pseudonymisation under the GDPR is new for the first time ever and it basically requires that you can show using technical controls that you have separated identity from information value. If I don't need to know the identity of somebody, just provide the minimum information value necessary to satisfy the process. It's okay that you can go back and forth over that wall, but you need access to the secret data. It’s called additional information that's kept inside of the courtyard, and you have to control who gets there.

[11:06] But if you can control that and show that you have auditable procedures and technical controls, you can continue to make use of the data and even greater use of the data. These are expressed in the statute benefits of Pseudonymisation. We're not talking about workarounds or loopholes or trying to get around something. The law identifies why Pseudonymisation is rewarded, and it's because you took the time and the energy to protect the rights of the data subjects if you get these rewards.

[11:38] We've talked about privacy enhancing techniques. Your IT department already has some of those. They by themselves are not enough. So, then you add GDPR Pseudonymisation. The Digital Privacy adds two more elements. The first, which we'll touch upon very quickly, are controlled relinkable dynamic de-identifiers. It's a lot of words. But it controls when the linking occurs and when it doesn't. And the second is the re-identification risk management. And when you deliver all of those, you get all the benefits of variant twins.

[12:08] This difficult word - controlled relinkable dynamic de-identifiers - what does it mean? Why do you need it? Typically, your identity is replaced with a token. But it's replaced with the same token in different datasets. And what happens is when you combine those datasets, you can re-identify an individual. In the US, there are millions of people. If I know a person's birthday, their gender, and their zip code, and you replace their name with the same token, I can identify 87% of US citizens by name.

[12:52] So, this ability to actually combine datasets where people's identity has been replaced with the same token is a fiction. It looks safe but it's not when you combine datasets. And what is AI about? It’s feeding algorithm with multiple datasets. So, what do you do? You actually assign different tokens to the same person. So, if and only if you have permission to know, the relationship isn't revealed. The data is still fully accurate. So, that's the first thing which is dynamic de-identifiers, which actually change between and within datasets.

[13:27] The second part is indirect identifiers. You'll notice I didn't say the name. I said birthdate, gender, and zip code. If I can combine those three datasets, I know who people are. So, you have to not only change the identifiers, you need to protect both direct and indirect identifiers. That's what we mean by controlled relinkable dynamic de-identifiers.
Nasya Bennacer:
[13:49] So, I have a question, Gary, here. That means that if someone is able to steal the data, dynamic de-identifiers and other solutions you are explaining that means that all the data cannot be used by anybody. Nobody will understand anything about the data. Correct?
Gary LaFever Gary LaFever:
[14:07] This is correct. Actually, under the GDPR, if data has been pseudonymised and there's a data breach, you may not even have an obligation to make anyone aware of it because there's no risk or there’s little risk.

[14:18] Re-identification risk management. This is a technique called k-anonymity where as you will see the data is input into the system. But if you fail to meet the k-anonymity tests, that's the red circle, the data is not released. It's only data that satisfies these k-anonymity tests to ensure that people can’t be re-identified that it creates variant twins that have been available for processing. All this automates this in the background so that data users get data that's lawful, and the data has embedded within it the policies that the privacy and legal people have approved. So, it makes these operations highly scalable, not just lawful. This is about maximizing the innovative use and credibility and utility of that data.
Nasya Bennacer:
[15:03] So, the variant twins you’ve published here are compliant to the GDPR?
Gary LaFever Gary LaFever:
[17:07] Correct.
Nasya Bennacer:
[17:08] They do not allow to recognize someone or anybody in identification, and we can use the data for secondary purposes?
Gary LaFever Gary LaFever:
[15:16] And this is all about secondary processing. Yes, this is not about primary processing. This is analytics, AI, repurposing, and sharing, and you can create different levels of variant twins. But you'll see what's happened is that top right where you have maximum utility and maximum protection, you get the best of all worlds. Again, this is not a silver bullet. It's not a golden shield, but it scales the expertise within your organization as to the protections that are necessary to maximize availability and the value of the data. The two primary use cases that our clients use us for are repurposing, cross sell, upsell opportunities, marketing to your customers, and also data sharing and combining both within and between different subsidiaries within a bank and between unrelated legal entities in a way that's lawful.

[16:05] This is just very quickly showing the trend from what we used to market to people by segment-based marketing. If you wanted to reach a certain type of person, you would put an ad in a magazine that they would read or a TV show that they would watch. And in the middle, we’ve gotten really good about marketing to individuals. But the marketing to individuals is what gets us in trouble in the GDPR. This new approach, which is called micro-segmentation, where you're using variant twins to reach out to people who meet your requirements and needs is what's permissible under the GDPR.

[16:37] And this is starting to get into the use case for Raiffeisen. Mr. Sandberger was unfortunately not able to join us today, but we have a picture of him. And so, this is what they used to do that they can't do anymore. They want to check the credit of a potential customer. It used to be you could exchange data with another party, find out if you had common customers, and find out what the experience had been with the common customers. That now violates individual's rights under the GDPR.

[17:06] The next slide shows how you actually create variant twins. And here's the difference, instead of asking if you have the same customer that I do, I asked: “Do you have customers with these attributes and these characteristics? And if so, do you have enough people that fit those characteristics that I won't know who they are, but I will learn from you that you've had good favorable or unfavorable experiences with those types of people.” So, I create and agree upon a variant twin schema. I exchanged information between the two parties, and you can augment your data with this lawful data to have more information.

[17:45] And most importantly, this slide stands to the fact that you do not lose utility or accuracy of the data.
Nasya Bennacer:
[17:54] So, in other words, that means that you can exchange data related to individuals with your ecosystems for example where with each ecosystem you can share your data with other companies of your ecosystems without losing the compliance to GDPR?
Gary LaFever Gary LaFever:
[18:14] Correct. Again, because the controls are embedded in the data. You're technologically enforcing those controls so you'd have maximum use. This is Mr. Sandberger if you want to tell us where this is from.
Nasya Bennacer:
[18:25] So, Mr. Sandberger is the lead person of the Raiffeisen Bank in Austria. We are working with them with Mr. LaFever and some other people of Anonos to provide them a solution that leads or helps this bank to exchange data relating to individuals with their own ecosystems. So, I think this is the use case. Maybe we can say more.
Gary LaFever Gary LaFever:
[18:52] Absolutely. So, again, being able to augment your data with affiliate’s data and with third party data in a way that respects and honors the individual rights of the data subjects, but augments the value and accuracy of your datasets in a way that brings more value and more data to feed those algorithms. So, it's a true win-win-win situation.
Nasya Bennacer:
[19:15] Another question. Imagine I am an international bank. Can I use the solution to exchange data relating to my own customers with other subsidiaries overseas in other countries?
Gary LaFever Gary LaFever:
[19:27] Yes, you can actually create variant twins that meet the requirements of the data localization and data sovereignty laws within each country, collect globally trained algos using that data, and then bring the trained data into each of the individual countries, and then add the identifying data back so you've never exported the identifying data. So, it's always maximizing the utility of data by embedding the controls in the data.

[19:53] And here is the slide that was behind the prior one. But the point is, we look very much forward to working with banks as we had with Raiffeisen. With the combined capabilities of Hitachi and Anonos and the bank itself allow the bank to meet its business objectives.

[20:09] This creates a new data asset for you that you have rights to, that is lawful, allows you to continue to innovate in a way that gives you the benefit of all worlds. You maintain 100% value of the source data, you get the privacy capabilities of anonymised data, and you get the relinkability of Pseudonymised data.

[20:28] A day before yesterday I was in Berlin. It was a meeting of experts on Pseudonymisation. And this report, believe it or not, talks as much about the business benefits of Pseudonymisation as it does in technical legal requirements. So, anyone who's interested in getting a copy of this report, please link out to me through LinkedIn and happy to give it to you.