Anonos | IAPP London Presentation Video Replay

Don’t Lose Access to Analytics Under the GDPR

Presentation Transcript

Gary LaFever

CEO at Anonos

Dr. Alison Knight

Research Fellow, Senior Legal Advisor, Data Protection & Governance, University of Southampton

IAPP Europe Data Protection Intensive 2018

Gary LaFever (Anonos)

[00:00:08] Thank you for coming to “Just Because You’re GDPR Compliant Does Not Mean You Can Use Your Data.” And I’d first like to start off by saying we're very lucky to have with us Dr. Allison Knight, PhD. Not only did she do her doctorate in the subject of what we're talking about, which is the benefits of dynamism in data protection over static approaches, but she also writes and speaks often on this. And so, I think we can hopefully have a very lively Q&A after the presentation, so please feel free to do so.

[00:00:37] I’d like to do something for a moment and pretend that it's 8 weeks into the future. What's behind this? May 25th. You've successfully completed your Data Protection Impact Assessments. You've installed your consent management system and you got it all down how you’re going to comply with the 72-hour breach notification. You're sitting at your desk, you're enjoying that first cup of coffee or tea, and your Chief Data Officer comes on and knocks on your door with a very simple question: “Can I still use my data? Not for transactions, but the future of this company is premised on analytics, artificial intelligence, and machine learning. Can I still do that?” And a focus of a lot of compliance efforts - I'm not saying those in the room - are on compliance keeping the organization out of trouble, but the reality is the GDPR is much different from Y2K. It's not over on May 26th. Life is just starting on May 26th. And so, the topic of this panel and this interactive Q&A - you guys are the stars - is how do you keep doing business after the GDPR goes into effect, principally when it comes to analytics, artificial intelligence, and machine learning.

[00:02:11] So, with that as a background, I don't know if you noticed, but 2 days ago the Working Party 29 actually issued its final guidance on consent. The description of this panel talks about the draft guidance, not knowing obviously when we submitted for this that the final would be out 2 days earlier. But there's a couple of things if you haven't read it. And if you haven't read it closely, you need to pay attention today. The last page and the last paragraph, in my view, is the most important paragraph of that guidance. It says: “We acknowledge that we're changing the rules on you, and you better check to see that the legal basis that you used to collect data up through May 25th is still valid. Because if it was broad based consent, you do not have the legal right possibly and consult your counsel to possess or process that data for analytics, AI, and machine learning.” And so, if historical data is important to your organization, read that last paragraph on page 31 because believe it or not, the Working Party actually provides you an out. They say you have a one-time opportunity to transform your data and support another legal basis. So, if nothing else, you can get it online. Again, it came out on the 16th. You can read the whole thing, but I suggest you focus on the last paragraph because it's very, very important. So, with that, I would like you to help me welcome Dr. Allison Knight as our first speaker. Thank you.

Data-Driven General Analysis and the GDPR Friends or Foes?

Dr. Allison Knight (University of Southampton)

[00:03:48] Hi everybody. I am a researcher. So, I work at the University of Southampton, but more than that I'm also a pragmatist. I'm someone who works on the ground. I do a DPO role. I work in legal as well. And the thing is as a University, what we really want to do is we want to get use out of our data. We want to maximize data use. That's the real driver. But we also want to protect people's rights. And so, I just want to give you about a 10 to 15-minute presentation about a new paper that I've written, but it's really got a practical bend to it, which is how can we maximize value in a way that the GDPR says that's fine? How can we come in tune? How can we use its enablers to maximize our utility while protecting people's rights?

[00:04:33] Now, I also speak as a researcher because I've been involved in the Data Pitch Project. Please do look this up online. This is an example of when it's really key to get these balances right because this snips on open innovation. So, what's happening here is we've got data providers around Europe providing their datasets and then giving them to SMEs and startups who want to do great things with data. You know, there's a real potential for people to do great things, but you've got to put in the right checks and balances and that's been one of the studies that we've been doing and we’re 1 year in now, to make sure how can we meet those challenges of the GDPR and actually see it as an opportunity? So, that gives a bit of background to where I'm coming from.

There's no doubt the huge potential that creative use of data could have.

[00:05:16] And of course, this has been mirrored by the very language of our commissioner herself. Recently, she said in a press release: “There's no doubt the huge potential that creative use of data could have.” So, she's saying this is a real possibility, but she's saying that the price of innovation - data innovation with analytics - does not need to be the erosion of fundamental privacy rights. So, she's effectively saying that there is a balance and it can be met, but caution to those who think that they can go ahead and not preserve these privacy rights. Now, this was in the context of the Google DeepMind Royal Free Trust case. Have people heard about that case, perhaps? So, that was a case where you've got an NHS Trust. It has got lots of data, it wants to do innovative things with this data to give it to Google DeepMind who are producing an app to help people with kidney disease to diagnose and to help them. So, it's really, really a beneficial purpose.

[00:06:16] But the ICO said in this case there was a need for a fine because they had relied upon the wrong lawful basis. They relied upon implied consent. They hadn't been transparent about what they were going to do. So, we're talking about not just the data collection but the secondary reuse of that data and they haven't carried out a privacy impact assessment at the right time. They haven't been on top of their game, and this is particularly important because that was during a current DPA regime and now we're moving into the GDPR regime.

Two Different Approaches to Data Protection

[00:06:49] So, how can we get it right? How can we avoid those mistakes that were made? Well, I want to take you back to the two different approaches to data protection. This really sort of will help you understand where I'm coming from and where Gary’s coming from. When we think about the Data Protection Directive and the Data Protection Act, there's been a real temptation for people to see it as being very static. They talk about personal data as if it's out there in bits of information. They talk about the fact that you can anonymize and then you can forget about it because you can effectively get rid of re-identification risk. They talk about it as if you’re doing data protection that you can apply and then you can forget. You can have your data protection manual. You put it up on your shelf. After 25th of May, you can forget about it. You’re GDPR compliant and data protection compliant.

[00:07:37] Now, of course, the General Data Protection Regulation has been in place now. We've had 2 years to get ready, and we can see from those people who have read the texts it's fundamentally different. It's saying we all live with risk. We have to assess that risk. It's transitive. So, we have to get comfortable with that, but we need to mitigate that risk to a level which is acceptable and the GDPR gives us those tools that we need to do that. It's dynamic. It says it's fine. You need to re-assess. It’s giving us these tools to see that we have a data cycle, which is not static at one point, but it's fluid and transitive and the law enables us to do this.

[00:08:16] So, this is really my three-point test for how you illustrate this dynamic and adaptive approach, which we're going to talk about today. So, under this approach, actually dynamism is our friend. It's a great thing because it shows us that it can coexist. We can have data innovation and protection in the same field. What we need to do, of course, is whenever we approach any secondary data use and further data use, we have to think of course about who might we be sharing that data with? For what purpose? How are we going to share it with? This is the who, what, why and how of data access.

[00:08:56] But really, the key here is that we're going to combine that with risk mitigation. So, risk is out there, but risk mitigation is what sits underneath this, which brings us to a GDPR compliant level of equilibrium and my focus is on three things. First of all, dynamic purpose preservation - and I’ll talk about that in a minute - assuring that we've got data purpose, which is roughly consistent over time. Protection adaptation - so, this is what I'm talking about. Risk mitigation is the key and how we can change those mitigatory measures as we go along and we can use this as actually a plus. It's a good thing. We can sit comfortably with this. And also, data quality management. Whenever we talk about analytics, we need to make sure that we don't have bias in our data and this is really important. And then, the third step is how do we bring this together? And this is actually how Gary and I met because we were coming at it from completely different angles and we had a meeting of the minds realizing we're right about the same thing. It’s called Dynamic Pseudonymisation.

[00:09:58] So, some of you may have heard about this new definition of Pseudonymisation in the GDPR. Anyone here? Okay. So, there's a technical definition since Pseudonymisation means you're taking out direct identifiers and you're replacing them with a pseudonym. Now, there's been a lot of uproar or concern because people aren’t clear what the GDPR says and it’s if you Pseudonymise your data and you can still single out people from that data, you can't assume that it's personal data. You have to do more. There’s a legal presumption it's personal data until you do more. However, with Dynamic Pseudonymisation, you're fluctuating your pseudonyms as they go along, and Gary’s going to speak more about this. It’s a way where you add that super++ to Pseudonymisation. It's controller linkability. You’re setting the parameters. You're making sure that your data analytics happens in the way that you want it to in your controlled sandbox. And in that way, as I'm going to explain, you can ensure not only fair and lawful processing, but also that you are doing things in a transparent way and what's really important is you can show accountability. You can show a paper trail that you’re doing these things right.

[00:11:15] So, I guess the key message is that: Yes, data controllers CAN engage in GDPR compliant analytics. They can be friends, but you have to make sure that you know the right way to go to the path ahead. We need to pilot. We need to steer.

[00:11:34] So, let me just take you back one step. For the people who were actually in the session just before this, there was a distinction about consent and legitimate interests, but let's put that into an analytics perspective to really focus in on it. So, consent we've all heard it's got to be freely given and freely withdrawable. It has got to be specific. It has got to be informed. Now, if you're carrying out generalized data analytics, you want to create real value from your data, but you're not quite sure what you're going to find. You can get yourself into real problems if you rely upon consent. It's impractical. It's what I call when I talk at the University as creating a rod for your own back or creating a straight jacket. Not only do you have to be very specific about what you’re going to do, which is almost impossible as you carry on as you don't know what's going to happen, but also it does that you've got to keep a record of this. I mean, you're creating a nightmare for yourself.

[00:12:33] And of course, if we're talking about secondary use, if the data that you've gathered was originally gathered under consent, it'd be very difficult then to use that data because effectively you said that person has agreed to consent. How do I do more than that because I didn't think of that when I was actually drawing up the initial consent template for them to sign? So, you can get yourself into a real mix and effectively the only way you can get out it is asking them to re-consent. You know, this is not the way that businesses want to run. We want to get away from consent and all the rigmarole that that entails, but we need to do it the right way. As I've said, it's the specificity restriction under Article 4 (11) the GDPR. This is what the Guidelines of the Working Party and commentators have come out and say: “No, this is really difficult.” You can’t. There's no longer that era that you can say, “We want to use your personal data for general analytics purposes” and that's sufficient to inform them and get their consent.

The Future of Analysis: Legitimate Interest!

[00:13:33] So, let's have a look at legitimate interest. By now, you've probably heard about the six lawful bases. I'm not going to go into huge amounts of details. But for those people in the last session, we’re going to take it to stage two. How do we use this actually in a proactive, competitive, and advantageous way? How do we use legitimate interests? So, to remind you of the three steps: First of all, you must have that legitimate interest. That interest could be something that's good for your company. It could actually be an interest of the third party in certain circumstances, but that's just the gateway. There's a second aspect here. There must be a necessity. So, actually, it must be very necessary for you to use that personal data to achieve that legitimate interest.

[00:14:16] And the key here is that you've got to have proportionality. How can you embed that proportionality in there relative to achieving that legitimate interest? And then, of course, we've got the balancing test and that it says that you've really got to weigh up those interests of the data controller and the third party and then you've got to think about the interests of the data subjects. How do you weigh them up? And of course, as I’ve talked about risk, I also talked about risk mitigation. It's more than that. How do we ensure, once you've looked at that balance, how can we mitigate the risk of the individual's rights being infringed? And this is where it's really important. We're going beyond just the theoretical framework. We're thinking about: How can I make it easier for my company? What proactive steps can I do? How can I use it as a competitive advantage by getting the right approach right from the start?

[00:15:08] So, the future analysis then. We need to do Data Protection by Design. You’ve probably heard a little about that word - Data Protection by Design and by Default. Embedding good data practices and complying with the GDPR principles actually into the technology. This is absolutely key. This is the lawyer’s agreement in some ways because I can sit there knowing that I'm comfortable that these principles are being respected because of the way that they're built and the people in the organization have thought about it possibly right at the very beginning. If you could have something which can in fact make the purpose and make the fact that you've got to have further processing, it’s got to be for a specified and explicit and legitimate purpose - that’s a purpose limitation rule.

[00:15:55] Now, if we break that down, it’s saying that you do have to have a purpose there, but the most important thing is your purpose can change, but you've got to ensure that there is no incompatibility between your primary and your secondary personal data processing. So, effectively, what it's saying is you need to have some data control. You need to control the links to make sure that things are done in the right way according to the reasonable expectations of people in a fair way. And by doing that, you're actually minimizing the amount of personal data that's being processed. Tick. This is in compliance with the data minimization principle. It's also compliant with security, which makes sure that you're not giving out more data than you need to. There are a great number of ticks if you can go down that route.

[00:16:42] And as I’ve said, you do need to have a specified purpose, but what it says under legitimate interest is that it doesn't need to be specific in the way that has to be with consent. You need to say: “Hey, you're being given a free choice to give your consent to this, but I need to tell you in a very, very granular way what it is that you are consenting to.” Legitimate interests - what it says is that you've got to have it specified. You must be able to tell people roughly what the consequences are, what's the scope of the personal data processing and analytics being carried out, but you don't need to go into the nitty-gritty of how it's being used. So, let me give you an example. So, in our project, we're thinking about how to promote smart cities and how to improve transportation around cities.

[00:17:25] So, let's say that we've data which has been provided by Transport for London. Of course, that relates to people in certain ways. It could be personal data. So, we put together challenges and we put challenges around the amount of data that we're getting in. Perhaps we’re saying, we want to know how to make bike storage and it parks these bikes more efficient across London. And then, we have SMEs - so people will have that data coming across with really great ideas. By that time, we've actually narrowed it down quite well. Now, the subjects, they don't actually have to know how they’re going to carry out the algorithm, but what they do need to know is what we're trying to achieve and that is possible if you scope it down in the right way. That perfectly sits with the GDPR, as long as you can let people know when there are going to be consequences, which detrimentally affect them.

[00:18:15] So in fact, what we're seeing here is that we're actually breaking down the process into different parts. So, just to repeat my point there that it's fine to carry out general analytics without using that specificity that you have, like I said that straight jacket, as long as you can make sure that you have control over the purpose, that compatibility, that reasonable expectation where people give their personal data that it's going to be used in a certain way broadly consistent and fairly between the initial and the tertiary stages of which they are used. And you can make sure that you can stop once specific consequences are done to potentially impact on people. And the way that you need to do this - and we've broken this down into stages - is that we want to look at once we know the great things that we can do with analytics and with data when our SMEs internally have come up with these great things, we need to at that point carry another assessment and say: What are the risks now? Because now I'm breaking down that purpose. We're thinking about a different purpose, and that could actually be something that's detrimental. At that point, it's fine for data protection impact assessment, which is like the legitimate interests not to be one static point in time. This is what I mean about transitory, but actually to re-engage with that process at the time that's appropriate.

[00:19:39] And then, of course, we do have that issue if you’re thinking about compatibility of purpose that when you're carrying out processing for research purposes, which I know as a University organization working for them that that's deemed compatible with the initial purpose. So, what I'm trying to say is that if we can meet the legitimate interest test, we've got a big tick and in fact that goes a lot of the way to meeting our compatibility tests as well, which goes a lot of the good way to doing all the right things and meeting our principals, but let me just break that down a bit.

Dynamic Data Governance for Analysis: Lawful Basis

[00:20:14] So, let's think about lawful basis then. This is what I mean about breaking down into times. We have the stage one - you want to gather the data. It has been gathered by an organization. They could have a lawful basis at that time. Now, you could go down the consent route and as Gary mentioned, in fact, you now have this very short window that the ICO have talked about where if consent is not GDPR consent, you can translate it or a relocate it into legitimate interest, and this is good because we don't want to go down the consent route when you're doing data analytics. And our secondary stage to this is where we're about to pass that data or internally share that data and they're going to do these amazing analytics things for it. We need to think about the lawful basis there and this is when the legitimate interest comes in with the right security, with the right safeguards, with purpose preservation.

[00:21:08] And then of course, we might have a third stage. So, this is the third stage. So, let's say it's been successful. They found some great algorithms and want to use it. Of course, we're saying that's absolutely fine. But if the company then wants to go back and wants to do something to impact so that they’re using a profile within an individual decision making about these individuals, then you need to think of another lawful basis. We’ve moved another step along and sometimes consent will be appropriate. So, we're thinking of the Google Analytics of the world when data has been used for research and that was very illegal by the way, but actually then they're coming in and you should be getting transparent consent. There may be times where legitimate interest is relevant there, but it's about re-assessing that risk and the degree of risk and what's fair to people. Are you going to lose people's trust? Is it within their reasonable expectations?

Dynamic Data Governance for Analysis: Impact Assesments

[00:21:59] And so, it is really important. We can't get away from the fact that we are living in a risk based world. We do have to carry out risk assessments. Not all of them will actually have to be data protection impact assessments. Let’s not think that we have to redo that, but actually we should be within our organizations looking, breaking down, or getting used to data flows. I don’t know if you use data flows. We use it a lot at my organization. But also thinking about, at times, sequential flows as well. And you're thinking at that point an impact assessment is good. It can be an initial step before a formal DPIA. But really at that point when you’re pointing at analytics, you're thinking about things like: Is the quality of my data appropriate? Is it going to be biased? Is it going to affect the results? Am I minimizing that data? Am I going to have the right technology to minimize? And this is where the Pseudonymisation and Dynamic Pseudonymisation comes in. Have I got the right data security making sure that personal data is only accessible by the right people? And then, at this first stage when you may have some consequences, you need to carry out a different type of assessment. So, we shouldn't feel that we need to carry out one and look at it in a very static way. This is part of that good data management cycle.

[00:23:16] In summary, what I'm trying to say is that data control shouldn’t fear analytics in a way that they should, go running for the hills and say: “I can't do this anymore.” But we need to have the right tools. It's not enough just to have the paperwork. I say it's a Holy Trinity. We've got to have organizational. We're halfway there. We're coming to this session today and this conference. We're thinking in the right way. We've also got to have legal. We’ve got to have the right contracts especially when we're sharing data. We've got to make sure that we comply with the law, but we need that technology and that's a feature - Data Protection by Design and Default. And if we do that, we’re halfway there. Legitimate interest becomes much easier to satisfy. Compatibility of purpose becomes much easier to satisfy. Tick, tick, tick. All these different consequences and data minimization. But when we do get to the point where consequences might flow to these people, we shouldn't be afraid at that point to put our hand and then say we need to do another assessment at that point because of the ramifications for those people. That is going to be built to the very way that we do data analytics in the future. But the tools are there to get us quite far along that road we need to have - I don’t talk about GDPR. I talk about robust data governance systems in place in the organization and between organizations as well. If we can have data share and whether we’re going to do analytics in-house, but we have to leverage the tools at our disposal and we need to do it now and we need to ensure that we can keep that purpose preservation, that data security over time to get the results that we want, which is a great data.

Data-Driven General Analysis & the GDPR: Friends or Foes?

Gary LaFever (Anonos)

[00:24:59] So, I want to let you know how I was introduced to Dr. Allison Knight. She and one of her colleagues actually published a paper on Dynamic Pseudonymisation. I forget the actual title but that was the subject. It was the same week that my company - I am the co-founder and CEO of Anonos - published one and we were vying for number two and number three on the Social Science Research Network (SSRN). I said: “I gotta figure out who these people are.” And then, I read their paper and I said: “I really have to figure out who these people are.” So, what I do love about Allison is she's very pragmatic. She's an academic, but she's an academic who knows street smarts and that's a rare combination. So, thank you.

[00:25:38] If Allison touched upon the what and the why, I'm going to give you a little glimpse of that but more of a focus on the how. And so, at Anonos, we spend as much if not actually more time speaking with actual data users and the Chief Data Officers than we do with privacy people. And that's why I started with the story of what happens 8 weeks from now when the CDO walks into your office and says: “Can I still use my data?” So, it's something you have to pay attention to.

[00:26:06] Two things. If you haven't come by, Anonos has a booth. These are not sales documents. They’re educational. This document, we've printed over 5,000 copies. We sent it to all members of the Article 29 Working Party. It’s available without charge. We've presented it to the European Commission, DG Connect Group, and EDPS. We're oftentimes invited to meet with the DPAs. On Monday, we were in Paris meeting with the CNIL. We literally get thank you notes and emails. Why? Because it walks through with the help of outside legal experts and technical experts, what we call the six legal safe havens. And you want to know what those are? I'll touch upon them in my presentation. It's a mapping through the recitals and the articles of the GDPR to do six fundamental business processes that are imperative to analytics and AI. And then secondly, a number of the different analysts groups - this is from IDC. Gartner’s is coming out. Forrester’s is coming out and they're all saying: “You know what? Traditional approaches to data protection do not work for analytics and AI. New approaches are necessary.” And so, both of these are available if you want to come by our booth. Please do.

[00:27:17] All right. So, Dynamic Pseudonymisation. There's a reason I'm putting dynamic in front of Pseudonymisation. The new definition of Pseudonymization under the GDPR is different than any other definition that has been seen before. It does not define it as Dynamic Pseudonymisation, but hopefully after this presentation you'll see that using the word “Pseudonymisation” by itself causes people to get lazy and they think of something that I refer to a static tokenization, which is what passed for Pseudonymisation in the past, but it no longer does.

[00:27:50] So, this is by far our most popular slide and you will see yourself in this slide and you can pick one side or another. When we talk to the Chief Data Officers, they’re in the green. When we talk to the Chief Privacy Officers and the compliance people, they're in the blue. Here's the problem with this picture. It is a zero sum gain. Any gain on behalf of the data users is at the cost and expense of the data protectors and any gain and increase of the size of the data protectors is at the cost of the data users. This didn't work well before analytics. It doesn't work at all with analytics. There's too much dynamic activity involved.

Pseudonymisation & Data Protection by Default Enable BigPrivacy

[00:28:32] But, as Allison said, if you embrace the concept of pseudonymization as defined under the GDPR, if you embrace Data Protection by Design and by Default - and I want to emphasize Data Protection by Design and by Default is more than just privacy by design. It is the penultimate example of it where when you go to make use of data, you provide only the minimum identifiable necessary for an authorized purpose and no more. The issue is, traditionally, people have provided all identifying data and they've been asked to sign a data use agreement or they've been known about a paper policy that they're not supposed to make improper use of it. Ask Facebook how that worked. It doesn't work anymore. So, technological enforcement of policies is what Data Protection by Design and by Default is all about. But if done correctly, you actually give the data users even greater use.

[00:29:33] In fact, we often have customers tell us the GDPR is an innovation inhibitor and a disabler. And we say to the contrary, it's an enabler. You can do anything that you can do today under the GDPR. You have to do it differently, but if you do it differently and put the technical and organizational safeguards in place, you can do anything. Again, you have to think smart and think GDPR.

[00:29:59] This is an example. Late last month, we were actually invited to go out to Luxembourg and meet with DG Connect. DG Connect has an interesting role within the European Commission. On the one hand, they're actually the ones that are responsible for the digital single market. How do you maximize innovation? And on the other hand, they're responsible for the GDPR. They got a copy of this book and they said: “Can you come to Luxembourg and tell us how you do it?” It's possible, as Allison said, if you have the technologies that make it so.

[00:30:33] All right. Here are some excerpts from the IDC report as well, and it's important to note this is about data stewardship. We have been working on our technology, BigPrivacy, for 6 years. Do the math. The GDPR was not even being discussed in parliament at that point in time. Our focus, when we started 6 years ago, was data stewardship best practices. And look, the GDPR is not perfect, but we think it is the best and most dynamic and most progressive data protection law that there is, and there's going to be a lot more like it. And so, it is the way to exercise data stewardship with technical controls.

[00:31:12] So, I'm going to go through three industry trends, each of them pretty simple, but you'll realize there are some pretty strong conflicts and currents going on. The first, clearly, particularly in light of recent scandals and other activities, companies that can show and demonstrate that they're protecting their customers data and their partner's data have a better market position, stock value, etc. Conversely, those who don't do a good job suffer.

TECHNOLOGY: Moving towards decentralized/distributed processing.

[00:31:41] But here's where things get interesting. Current technological trends are all about spreading out. It's about decentralized processing, distributed processing, to the maximum degree. That's what analytics is about. That's what blockchain is about. It’s even called the distributed ledger. The cloud is about that. IoT is about that. It's all about massive distributed/decentralized processing. What's wrong with that?

COMPLIANCE: Moving towards centralized/individualized concepts of privacy from government legal mandates and individual awareness.

[00:32:10] What's the opposite? Compliance. Compliance requires centralized. Under the GDPR, it's the most centralized you can imagine where ultimately the data subject is the one who's supposed to be exercising control or at least the data controller on their behalf. So, you're not relying on consent, which is clearly at the data subject level and you're relying on legitimate interest. The data controller has to show that they satisfy those requirements and then technologically enforce those and then the data processors have an obligation to the data controller.

Business + Technology + Compliance Trends

[00:32:46] And so, on the one hand from a technical perspective, everything is going out to the edges. And on a controlled perspective, everything's coming in. Here's the problem. Traditional approaches to data protection are static. They literally require: “Bring me all the data you're going to use. Explain to me all the purposes you're going to put the data to, and I will apply my magic privacy-enhancing techniques to that data and I will give you an output file that is safe, so long as you don't add any data to it and you don't have a new purpose.” That helped me for a moment in time. What happens to the next moment and the moment after that? You have to rerun those privacy-enhancing techniques again and again and again. It doesn't scale, and it doesn't support multiple uses. And the reality is, for true interactive analytics, AI, and machine learning, you don't have the legal right anymore.

[00:33:45] You can't rely on consent. You can't rely on contract. So, you have to tell the data subject at the point of data collection that you are relying on legitimate interest and describe to them with specified purpose - not specific and there’s a difference - what you're going to be doing with it. So, the reality is you can continue to do everything you've done in the past, but you have to do it differently and it requires technology and it will support just about any desire to legitimate processing. And by the way, it’s scalable. It supports the technology that's going out in a decentralized way.

[00:34:20] We had a blog recently. This one was really popular. It was the EU teaching Uncle Sam how to do privacy. There's no surprise here, but recent events have actually shown that the GDPR is in fact the state of the art and the state of the art, again, when it comes to analytics and artificial intelligence is all about Pseudonymisation and Data Protection by Design and by Default.

[00:34:20] And you can bring these two in concert, so you can exercise centralized controls over decentralized and distributed processing.

BigPrivacy vs. Traditional Data Protection

[00:34:52] Here's another thing. Traditional data protection starts with all the data being vulnerable and then you apply protective techniques to get it down to a reasonable level. Well, Data Protection by Design and by Default is literally reversing that paradigm. It is saying: “As a default, start with all your data protected so your vulnerability should be near zero. And then, to the extent you have an authorized use, expand the scope of use or visibility or transparency, but no further than you had to.” Had Cambridge Analytica only been provided cohort level data, which is all they needed for their purpose, none of this would have happened. And yet, so many companies and so many organizations provide fully identifiable data for purposes that don't require it. And again, that doesn't scale. It breaks down.

Benefits of GDPR Compliant Pseudonymisation

[00:35:46] All right. This is going to be a taxonomy lesson because I can't tell you how many times people say these words incorrectly. Anonymisation does not mean obscuring data. Anonymisation as defined and interpreted under EU Data Protection Law means there is no means to re-link and that's the problem. If I had a resulting dataset in which there's no re-linkage, what happens when I add data? What happens when I put the same data to a different purpose? It starts to get more and more data collected. And so, you have correlations and linkages and all of a sudden you can re-identify. So, these terms are actually very important. Anonymisation can be very powerful. It can also be very dangerous. If you're relying on anonymisation so that you're not under the jurisdiction of the GDPR congratulations, as long as you're right. The second you're wrong, you won't have the safeguards in place. It's a cliff's edge.

[00:36:45] Generalisation and differential privacy - it has very valuable purposes. But by definition, it does not support re-linking. It does not support combining different datasets. Actually, the closer you get to identifiability to re-linkability, it's going to insert more noise. It's literally built not to enable you to do that. So, when you really think of what analytics, AI, and machine learning are about, it's actually about controlled re-linking. So, they're not going to meet your purpose. I have greyed out one because this is what people often call Pseudonymisation, but I call it static tokenization. And it doesn't work to satisfy the requirements for GDPR, but Pseudonymisation as defined under the GDPR is very powerful and very viable.

Dynamic Pseudonymisation = State of the Art

[00:37:32] This is my favorite slide because of blinks and I'm easily amused, but this slide actually shows something that's highly relevant. The dark grey color is protected data that doesn't have high utility and the vibrant colors whether it's the blue and the green are data you can use. Static approaches to data protection make you make a choice. Do you want to protect it? Or do you want to use it? The problem is, look what happens when you use it. Every single one of those cells is either gray or it's blue. What if my purpose only desires one or two of those cells? Revealing them all actually exposes me to liability. As opposed to on the other side of the screen, you'll see the different cells are either dull, gray, or green at different times. Why? Because different purposes require different data elements. Again, with dynamism, you can support this. This is all doable. And one of the biggest lessons here is that traditional approaches to data protection - if this is identifiable data and this is information value, they treat them one in the same. You get one, you get the other, and there's no separation. Whereas, if you can separate information value from re-identifiability, you actually get more data uses. That's why I'm saying that in our opinion the GDPR is an innovation enabler.

[00:38:58] So, the name of our product is BigPrivacy. It does something that Data Protection By Design has intended to do that has never been required before. You have all these applications in the blue. They've existed for decades. Don't be surprised they don't comply with the GDPR. They weren't meant to. They weren't designed to. They weren't architected to. We have all these data uses in the gold. No one knew when they designed those data uses that you would need a transformation layer and that's really what the GDPR is asking for.

[00:39:26] And we do that with something that we call Variant Twins, which I want to just very briefly explain. So, the digital twin - Gartner said that's one of the top five most powerful concepts out there - is a digital representation of a person, place, sensor, or device. A Variant Twin is that same digital twin less any unnecessary, unauthorized identifying data. It's the minimal data that you need for legitimate purposes.

[00:39:53] So, here are the benefits of Variant Twins. First, you separate an information value from identifiability. And here's the irony, as I said, we do a lot of work with Chief Data Officers, they don't really care about who you are. Nothing personal. They care about what you represent. In their mind, they're thinking of you in groups, classes, and demographics. And when you give them original data, they have to work through that original data to get to the point that they're even considered relevant. So, you're actually making the data user’s job easier because you're starting them at a point that it's at a cohort or group level. You're not making them work through the identifiability. So, not only are you helping to satisfy and respect and enforce the fundamental rights of any digital data subjects, you're making your colleagues' jobs easier. But it can be re-linked under controlled conditions and a lot of people don't realize this.

[00:40:47] The GDPR specifically says with pseudonymised data, it's okay if you can re-link, but you have to show it's under controlled conditions and only certain people can do that for authorized purposes. It gives you greater legal rights to use the data internally and externally. Actually, again, this is not a loophole in the GDPR, Article 12 (2) says explicitly if a data controller can show that they can't re-link data to an individual, it's not subject to individual rights 15 through 22. It encourages this approach. So, again, you can have different situations where you can re-link or not, and this works with both structured and unstructured data.

[00:41:27] Very simple example. I have three different variants up on the screen. One use needs a circle level of granularity as to what someone's income is and where they live and the value of their house and the other two require lesser amounts. So, you can deliver to each of these users a different view of the same data in real time.

[00:41:46] So, I'm gonna give you some use cases now. You’re probably sitting there saying: “I hear but I don't understand what he's saying and give me a use case.” So, I'm going to give you actually seven real-world use cases - two legal, one in the cloud, and four for financial services companies. Most of the companies that we work with are large enterprises and tend to be in highly regulated industries already - finance and health. They're already used to this concept of data being regulated. But everyone in this room knows starting May 25th, all data is highly regulated. But again, these use cases are more of the highly regulated industries.

1. Alternate (Non-Consent Based) Legal Basis

[00:42:18] The first one, we've talked about this. Pseudonymisation is an identified means of a technical and organizational safeguard to support legitimate interest. Why should you care? How many organizations in this room are counting for the future of their organization to develop new value through analytics, artificial intelligence, machine learning, and digital transformation? You can't support those. In almost every situation there's always exceptions by consent or contract. So, again, Dynamic Pseudonymisation helps to support the legal basis of legitimate interest.

[00:42:50] Secondly, and these are the six legal safe havens. And again, feel free to pick up a book or if you want to get an electronic copy, give us a card, come by the booth, and we'll send it to you. This is literally the result of 6 years of research. And so, these are six different business purposes that Dynamic Pseudonymization and Data Protection by Design and by Default can support. The first one we've touched upon. What if you want to process data and you don’t want to reply on consent? The second one is compatible secondary uses of data. And it literally will walk you through the Article 29 Working Party Guidances, EDPS, the recitals, and it walks through all the legal parameters. Our clients literally take this to their Chief Privacy Officer and General Counsel and say: “Read this/ I know I can't teach you anything because you know everything already, but perhaps it will enlighten you. And if you agree with what's in here, can we take this approach?”

[00:43:38] So, it's a holistic approach where you address the business, the technical, legal, and even statistical underpinnings of what's necessary to show good data governance. The third one is controlling the linkability. Only provide linkable data if it's necessary. The mere fact that you can control whether data is linkable goes a long way to risk assessments. Privacy-respectful non-identifying processing. If you don't need to identify somebody, don't. This next one is I probably want to emphasize this again. If you have legacy data, data that was collected before the effective date of the GDPR, you really need to ask yourself: “Upon what legal basis did you collect that?” Because the guidance from the Article 29 Working Party again that came out on the 16th on page 31 and the last paragraph states you have one opportunity. Once the GDPR goes into effect, if you don't notate the right legal basis when you collect data, you cannot fix it. Ever. You cannot go back in time.

[00:44:37] But because they're changing the rules on all of us, they say you have a one-time opportunity to fix the legal basis for the data that you collected prior to the GDPR. So, that one, the privacy-respectful processing of legacy data is important. And the last one is one that for some reason people just aren't paying attention to. It's the joint and several liability between data processors and data controllers. And so, those parties who do things right are going to be much better business partners going forward. The example that's often brought to our attention are the cloud providers. That's going to be a data processor. Make the data controller to represent the warrant and the contract that they're going to comply with all data protection laws. You know what? That has worked up through May 25th. The cloud providers have no liability. What happens on the 25th? They have direct liability under the statute to data subjects, etc. And so, this becomes absolutely critical to get this right whether you're a data controller or data processor.

[00:45:34] So, here's the example in the cloud. The cloud is all about distributed, decentralized processing. And the reality is you can allow that to happen at the two layers to the far right. So that's, the platform layer or the software or the application layer. And you can do that by embedding the compliance at the data element level. And so, you actually have the centralized controls at the infrastructure layer or even in prem or in region, and the data that's used in the cloud is actually a Variant Twin data that is not identified. And if that's what the data users need anyways, you've accomplished both goals.

[00:46:12] So, four financial services use cases, and then we'll get to the questions.

[00:46:18] The first one. One of the largest banks in Europe recognized they had a big problem and that big problem was they wanted to use data, but they had an issue in that they actually couldn't get approval from the Chief Information Security Officer to share data between different divisions in the same legal entity within the UK. What law is being violated? None. The CISO was concerned that the correlations and linkages that would be revealed through the combination of those datasets would make the data more vulnerable to breach by misappropriation by rogue employees.

[00:46:54] So, that very same correlations and connections that the people who use the data wanted was why the CISO prohibited it. Also, they're collecting data from over 50 countries. You've got data sovereignty issues and you've got all kinds of issues. But you can actually create Variant Twins of data because, again, what I really want to know if I'm in private banking is what are the attributes of someone from the credit card department who makes a good candidate for a private bank? The reality is, if I can do that analysis and give the resulting attributes and the resulting characteristics back to private banking, they're the ones who do the linking. The analysis and the re-linking can be separated and only the party who has the right to make the re-linking is the one that does it, and this can work between divisions and between separate legal entities, which I’ll hit in just a moment.

[00:47:19] All right. Here's another one. A German bank with hundreds of data puddles. They want to actually combine those to make a data lake, but the owners of the different data puddles don't want to give up control of the identity of the members in their puddle. What do you do? You create a data lake that is comprised of Variant Twins that's actually information rich but non-identifying that allows consolidated processing and comparison, analytics, and AI and then the results of that are given back to the owner of each data puddle to re-link to actual individuals. Again, making a distinction between identity and information value.

[00:48:16] The next one. Here you have an Italian bank who is in the instant credit business. And so, people come to them who've never been a customer there before and they ask for instant credit. How do I evaluate? But the reality is they work with other non-legally associated companies be it a grocery store, an insurance company, or a utility company and they say: “What's your experience with people with the following attributes?” And those following attributes are equivalence classes of five or more. It's not identifying and they hear back. They're actually able to probabilistically determine the likelihood that someone would be a good customer, not because they know it's a shared customer, but they know they have shared attributes. This is the future of analytics. This is how you feed AI and machine learning. This is how you enable people to continue to use data.

[00:49:08] On Monday, we were in Paris meeting with the CNIL. Do you know why? The French have announced a $1.5 Billion AI initiative.Let's think about that. The government is going to share data so you can help feed these AI algorithms so that people can come up with new discoveries. That's fine. How do you do that without revealing identities? And so, the reality is the future is balancing information value with the need to protect identity, and the way you do that is you separate them one from another. This is a visual depiction of this Italian bank that I talked to you about. If you look at it, the dark blue represents proprietary identifying data. The dark green represents identifying proprietary data that's owned by two separate legal entities, but they can create Variant Twins that share the same schema. The same attribute scheme. That’s the gold box. They can combine those boxes. They can analyze the combination thereof. And each takes that back to their own firewall, their own protected environment, and they really get back to the individuals. Again, separation of information value from re-identifiability.

4. Operational Efficiency (Data Transfer)

[00:50:52] And this is the last one. This is one that's rather interesting and that is a major bank has so many privacy issues both because of data sovereignty issues and security issues that when there's a problem with data, they send the people to the data. That's incredibly inefficient and expensive. Whereas, you could send Variant Twin versions of the data to people to remedy the problem. So, that's it for my presentation and we're looking forward to questions at this point. So, please raise your hand if you have a question.

Audience Question

[00:50:55] In my background, I've done a couple of things. So, I'll ask two questions. One, I used to work as a privacy officer for an entity that looked into health systems that collected cancer research data, and one of the big issues was the longitudinal data for population surveillance studies and ethnicity in this case for that site comes up for that, but I'm also engaged now with some other players in the field that are starting to do - I don't know if you've heard these phrases - self-sovereign identity and various things like that. So, if you get a situation where self-sovereign identity takes off and you’ve got an otherwise cryptographically unique relationship so each individual has actually unique presentations with the various backend data stores, does that give you any problems?

Gary LaFever (Anonos)

[00:51:45] So, a couple of things. First off, to go to your point indirectly and not go to it directly,, I'm actually flying first thing tomorrow to Oslo meeting with the Norwegian Data Protection Authority at the request of a life insurance company. Here's their concern. They were told by their technology team and not their CPO that they have to delete all the data when somebody leaves the insurance company. So, their concern is what happens when somebody comes back later to either ask for coverage or there's a client that they have no history whatsoever. And so, we're going together to the Data Protection Authority to discuss that this has societal impact. And so, what you can do, if you think of what a Variant Twin represents - a Variant Twin represents a non-identifying version of data with the ability to re-link back to the original data. You can sever the linkages and give the key that includes the linkages and the original data to the data subject. So, if they ever do come back, you actually can realign all of that.

[00:52:43] And so, the first thing I want to go to is the fact that there are impacts of some of these laws that if you don't think through can actually have negative impacts on the data subjects and society as a whole. Secondly, with regard to that specific question that you asked, the easiest way to think of the way that Anonos does what it does and people may well be familiar with the example that's often cited that you can get from the US Department of Census three data sets. People within the US residences and the last census by zip code, by age, and by gender. Each of those is anonymous according to US definitions of the term because people's names aren’t in it. The problem is my name was replaced in each three of those datasets by the same token.

[00:53:25] And if you combine those three datasets, it's been proven that you can re-identify up to 87% of the US population by name simply because I was given the ABCD in the first one, ABCD in the second one, and ABCD in the third. The easiest way to think of the easy implementation of our technology is what if I’m ABCD on the first one, Q99 in the third, and DDID in the second? The reality is each of those datasets is still accurate. You can't figure out who anybody is and any of them by themselves. And unless you have access to and permission to see the mapping table and overall key, you don't know that ABCD equals 1234 equals DDID. So, yes, you can in fact cover that thanks. Allison, how about you?

Dr. Allison Knight (University of Southampton)

[00:54:08] So, in my language and from where I come from, I think what you're talking about is data stores, so it’s this concept that you can actually give people to control the linkages in particular in identity assurance testing and I think that's actually possible using this same type of technology. So, actually, the government is coming up with something called Verify, which is a UK identity assurance scheme and it's really based on this model that you're using trusted third-party intermediary. So, use that trust level. Why do we need to know exactly who we're transacting with? You know, it's like having a credit card. Why does it have our names on it anyway? Why isn't it Mickey Mouse? So, why is that important to know? So, I think, yes, you're absolutely right. There is potential. This is a field of research, but how are we going to do it? Those principles that Gary and I found at that meeting of minds, I think, that's the key for the future way in which we’re going to do this stuff.

Audience Question

[00:55:05] So, I haven't read your paper and I'm just going off of what you said and I'm going to ask you a question if I'm interpreting correctly because I didn't see the definition of Variant Twins. Maybe you had one up there, but I didn't see it. So, it sounds to me like you're having temporary tokens that represent k-anonymous and l-diverse subsets of your entire set. Would that be an accurate description?

Gary LaFever (Anonos)

[00:55:34] Correct and then you're maintaining the mapping between them. Absolutely. You clearly know what you're talking about. Any other questions?

Audience Question

[00:55:40] So, the organization I work for supplies products and services direct into the NHS. Three separate types of business. So, there’s very little personal identifiable information in there. The second one is for staffing so putting the temp staff from agencies into trusts, and then the third portion is staff flow. So, you may have a temp worker logging in on one trust who has the skills and attributes to be able to fill the vacancies within other trusts for various different reasons. So, there are privacy impact assessments being conducted within the data processing of grievances created that’s going to be a set of anonymised data and we would only receive the anonymised data for the temp employee analytics. When we stop doing staff flow, so one employee is moving to a different trust, we need the un-anonymised data. The trusts are actually the data controller. We’re the data processor. So, there’s been a complete change of what that use of the data is going to be for. We would’ve originally received the data either from the trust or from an agency that's anonymised but then during that person’s employment they have provided us with all of the things that then identity that person as ABCD123. I guess the question is: How does that set and how do you get around that because there’s this massive data creep but the only way that we can do business is by there being the data that is accepted by all parties that this is going to happen, but there’s no documentation to actually say we should have that. There is no processing agreement as all of the agreements from the agencies and from the trusts directing the data subject. There is no agreement direct with us as a processor.

Gary LaFever (Anonos)

[00:58:34] So, Alison, do you want to take that one first?

Dr. Allison Knight (University of Southampton)

[00:58:36] Well, whenever anyone ever says that the agreement is data is anonymised, my hair straightens up into the air. Part of the problem is that what does anonymisation even mean in particular under the existing regime? You know, traditionally, it's been touted sometimes where anonymous data is that there's no risk of re-identification. But of course we're not living in that world anymore where re-identification is a risk. It's a long tail of risk. So, I guess without getting into the details here, I’d ask how was it anonymised? But let's put that to one side. I think function creep or you said data creep so once you have identifiability - and I mean the GDPR doesn’t say that you can’t have some residual re-identifiable risk. What it says is that it depends on the means reasonably likely by the organization or perhaps the third party in certain circumstances to re-identify.

[00:59:35] So, I think you would have to sit down and try to work out what those consequences are. But really from a pragmatic point of view, I think you're right that at that point you're already there. You've already got the risk of it's personal data. So, you already have to be thinking about whether you'll comply. And this is a problem when it's all about terminology and it's about - once we get a late stage, we're left with a bit of a mess. How much better is if we could - how are going to define anonymisation process or Pseudonymisation as the ideal when we want to keep those individual data points. That's why we use the word Pseudonymisation. We want the individual data point so we can do things with it. We're not talking about advocating and taking away. So, I guess there were a lot of questions that you need to go through. But the next time, yes, you're right, it needs to be in a data processing agreement and then it's actually thinking about these things from the start. You know, I'm in a similar boat. The University gets information from other sources. Of course, there’s this question about transparency. At what point does the individual need to know what you're doing with their data as the subsequent point? But I would say fundamentally, first of all, you need to have that secure flow under Data Protection by Design to get you onto that right footing going forward.

Gary LaFever (Anonos)

[01:00:50] So, I was on a video conference call yesterday. And without consciously doing it, the person on the other side used the term “anonymisation” and I went like this as if I was warding off some kind of Dracula or a demon. And the reason is, I think the word anonymisation gives us a false sense of security for all the reasons that Allison said, and Pseudonymisation is hard to say and it’s harder to spell. Is it an S? Is it a Z? And by the way, it’s not pseudo-anonymisation because that word doesn't even make sense. But the good thing about Pseudonymisation and I have been mentioning this - the definition of Pseudonymisation under the GDPR has never existed before. It literally says: “I need to separate the information value of data from the means to re-identify. There must be additional information necessary to reunite those and that must be kept under controlled conditions and only provided to those people who have the authorized parties.

[01:01:49] That's why if I replace a value with the same token every time, I don't need additional information. I can draw those correlations without anything. That's why to me - to us - static tokenisation does not satisfy the definition of Pseudonymization, and Pseudonymization is not required anywhere under the GDPR. It's never required. It is encouraged 13 times and there are significant benefits of using it. So, what I would say to you is first, don't use the A word. Use the P word right and then think about it. If you can show that technologically the data itself has been protected so that when it comes back and you really do have to do the re-linking that it requires additional technology and the data is only available to select people. That and a good privacy impact assessment and you should probably be okay, but I would avoid the A word because I really think it gets people in trouble and I would actually look for the technical measures.

Audience Question

[01:02:47] So, this A word is a stipulation from the data controller within their processing agreements you will only receive anonymised information. It can’t be anonymised information.

Gary LaFever (Anonos)

[01:03:11] I think it's Article 4 (5) of the GDPR. I’ll send it to him and just say: “Read the definition of Pseudonymisation.” Any other questions? We have a little bit more time.

Audience Question

[01:03:17] I was thinking if a data processor posts data that belongs to a controller and then wants to do product development through analytics, would they rather rely and say that this data is anonymous so GDPR doesn't apply after applying strong Pseudonymisation or that they actually become a controller and then the data subject right is documented?

Dr. Allison Knight (University of Southampton)

[01:03:44] I mean, this is a tension that's throughout. You know, as in are people going to put all the eggs in one basket and say it’s anonymised, which we don't like that word, but anonymous for the purpose of GDPR, which means that you've got the Recital 26 which means that if you can't reasonably by the means likely you can't re-identify that person then it's non-personal data. But the problem with the dynamic view is that that means reasonably likely in the perspective - the how, the from whose perspective, and in what ways? I mean, even identifiability can be defined in different ways. It's very difficult to put your hand down. So, we love it. So, that's what you would do if you're trying to do analytics. You would handle the data in a pseudonymised form at the least. I mean, if you can take away all the individual data points and achieve your own, great. Do that.

[01:04:33] But if you can't, at the very least pseudonymise it so you've got a good argument with the regulator if any problem that it's non-personal data. But we're in a regime and era now where you can’t assume and put your eggs in a basket because if things change it could be treated. So, you need to contract to say that it is personal data. Whether they then become controllers is actually a slightly different issue. It really depends on what they're doing. That relationship could be. I mean, in the Data Pitch Project, we've talked a lot about that, but I would say that's slightly a separate issue from whether it's personal data or not. It's really who has the means and the purposes and who's driving this relationship and always seeks some legal advice around that.

Audience Question

[01:05:14] To your point as well, I think you touched on it a number of times in your presentation and I think it can't be understated. Is the importance of the controlled sharing because the minute you - and I always tell my clients this in presentations as well to never just release a dataset you think is anonymised or it doesn't have any personal data into the wild because, as you mentioned, it's dynamic and we're regenerating a thousand points of light and a million points of light in petabytes. So, I think in response to that, you're always going to have a contract and technical and other organizational measures that controls how it's going to be used and requires them to play within that little sandbox. And if they want to make any other changes, then they have to come back and you have to do the new risk assessment to see if it can be re-identified again after in light of the changing environment.

Dr. Allison Knight (University of Southampton)

[01:06:01] Yeah. That's absolutely right. So, break it down into time periods as your purpose changes as the data or what we call the data environment changes. And you're actually right as well. You can't just say technology was there. It's a combination of legal. No one would share that data with that data legacy agreement. We're not going to get rid of lawyers and I can say that because I'm a lawyer. Fortunately, maybe in 20 years’ time. But you know you've got to have the right thing and the organizational - so this actually sits on top of what we've been learning about today about the GDPR organizational, the accountability, the keeping records, the getting lawyers involved without feeling like the legalease is overwhelming us and there are too many agreements, and then as well as that we understand the framework and we use technology to do the work for us and we don't have to do so much work. So, it's a culminated suit of armor that we're putting on there and understanding that is key.

Audience Question

[01:06:55] And I'm just wondering and back to you, with that in mind then, you provide the technology and the tool, but do you also sort of caution your clients that this is not the magic button?

Gary LaFever (Anonos)

[01:07:02] It is not a silver bullet. In fact, what it's about is really - we were at the IAPP Global Conference in DC a couple of weeks ago and someone came up to her booth and they said: “My company just spent over $50 billion to buy a healthcare company. I have 12 people in my compliance department. And before the acquisition, we had 700 data scientists. After the acquisition, we're going to have 1400 data scientists.” Guess how many people are going to be in the compliance department? 12. And so, their question to us was: “What do you actually do?” And the answer when you cut through the k-anonymity and the l-diversity, which is important, is we programmatically enforce your policies. It's not a silver bullet, but you can now create policies that are technologically enforced. They’re your policies. Not ours. It's your data. Not ours.

[01:07:55] And so, to us, Data Protection by Design and by Default is all about just that. No, it doesn't replace lawyers. It makes them superheroes. You see the T-shirt that we’re giving out, right? Because it programmatically gives you the technical ability to enforce the rules that you determined makes sense. And so, 8 weeks from now when the Chief Data Officer comes in and says: “Can I still use my data?” Hopefully, you have this and you say: “Yes, you can. But guess what? You can do anything you used to do before but you have to do it differently and I'm here to help you figure out how to do that.” So, data analytics, artificial intelligence, machine learning, digital transformation, it's actually even more enabled under the GDPR. If you embrace the technical and organizational measures and the safeguards it requires and the data governance. I mean, it’s really what it's about. Any last questions? I think we have just a moment or two. Yes, please?

Audience Question

[01:09:00] I’m Lisa Tang. Sometimes when you have the government agencies - so, if you want to claim like VT for example and then they want proof, evidence, customer list or attendee list or something, can you then challenge them to say you don't really need this information and you just need to know the number of people attending the event so that you can then playback the VT. Could you then use this technology to say: “I'm going to give you this.” But it doesn't show you the personal data? And that you really need to see that they can have the key to it and that will only be given to a specific case work because if you give them the list, it could be shared amongst other people in the organization.

Gary LaFever (Anonos)

[01:09:45] So, what you just described is what I think we've all hoped surveillance becomes where surveillance is applied at the pattern matching level and not the individual identity level and only when abnormal or dangerous patterns are observed that you can pierce through to get identity. So, whether the government would accept that is another question. But I like to think that in time as people become more familiar with this and I'll speak to you - the debate will be what's the appropriate level of k-anonymity. If we ever get to that point, k-anonymity basically means there's at least a K number of people in an equivalence class. Typically, it’s held to be five, which means that there are at least five people that can be identically described in the same category. The risk of re-identifying them is no greater than 20% with any one of them.

[01:10:35] So, if you could say: “Look, I had 10 cohorts and there were five in each, which means I had 50 people.” I had 50 people and yes I have the data to show the five and each of those. If they would take that, actually that would be a great result because you do have the data that supports it in a fine grain or identifying manner, but it would only be pierced when justified. So, as odd as that sounds, my hope is someday we're arguing as to what is the appropriate level of k-anonymity. That would mean we've really advanced the point to where objective criteria used by regulators and governmental agencies will determine if we’re managing risk. Well, thank you very much. We appreciate the time.

IAPP Europe Data Protection Intensive 2018Just Because You're GDPR Compliant Does Not Mean You Can Use Your Data

IAPP Europe Data Protection Intensive 2018
Just Because You're GDPR Compliant Does Not Mean You Can Use Your Data