How to Comply with the GDPR While Unlocking the Value of Big Data

2017 IAPP Webinar
Presentation Transcript
Gary LaFever Gary LaFever
CEO at Anonos
Former Partner at Hogan Lovells
Mike Hintze Mike Hintze
Partner at Hintze Law
Former Chief Privacy Counsel and Assistant General Counsel, Microsoft
Gwendal Le Grand Gwendal Le Grand
Director of Technology and Innovation at the CNIL
Read the summary from this “Don’t Miss” webinar featuring the French CNIL.
Featured Topics Covered:
  • The GDPR Increases Options for Organizations to Process Data
  • Support for Global Data-Driven Business Beyond GDPR Compliance
  • Controlled Re-Linking of Data Increases the Value of Data Analytics
How to Comply with the GDPR While Unlocking the Value of Big Data
Dave Cohen Dave Cohen (IAPP)
[00:00] Welcome to the IAPP Web Conference on “How to Comply with the GDPR While Unlocking the Value of Big Data” brought to you today by Anonos. My name is Dave Cohen. I'm the IAPP’s Knowledge Manager, and I'll be your host for today's program. We'll be getting started with the presentation in just a minute. But before we do, a few program details. Participating in today's program will automatically provide IAPP Certified Privacy Professionals or the named registrants with one (1) CPE credit. Others who are listening in can apply for those credits through an easy-to-use online form on our website.

[00:34] We'd also like to remind you that today's program is being recorded and will be provided free to registered attendees approximately 48 hours following the live event. A link will be provided on one of the last slides in the presentation to access the recording, and we encourage you to ask questions at any time during the program by typing them into the Q&A field to the right of your PowerPoint window. And your questions will be answered by the presenters either during or after the presentation at their discretion. And now without any further ado, let's get started and I would like to introduce today's panelists.
Welcome & Introductions
[01:06] Gary LaFever is the CEO at Anonos and a former partner at Hogan Lovells. Gary, welcome to the program and can you tell us a little bit about your professional background in privacy and security?
Gary LaFever Gary LaFever (Anonos)
[01:15] Yes, Dave, thank you. As Dave said, my name is Gary LaFever. My background is actually in risk management, technology, and law. And I'd like to personally welcome the nearly 600 people we've had registered for this event, which is literally a who's who of the international privacy and security community with representation of the Fortune 500 from almost every vertical as well as legislative and regulatory leaders. So, welcome. This kind of strong turnout I think really reflects a common goal among our community of privacy leaders to comply with the GDPR but in a way that still enables us to do big data analytics and it is possible to achieve both.

[01:57] This webinar will enable us to share information about how data protection and enhanced data value need no longer be viewed as opponents. Companies can more than comply with the GDPR. They can improve and grow their business. Prior to co-founding Anonos 5 years ago, my prior company was the leading real-time risk management technology vendor for the financial securities markets worldwide. But there, we focused on satisfying regulations, not on improving the business lives of users. But over the last 5 years at Anonos, we've leveraged our global risk management expertise to help companies go beyond merely satisfying regulations to advance their business goals as well. It is a rare instance to be able to do both at the same time - satisfy regulations and advance business goals.
Dave Cohen Dave Cohen (IAPP)
[02:51] Terrific. Thanks, Gary. Welcome to the program, and thank you for your help with sponsoring the program for our IAPP members and others as well. We really appreciate it. And joining Gary on the panel, Mike Hintze is a partner at Hintze Law and is former Chief Privacy Counsel and Assistant General Counsel at Microsoft. Mike, welcome can you tell us a little bit about your background?
Mike Hintze Mike Hintze (Hintze Law)
[03:09] Sure. As you noted, I am a partner at Hintze Law, a small Seattle-based law firm. Five of us are focused on privacy and data security exclusively. And before joining that firm last year, I was at Microsoft for 18 years where I was the Chief Privacy Counsel leading our work on privacy compliance on policy and strategy. I also teach data privacy law at the University of Washington Law School, and I'm involved in a number of professional organizations focused on privacy.
Dave Cohen Dave Cohen (IAPP)
[03:50] Terrific. Thanks so much, Mike. It's great having you on the panel with us today too. And joining us from his office in Paris, we're very, very pleased to have with us today Gwendal Le Grand who is the Director of Technology and Innovation at the CNIL. Gwendal, thanks for joining us. Can you tell us a little bit about your professional background please?
Gwendal Le Grand Gwendal Le Grand (CNIL)
[04:08] Yeah. Thank you. Hello everyone! I'm Gwendal Le Grand, the Director of Technology and Innovation at the CNIL, which is the French Data Protection Authority. There, I supervise all the people doing the technical policy at the IT Experts Department. I also supervise the IT Operations Department, the innovation and foresight unit, and also the CNIL labs. I worked a lot with Article 29 especially in the work of the technology subgroup of Article 29. I'm the liaison officer of the Article 29 to ISO/IEC JTC1/SC27/WG5, which is doing the standards in privacy, and I have a background in computer science and telecommunications.
Dave Cohen Dave Cohen (IAPP)
[04:49] Wonderful. Terrific. Gwendal, thank you so much for joining us. And as you can see, we have a rich depth of experience in various perspectives on the panel today. So, without any further ado, let's go ahead and get started. And Gary, I'll turn it over to you for that to begin this program. Gary, it's all yours.
Program Outline
Gary LaFever Gary LaFever (Anonos)
[05:05] Thank you, Dave. So, we're going to start off. I'm going to speak briefly about readily linkable and controlled linkable data. As you registered for the webinar, there was a link for a whitepaper that Mike Hintze and I co-authored. I would strongly recommend you take a look at that if you have not already. It can also be downloaded at We're not going to go into a lot of detail on matters that are covered in the whitepaper because we really want to get to the Q&A session, but I will start off with a high level overview between the difference in readily linkable and controlled linkable data.

[05:42] Mike Hintze is then going to pick up on the concept of maximizing value while staying in compliance, and Gwendal is going to really get to the meat of this with the technical requirements for data protection by default under the GDPR and really get into both the intent and the practice of it. And then the real value of this, we believe, will be the Q&A afterwards. So, as Dave said, please enter questions in the box to the right. We're already getting some. I want to let everyone know that we will respond to all questions. If we don't get a chance to do that during the webinar, they will be done afterwards. So, please do not hesitate.

[06:17] I also wanted to encourage not only questions but if you have observations, recommendations, suggestions, etc. We're going to be sharing everything that's put through the input interface. And also in the next 24 hours, if you want to submit questions, observations, and recommendations to We will be sharing both the questions, the answers, and the interaction between the community on this webinar. So, thank you.
This slide that you see in front of you, we actually added to the presentation after we got some of the questions that were submitted in advance.
[06:45] This slide that you see in front of you, we actually added to the presentation after we got some of the questions that were submitted in advance. And those questions we'll get to later, but they dealt with the magnitude of the undertaking, the need to get different stakeholder groups within an organization together, and the question as to who those stakeholders should be and what they should represent. And we thought it was important to add this slide because the reality is, while most of our discussion today is going to be about compliance and existing data, it's what you can do with new innovation with both existing and new data and what you can do with new data in a compliant fashion that really makes GDPR compliance very, very powerful.

[07:31] And in fact, in this regard, I would encourage anyone to check out the blog that Hilary Wandall is posting today at It’s great. It’s entitled “Maximizing Data Utility Under GDPR.” She mentions the whitepaper and this webinar. But more importantly, she carries over a blog that she started in December talking about privacy professionals being business enablers, and that's a lot of what this webinar is about. And so, I would encourage you to check out that webinar and how it's important to both partner with the appropriate stakeholders and constituencies within your organization and to drive that constant evaluation about both the value and cost of data.

[08:22] And really, the GDPR certainly inserts new penalties and downsides if you do things wrong. But I hope you will get from this webinar the message that if done right, it also empowers a lot of positive business things. And so, those things have to be taken to the whole organization. And so, on this slide, as I said, what we're focusing on is the fact that there's a lot of positive that comes from the GDPR as well.
Obfuscating Connections Between Data Elements
[08:50] I also want to mention that we purposely all those on the webinar decided that some of these slides may be a little text heavy from what you're used to seeing in a webinar, but that's done purposely. By the end of the day, you will all receive a copy of this deck. And so, you'll have it to refer to. And then also within the balance of the week, you'll get a copy of the replay, the Q&A, and the interaction between the community.

[09:14] So, what I'm trying to show on this slide is basically on the left hand side where you have readily linkable data and re-identification is possible. The primary thing that you will see is the relationship between the data elements are clearly evident, linkable. But the quality and vibrancy of the data is muted. Whereas on the right hand side, each of the data elements are much more vibrant. They also can be viewed independently, and the relationship between them is not evident unless and until an authorized party determines that will be the case.
The GDPR requires FUNDAMENTAL CHANGES in data processing
[9:51] So, we all know by now, the GDPR requires fundamental changes in data processing. And I think the reason there's so many people on this webinar is that people are realizing now that merely complying with the GDPR while that will mitigate the downside and the penalties, it does not ensure your ability to continue to do business in a data analytics perspective. And it's because the GDPR requires much more than just privacy by design, it requires the most stringent form of privacy by design and data protection by default, and that is a new requirement. But that new requirement can actually improve data privacy, security, and accuracy. So, the benefit of the GDPR, if done correctly, is you actually can be in a better place for your business.
Timeline of Changing Data Processing Priorities
[10:45] This slide, I realize, may be a little hard to read. But as I mentioned, you will get a copy of this. This slide is intended to show a historical progression - a timeline of what our focus and priorities have been in data processing. Initially, as evidenced by the green light, it was all about the data. And it says here that the focus was convenience of data processing. I do not mean by this that data processing was easy. And a lot of innovation and effort and money has been spent in coming up with new ways to capture, retain, manage, and process data. But it was focused primarily on the data.

[11:25] Then, you get to the caution light. And that was when we started to realize the power of linkages. Those linkages could enable you to make correlations, discoveries, all kinds of things between data, particularly data from different sources. But we also had to be cautious because those linkages which opened up all those value propositions also exposed identities of data subjects, and that's when data privacy by design started to come into vogue.

[11:54] We're now at the third stoplight. It's a red light. It's a red light because what's been realized is this convenience of data processing came at the cost of the fundamental rights of data subjects. But if viewed in the right light, the GDPR actually provides an answer to that. Data protection by default enables us to respect, honor, and protect the fundamental rights of data subjects while actually opening up new business opportunities.

[12:27] And that's why we end up at the green light because there are positive attributes to this. And so, while the convenience of data processing now drops down, it's not going to be as easy to do things the way we've done it in the past and organizations will have to make fundamental changes in how they process data. But the upside of those changes are new business opportunities and data utilization.
Data Protection by Default enables a new form of de-identification: Controlled Linkable Data
[12:54] So, data protection by default as required under the GDPR actually enables a whole new form of de-identification, and Mike Hintze and I go into detail in the whitepaper on the subject. We chose the term “controlled linkable data” purposely. It is not a word that's been defined in statute. It's not a word like anonymity that has many different interpretations around the globe. It literally means what it says. There's data that can be linked, but there are adequate controls over that linkability. And as a result, traditional technologies that have been around for decades were invented and refined and perfected years before data protection by default was even invented do not themselves achieve controlled linkable data and that's why new technologies and new approaches are necessary. But importantly, controlled linkable data supports GDPR requirements for data protection by default.
Explanation of Controlled Linkable Data
[13:53] This slide again goes into a lot of detail. I'm going to go over it very quickly, but details are in the whitepaper. And what this slide shows is the fundamentals of controlled linkable data and you'll see that it mirrors the requirements of data protection by default. The first thing that you do is you sever the data into discrete elements and you protect each of those elements so that they're each protected by default. Also, correlations between those data elements are obscured. So, you cannot have or it's much more difficult to have linkage attacks or re-identification via the mosaic effect.

[14:35] So, the three steps are: You begin with data, you separate it into data elements, and you replace persistent identifiers with non-persistent pseudonymous tokens. What does that give you? It gives you the benefit of number two, which is granularized re-identification. And when you can show data in context with adequate controls, you actually can generate greater value.
Readily Linkable vs. Controlled Linkable
[15:03] This next slide highlights this difference between readily linkable and controlled linkable data, and it will help to give a couple of examples. One is actually rather humorous. As we put word out regarding this webinar, we started to get very strong responses. As I said, we've had nearly 600 people register. But we still wanted to have a proactive outreach to leaders in the community that we thought should not only participate and see the webinar, but also contribute to the quality of the Q&A and the follow up.

[15:36] And in doing so, we asked the IAPP for a list of who had registered up until that time, not because we wanted to reach out to them, not because we wanted to link to them, but for the exact opposite reason. We didn't want to overly inundate companies that had already sent a representative to the webinar and signed up for it. Rather, we wanted to go out to different companies and it actually took a little bit of time for us to negotiate that with the IAPP. Why? Because the IAPP, no surprise, takes privacy very seriously.

[16:08] But what they realized is we weren't asking for the “who.” We were asking for the “what.” Not who people were but what companies they represented. And that highlights the default before the GDPR, which was just about all data was tied to the “who.” And so in order to get to the where, why, what, or when of a data element, you went through the “who” when in fact those elements have freestanding value on themselves.

[16:38] A question for yourself: When you have an app on your phone that's a map app and you launch it, why does the provider of that map app have to know who you are? They don't. They have a legitimate reason to know if you're registered for the service that doesn't require knowing who you are, and they have a very important need to know where you are and where you want to go. But you could just as easily service users of a map app without finding out who they are and a different identifier could be sent every time that could be checked to make sure that the person was registered for the service and could give you their GPS location so they could get from where they are to where they want to go. A simple example of why not relying on who, but rather the value of the data element itself still provides high value but in a way that does not jeopardize data rights of individuals.
Data Protection by Default
[17:34] So just very quickly, traditional approaches to privacy were, as I mentioned before, developed years before data protection by default, and therefore it's not surprising they fail to support it by themselves. Don't get me wrong. There's a lot of fantastic privacy enhancement techniques and security techniques out there and they still do a fantastic job of what they do. They simply were not designed to do what we need to do today, which is data protection by default. They, therefore, provide inadequate privacy protection in and of themselves by themselves.

[18:07] And unfortunately, many privacy enhancement techniques actually reduce the value of data. Whether it's k-anonymity, l-diversity, differential privacy, the way they go about protecting privacy is reducing the level of accuracy. As you will see in more detail on the whitepaper, there are other ways to protect privacy where you can still leverage privacy enhancing techniques, but in a way that maximizes that value. And so, you end up with data protection by default, which actually allows you to retain up to 100% of the value. At the same time, you're improving security and privacy and supports greater use and sharing of data.
Benefits of Data Protection by Default
[18:50] So, this slide highlights the benefits to big data of traditional approaches (the readily linkable) and the new approaches (the controlled linkable). So, with the readily linkable, you're often left with binary alternatives. An example with the IAPP, they felt like they either had to give us all the information which revealed identities of registrants or nothing. When in fact, they could have just given us (which they did) the companies that were represented. Or reduced value. Many privacy enhancing techniques are premised on the fact that they reduced the value of the data, as compared to controlled linkable data where you actually have the ability to retain up to 100% of the value. And because you have greater protection of privacy rights, you actually now can increase the number of data sources used.
New GDPR Requirements May Soon Prevent You From Doing Big Data Analytics
[19:37] And the last slide before I hand it off to Mike Hintze. This comes from the whitepaper. The infographic on the left is meant as a visual to highlight the old approaches to privacy where you're using linked or readily linkable data that may well actually prohibit you from doing big data analytics, machine learning, and artificial intelligence. Whereas, data protection by default controlled link data actually enables you to continue to do that. And with that, I'd like to hand this over to Mike Hintze.
General Data Protection Regulation
Mike Hintze Mike Hintze (Hintze Law)
[20:07] Great. Thanks, Gary. I'd like at this point to take a little step back and look at the GDPR specifically and what's required under the GDPR. And obviously, we don't have time to go through a full overview of everything that's new in the GDPR in this seminar, but I think it's worthwhile to look at a few things. And very clearly, the GDPR is a significant step beyond what was required under the 1995 Directive. There are more obligations on data controllers and processors both in terms of substantive obligations, meeting new data subject rights, and a number of procedural obligations as well around process data protection by design and by default, data protection impact assessments and the like.

[20:58] As I mentioned, there are new rights for data subjects. It's potentially more difficult to rely on consent. The definition of consent has changed slightly with a couple more adjectives added in there. There are more prescriptive notice requirements. There are new data breach obligations, restrictions on profilings as Gary has talked about the requirements for data protection by design and by default. Obviously, the new penalties have gotten a lot of attention. The 4% of a company's annual worldwide revenue is potentially a lot of money, much more finding authority that existed under existing law in most countries.

[21:43] And add to that, the fact that a lot of it is still unclear. There's clearly a direction that has been given by the regulators. There are things that we know need to be done, but there are still a lot of uncertainty. And I think some people have looked at all this and sort of thrown up their hands and said: “I don't even know where to start. I don't know what to do. There's just too much here. There's too much uncertainty. This is just impossible”. Well, it's not impossible. It's clear that compliance will require adopting a number of new measures, some of which Gary talked about. But there are fairly clear paths forward for addressing much of what's in the GDPR.
Compliance Steps Necessary for GDPR
[22:26] So, in terms of what those compliance steps are, again, I can't go through in this limited time we have and go through the full checklist of what companies should be doing at this point. But there's a number of things that have been called out in the GDPR that will be required and that includes appointing new personnel; ensuring that your internal policies and external privacy statements are updated to meet the new requirements; developing new employee training; developing new or updating existing procedures, practices, tools and technologies based on your own internal gap analysis of kind of where you are now and where you need to get to ensure that you're in a position to comply in 2018 when the GDPR comes into effect, and those require the adoption of new processes, documentation around impact assessments for high risk activities.

[23:25] If you're doing stuff with kids, there's going to be a need for new parental consent mechanisms based on the new provisions of the GDPR. Ways to respond to the new data subject rights like the right to erasure or data portability. And then, the one thing that I really want to focus on today is the use of Pseudonymisation, de-identification, anonymisation to help comply with a number of the different provisions of the GDPR.
De-Identification Under the GDPR
[23:55] And I've spent a fair amount of time thinking about and writing about how the GDPR addresses de-identification. And the more I looked through the early drafts and then the final draft of the GDPR, I realized that the GDPR, much more so than the 1995 Directive, recognized a fuller spectrum, the different gradations that exist around de-identification. I think that's very important and that's a positive step forward. Under the 1995 Directive, it was almost sort of a binary all or nothing, either it's personal data or it's anonymous data. Under the GDPR, there are different variations and different gradations between those two that are recognized.

[24:48] First and this actually isn't too different from the 1995 Directive. I think one thing that's important to look at is the definition of personal data and that definition includes the concept of identified versus identifiable data. So, data that's identified is clearly data that's on its face or through some easy mechanism you can identify who that person is. Identifiable is a much bigger bucket where it may not be apparent on its face, but there is at least a theoretical way to re-identify the person behind that data.

[25:29] But the GDPR actually recognizes and this is new - two different variations in this concept of identifiable data. One is the explicit addition of a definition around Pseudonymisation, and Pseudonymisation is defined under Article 4 (5) of the GDPR as the processing of personal data in a manner such that the personal data can no longer be attributed to the data subject without the use of additional information provided that such additional information is kept separately and subject to technical and organizational measures to ensure that the personal data is not attributed to an identified or identifiable natural person. So, what that means is you've got some data. It's not apparent on its face, but there is some other data out there that would identify that data. But that other data is kept separate from the first chunk of data through technical and organizational safeguards.

[26:27] There's a different provision under the GDPR Article 11 that talks about a stronger level of de-identification, in which the controller is able to demonstrate that it's not in a position to identify the data subject. So, certain types of Pseudonymisation where the data is kept separate and there are technical and organizational safeguards, the controller still may be in a position to identify the data subject. The controller may, in effect, control those technical and organizational safeguards and be able to reverse them in the event of receiving a lawful order from a government in response to receiving a data subject access request from the data subject, him or herself.

[27:09] But Article 11 talks about a level of de-identification where the controller is not in a position to identify the data subject. And so, that’s a stronger level that's being recognized under the GDPR. And then, of course, the GDPR continues to recognize the highest level. It is a very high bar of anonymous data. And if you reach that bar, the GDPR requirements don't apply just like the 95 Directive didn't apply if you met that very high bar for anonymous data, and I know Gwendal is going to talk more about that and some of the things that the Working Party has said around anonymisation. So, I'll just leave that at that.

[27:47] But given these different gradations of de-identification or identifiability that have been recognized in the GDPR, you can see that it is recognizing a spectrum of de-identification much more so than existing law has where it goes from identified, a couple of levels of identifiable where one might be more readily identifiable, and then the Article 11 definition of de-identified data is less readily identifiable or much more difficult than in the highest level of anonymous or aggregate data.
Compliance benefits of De-Identification
[28:26] So, what does that mean? Adopting de-identification as a compliance mechanism can have a lot of benefits under the GDPR. A lot of the provisions of GDPR are risk based. And it's quite clear that the stronger level of identification you can apply to data, you're lowering the risk. And so, you can use de-identification to help demonstrate that you have adopted data protection by design and data protection by default as Gary suggested earlier. You can use de-identification if you meet that higher level, that Article 11 type of de-identification to result in a relief from certain specific obligations.

[29:05] Article 12 talks about that if you meet that level of de-identification, you're no longer subject to having to comply with data subject access requests, erasure requests, portability requests, and a list of other obligations. And that makes sense because if the controller is not able to connect that data back to an individual, there wouldn't be an ability to authenticate the person who's making the access request or the erasure request. And so, that's just recognizing the reality that once you reach a certain level of de-identification, you simply can't comply with those kinds of obligations. And therefore, you get that regulatory relief from an obligation to comply with those things. So, believe it or not, the GDPR does not require that you do the impossible.

[29:56] De-identification can also help meet security obligations because those are very risk based. The security obligations under the GDPR focus on adopting measures sufficient to ensure a level of security appropriate to the risk. And if you've applied de-identification, that level of risk goes down and the need for other security measures correspondingly can be reduced. It mitigates the risk of security breaches and notification obligations. Again, those obligations are based on the level of risk to the data subject. And if the data has been de-identified and there is a breach and de-identified data has been released that can go into the calculus of what the risk is and therefore the need to provide notifications.

[30:46] This one's a little bit more controversial I think, but I do believe that de-identification can provide a stronger case for relying on legitimate interest as a basis for processing as opposed to data subject consent. Now, there was an earlier draft of the GDPR, in which the use of Pseudonymisation gave an automatic ability to rely on legitimate interests and that was taken out. So, it's not automatic that if you reach a certain level of de-identification, you therefore get to rely on legitimate interest. But if you look at the rules around legitimate interest, it talks about a balance between the interest of the controller and the fundamental rights and freedoms of the data subjects.

[31:30] And again, it's clear that when you have applied strong de-identification, there is a lower risk to data subjects fundamental rights and freedom. I think that plays into that calculus, and it gives the data controller or the data processor a much stronger case for relying on legitimate interest. I mean, potentially there are several other benefits that I've gone through in some of the things that I've written and talked about some of those others.
GDPR Obligations Through a De-ID Lens
[31:56] This is a chart that I put in a paper that I wrote last year that talks about some of the things that I just mentioned and some others where if you apply stronger levels of de-identification, it can help you with your compliance under the GDPR and can give you the ability to rely on measures that may be more pragmatic, maybe easier, and may bring out the value of data to a greater degree. So, with that, I will turn it over to Gwendal at this point and let him complete, and then we’ll open up for questions afterwards.
WP29 on big data
Gwendal Le Grand Gwendal Le Grand (CNIL)
[32:39] Thank you. I just lost the presenter view. So, if you can please switch to the next slide. I will do a very quick presentation about the statement that Article 29 did about big data and also the opinion concerning anonymisation because I think this is really the heart of the topic that we're discussing today. So, there was a statement that was released back in 2014 and you will have the link directly in the slide that is available, which recognized that there are many benefits that are expected from the development of big data. Actually, an important part of the big data operations relied on the processing of personal data of individuals. It also raises important questions among which concerns with regard to the privacy and data protection rights of individuals.

[33:39] Benefits of big data analysis can be reached only under the condition that the corresponding privacy expectations of users are appropriately met and that the data protection rights are respected. In Europe, you know, we had the Directive, which was adopted in 95. Now, we have the GDPR and there's also other relevant EU legal instruments, which ensures a very high level of protection of individuals by providing them with specific rights, which cannot be waived. These rights are applicable to the processing of personal data and big data operations, and the principles are still valid at the Europe big data and they've been updated when the GDPR was adopted to make the principles of the Directive more effective in practice.

[34:31] What DPAs (Data Protection Authorities) believe is that complying with these principles with the rules that are enshrined in our legal framework is the key element in creating and keeping the trust to develop a stable business model that is based on the processing of such data. So, this means that investment in privacy friendly solutions in anonymisation techniques is essential to ensure fair and effective competition between economic players.

[35:02] Now, when it comes to the use of the word “big data,” what we see from our DPA window is that big data covers actually a great number of data processing operations. And most of them, I would say are already very well identified. Indeed, there's a number of developments that are qualified today as big data that have long been implemented in many EU member states and that have been tackled with the existing legal framework and that will be tackled with the GDPR of course. They have already been addressed with the framework of the existing data protection rules whether at the EU or at the national level because you know that in Europe, the 95 Directive is for the moment transposed into national laws. This means, for instance, that in France we have a national law that is transposing the principles of the Directive. This will be harmonized with the GDPR because the GDPR is a regulation. You don't need to have transposition in national laws. So, it's a very strong harmonization tool at the European level.

[36:16] This legal framework has been addressing many big data applications because most of the time what we see is that the controllers know exactly for which purposes that data is going to be processed. There's one important point that is worth mentioning in the legal framework, which was already mentioned by the previous speakers, which is that anonymisation is a key trigger for big data because also in the context of the GDPR and also in the context of the Directive the rules do not apply. The rules on personal data protections do not apply to anonymous data, which means that anonymisation is an alternative to data erasure once the purposes of the processing have been fulfilled.

[37:05] So, now that we've said this, the question is: “How do we anonymise data? And how do we do this in the proper way?” So, the Article 29 Working Party has issued a number of policy documents which are relevant to the analysis of privacy concerns, which are raised with regard to big data and anonymisation. So, there's an opinion that was published in 2014 on purpose limitation. There's another opinion and I will speak more about this one. This was released in 2014 on anonymisation techniques. I can also mention probably the opinion on legitimate interest that was also published in 2014.
Anonymisation ... ?
[37:50] If I can focus a little bit on the Opinion 5/2014 on anonymisation techniques. This is a rather technical opinion in which Article 29 explains how to anonymise, technically speaking, a dataset and what mistakes should be avoided when you use a specific non-anonymisation technique. So, I will say I’ve got the manual for anonymisation. It's like a toolbox and data controllers can pick a certain number of tools depending on their processing operations and the dataset they need to anonymise to design the appropriate anonymisation technique that is fitted to the dataset.

[38:40] I think one of the key messages in the opinion is the fact that we identified three criteria to qualify what is the quality of anonymisation techniques. And if the three criteria are met, then that means you're on the safe side. If the three criteria are not met, it means you have to be very cautious. You have to be very careful, and you have to do risk analysis concerning re-identification possibilities.

[39:14] So, now what are the three criteria? The first one is singling out, which basically corresponds to the possibility to isolate some or all of the records which identify an individual in the dataset. The second one is linkability. It's the ability to link at least two records concerning the same data subject or group of data subjects. So, either in the same database or in two different databases. The last criteria is inference, which is actually the possibility to deduce with a significant probability the value of an attribute from the values of a set of other attributes. So, deducing some information based on the dataset. So, as I said before, meeting these three criteria takes you on the safe side and means you will have an anonymous dataset and you can safely reuse the data. If you don't meet the three criteria, you just meet two out of the three, then you need to be very careful and think twice before you actually reuse the data. Maybe you need additional safeguards. Maybe you need a combination of different techniques.

[40:27] The Opinion also presents a number of technical solutions either to anonymise and they're classified into two big families of anonymisation techniques namely randomization and generalization. Now, I don't have time to go deeper into different techniques that are described in the Opinion, but we explained what noise addition, permutation, differential privacy, k-anonymity, l-diversity, or key proximity - what they are and what type of assurance they can give you once you have applied them to to your dataset. So, please refer to the Opinion. Please read the Opinion, and it will give you some good guidance about how to anonymise the dataset properly.

[41:19] The last message I want to focus on, which is discussed in the Opinion, is about Pseudonymisation. It's clear in the Opinion and it's also clear in the GDPR that Pseudonymisation is a leading practice. It's a security measure, but pseudonymous data is not anonymous data. When you have pseudonymous data, you can link to an individual and you can identify the individual. This means that the safeguards and the rights in the GDPR still apply.

[41:53] Just to give you some quick references on Pseudonymisation in the GDPR, you have this in the definition if you want to see the definition of pseudonymous data. One of the articles where it is listed as an appropriate safeguard. For instance, the article on security where encryption and Pseudonymisation are listed as leading practices. This is Article 42, by the way, on security. Article 25, which is about privacy by design and by default, also presents Pseudonymisation as a good practice that has to be implemented when you want to protect your data adequately.
GDPR on Anonymisation and Big Data
[42:36] This is going to be my last slide just to focus on the fact that the GDPR in addition to this, I would say technical focus related to anonymisation provides some additional triggers to facilitate all the economic uses of big data. GDPR was clearly designed with big data applications in mind. And of course, big data applications while respecting the rights and liberties of data subjects. So, I've listed here a couple of references from the GDPR, which you will find in the deck. GDPR recognizes the possibility to have categories of purposes. This is if you refer to Article 5 and 6. And more importantly, you have this principle of purpose limitation. But if you read the definition of purpose limitation, we also say that you need to define specific purposes for the processing of personal data and you should not process the data in a way that is not compatible with the initial purpose for which the data was collected.

[43:47] And once you have said this, there's also some recitals that relate this to the reasonable expectations of the data subject. So, it gives you some margin of maneuver to reuse data that has been collected, as long as the data is reused more or less in the same context. An additional trigger in the regulation is the fact that scientific historical statistical purposes are considered to be compatible with the initial purpose for which the data was collected.

[44:19] And now, if we want to open this up a little bit. I think we have a lot of companies that are here on the line today. There are some very important principles that you find in the GDPR. One of the objectives of the principles is to empower the data subjects, to be very transparent about the way the data are being used, and give the possibility for data subjects to actually control the way the data are being processed and the way the data is being reused. So, this means that in the context of big data applications, I think it's very important to design solutions where you propose technical simple means for data subjects to oppose specific reuses of the data. And one additional thing and this will be my last point probably before we answer the questions is that it's very important when data is being reused that you're very transparent about the way the data is going to be reused. This means you need to find adequate ways to inform the people so they have this information and they have the tools to oppose specific uses of their data. And with this, I'm going to turn it back to Gary for the Q&As. Thank you.
Questions & Answers
Gary LaFever Gary LaFever (Anonos)
[45:47] Thank you, Gwendal. So, we've had quite a few questions submitted. We likely will not have a chance to answer them all during the live webinar, but I encourage you to continue to submit questions through the web interface and also to because everyone that's registered will get a copy of the questions and answers. Also, a couple of questions have come through and they have asked questions that I think deep answers are in the whitepaper that we referred to before. So, if you want to refer to that as well, again, that's, but we will answer all of them and everyone on the webinar will get copies.

[46:24] So, what I'm going to try to do to try to get as many of these questions as possible is group a couple of them. So, I'm going to read three questions that are a little different, but I think all deal with the same issue. First question: “Most businesses are focusing only on bare minimum tick-in-the-box exercises rather than using this as an opportunity to transform the way they manage and use personal data. What would your advice be to them?” Next question: “How do I make our technologists understand why we have to process data differently the day that GDPR goes into effect than the way our company has processed data for years prior to the GDPR?” And the last one of this set: “My company’s technologists use the lack of specific requirements and specifications under the GDPR as an excuse not to change what we do and how we do it. Any suggestions?”

[47:22] I'll start with this and then hand it over to Mike and with Gwendal. But I think this is actually a very important set of questions and I think this is a wake up call. The GDPR is not a rule that enables you to make minor changes. It requires a fundamental shift, hopefully, that that one slide with the stoplights helps to convey that. It does require data protection by default, which has never been required before. Or if required, the penalties have been so de minimis that people just basically engaged in regulatory arbitrage and paid the fine.

[48:00] The penalties are currently assessed at 4% of global gross revenues, and it doesn't get as much press, but there's also joint and several liability between data controllers and data processors. So, the magnitude of fines could be amazingly large. And I believe and Gwendal can speak to this that the reason they are like that is because the EU legislators and regulators want us to take the rights of data subjects seriously. And that hasn't always been the case.

[48:26] So, the bottom line is you need to start the interaction with the downside of not complying and that does require that the people on this call likely are the standard bearers who are saying: “This isn't something we can just do a check the box.” But that's why we added the new slide four to the deck. If you take to them what these changes could mean in a positive way, you'll get more engagement by management on a different approach. This will likely require changes to architecture. Technologists hate that. You need to show them it's no longer discretionary, it's no longer optional, and they need to be at the table together with you and together with the people who are responsible for generating revenue and value through data. And through that stakeholder group, you can have a productive discussion. And so, Mike, do you want to take that just to kind of give your perspective on those three questions?
Mike Hintze Mike Hintze (Hintze Law)
[49:22] Yeah. You know, I agree with what you said. I think that there's a temptation and a natural reaction in many cases to say: “Well, you know, we haven't been handed a clear roadmap by the regulators that we have to take steps one, two, and three, and it's all very amorphous. And so, we're just gonna throw up our hands and do nothing.” And that's exactly the wrong thing to do at this point. It still seems like May 2018 is a ways out, but it's really not given the types of things that need to be done to get ready for the GDPR. I know a lot of companies have already done that path, a lot of companies are just getting started, and others are just trying to wrap their heads around what this all means. But companies need to start doing something. They need to start putting steps in place that when the regulator comes knocking after May 2018, they can say: “Look, these are the things we did.” And you know, at the end of the day, there's going to be some uncertainty and things are going to have to be tweaked as more guidance comes out as the GDPR starts to be enforced, and we all collectively start to understand what it's going to look like in practice better. But taking the steps to deal with the process and taking in the steps to deal with the data and how data is managed, stored, and processed - those are going to be important steps that have to be started now because you can't just flip a switch and do that overnight. You can't just wake up in, you know, April of 2018 and say: “Oh, well, now it's time to get GDPR compliant.” It's gonna take some time.
Gary LaFever Gary LaFever (Anonos)
[51:11] Gwendal, do you have any perspective on this you'd like to share?
Gwendal Le Grand Gwendal Le Grand (CNIL)
[51:15] Yeah, maybe if I can add to it that GDPR is applicable in May 2018, as you say, so that means we have more or less 15 months left, which is not a lot of time because there's a couple of things that need to be changed in your organizations concerning the governance of privacy, I would say. So, it's clear that companies need to get prepared. There's also new rights for individuals. It's not the topic of our webinar today. But for instance, there’s a new right to portability. There is an obligation in certain cases to conduct a DPIAs (Data Protection Impact Assessments) within the companies. There is an obligation to notify personal data breaches to the authority and to the data subject. This is feasible. Companies who process privacy and who take privacy into account properly will be ready for the GDPR, but they need to think of this a little bit in advance because new processes need to be implemented in place in the company.

[52:23] And one important thing I think is, of course, the fines can be a bit scary for the companies because it goes up to 20 million Euros or 4% of the annual turnover and the highest amount that counts. So, it's actually for big companies we are talking about 4% of the worldwide turnover of the company, but it also gives a lot of leverage to privacy professionals to get and to have more engagement by the management because when you're trying to implement some privacy safeguards and some security safeguards in your systems, I mean the order of magnitude is changing completely with GDPR. So, I think it's a very interesting tool for privacy professionals. And this is really how they have to see it.

[53:10] The last point I want to make with respect to the questions is the fact that Article 29 and the group of regulators is trying to help you also with respect to the implementation of GDPR. We've already issued some guidelines in December on a number of topics including the right to portability that I mentioned before and also Data Protection Officers. These guidelines have been open for comments and we're still receiving a lot of comments by various organizations. The deadline for comments is today, and we will produce some guidelines and other topics on the hot topics to help companies be ready for the GDPR in 2018. This will be announced shortly. Now that we've issued some guidelines on certain topics, we'll be working on a new set of guidelines and this will be done over and over again until May 2018 when the GDPR is applicable.
Gary LaFever Gary LaFever (Anonos)
[54:13] Thank you, Gwendal. Please continue to submit questions through the sidebar on the webinar interface and also to We will commit to answer two more questions that have been submitted. But as we said, the rest of them will be answered within the next week. So, please do not stop submitting. So, this question goes as follows: “US law focuses on whether identifiers are directly linked to data subjects. The EU laws focus on whether identifiers are linkable to data subjects. The GDPR requires appropriate technical and organizational measures to safeguard the rights of data subjects. Does this mean that persistent identifiers are not permissible under the GDPR?” Gwendal, do you want to start with that one?
Gwendal Le Grand Gwendal Le Grand (CNIL)
[55:06] The GDPR is a framework to explain in what conditions you can process personal data. It's not a ban on the processing of personal data. It just says: “You can process personal data, but you can process personal data under certain conditions.” And these conditions are the privacy principles which are described in the relevant article of the GDPR, one of which is security that you mentioned before. So, Pseudonymisation, anonymisation and also other measures can be implemented. There's this principle of the risk based approach, so you need to understand what security safeguards you need to implement in the system based on the risks that your processing is facing, but it’s not preventing the processing of personal data per se. The only thing that it’s saying is that the GDPR does not apply to data that is made anonymous.
Gary LaFever Gary LaFever (Anonos)
[56:18] To follow up very quickly and then I’d love to get Mike's perspective. The problem with persistent identifiers is: “Who has access to those persistent identifiers? And how likely are they to be subject to linkage attacks of the mosaic effect?” And so, like Gwendal says, the GDPR is not intended to stop the processing of personal data, but rather you're supposed to put in place protective measures, both organizational and technical, to make it harder. So, Mike, do you have further clarification on that?
Mike Hintze Mike Hintze (Hintze Law)
[56:46] I guess the way I would respond to the question is that persistent identifiers are not barred under the GDPR. No type of data is barred. Any day they can be processed. But if you're using data that meets the definition of personal data, and it is not de-identified in any significant way, all of the requirements of GDPR are going to apply to you. If you use an intermediate level of de-identification, if you meet that level that's described under Article 11 that I talked about earlier, you get some relief if you use any method of de-identification that’s at least showing or at least partially showing that you have adopted the kinds of measures that are required under the GDPR. But it's not the only way that you can comply with the GDPR. And if you get to the very highest level of de-identification and it meets the anonymisation bar that Gwendal described then you get sort of complete relief from the GDPR. So, it's a spectrum. But there's nothing that's absolutely barred. It's just a matter of compliance obligation you provide to that data based on the nature of that data.
Gary LaFever Gary LaFever (Anonos)
[58:05] Gotcha. So, one last question quickly. We appreciate everyone staying with the webinar, and we'll go long enough to get this question fully answered. I'm going to group two questions here. The first one is: “Can you suggest how to get a budget for GDPR compliance in 2017 when senior management views GDPR as a 2018 issue?” And then the second one: “As chief privacy officer, my title has a “C” in it but that does not mean I have a key to the C suite. The magnitude of liabilities and obligations under the GDPR are way out of sync with budget and authority that I have in my position. How do I navigate the corporate labyrinth to make senior executives fully appreciate the magnitude of these issues?” Mike, you want to take a first shot at that?
Mike Hintze Mike Hintze (Hintze Law)
[58:55] Yeah, sure. I think it comes back to some of the things we talked about earlier in response to the first group of questions. And that is, you know, you need to make the case that the time is now to be focusing on the GDPR. Like I said before, the types of things that need to be put in place to show compliance with the GDPR are not things that you can just flip a switch or turn on a dime. They require investment now and over the next year. So, sort of laying that out showing the types of things that need to be done, the type of architectural changes that may be required, the type of process changes, the type of organizational and personnel and training things - these all take time and they require investments of time and money currently. And if you're waiting until 2018 to do that, there's just not going to be enough time to get it done.
Gary LaFever Gary LaFever (Anonos)
[1:00:00] Gwendal, do you have any particular insight?
Gwendal Le Grand Gwendal Le Grand (CNIL)
[1:00:02] Yeah, I can add to this as well. Of course, as I said before also, there are new rights for data subjects. There are new processes that need to be implemented in companies. And it just takes time to be prepared adequately so that companies are ready. And two important triggers that you can find in the regulation are Article 3 and Article 83. Article 3 is about the territorial scope. So, it says that the regulation applies to controllers and to processors regardless as to whether the processing is taking place in the Union or not. So, if you have an establishment, or if you're targeting basically users that are in the EU, then the GDPR will be applicable to you. So, that's the first thing. So, it means it's applying to many people that you are offering your services in Europe, you need to take into account GDPR.

[1:01:02] The second thing is Article 83. Article 83 is about the fines. So, I mean, everyone has heard about this. But if you go to your management and you say: “Well, the risk that we run as a company if we're not prepared to do GDPR is this amount of money.” I think this gives you a lot of leverage when you discuss with your management. So, it's not a very nice way to discuss with your management, but I guess it's very efficient.
Gary LaFever Gary LaFever (Anonos)
[1:01:36] Great. So, again, I appreciate everyone's questions. Please continue to submit through the webinar interface and also I think it’s evident just from what we were able to do during the live session we're clearly at a tipping point, and companies can really no longer do what they used to do and expect to comply with the GDPR. They have to look at what steps they're taking to protect the rights of data subjects based on the uses of the data that they're making. Again, I would encourage you to take a look at Mike's earlier de-identification whitepaper, as well as the one that he and I recently wrote on big data and controlled linkable data.

[1:02:20] Clearly, readily linkable data and linked data and persistent identifiers, the way they've been used in the past can no longer be used quite the same way. You have to have protective mechanisms in place and show that you're giving controls to the data subjects and you're respecting their rights and this requires new technical measures. Data protection by default did not exist prior to the GDPR. And as Gwendal just said, if you think about what's changing on May 25, 2018, it's really as much if not more about the fines as it is the requirements but with the magnitude of those fines and the potential penalties and the opportunity to embrace new technologies to improve business practices. Hopefully, this is truly a tipping point, which is not a negative. It's a positive.

[1:03:06] So, while things like persistent identifiers can't be used as readily as they have in the past, there are ways to continue business processes so that everybody can be successful. And we would like to think and I know the IAPP has this mindset that 2017 is a year about talking about solutions and approaches and working together to make things happen. We invite people to continue to submit questions and be active through the follow on to this live webinar. And you will receive by the end of today, a copy of the deck that was presented today. And within the week, you will get a copy of all the questions and answers as well. So, thank you very much. I want to thank you, at least on our part, and I'll turn it over to Dave.
Dave Cohen Dave Cohen (IAPP)
[1:03:47] Well, thank you very much, Gary. And let me echo from the IAPP our thanks to Anonos for sponsoring today's program and making all this great information available for free to our attendees. And of course to Mike and to Gwendal who were with us today. It's really a pleasure to have both of you on the panel and to work with you in preparing for this, and we very much appreciate your time, effort, and energies to help strengthen the privacy and security community that we're all building together here as we lead up to the compliance deadline for GDPR.