Keynote speech on Pseudonymisation at PrivSec London 2020

Presentation Summary
Magali Feys, Chief Strategist - Ethical Data Use @ Anonos:
Thank you so much. Today, I want to talk to you about Pseudonymisation and why it is really unique under GDPR and I definitely want to stress the point because whenever, as a lawyer, I implemented GDPR within the companies. If I hear a lot of GDPR consultants and experts, I really hear them talking about the efforts to be transparent, privacy policy, what it should look like, and these kind of things which of course, are necessary under GDPR. But I often feel that they are forgetting a bigger part and maybe as a lawyer, it’s a technical part and they’re not always that familiar with it. But it’s also very important to see how you can actually also implement the technical safeguards and also make sure that you have Data Protection by Design and by Default. That’s why I will talk about Pseudonymisation today.
First of all, I would love to talk about encryption and Pseudonymisation because we also see when people use it, most of the time, they use encryption for Pseudonymisation or the other way around. And there really a difference between both terms! There a difference from a legal perspective in the GDPR, for example. Then secondly, there is a difference in the way we use it and what the concepts really mean when also applied in practice.

If we look at the GDPR, data protection encompasses both data security and privacy. But very important, data protection, according to the GDPR, is not an absolute right. A lot of people sometimes forget about that.

The GDPR acknowledged that data protection is not only about data security and privacy, but it's also about maximizing the ethical and lawful value of data.
GDPR, states in one of its first recitals, the processing of personal data should be designed to serve mankind. The right of protection of personal data is not an absolute right and it must be considered in relation to its function to society.

Let's take an example, the right to be forgotten. If you went to say that the right to be forgotten would be an absolute right for the data subject, that would actually mean in practice that tomorrow, if you for example have a day off, you could go to the shops, max out your credit card and then call the credit card company and say, “Can you forget about me? Can you forget what I did the day before?” That would be too easy. So you see, it is not an absolute right and it's written all in there.
Encryption. I don't think I have to teach this audience what encryption is, but let's take the concept and let's take you through the history. Encryption already existed for more than 4,000 years. We saw it with the Egyptians where they use it to preserve the secrecy of religious rituals from outsiders.

The Greeks are credited for actually using encryption or cryptography for the first time in the Iliad, in literature, in the Iliad by Homer.

It's actually Bellerophon who was a messenger who held an encrypted message. He cannot read it and actually, the message itself contains, without him knowing it, his own death sentence. Because he delivered it to the King, the King read it and he sent him on impossible quests, sentencing him to death. Julius Caesar also used encryption for military purposes.
It's been used for 4,000 years, but when and how is it used? It's only used to protect data when at rest or in transit. The Egyptians who really protected their religious rituals is actually really to protect its data, those rituals when at rest.

The Bellerophon myth was really to protect data when in transit. But we see that actually, encryption was never used to really protect data when in use, because whenever it's protected, the message is being transferred in the hands of the messenger, from somebody like the messenger himself who doesn't have an authorized use to read it. But what the King can do with it, that was never protected by encryption. So encryption in itself is actually not designed and is not capable of protecting data when in use.
Now, let’s compare encryption and Pseudonymisation. Is it the same? First of all, encryption is there for merely 4,000 years. What if I tell you that Pseudonymisation, as recently defined in GDPR, is only there for the last four years? You are all going to probably say that Pseudonymisation is already there much longer, right? Is it not used as pseudonyms? We all know pseudonyms.

For example, Emily Bronte used the pseudonym, Ellis Bell because she was afraid that the fact that a woman had written Wuthering Heights would actually have the risk of a bias against the book not being read because it was written by a man.

So therefore, for example, the pseudonym itself, Pseudonymisation can be used to reduce the risk of bias in itself.

Now, newly-defined Pseudonymisation, newly-defined under GDPR and we will see that it’s not only there to protect data and data security and privacy, but also to maximize the utility and the value of data.

First from a legal point of view, encryption versus Pseudonymisation. It is not the same. Why is that? Because for example, in Article 6(4)(e) 32, we see that the two terms encryption and Pseudonymisation are used next to each other.

I will say as a legislator and legal people, when drafting legal texts, they're not poetic about it. So if you use two terms, that means that they are not synonym and that both of them have a whole different meaning as such. So the fact that in both articles, encryption as well as Pseudonymisation is mentioned, means that both terms must have, from a legal point of view, a different meaning.

If we look at encryption, only mentioned four times in GDPR. It is all used to secure what we all know as data in rest or in transit but not in use. When we look at Pseudonymisation, Pseudonymisation is mentioned 15 times throughout GDPR. Not only to secure data when in transit or at rest but also to secure data when in use.
You will see, when you really go through the text of the GDPR, Pseudonymisation as such is not merely seen as a technique or like using a token, like two-factor authentication because as you well know, those examples of current techniques used to protect data are not mentioned in GDPR because the legislator did not want that the GDPR after two years would already be outdated by mentioning privacy enhancement techniques in there.

So that's why actually, the techniques are not in there. So Pseudonymisation mentioned 15 times must be more than just a technique like encryption, two-factor authentication, tokens and other privacy enhancement techniques.

So very important and what we also see when we go to the GDPR, we see that it is mentioned to have a code of conduct on Pseudonymisation. Pseudonymisation is also mentioned as an ideal way to prove and to implement Data Protection by Design and by Default.

It's also said that Pseudonymisation can reduce the risk for the data subjects and at the same time, make sure that data controllers and data processors could be more compliant with their data protection obligations.
If we look at the Article 29 Working Party opinions, it said that it could also be used when applying Legitimate Interest because it will tip the balance in favor of the controller when evaluating the steps taken to actually minimize the impact on data subjects, balancing it against the other requirements and the value of data for the controller.

So it is the Article 29 Working Party who really said that Pseudonymisation as a technical safeguard that it could really be used as such in order to make sure that Legitimate Interest could be applied in a GDPR-compliant way.
What is then so different about this 4-year-old definition of Pseudonymisation in GDPR versus what we are used to know as such? And can we use it to enable greater data value and utility?
Pseudonymisation before GDPR or as it is known much of the times, it's mostly based on the fact that you use static de-identifiers. Using static de-identifiers makes sure, with the technology that exists today, makes it very easy to de-identify or re-identify.

For example, if we take three situations. In the AdTech, a lot of players will not even use Pseudonymisation or encryption. They would just use your entire profile.  For example, you’re John McKee, living in an apartment in London, you have a cat, you bike to work, stop in the morning before you go to work at a coffee shop for a double latte.

Then you might know that really in AdTech, that entire digital profile made of us is actually sold and used to enable the real-time bidding. So you see, that's really not applying data minimization, because for example, why does a coffee shop owner or a coffee brand needs to know that you own a cat in order to advertise for their coffee brand. 

But that's the way most of the times, AdTech works. Now, if you use static identifier or de-identifier, then you could say, let's replace John McKee by, for example, ABCD and then you would have, ABCD lives in London in an apartment with a cat. ABCD drives to work on his bicycle and stops for a coffee. ABCD likes this type of coffee.

If you use static de-identifiers, then you enable indeed the fact that you cannot be identified easily with direct identifiers like your name, for example. But we can all agree that using all that information together and also using indirect de-identifiers, you could easily, if you have everywhere, ABCD lives in an apartment with a cat, ABCD this, ABCD that. If you're actually combining the different data sets, it is very easy, definitely under the techniques of today to re-identify John McKee using direct and even indirect identifiers.

And that is also what we saw in the Mosaic Effect Study of Harvard. So Pseudonymisation, I hear a lot of times people say, Pseudonymisation does not work because for example, you have the Harvard Study and there, they used static tokens to replace the personal data of the data subject. But based on three indirect de-identifiers or identifiers, your zip code, your birth date and your gender, they were able to relink it to 87% of the data subjects.

So you see, if you use static de-identifiers, static tokens, that by using indirect identifiers, you could re-link it and attribute the personal data back to the data subject as such.

For example, the Belgium Social Security number is made of your birth date and then it seems like a complete random number, but apparently, it is not that random. It is the notification that your parents made at the office and whether it's an even number or an uneven number, depends on the gender. Even are the girls. Uneven are the boys.

It was said at a certain point, if you hash and use an encryption key, hashing, then it would become anonymous data and you would not be able to relink it again, attribute that data to the data subject.

Studies found out that that is not the case and that for example, you really also need to apply the salt, which then, if you hash it and then encrypt the data, it needs a more random encryption in order to make it unlogical for other people to authorize or have access to the data or relink it to the data subject.

Just saying that Pseudonymisation in itself does not work, is not a statement you can make without knowing really the technology behind it and knowing what have they done? Have they used static tokens? What type of encryption did you use?

So just saying Pseudonymisation does not work, can’t be said anymore under GDPR.
Now, what does the GDPR say? What is the definition of Pseudonymisation? Because we now know, it’s different from encryption.

Encryption can be a technique to apply or one of the techniques to apply to Pseudonymisation which is a technical safeguard under GDPR because it says, if you must make sure that personal data can no longer be attributed to a specific data subject without the use of additional information and that additional information, you have to keep very secure and using proportionate and appropriate technical and organizational measures to make sure that that additional data cannot be attributed no longer to a specific data subject.

Do you remember what I said about direct identifiers and indirect identifiers? It is not enough to make sure that just by replacing a name, a social security number that you say, “Okay. I’ve protected and I’ve pseudonymised this data from direct identifiers.” That you would comply with the new, high standard under GDPR because it is said under GDPR that it’s no longer attributable also when using indirect identifiers.

Remember the Mosaic Effect example from Harvard. So even if you only have a gender, a zip code and a birth date, you must not be able to attribute it without the use of that additional information. And that additional information is only for the persons within a company, for example, who have access and can relink it.

Now, the fact also that under Pseudonymisation, you can still relink it whilst having the authorization within a company. Make sure that you also complied with, for example, the principles of accountability. You can show demonstrable accountability because if a data subject will come and say, “What have you done with my data?”

Let’s say that you have anonymised it, once anonymised, you are not able to do anything anymore and so, you cannot enlighten and respond to the request of the data subject because you have no trace anymore with the people having the necessary rights and then you have to do access management of course.

But if you have that, then those people can say, “Look, we used that data. We have pseudonymised it in accordance with GDPR.” And you can actually make sure to the data subject that you can prove demonstrable accountability towards the data subject.

In addition, the data subjects’ rights are much more protected because you complied with data minimisation by actually making sure that you've pseudonymised and you do it in a way that you don't use only static de-identifiers.
It makes sure that it is data minimisation because let's go back to our example of John McKee. In the first example with AdTech, that's not data minimisation. You just throw away all the data, even if it's usable or not.

In the other example, yes, it was already a little bit of data minimisation, but the fact that by combining it together with other data sets, you could relink it, you did not show and did not prove data minimisation in a whole.

Using Pseudonymisation under GDPR, you can make sure that the data is not attributable anymore to the data subject behind it. You can make sure that you complied with data minimisation and have a technique, a safeguard that fulfills and complies with Data Protection by Design and by Default.

We see also that Pseudonymisation after GDPR, how we have to do it. Well, luckily, we're not left in the dark completely. We have ENISA who has the Pseudonymisation techniques and best practices. They now have also a report that was issued in November 2019 and we have the draft for a code of conduct on the use of GDPR-compliant Pseudonymisation by the working group in Germany.
If we look at what ENISA highlights, the following benefits of Pseudonymisation, GDPR-compliant Pseudonymisation, because if I talk now about Pseudonymisation, it is the GDPR-compliant Pseudonymisation.

It says that Pseudonymisation supports a more favorable, broader interpretation of data minimization. It goes beyond protecting the real-world personal identities by also protecting them, not only from direct identifiers, but also from indirect identifiers.

Also, the fact that you only use GDPR-complaint Pseudonymisation, will make sure that although you protect on the one side, the privacy of the data subject that you have a measure that is also secure, but that you still maximise utility and the value of your data because you can really decide what to show from the pseudonymised data.

For example, if a company, if you want to use it for secondary purposes and you’re going to say, “We only need to find whatever his name was, John McKee that lives in an apartment and has a cat. We can actually give that data and it's really protected under GDPR-compliant Pseudonymisation.”
In summary, it's very important to note that Pseudonymisation is not just a technique used under GDPR, but that it’s …
..Very important because ENISA notes that Pseudonymisation, in order to be GDPR-compliant Pseudonymisation, that it has many benefits as showed here. But that it requires a significant expertise and caution because it's a very complex process.

Indeed, you must uphold the threshold that the personal data is no longer attributable to the data subject. And that's not only when using direct identifiers but also indirect identifiers.

A very complex way in order to make sure that you have the technical controls in place to comply with that new technical safeguard under GDPR.

You can use different privacy enhancement techniques, but you will need more than just one, I believe, in order to comply with GDPR Pseudonymisation.
Let's summarize. What are the benefits?

It’s a newly-defined Pseudonymisation which is a technical safeguard to improve privacy on the one hand and security on the other hand. It also balances against those two principles, but also the fact that you can maximise in an ethical and lawful way, still has to comply with the other principles under GDPR because I don't believe in one solution or one thing you can do to become GDPR-compliant.

GDPR, I always say it's a sort of 10-step plan. When we implement it with clients, you really have the different bits and pieces that you all have to make sure that connect together. But if you use GDPR-compliant Pseudonymisation knowing that you already then encompass data minimisation, that you can prove accountability, demonstrable accountability, that you can also do in a lawful way secondary purposes of processing, that really enables you to really check a number of boxes of those different steps you have to take to become GDPR-compliant or in a process to become GDPR-compliant.

Newly-defined Pseudonymisation under GDPR is the new state-of-the-art process that improves privacy, security and also the value of data.
 
CLICK TO VIEW CURRENT NEW 



Are you facing any of these 4 problems with data?

You need a solution that removes the impediments to achieving speed to insight, lawfully & ethically

Roadblocks
to Insight
Are you unable to get desired business outcomes from your data within critical time frames? 53% of CDOs cannot achieve their desired uses of data. Are you one of them?
Lack of
Access
Do you have trouble getting access to the third-party data that you need to maximise the value of your data assets? Are third-parties and partners you work with worried about liability, or disruption of their operations?
Inability to
Process
Are you unable to process data due to limitations imposed by internal or external parties? Do they have concerns about your ability to control data use, sharing or combining?
Unlawful
Activity
Are you unable to defend the lawfulness of your current data processing activities, or data processing you have done in the past?
THE PROBLEM
Traditional privacy technologies focus on protecting data by putting it in “cages,” “containers,” or limiting use to centralised processing only. This limitation is done without considering the context of what the desired data use will be, including decentralised data sharing and combining. These approaches are based on decades-old, limited-use perspectives on data protection that severely minimise the kinds of data uses that remain available after controls have been applied. On the other hand, many other new data-use technologies focus on delivering desired business outcomes without considering that roadblocks may exist, such as those noted in the four problems above.
THE SOLUTION
Anonos technology allows data to be accessed and processed in line with desired business outcomes (including sharing and combining data) with full awareness of, and the ability to remove, potential roadblocks.