In the digital world, there is a continuous increase of information flow, including also high volumes of personal data, often presented as the ‘oil’ of the new digital economy. The protection of this data is a key requirement in order to build trust in the online ecosystem and support fundamental values, such as privacy, freedom of expression, equity and non-discrimination. At the same time, technological advances and innovative ways of analytics, accelerate the online processing of personal data in several unpreceded and unexpected ways, for example by enabling the correlation of different types of data that may link to the same individual. It is, therefore, essential for the entities processing personal data (data controllers) on the one hand to collect and further process only those data that are necessary for their purpose, and on the other to employ proper organisational and technical measures for the protection of these data. Pseudonymisation is one well-known practice that can contribute to this end.
Broadly speaking, pseudonymisation aims at protecting personal data by hiding the identity of individuals in a dataset, e.g. by replacing one or more personal data identifiers with the so-called pseudonyms (and appropriately protecting the link between the pseudonyms and the initial identifiers). An identifier is a specific piece of information, holding a privileged and close relationship with an individual, which allows for the identification, direct or indirect, of this individual 2. This process is not at all new in information systems design but gained special attention after the adoption of the General Data Protection Regulation (GDPR) 3, where pseudonymisation is explicitly referenced as a technique which can both promote data protection by design (article 25 GDPR), as well as security of personal data processing (article 32 GDPR).
Pseudonymisation can indeed greatly support data protection in different ways. It can hide the identity of the individuals in the context of a specific dataset, so that it is not trivially possible to connect the data with specific persons. It may also reduce the risk of linkage of personal data for a specific individual across different data processing domains. In this way, for example, in case of a personal data breach, pseudonymisation increases the level of difficulty for a third party (i.e. other than the data controller) to correlate the breached data with certain individuals without the use of additional information. This can be of utmost importance for both data controllers, as well as the individuals whose data are being processed (data subjects). Recognizing the aforementioned properties of pseudonymisation, GDPR provides a certain relaxation of the data protection rules if the data controllers have provably applied pseudonymisation techniques to the personal data.
Having said that, however, not all pseudonymisation techniques are equally effective and possible practices vary from simple scrambling of identifiers to sophisticated techniques based on advanced cryptographic mechanisms. Although many of these techniques would fall under the broad definition of pseudonymisation, they would not offer the same level of protection for the personal data. In fact, in certain cases, poor pseudonymisation techniques might even increase the risks for the rights and freedoms of data subjects, giving a false sense of protection. It is essential therefore, to explore the availability of existing solutions, together with their strengths, as well as their limitations.
The above discussion becomes even more demanding in the area of mobile applications (apps), where multiple identifiers of the mobile users are being processed by several distinct parties (e.g. app developers, app libraries, operating system –OS- providers, etc.), often without the users (data subjects) being aware of it. A recent ENISA study [ENISA, 2017] in the field highlighted the need for scalable methodologies and best practices on how to implement specific data protection measures by design in the mobile ecosystem.
Against this background and following previous ENISA work in the field [ENISA, 2014a], [ENISA, 2015], the Agency elaborated under its 2018 work-programme 4 on the concept and possible techniques of data pseudonymisation.
1.2 Scope and Objectives
The scope of this report is to explore the concept of pseudonymisation alongside different pseusonymisation techniques and their possible implementation. In particular, the report has the following objectives:
- Examine the notion of pseudonymisation and its data protection goals.
- Describe different techniques that could be employed for data pseudonymisation.
- Discuss possible pseudonymisation best practices particularly for the mobile app ecosystem.
The target audience consists of data controllers, producers of products, services and applications, Data Protection Authorities (DPAs), as well as any other party interested in the notion of data pseudonymisation.
It should be noted that this report does not aim to serve as a handbook on when and how to use specific pseudonymisation techniques, but rather to provide an overview on the concept and possible practices of data pseudonymisation. The discussion and examples presented in the report are only focused on technical solutions that could promote privacy and data protection; they should by no means be interpreted as a legal opinion on the relevant cases.