Gary LaFever | September 3, 2020

Must Data Minimisation Mean Data Deletion?

Data minimisation is a fundamental principle set out in the General Data Protection Regulation (GDPR) as well as in many other privacy laws around the world. However, the actual implementation of data minimisation by deleting data raises a question: is this the best way of accomplishing the goal? Data deletion may seem like a good solution at first, but it can have major impacts on data sets and the use of data that extend far beyond the individual records that are deleted.

A different approach should be considered: one that focuses on the critical distinction between data collection minimisation and data use minimisation. What if data doesn’t need to be deleted as quickly or at all, because its use is limited or restricted instead? This can remove many of the negative impacts of data deletion, while allowing the principle of data minimisation to be preserved and respected.

What Are the Impacts of Data Deletion?

There are numerous impacts upon society that come from data deletion, including slowdowns in health research, an increase in business compliance costs, and dampening of economic growth and innovation in general.

When data needs to be deleted it can cause major organisational disruption, as entire datasets may become inaccurate or useless for processing. This disruption can stand in the way of medical research and efficient and accurate delivery of other services, as well as product development. Entire datasets can become less accurate when an individual data subject’s data is deleted. Larger, more representative data sets are required to ensure that data is accurate, and data deletion should be minimised as much as possible so long as the fundamental rights of individuals can be assured.

Data deletion results in lost opportunities, missed innovation opportunities, and ultimately lower benefits for society. This is the case for health research, medical and pharmaceutical product development, AI and machine learning innovation, and so on. Even when an organisation wants to delete data, in some cases it is extremely difficult to track down all the people or organisations that have access to the data. Our Big Data world means that personal data is shared and used widely, and in many cases the cost of deleting the data can be high. This can create additional compliance burdens and potential liability issues that discourage organisations from developing products or services, simply because the risk is too great. When innovation is restricted by data sets constantly changing, or high costs in tracking down information required to delete data on individuals, the pace of development can be reduced. In a post-COVID 19 world, restrictions on economic growth and innovation can be harmful.

Nonetheless, there is no argument that none of this growth or innovation should be at the cost of an individuals’ fundamental privacy rights being harmed. So what’s the solution?

What are the Alternatives?

There are two ways of limiting the potential impacts on data subjects: limiting data collection and possession (by deleting data immediately or not collecting data in the first place), and limiting data use (by allowing data to remain with the data controller, but restricting its use to authorised processing only).

When data collection is not the focus, but data use is, the huge benefits of privacy-respectful data processing can be retained without negatively impacting the data subject. If data is used in ways that are highly protected, particularly with the application of new technologies such as Pseudonymisation (newly defined in the GDPR as whole-dataset de-identification), and CCPA advanced de-identification, data can be kept and used without harm to the individual’s right to privacy.

This shift towards the idea of data use restriction is a critical one. We are already living in a world in which data collection is ubiquitous. By focusing solely on data collection restrictions we may be fighting a battle that is already lost, and missing out on opportunities to provide strong and timely protections for data subjects in terms of how their data is protected and used post-collection. We need to limit data in use, with more than just policy applications, by enforcing technical controls like GDPR-compliant Pseudonymisation, an approach to privacy that understands the data-rich and Big Data world in which we live.[1]

[1] The GDPR is expressly technology neutral and does not recommend specific techniques (GDPR Recital 15 states that “In order to prevent creating a serious risk of circumvention, the protection of natural persons should be technologically neutral and should not depend on the techniques used), however, a review of resources published after the effective date of the GDPR reveals that the term Pseudonymisation no longer refers to the privacy enhancing technique (PET) of replacing direct identifiers with pseudonymous tokens as it did prior to the GDPR. To satisfy new GDPR definitional requirements under Article 4(5), Pseudonymised data - inclusive of both direct and indirect identifiers, and potentially attributes as well - must now make it impossible to attribute data “to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.” It is critical to realize that this means Pseudonymisation is now an outcome describing a data set as a whole, it is no longer a technique applied to individual fields in a data set.

Examples of resources on Pseudonymisation published after the effective date of the GDPR include: (a) Recommendations on Shaping Technology According to GDPR Provisions: An Overview on Data Pseudonymisation published by The European Union Agency for Cybersecurity (ENISA) in November 2018; (b)Pseudonymisation Techniques and Best Practices: Recommendations on Shaping Technology According to Data Protection and Privacy Provisions published by ENISA in November 2019; and (c) Draft for a Code of Conduct on the use of GDPR Compliant Pseudonymisation by the German Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2019.

This article originally appeared in LinkedIn. All trademarks are the property of their respective owners. All rights reserved by the respective owners.