The General Data Protection Regulation (GDPR) is a transformative shift in privacy. In many respects, it signals a move away from a policy-based data governance approach to a technology-based approach that can enforce data protection policies for personal data. How can we achieve this and what’s the solution for managing compliance?
Traditional privacy programs rely on written rules that are incapable of preventing unauthorized data use before it occurs. But as the GDPR significantly expands the rights of data subjects, it requires organizations to implement technologies and solutions capable of enforcing policies by leveraging technology that can prevent misuse before it can transpire for certain data use cases. In some circumstances, the regulation may require pseudonymisation1 to defeat unauthorized data linkages and data protection by default2 to protect data on a per use basis by limiting access to authorized data.How Will the GDPR Affect You?
- Broad Application: The GDPR is the biggest regulatory change in data protection in several decades, and it applies to almost all organizations operating internationally – no physical presence or EU sourced revenues are required – all that is required is the processing of a single data record of a data subject residing in the Union, regardless of where an organization is located.
- Substantial Risks for Non-Compliance: Failure to comply with the GDPR exposes organizations to significant liability and exposure including fines of up to 20 Million Euros or 4% of global gross revenues, class action lawsuits, joint and several liability among data controllers/processors, and adverse public perceptions.
- Cannot Use Existing Legal Bases: In many instances, the GDPR prohibits organizations from performing data processing activities that they have relied upon for years – including personalization, analytics, machine learning, and sharing data with third parties. To lawfully continue such processing, alternate legal bases may be required necessitating new technical capabilities not supported by security and privacy technologies developed prior to the regulation.
- Cannot Use Existing Consent Frameworks: Data uses made possible by the advanced state of technology (e.g., personalization, customization, analytics, artificial intelligence, and machine learning) often render consent as a legal basis impractical since new uses and opportunities do not arise until more in-depth analysis is completed.3 In many instances, consent cannot encompass the iterative nature of these digital advances.
- Lost Insight and Intelligence: Many organizations will miss out on insights made possible by advanced technology if they rely on complying with GDPR requirements using consent alone.
- The state of the art in data protection4 – Controlled Linkable Data5 – has advanced to where it enables organizations to accomplish desired data processing objectives in compliance with the GDPR to unlock data.
- This new state of the art – Controlled Linkable Data – enables the “dialing-up” or “dialing-down” of the linkability (identifiability) of data to support legal data uses in compliance with the GDPR.
- The Controlled Linkable Data solution extends beyond GDPR compliance to enable controls necessary for secondary uses of data underlying the new global digital economy.
Organizations can be described as processing data within four categories of use. The GDPR imposes new requirements, forcing organizations to reconsider their tools, solutions and approach for data processing. If they want to process data in categories 2 or 3 – which are at the core of the new digital economy – they must control the linkability of data to comply with GDPR requirements. Importantly, the implications of controlling linkable data crosses sectoral boundaries, as companies capitalizing on the competitive edge of analytics will undoubtedly process data within these categories of data use – this includes organizations in advertising, research, finance, retail, and more.
The new “state of the art” GDPR solution – Controlled Linkable Data – helps to defeat unauthorized re-linking of data and protect data on a per use basis so that only the minimum authorized data necessary is used. This is critical for lawful personalization, customization, analytics, artificial intelligence, machine learning, and sharing of data with third parties that is no longer lawfully supported by previously established consent and contractual frameworks under the GDPR.
As Wiewiorowski underscored, “The road code is created in order to facilitate the way that we transport things and transport people. But, of course, it somehow limits the ways that we try to invent solutions. This is the kind of price that we pay for a civilized way for the flow of personal data in the world. So that goal is as important as the protection of data itself.”
Category 1 – Consent/Contract Use Based(Linked/Readily Linkable Data)
As a result, there are plenty of traffic regulations to govern the roads and highways, and there are plenty of auto regulations to govern the production of vehicles capable of using those roads and highways. While there is certainly some risk involved (i.e. a highway mishap between two vehicles), nobody is calling for the abandonment of the highway system and the end of the production of autos.
This category involves using personal data (i) within the scope of consent from data subjects expressly limited to what is specifically and unambiguously described at the time of consent and (ii) necessary for the performance of contract. This includes personal data that is directly attributed to a data subject (“Linked Data”) and data that is easily linked to a data subject (“Readily Linkable Data”). Additional information: Linked Data, Readily Linkable Data and Controlled Linkable Data.
Category 2 – Internal Use – Not Authorized by Consent or Not Necessary for Contract(Controlled Linkable Data)
Companies traditionally applied broad interpretations of consent and contractual duties, but this is no longer lawful under new GDPR restrictions (and trying to satisfy GDPR changes by implementing additional consent is repetitive and impractical). This category of data use involves continued processing of data for secondary purposes by the original data controller. When a data controller uses data collected for primary purposes for any other reason not within the original scope of consent or contractual purpose, it is considered a secondary purpose and requires a separate legal basis to be lawful use.
Controlled Linkable Data enables secondary data uses by satisfying alternate legal basis requirements under the GDPR by enforcing dynamic pseudonymisation and data protection by default to control the linkability of data.
Category 3 – External Use – Sharing, Analytics, AI(Controlled Linkable Data)
This category involves sharing of data: (i) for primary use purposes with co-data controllers/data processors not in a position to seamlessly enforce security, processing and contractual requirements – including expanded data subject rights; and (ii) for secondary use purposes such as analytics and AI. For instance, this will affect any company outsourcing marketing and customer analytics to a third party, and will render them liable and subject to penalties for GDPR non-compliance. Former privacy and security technologies used for data sharing no longer satisfy GDPR requirements for lawful use.
Controlled Linkable Data supports GDPR requirements for lawful and safe data sharing through technologically enforced dynamic pseudonymisation and data protection by default by controlling the linkability of data.
Category 4 – Generalized Statistics(Unlinkable Data)
Privacy technologies developed prior to the GDPR were designed to protect predetermined isolated data sets and support generalized statistics, but in this changing regulatory landscape, they fail to comply with new GDPR standards for modern digital processing.
Combining and analyzing multiple data sets, inserting unstructured data and adding Linkable Data into data sets is at the core of data processing in the new digital economy, allowing companies to personalize and customize offerings, perform analytics, artificial intelligence, and machine learning, and share data with third parties. Legacy privacy technologies and generalized statistics technologies that have been used in the past and claim to support combining or re-linking protected data sets and data sources for secondary purposes are not designed to support these uses in a GDPR compliant lawful manner. This requires GDPR compliant pseudonymisation and fine-grain control over data on a per use basis which technologies developed prior to the GDPR were not architected to support.
Organizations across the world are looking for a way to continue using their data when the GDPR comes into full force. The Controlled Linkable Data solution extends beyond GDPR compliance to enable controls necessary for secondary uses of data underlying the new global digital economy.
1New pseudonymisation requirements are set forth in GDPR Recitals 26, 28, 29, 75, 78, 85, 156 and Articles 4, 25, 32 and 89. If a vendor claims to “pseudonymise” data to comply with the GDPR, it is important to verify whether they use static pseudonymous tokens or dynamically changing pseudonymous tokens. In the author’s view, dynamically changing pseudonymous tokens can satisfy state of the art GDPR requirements that the information value of data be separated from the ability to attribute data back to individuals via the “Mosaic Effect.” GDPR Article 4(5) defines GDPR-compliant pseudonymisation as “the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.” Traditional approaches to pseudonymisation use a persistent, or static, pseudonymous token to replace each data element. Using a simplistic example, the zip code value of 20500 in a database would be replaced with a static pseudonym (or token value) of 6%3a8, and this same pseudonym would be used to replace each occurrence of zip code 20500. Due to advances in technology and threat-actor sophistication, persistent (static) pseudonyms can be readily linked back to individuals via the “Mosaic Effect” in violation of stated restrictions in Article 4(5) without requiring access to keys to reveal the value of persistent (static) pseudonyms. Thus, in the author’s opinion, persistent (static) pseudonyms fail to comply with new GDPR requirements to separate data from the means of attributing information back to individuals. In contrast, dynamically changing pseudonymous tokens separate the information value of personal data from the means of attributing the data back to individual data subjects. An example of the “Mosaic Effect” is available at http://dataprivacylab.org/projects/identifiability/paper1.pdf where it is explained that if three seemingly “anonymous” data sets using persistent (static) pseudonyms are combined – one each comprised of zip code, age and gender of US citizens, up to 87% of the U.S. population can be identified by name.
2Data Protection by Default is required under GDPR Recitals 78 and 108 and Articles 25 and 47. Article 25 obligates data controllers to “implement appropriate technical and organisational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed. That obligation applies to the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility. In particular, such measures shall ensure that by default personal data are not made accessible without the individual’s intervention to an indefinite number of natural persons.” Thus, in the author’s opinion, Data Protection by Default requires real-time, use case specific, fine grain control over use of personal data. Be wary of vendors who highlight adherence to “Privacy by Design” principles but do not similarly state that they comply with “Data Protection by Default” requirements. They are not one in the same – the GDPR mandates the strictest implementation of Privacy by Design, which is Data Protection by Default.
3While “consent” under GDPR Article 6(1)(a) remains a lawful basis for processing personal data, the definition of consent has been significantly restricted. GDPR Recital 32 and Article 4(11) mandate that consent must be “freely given, specific, informed and an unambiguous indication of the data subject’s agreement to the processing of personal data relating to him or her.” These heightened requirements for consent under the GDPR shift the risk from individual data subjects to data controllers and processors. Prior to the GDPR, risks associated with not fully comprehending broad grants of consent were borne by individual data subjects. Under the GDPR, broad consent no longer provides sufficient legal basis for processing personal data.
4GDPR Recitals 78 and 83 and Articles 25 and 32 require deployment to the fullest extent possible of the state of the art in data protection processing controls and security technologies.
5Controlled Linkable Data was presented at an International Association of Privacy Professionals (IAPP) program entitled General Data Protection Regulation (GDPR) Big Data Analytics featuring Gwendal Le Grand, Director of Technology and Innovation at the French Data Protection Authority – the CNIL, Mike Hintze, Partner at Hintze Law and former Chief Privacy Counsel and Assistant General Counsel at Microsoft, and Gary LaFever, CEO at Anonos and former Partner at Hogan Lovells (see https://www.anonos.com/gdpr-big-data-iapp-industry-faqs) and explained in a White Paper co-authored by Messrs. Hintze and LaFever entitled Meeting Upcoming GDPR Requirements While Maximizing the Full Value of Data Analytics.
This article originally appeared in CPO Magazine. All trademarks are the property of their respective owners. All rights reserved by the respective owners.
Pre-GDPR Pseudonymization versus GDPR Compliant Pseudonymization
How GDPR compliant pseudonymization requirements have evolved from prior standards: