Refining Data Uses by Technologically Enforcing Data Protection Policies
A recent article in CPO Magazine entitled Challenges of Big Data Privacy for Today’s Innovative Enterprises notes that business analysts and venture capitalists refer to data as “the new oil” of the digital economy. The article concludes with the following paragraph:
Going forward, the challenge will be to protect user data and privacy, while at the same time, using all that data as part of innovative new business models. With the right data, innovative enterprises can create hyper-targeted advertising campaigns, improve the efficiency of just about any business process, and ensure that decision-makers have as much data at their fingertips as possible to make the best possible decision. The trick, though, will be to protect customer data in a way that won’t stop “the new oil” from coursing the enterprise on a real-time basis.
Further underscoring the importance of data is the recent Economist article entitled The World’s Most Valuable Resource Is No Longer Oil, But Data which highlights that the five most valuable listed firms in the world – Alphabet (Google’s parent company), Amazon, Apple, Facebook and Microsoft— “deal in data, the oil of the digital era.” Data used by these, and other, companies is generally collected for primary purposes (original uses) rather than secondary data uses. However, it is seconday data uses like business (non-scientific) re-processing of personal data for analytics, artificial intelligence and machine learning (what we’ll refer to as “Data Analytics”) that often generate the highest returns.
The value of Data Analytics is created by “refineries” that enable collection, transformation, combination, integration, sharing, and use of data to create new products and services. These “refineries” power the digital revolution. Just as oil refineries must process crude oil differently for different applications, these data “refineries” must process raw data differently for different use cases. However, all companies risk loss of access to, and severely restricted use of the “new oil” that data represents due to upcoming dramatic changes in global data protection obligations under the new EU General Data Protection Regulation (GDPR) that goes into effect next year.
The article entitled GDPR Requires Controlled Linkable Data to Comply With State of the Art and Proportionality Requirements (http://www.lexology.com/library/detail.aspx?g=f1177675-4fc0-478b-9ae0-5c1f540e91e7 ) highlights how the new state of the art in data protection – Controlled Linkable Data – can support functioning business models and technological practices capable of working harmoniously in legally-compliant ways. To continue doing Data Analytics under the GDPR, companies must evaluate the capability of Controlled Linkable Data to enable them to fully realise the immense potential of Data Analytics by technologically enforcing data protection policies to permit context-based data stewardship obligations for internal data use and external data sharing.
The good news is that two unrelated groups, each working independently on opposite sides of the Atlantic Ocean - have come up with complementary approaches to Controlled Linkable Data to resolve this Data Analytics privacy-utility dilemma to embrace the changes brought by the GDPR. The two groups became aware of one another when articles each published on the Social Science Resource Network (SSRN) were ranked as the #2 and #3 most popular downloaded SSRN papers the same week, and discovered they shared common objectives and approaches. Both groups recognise that the immense potential of Data Analytics cannot be realized unless smart measures are put into place for data handling and repurposing that allow maximum value extraction. Yet, these measures must also encompass legally-adequate security and “Data Protection by Design and Default” - a principle that has only been introduced and required recently under the GDPR. Both groups conclude that the best “answer” for this new era of analytics is a dynamic one, which encompasses data selectiveness and controlled linkability – Controlled Linkable Data – depending on the context (in other words, ensuring restricted secure access to only the information that is actually needed for particular Data Analytics).
The need for this new approach is not just a “research” pipedream by both groups. With less than a year to go until the GDPR comes into effect, it is a call for companies to take action to find smart solutions, or otherwise risk being faced with severe consequences. Many companies believe mistakenly that “complying” with the GDPR means “business as usual” when it comes to Data Analytics and other data processing operations. However, a stark warning was issued recently by Helen Dixon (the Irish Data Protection Commissioner) that fines, liabilities, and negative PR await companies that ignore the call to action ushered in by the GDPR (http://www.independent.ie/datasec/new-data-rules-mean-it-cant-be-business-as-usual-helen-dixon-35585883.html). In other words, companies doing “business as usual” when it comes to Data Analytics after 25 May 2018 expose themselves, their corporate customers, and other technology partners to penalties as high as 4% of global gross revenues, group legal actions, negative publicity, and possible additional liabilities under the GDPR. Moreover, even if Data Analytics occurs outside of the EU or it does not generate revenues, the GDPR can still apply. This is a truly global challenge.
The incoming changes require new technologies and organisational practices combined with a different type of corporate ethos by those desiring to preform Data Analytics under the GDPR. Companies must consider whether any secondary use of data complies with the legal principle of ‘purpose limitation’ preventing arbitrary reuse, such that data initially used in one context may be considered adequate, relevant, and proportionate to be reused in another context. However, while consent remains a lawful basis under the GDPR to justify personal data processing, the definition of consent is significantly restricted. Under the GDPR, consent means a “freely given, specific, informed and unambiguous indication of the data subject’s wishes” signifying agreement to the processing of personal data relating to him or her. These requirements are unlikely to be satisfied if there is ambiguity and uncertainty around secondary processing purposes, as is often the case with Data Analytics where pre-obtained data sets are repurposed to unlock yet unrevealed value.
In other words, organisations engaged in Data Analytics must evaluate whether Contolled Linkable Data can enable them to rely on alternate legal bases to justify personal data processing where obtaining consent would secondary processing would not be possible if they want to carry on their existing business practices next year.
Of course, a company can choose to comply with the GDPR by abstaining from Data Analytics using EU personal data. While this may be the easiest way to “comply” with the GDPR when it comes to Data Analytics, if a company desires to maintain competitive benefits from Data Analytics, smart solutions like Controlled Linkable Data provide an alternative approach. In particular, Controlled Inkable Data involves consideration of alternative (non-consent) legal bases for Data Analytics under the GDPR, in particular the “legitimate interest” legal basis, as well as personal data “anonymisation” to satisfy new data protection compliance requirements. However, both oblige companies to address steps that may be new to them to protect the rights of data subjects based on the uses of Data Analytics they are making. In particular, the application of anonymisation techniques are rarely sufficient by themselves to guarantee that data protection law no longer applies. Comparisons can be drawn with satisfying the conditions for obtaining a legal basis by which personal data may be justified under the GDPR based on the “legitimate interests” of the data controller. In both cases, businesses must take other appropriate safeguards to ensure that the fundamental rights of data subject are upheld in those situations where the law regards that they take primacy over the data controller’s interests.
To these ends, both groups have reached similar conclusions on the types of steps that must be taken to achieve the new state of the art in data protection – Controlled Linkable Data. Mses. Stalla-Bourdillon and Knight conclude in their paper as follows:
- Anonymous Data – A dynamic and contextualised approach to anonymising Data Analytics is compatible with the new data protection regime embodied in the GDPR. Excluding anonymised Data Analytics from the scope of GDPR jurisdiction is less problematic than anticipated because the line between “anonymous” data (not subject to GDPR restrictions) and “personal data” (subject to GDPR restrictions) remains a fluid line – since data that has been subject to anonymisation techniques can become personal data depending on facts and circumstances of data use. There are three criteria for assessing the efficacy of anonymisation techniques – the ability to use purportedly “anonymous” Data Analytics to (1) single out, (2) link to, or (3) infer new information about the data subject, where these methods are reasonably likely to be used to identify them. Where the full mitigation of these three types of re-identification risk is impossible, in particular through anonymisation techniques only, it does not mean that legally-effective anonymisation cannot be achieved; instead, a data controller must conduct a risk analysis to verify that the risk of re-identification is sufficiently low and additional safeguards and techniques may be required. Importantly, the interplay between different components of data environments (the data, the infrastructure, and the agents) must be assessed to control risks associated with the linkability of individualised data records across multiple datasets when applying anonymisation techniques to Data Analytics.
- Pseudonymous Data – The interplay between the different components of data environments must also be assessed to control risks associated with the linkability of individualised data records within datasets when pseudonymising Data Analytics. 
- De-Identified Data – The GDPR accommodates a dual characterisation of Data Analytics depending upon the perspective of the organisation that holds a dataset, i.e. those of the initial data controller in its raw form, or those of a subsequent recipient in a transformed state, depending upon whether the initial data controller has put in place technical and organisational measures for the seclusion of the initial raw dataset transformed into a protected dataset, and whether the subsequent recipient has access to other datasets.
BigPrivacy dynamic de-identification and anonymisation technology developed by Anonos Inc., Messrs. LaFever and Myerson’s company, provides technology that supports Controlled Linkable Data by supporting legitimate interest as a legal basis for Data Analytics. Anonos BigPrivacy leverages patented dynamically-changing pseudonyms to decouple data from identifying elements that can otherwise be used to attribute or link data back to individuals. BigPrivacy enables technical enforcement of granular, context-sensitive control over data so that only that data necessary at any given time (and only as required) to support each authorised use is made available using keys. In technical and organisational terms, this means that measures may be automatically assured which provide data controllers and/or processors with control over access to and use of keys to control the linking of data for Data Analytics. In legal terms, the BigPrivacy solution can satisfy GDPR requirements under Article 4(5) (meeting the new definition of data “pseudonymisation”), Article 11 (meeting the exemption conditions of “processing that does not require identification”), as well as Article 25 (requiring “Data Protection by Design and Default”). In particular, the following GDPR-aligned data states are possible:
- Anonymous Data – When identifying keys are held by data subjects, deleted or otherwise controlled; valuable, non-identifying, non-personally identifying Data Analytics can be processed outside the scope of GDPR jurisdiction.
- Pseudonymous Data – When identifying keys are held by data controllers/processors, the information value of data can be separated via technical and organisational measures from means of attributing data to individuals (to help satisfy GDPR requirements for Data Protection by Default, and satisfy the legitimate interest legal basis for Data Analytics), while still allowing permitted data linkability within and across databases to support Data Analytics.
- De-Identified Data – When identifying keys are not held by a data controller so the data controller is “not in a position to identify the data subject” (Article 11), then a data controller may be relieved of ensuring data subject rights are fulfilled under GDPR Articles 15-22 to enable greater use of Data Analytics in a privacy-respectful manner.
In conclusion, data protection law insists on the protection of rights and values in personal data even in Data Analytics. Yet the GDPR goes beyond current law in demanding higher standards from companies doing Data Analytics. It also places the responsibility squarely on data controllers and processors to demonstrate their processing operations are being carried out fairly and lawfully, their incorporation of Data Protection by Design and Default, as well as the fulfilment of elevated consent requirements. Those engaged in Data Analytics must treat these obligations as a critical pre-condition for legal compliance as part of their overall corporate risk management strategy, or face the prospect of significant liabilities strarting next year.
Research by Mses. Stalla-Bourdillon and Knight, and technology developed by Messrs. LaFever and Myerson and colleagues, illustrates that those carrying out Data Analytics after 25 May 2018 can continue to innovate and reap the rewards, while at the same time safeguarding individuals’ rights in implementation of the new rules by leveraging Controlled Linkable Data. Traditional approaches to obtaining consent and imposing access control (like role-based access control, or access control lists) are unmanageable when applied to Data Analytics, yet this need not be an insuperable barrier to extracting maximum data value. Accountable data processing based on Controlled Linkable Data implemented and enforced through granular access control is a key mechanism for upholding key data protection principles and implementing required privacy and security safeguards, as well as keeping such processes under review.
 Respectively, they are based at the University of Southampton - an institution with strong reputations in the applied inter-disciplinary research areas of Data Analytics and Web Science co-founded by its alumni professor of computer science, Sir Tim Berners-Lee - and US-based Anonos Inc. Both groups promote the importance of cross-fertilisation of ideas between academia and industry. At Southampton, research work is being carried out by Dr Sophie Stalla-Bourdillon, Associate Professor of Information Technology at the University of Southampton (UK) and Director of the Institute for Law and the Web, and Alison Knight, Research Fellow in Law and member of the Web Science Institute at the University of Southampton. They are currently exploring data situation models relying in part on anonymisation and pseudonymisation practices and their implications for data protection obligations under the UK-GCHQ Effective Data Anonymisation Techniques Research Project, and developing data sharing best practices for incubating services under the Data Pitch research project helping start-ups and SMEs innovate with data (https://datapitch.eu/) as part of the EU’s Horizon 2020 research and innovation programme. Other notable research projects they are or have been involved in include the Horizon 2020 FutureTrust research project on interoperability eID and trust services, in addition to the UK-EPSRC / US-Department of Homeland Security Super-Identity project creating new measures of identity combining physical and cyber inputs. Mses. Stalla-Bourdillon and Knight recently published a paper entitled Anonymous Data V. Personal Data—A False Debate: An EU Perspective On Anonymization, Pseudonymization And Personal Data on the Social Science Resource Network (SSRN) which is available at https://papers.ssrn.com/sol3/Papers.cfm?abstract_id=2927945. The second group consists of 16-year business partners Gary LaFever and Ted Myerson, co-founders of Anonos Inc., the company they founded to leverage their experience in global data risk management. Their prior data risk management company, FTEN, was acquired by NASDAQ in 2010; the technology and intellectual property acquired by NASDAQ helps to manage global data risk management at 80+ financial markets powered by NASDAQ around the globe. Newly developed Anonos BigPrivacy technology uniquely de-risks data to enable controlled use of data while adhering to stringent privacy and security requirements. Mr. LaFever recently co-authored a paper on Controlled Linkable Data with Mike Hintze, former Chief Privacy Counsel at Microsoft and now partner at Hintze Law, entitled Meeting Upcoming GDPR Requirements While Maximizing the Full Value of Data Analytics available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2927540. Messrs. LaFever and Myerson, together with advisors, Jonas Almeida, Ph.D., Sean Clouston, Ph.D., and Sandeep Pulim, MD, also recently published a paper entitled Big Data in Healthcare and Life Sciences Anonos BigPrivacy Technology Briefing available on SSRN at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2941953.
 Article 6(1) GDPR states that processing of personal data is lawful only if and to the extent that at least one of six legal based, the first of which and most popular option requires obtaining data subject consent.
 Under Article 6(1)(f) GDPR, an alternate legal basis may be satisfied where the “processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child”.
 Article 4(5) GDPR formally defines “pseudonymisation” as “the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person”.
 Respectively, the right of access to personal data by the data subject; the right to rectification; the right to erasure (the so-called new ‘right to be forgotten’); the right to restriction of processing; the notification obligation regarding rectification or erasure of personal data or restriction of processing; the right to data portability; and, the right to object to automated decision-making.
This article originally appeared in Lexology. All trademarks are the property of their respective owners. All rights reserved by the respective owners.
Pre-GDPR Pseudonymization versus GDPR Compliant Pseudonymization
How GDPR compliant pseudonymization requirements have evolved from prior standards: