Enabling Cloud Use Cases: Modernizing Your De-identification Toolbox

"Integrating the DevOps 'Shift Left' method to expedite product delivery with a privacy focus enables businesses to implement advanced privacy controls from the start. Anonos has an innovative and patented Variant Twin technology solution that brings a 'Shift Left' approach to privacy in DataOps, proactively securing data wherever it travels, which is essential for controlling data privacy risk in cloud migration and data sharing projects."
Stewart Bond, Research VP at IDC, specializing in data intelligence and integration software research, comments on Anonos joining Infromatica’s ISV Partner Program.

Summary

As the digital landscape evolves, many organizations grapple with outdated de-identification techniques while pursuing advanced cloud-based strategies.

Traditional single-step anonymization methods, developed when processing occurred primarily within data centers,1 have become inadequate for protecting sensitive data due to the challenges of multi-party data sharing and the risks of internal breaches and external threats.

To effectively safeguard data throughout its lifecycle, organizations must embrace innovative, comprehensive data protection strategies, such as Anonos' award-winning Data Embassy enterprise de-identification toolbox, which embeds controls directly into the data to protect it wherever it travels.

This cutting-edge approach offers superior data protection, surpassing conventional de-identification techniques while mitigating internal and external risks and addressing the limitations of traditional methods.
It’s Time to Modernize Your De-Identification Toolbox to Enable Desired Use Cases in the Cloud

Introduction

Data has become an increasingly vital asset. Organizations must adjust their security and privacy measures to protect sensitive information while supporting digital transformation and innovative growth.

They can do this by moving beyond yesterday’s traditional de-identification techniques, which fail to support modern data use involving (cross-border) data sharing, leveraging the cloud, training machine learning models, implementing data mesh, etc.

Traditional de-identification technologies, including tokenization and masking, while once the primary line of defense against unauthorized disclosure of identity, are now insufficient due to the limitations of traditional anonymization, the complexities of multi-party data sharing, and the growing occurrence of internal breaches and ever-more sophisticated external threats.

Shortcomings of Single-Step Anonymization

Historically, organizations have used single-step anonymization to safeguard sensitive data. By removing primary identifiers (which directly identify individuals) from datasets, the hope was that it would be impossible to link data back to specific individuals.

However, even advanced governance systems incorporating artificial intelligence (such as Okera's data governance platform, recently acquired by Databricks) cannot prevent the re-identification of seemingly anonymized data that only protects primary identifiers, which is necessary for compliance with data protection regulations like the GDPR.

Even under US law, the effectiveness of anonymization has been challenged, as researchers have demonstrated on numerous occasions that purportedly anonymized data can be easily re-identified.

To address the limitations of single-step anonymization, organizations should consider adopting advanced data protection techniques such as data synthesis and GDPR-compliant Pseudonymisation and implementing "Shift Left Privacy" principles.

Shift Left Privacy is the process of engineering privacy and data protection into projects from the outset instead of applying traditional methods that treat privacy as something to be dealt with later.

Data synthesis and Pseudonymization, unlike single-step approaches to anonymization, enable organizations to preserve the utility of data for analytics and research purposes while safeguarding individual privacy during active processing by embedding controls into the data wherever it travels.

This ensures enhanced data protection that surpasses traditional single-step anonymization techniques. By embracing “Shift Left Privacy” principles, organizations can enjoy increased flexibility in data processing, reduced risks associated with potential data breaches or non-compliance, and improved adherence to data protection regulations like the GDPR.

Challenges of Multi-Party Data Sharing, Combining, and Computation

Multi-party data sharing, combining, and computation have become increasingly prevalent as organizations collaborate to derive insights and drive innovation.

Third-party "clean rooms" (secure environments where companies can share data without revealing sensitive information to each other) can improve the security of data processing but may not necessarily make it lawful under non-US laws like the GDPR. This complex data-sharing environment exposes organizations to new risks and challenges that exceed the scope and effectiveness of single-party controls.

One such challenge arises from the long-established practices of tokenization and data masking to protect data.

The degradation of utility of these techniques limit their practicality, even within a single organization.

Moreover, they do not provide adequate protection against unauthorized re-identification via the “Mosaic Effect” in the event of an external breach or when data is shared among parties.

The Mosaic Effect occurs when different datasets are combined, allowing for the re-identification of individuals whose data was originally tokenized or masked.

This risk increases when data is shared across multiple parties and combined with auxiliary datasets.
Challenges of Multi-Party Data Sharing, Combining, and Computation
Graphic is from "Worried about security? Beware the Mosaic Effect" at https://gcn.com/cybersecurity/2014/05/worried-about-security-beware-the-mosaic-effect/297335/
As data flows between parties and across public clouds, organizations must implement robust data protection measures that ensure compliance with regional data protection regulations, like the GDPR.

This again requires adopting state-of-the-art data protection approaches, such as data synthesis and GDPR-compliant Pseudonymization, to embed controls into the data and "Shift Left Privacy" principles to incorporate protections into all processing aspects from the beginning.

Growing Threat of Internal Breaches

Data breaches are no longer limited to external threats, as internal breaches involving employees and other individuals with authorized access to data have become increasingly prevalent. These breaches can occur for various reasons, including negligence, malicious intent, or accidental disclosure. Employees may misuse sensitive data for personal gain or inadvertently expose it through careless handling or sharing practices.

Organizations should consider implementing "Shift Left Privacy" principles, which involve engineering trust and privacy into data from the outset so that even authorized parties are limited to using data for permitted purposes, and are technologically prevented from doing anything else. This prevents malicious misuse, as well as careless handling.

Evolving External Attacks and Perimeter Bypass

Cybercriminals have become increasingly sophisticated in their methods of circumventing perimeter controls.

Techniques such as social engineering exploit human vulnerabilities to gain unauthorized access to sensitive data. Perimeter defenses are ill-equipped to prevent such attacks since they primarily target technology rather than human behavior.

To effectively combat external threats, organizations should yet again consider implementing "Shift Left Privacy" principles which protect data during use in ways not possible using perimeter controls alone.

If a cybercriminal were to gain access to data that has been protected data synthesis or with GDPR Pseudonymization, they would still not have access to information that could identify the individuals that it relates to.

A Modern Enterprise De-Identification Toolbox

No single technique can adequately address all aspects of data protection in use. Anonos Data Embassy enterprise platform stands out by offering a combination of numerous data protection techniques to digitize and enforce policies that protect sensitive data while it is being used, decrypted, and potentially exposed to misuse, particularly in untrusted environments.

This robust suite of data privacy, security, and enablement tools empowers legal and privacy professionals to set technical guardrails that ensure compliance with relevant regulations.

This, in turn, enables data users to access protected datasets more efficiently, accelerating projects from conception to fruition. Data Embassy software bridges business and technical users, allowing enterprises to operationalize their sensitive data assets with reduced risk and expand processing opportunities to maximize value.

The Data Embassy platform delivers a programming framework that balances data protection and utility by providing results equivalent to processing cleartext by converting sensitive data assets into new, protected outputs known as Variant Twins®.

A Variant Twin retains only the minimum required identifying information for a specific purpose while maintaining the same analytical value as the original source data. In most cases, no identifying information is needed to achieve this result.

Data Embassy's programming framework for generating Variant Twins is built on two core capabilities:

  • Policies: Legal and privacy experts establish data protection policies based on the organization's objectives for data usage and the applicable regulations. These policies are then digitized, creating technical guardrails to ensure the appropriate protections are applied to a given dataset based on data types, use cases, regulations, and users.

  • Transformers: Transformers automate the enforcement of digitized data privacy and security policies. They process data, apply the appropriate protections according to the established policies to ensure compliance, and permit the transformed source data to flow only to authorized destinations, with privacy and utility reporting throughout the process.
The software's user-friendly interface and streamlined workflows simplify setting up policies and transformers that generate lawful variations of sensitive data without borders.

The technical controls within Variant Twins can be adjusted based on use case, data type, jurisdiction, requirements, and restrictions.

Enable Cloud Use Cases: Conclusion

Through the implementation of resilient data protection solutions, guided by Shift Left Privacy principles, and the adoption of technologies like Anonos Data Embassy software, organizations can successfully safeguard sensitive data.

This approach instills trust and confidence in customers, partners, and other stakeholders and also enables more efficient use of data to propel digital transformation projects and accelerate innovation.
1. Masking and tokenization were conceived when data processing primarily occurred within data centers, secure enclaves, or controlled perimeters. These techniques aim to protect data by eliminating information that could enable reidentification. However, they struggle to balance privacy and utility, relying on de-identification "by subtraction." The increasing availability of auxiliary data sources for potential attackers exposes the limitations of these traditional de-identification techniques because they fail to deliver the privacy assurances offered by anonymization due to the considerable risk of reidentification via the Mosaic Effect. And, as organizations attempt to enhance the effectiveness of data protection, they inadvertently diminish the utility of the protected output. Synthetic Data represents a fundamentally different approach than techniques that try to anonymize data by removing information. Instead, in a multi-step process, artificial intelligence (AI) and machine learning (ML) algorithms are first trained to capture the statistical relationships in a data set as a mathematical model. In a separate step, the new model is then used to generate entirely new records that preserve the analytic utility of the data. Done correctly, no newly created synthetic record is mappable to an original record. The multi-step techniques used to generate Synthetic Data create born anonymous data instead of transforming cleartext data to become anonymous. This multi-step approach preserves accuracy and prevents the relinking of data to identity (see https://anonos.com/solutions/synthetic-data).