HITACHI BIGPRIVACY GDPR WEBINAR
JULY 2018

Hitachi BigPrivacy Webinar
Presentation Transcript
00:00 [Slide 1]
(Instrumental music playing)
00:07 [Slide 2]
Does your firm face one or all of these top Mission Critical Priorities?

  • Is Data minimization killing off your Big Data projects?
  • How does your firm handle cross-border data consolidation?
  • Are you interested in outsourcing data processing to FinTechs?

Francois Zimmerman:
Hello, today we are going to take a look at the top data privacy and sovereignty challenges that we keep seeing in the banks and insurance companies that we do business with. Does your firm face any of the following challenges? Are your big data projects being canceled because you no longer have access to all the required data sets due to data minimization initiatives? Are you struggling to consolidate data across borders or between subdivisions due to privacy or data sovereignty, and do you want to outsource some data processing to FinTechs to bring rapid innovation to your business and how can you do that safely?
00:44 [Slide 3]
My name is Francois Zimmerman. I am the Global CTO for Financial Services vertical at Hitachi Vantara. I am joined by Gary LaFever, CEO and Co-Founder of Anonos. Today we're going to look at how the GDPR concepts of Pseudonymisation and Data Privacy by Design and by Default can be applied to address each of these three big challenges.
01:06 [Slide 4]
But first, let's take a look at the bigger picture. We are becoming increasingly reliant on decentralized and distributed activities like cloud processing, blockchain, advanced analytics, and open banking to generate value to differentiate from competitors, and to create economies of scale. The banks and insurance companies that we work with are all pretty serious about running data driven businesses. They believe that a core part of their mission is to provide new data driven insights that will unlock new market opportunities and drive innovation. A lot of innovation functions depend on secondary uses of data. AI (Artificial Intelligence) is hungry for historical training data and analytics demands multi-dimensional access to big data, and IoT is all about correlating data across a variety of machine and human sources.
01:55 [Slide 5]
But these secondary data uses and decentralized distributed activities collide head-on with new centralized privacy and security requirements under the GDPR. The new reality is this, the capability of an organization to adequately protect the privacy and security of customer and partner data will have a direct impact on brand value and reputation. And, it could be positive or negative depending on success or failure. So, managing conflicts between data value maximization and data privacy protection is both a challenge and an opportunity to establish trust.

We need data to innovate, we need to feed our AI analytics, and we need to keep those IoT projects running, but the GDPR creates a new balance between data utility and risk. The GDPR significantly increases customer consent requirements and mandates new obligations in order for secondary data processing to be lawful. We can no longer, for example, rely on a single tick box at the end of the 50-page agreement to unlock customer data for secondary processing and if we need to rely on explicit consent for each new use case, if we need to go back to the data compliance team at every single step, then this will slow down development cycles and stifle innovation. So, there has to be a different way.

This challenge of balancing data protection and data value also extends beyond privacy into data sovereignty and cross border data governance. The global banks and insurers that we do business with, often have subsidiaries in places like China, Saudi Arabia, or Switzerland, or other countries with strong controls on what personal data can leave the country. These countries are either selling - effectively - Privacy as a Service, or they are trying to retain control over subpoena access to their citizens’ data. But there can still be a lawful basis for secondary data processing. Even in these countries, data sovereignty legislation is typically only concerned about personally identifiable data, so as long as you can keep identifiable data in the country and roll up only information from which identity has been separated, you are generally okay.
04:25 [Slide 6]
Now, we believe that if we're going to continue with secondary data processing in a way that meets our obligations for privacy and data, sovereignty, then we need to change the end-to-end data pipeline. Statistical data processing doesn't have any use for personal identifiable data and data compliance is all about finding a balance between risk and utility. If your business function has no need for indentifying data, then you shouldn't be able to access it. In order to implement Data Privacy by Design and by Default, we need to change the way we process big data in two pretty fundamental ways. The first is what we call data de-risking. At the beginning of the data pipeline, we need to run through a process where we strip out personally identifiable data and create cohort data sets that we can use for statistical processing, so here we separate identity from data value and this transformation of data supports GDPR compliant legitimate interest processing, and frees it from data sovereignty restrictions. This approach also enables the privacy respectful combination of cohort data from different divisions, companies, and even different countries.

You can run reporting analytics and machine learning against it. All of these business functions can run against cohort data sets instead of identifiable data. So, you can reduce privacy risks and continue to innovate without having to resort to rigid consent requirements for every single new use case. That is data de-risking and it happens right at the beginning of the pipeline.

Now, at the end of the pipeline we introduce a step that we call controlled re-linking. Once you've processed the data, you may need to re-identify it for “next best action” analysis. For example, you may need to pass a recommendation onto a customer that signed up for an advisory service, or you may discover a fraud risk indicator that needs individualized investigation. So, here we implement a step that we call controlled re-linking, which puts the necessary controls in place to enable authorized re-identification only. These two transformations arethe fundamentals of Data Protection by Design and by Default. At the beginning of the process - data de-risking – and at the end of the process - controlled re-linking. And now, what I'd like to do is hand this over to Gary and take a look at how Anonos’ BigPrivacy controls fit into the legal context and how they implement these transformations to solve the big three problems that we identified at the beginning of this talk.
06:53 [Slide 7]
Gary LaFever:

My name is Gary LaFever and I'm Co-Founder and CEO at Anonos, a data risk management company and key partner of Hitachi Vantara on GDPR projects because of Anonos' expertise and patented technology.
07:10 [Slide 8]
The GDPR is actually a well-written piece of legislation, so the challenge of balancing data protection and data value is not insurmountable. In fact, the GDPR provides a total of six legal basis for lawful processing of personal data under the GDPR. However, three are most relevant for secondary processing like data analytics and AI.

Francois has discussed the shortcomings of consent, such as having to go back and get consent again and again, or the fact that having to go back to the compliance team will limit, stifle, if not eliminate, innovation. So, then we go to the legal basis of contract. The legal basis of contract is severely limited under the GDPR. Contract as a legal basis is limited to only that processing that is specifically required to consummate a contract. This leads us to legitimate interest as a legal basis. Legitimate interest is the sixth of the different legal bases authorized under the GDPR. And under legitimate interest, a data controller can use this as a legal basis for processing if they satisfy a threefold test. Step one, the data controller or a third party must have a legitimate interest in using the specific date in question. Two, the particular source of the data must be necessary. This means that if the same data can be acquired from another source, then it is not necessary for this purpose. But if you can satisfy one and two. One, legitimate interest of the data controller or a third party and two, the data is necessary, ten you are left with the third requirement, and here the data controller must satisfy a balancing of interest test which compares the interests of the data subject - the fundamental right to privacy - against the interest of the data controller and/or a third party. This balancing of interests looks to see to what extent new technical controls have been invoked and leveraged to reduce the likelihood and impact of any improper use of personal data. So again, legitimate interest processing is available and represents a powerful means and a legal basis for secondary data processing such as data analytics and AI.
09:20 [Slide 9]
Organizations can collect personal data and use protected versions of that data, such as Francois outlined previously, for research, business, and analysis. If they leverage legitimate interest processing as their lawful basis. This requires that they embrace two state of the art technology safeguards under the GDPR. The first is Pseudonymisation and the second is Data Protection by Design and by Default.
09:54 [Slide 10]
Let's talk further about these two technology safeguards under the GDPR. Pseudonymisation as newly defined under the GDPR is a procedure that requires the separation of information value from the means of identifying individuals. You do that by replacing the identifiable data elements, both direct and indirect identifiers, with dynamically changing tokens called pseudonyms. This GDPR requirement for dynamism is absolutely critical in order to overcome advances in both technology and threat actor sophistication that otherwise enable data tokens to be readily linked back to individuals by correlations and linkage attacks, something typically referred to as the Mosaic Effect. In contrast, the GDPR requires for compliantpseudonymisation the defeat of the Mosaic Effect. This is why GDPR compliantpseudonymisation helps lawful secondary processing like data analytics and AI. This is a big difference between pseudonymised data that complies with GDPR requirements and anonymised data. The difference is that truly anonymised data is - never – re-linkable back to original data. Whereas, as Francois outlined previously, there are significant benefits using GDPR compliantpseudonymised data because you can do controlled relinking.
11:23 [Slide 11]
Next, we move on to Data Protection by Design and by Default. Under the GDPR, this means businesses must integrate - or bake into their data processing practices – data protection from the design stage right through the full life cycle. Some people refer to this,and it was known previously as Privacy by Design. However, this concept has a whole new meaning under the GDPR. First, Data Protection by Design and by Default is no longer a nice to have. It is now legally mandated under the GDPR. But even more importantly is that Data Protection by Design and by Default has new heightened requirements. It is the penultimate, the highest form of,Privacy by Design. Don't be fooled. Data Protection by Design and by Default has its own specific requirements. The GDPR requires fine-grained control over who has access to data, as well as the level of identifying or non-identifying data that is disclosed to each user, for each authorized purpose. As an example, although role-based access controls were good enough before the GDPR. Used alone, they often fail to satisfy the new requirements for this fine-grained control over both who has access to data and what level of identifying or non-identifying data is provided, under the GDPR. Data Protection by Design and by Default now requires more than merely Privacy by Design and simple role-based controls.
13:01 [Slide 12]
Big data projects don't necessarily have to be put on hold as a result of the GDPR. Historical data can be transformed by leveraging Pseudonymisation and Data Protection by Design and by Default, to support analytics and AI, based on the alternate legal basis of legitimate interest. Data collected after the 25th of May can be used for analytics and AI, so long as customers are notified at the time of data collection that their data will be used for such purposes based on the legitimate interest legal basis and, new technical safeguards are put in place satisfy the balancing of interests required for legitimate interest as an alternate (non-consent) legal basis for processing.

As for using data cross border, state of the art dynamic de-identification techniques including Pseudonymisation and Data Protection by Design and by Default, can be used to create derivative versions of identifying data that impart all of the necessary information value, but without being identifying. These non-identifying but information-rich data are much easier to get approved by in-country regulators to use outside of the country. The resulting non-identifying data can be brought back into the country before any re-linking to identifying data - which recall never left the country - is done. This re-linking only occurs under controlled conditions for authorized purposes.

Many see outsourcing data processing to FinTechs as a way to leverage the FinTech ecosystem for rapid innovation. But they are concerned about compliance when working with third party processes. However, so long as dynamic de-identification techniques like Pseudonymisation and Data Protection by Design and by Default are used in this processing, derivative versions of identifying data can be created that impart necessary information value without being identifying. These non-identifying, but information-rich data can then be provided to an outsourced processor or exchanged with a partner to enable them to do privacy respectful analytics and AI on the data.
15:10 [Slide 2]
Francois Zimmerman:
Before we close, what I'd like to do is go back to the big three questions at the beginning of this talk and get Gary to apply the legal and technical concepts to these specific business challenges. The first one was (1) I am running a bunch of big data projects and they've been canceled because I no longer have access to the data. What would you do?

Gary LaFever:
Great question, Francois. The good news is, big data projects don't have to be put on hold as a result of the GDPR. First off, historical data can actually be transformed by leveraging Pseudonymisation and Data Protection by Design and by Default to support analytics and AI on the alternate legal basis of legitimate interest. What this means is data that was collected prior to May 25th, that was collected using now illegal broad-based consent, can still be transformed to be compliant and used on a going forward basis. Data that's collected after May 25th can also be used for analytics and AI. All that is required is that customers are put on notice at the time of initial data collection that the data will be processed for these purposes based on the legal basis of legitimate interest. Again, that does require that new technical safeguards are put in place that satisfy a balancing of interest for legitimate interest to serve as a legal basis. Pseudonymisation and Data Protection by Design and by Default allow this to occur.

Francois Zimmerman:
The second question is (2) I'm running a global bank. I have a lot of subsidiaries in countries with very challenging and hostile data sovereignty, privacy legislation. How do I overcome this? How do I aggregate data out of those countries up into my global data lake so that I can get a global business view?

Gary LaFever:
Another excellent question Francois, and again very similar with regard to the first one, related to big data projects. You can use the same techniques of Pseudonymisation and Data Protection by Design and by Default to facilitate cross border data transformation, use, and consolidation, and you do that by creating derivative versions of the identifying data that impart the necessary information value, but not the means of re-identification. These non-identifying, information-rich data are much easier to get approved by in-country regulators to use outside of the country. It's important to note that the resulting analysis that has occurred using the non-identifying data can be brought back into the country before you relink to the identifying data - which again never left the country. By doing this under controlled conditions for authorized purposes only, the regulators get greater comfort. The company, the firm, the organization gets the benefit of sharing the data, consolidating the data, and processing the data, out of the country and then bringing it back to re-identify or re-link it only within the country. So again, Pseudonymisation and Data Protection by Design and by Default can be used to satisfy cross border data processing and sovereignty issues.

Francois Zimmerman:
So, the final question was, (3) I'm running a bank and I'd like to use FinTechs to enable innovation. How do I send the data out to them without compromising customer privacy and frankly without enabling them to steal my customers?Then how do I plug that data back into my pipeline so that I can continue with my business processing?

Gary LaFever:
Very timely question. This is all possible so long as dynamic de-identification techniques like Pseudonymisation and Data Protection by Design and by Default are used in this processing. As a result, derivative versions of the identifying data are created which impart the necessary information value necessary for the FinTech company to do the processing, but it is processing non-identifying data. That non-identifying- but information-rich - data can then be provided for outsourced processingby the FinTech or exchanged with a partner, and only within the confines, construct and security of the original data controller will it be relinked to identifying data. So again, you can leverage the capabilities of FinTechs and other partners with privacy, respectful analytics and AI using Pseudonymisation and Data Protection by Design and by Default.

Francois Zimmerman:
Before we close, where can people go to find more information about this?
19:39 [Slide 1]
Gary LaFever:
For more information on state-of-the-art dynamic de-identification techniques like Pseudonymisation and Data Protection by Design and by Default, which have all the benefits that we've outlined during this webinar.
CLICK TO VIEW CURRENT NEWS



Are you facing any of these 4 problems with data?

You need a solution that removes the impediments to achieving speed to insight, lawfully & ethically

Roadblocks
to Insight
Are you unable to get desired business outcomes from your data within critical time frames? 53% of CDOs cannot achieve their desired uses of data. Are you one of them?
Lack of
Access
Do you have trouble getting access to the third-party data that you need to maximise the value of your data assets? Are third-parties and partners you work with worried about liability, or disruption of their operations?
Inability to
Process
Are you unable to process data due to limitations imposed by internal or external parties? Do they have concerns about your ability to control data use, sharing or combining?
Unlawful
Activity
Are you unable to defend the lawfulness of your current data processing activities, or data processing you have done in the past?
THE PROBLEM
Traditional privacy technologies focus on protecting data by putting it in “cages,” “containers,” or limiting use to centralised processing only. This limitation is done without considering the context of what the desired data use will be, including decentralised data sharing and combining. These approaches are based on decades-old, limited-use perspectives on data protection that severely minimise the kinds of data uses that remain available after controls have been applied. On the other hand, many other new data-use technologies focus on delivering desired business outcomes without considering that roadblocks may exist, such as those noted in the four problems above.
THE SOLUTION
Anonos technology allows data to be accessed and processed in line with desired business outcomes (including sharing and combining data) with full awareness of, and the ability to remove, potential roadblocks.