CHAPTER 4

PSEUDONYMISATION IN THE MOBILE ECOSYSTEM

In this Chapter we discuss some pseudonymisation best practices and examples, focusing especially in the area of mobile apps, where a large number of identifiers that can be linked to specific individuals (e.g. device or app identifiers) may be processed by several different entities (e.g. app developers, OS providers, library providers, etc.), often without the individuals’ being aware of it.

For instance, in a mobile device, the following device identifiers are present [Son, 2016] and can be linked to its user:

  • The International Mobile Subscriber Identity (IMSI), which is an up to 15-digit decimal identifier representing the mobile subscriber identity.
  • The International Mobile Equipment Identity (IMEI), which is a 15-digit decimal identifier associated with the mobile phone.
  • The Media Access Control (MAC) address, which is a 48-bit number assigned to the device ’s network interface, e.g. Wi-Fi or Bluetooth.

Moreover, there are also several other identifiers owing to the corresponding operating system used in the mobile device. For example, in Android systems there is the Android ID, which is a 64-bit randomly generated number, as well as the Google Advertising ID (GAID), which is 32-digit alphanumeric identifier that is available on devices that have the Google Play service installed. Similarly, in iOS devices, there is the Unique Device IDentifier (UDID), which is a 40-character string composed from various hardware identifiers; more precisely, as stated in [Agarwal, 2013], it is based on the serial number of the device, the IMEI, the Wi-Fi MAC and the Bluetooth MAC.

Note that most of these identifiers are generally considered as permanent – an exception being the GAID, which can be reset by the user at any time 48.

As already mentioned, in the mobile ecosystem, there are several actors, which may qualify as data controllers [ENISA, 2017], as they process the individuals’ (i.e. mobile app users) personal data. Even in cases, however, that these actors are not data controllers or processors, they are encouraged – according to the recital (78) GDPR – to make sure that controllers and processors are able to fulfil their data protection obligations. In all cases, pseudonymisation is an approach that can support the protection of personal data, especially taking into account the special characteristics of the mobile environment.

In the next Sections, we explore some use cases where pseudonymisation could be employed to enhance data protection in the field of mobile apps, especially by app developers/providers, library providers, as well as OS providers.

It should be stressed that our aim is not to provide a detailed implementation guide, but rather to revisit with some simple examples the earlier presented pseudonymisation techniques; by no means these solutions should be interpreted as a legal opinion on the corresponding use cases.

4.1 App developers/providers

In the mobile ecosystem, the app developers are the actors that are responsible for the development ofthe app itself, i.e., for coding the app functionalities and requirements. They provide the app to the appproviders or the end users, depending on the business model [ENISA, 2017].

In the next paragraphs, we discuss, through four pseudonymisation use cases, four relevant best practices, which could be utilised by app providers in order to enhance data protection by design. Although clearly these examples are not exhaustive, we tried to cover some typical cases, which controllers could meet in practice. Note that for simplicity we consider in all examples that the app developer is also the app provider, i.e. the data controller, which processes individuals (users) personal data in the context of the app.

Use case 1 – Tracking without storing the initial identifiers

In a social network app the users may simply observe posts from other users and comment on them or create new posts, without necessitating a login procedure (a typical case of a so-called “anonymous” social network). However, as described also in Section 2.2, despite the fact there is no login procedure, the app provider still needs to keep track of a user’s device (e.g. on the basis of a device identifier), so as to send him or her notifications whenever somebody likes and/or comments to his/her posts. However, although tracking is needed, the app provider does not actually need to know the specific device identifier (as long as this can be singled out from all other identifiers). Note also that there is no need for this app to share the same user/device identifier with other apps 49.

Clearly, in this case, simply using a permanent device identifier to track the user, may potentially lead to the identification of the user through the identification of his/her device. The situation is slightly improved if a non-permanent identifier is used (e.g. GAID in Android devices), but again identification is possible within certain time limits. Simple hashing of such identifier would not offer significant protection, as anyone with knowledge of the device identifier will be trivially able to re-identify the device (and, hence, possibly the user). Moreover, in all cases, the app provider would need to store the aforementioned device identifiers, although this is not needed for the purpose of the specific processing operation.

To this end, pseudonymisation can greatly support data protection in this scenario if properly implemented by design. Indeed, a possible approach would be to use a keyed-hash function on a nonpermanent identifier for creating pseudonyms that can be used in the place of the initial identifiers. In this way, the app provider would also not need to store the initial identifiers, whilst the corresponding secret key for the hashing should be securely kept in a different database from the one that the pseudonyms are being stored. Moreover, the transmission of the identifier to the app server should be done over a secure channel – e.g. via the Transport Layer Security (TLS) protocol – so as to ensure that network eavesdroppers cannot capture the identifiers in transit and, hence, cannot by any means associate them with the corresponding pseudonyms. Yet, the TLS protocol also ensures 50 that the device is actually connected to the legitimate app server, which is necessary for both privacy and security purposes.

Use case 2: Protecting credentials in a database

Let us consider a mobile app that monitors user’s footsteps and stores this information (measurement data) in the app’s server, so that the user is able to access it through Internet from any device. For simplicity, we assume that the app simply counts the number of user’s steps, without combining these data with any other data about the user (e.g. from other apps) or sending the data to any other recipient. Still, even in this simple case, the app provider builds a profile of the user with regard to his or her daily walking habits. The user is authenticated to the app server, for accessing his/her data, with a combination of an e-mail address and a password. Thus, the app provider can clearly identify the user, since each registered user should be able to access explicitly his/her specific user profile.

In this use case, we will explore the possibility to use pseudonymisation in order to protect the users’ credentials in the app’s server (database). A simple hash function on the user’s name or email address is clearly again not a proper pseudonymisation approach. On the contrary, a keyed or salted hash function could be used. The corresponding key/salt, as well the original identifiers, should be securely stored and separated from the database with the pseudonymised data, e.g. in trusted authentication server. Alternatively, the pseudonyms may be produced by applying a deterministic symmetric cipher, such as the AES; again, the encryption key – which coincides with the decryption key in this case – should be securely kept separately.

Note that, after the application of such a pseudonymisation process and depending on the scale and specific characteristics of the database of pseudonymised data (lifestyle data), such database could be used for statistical purposes, even from a third party. 51 Indeed, as long as this party does not have access to the secret key/salt, it is not trivially possible to identify the users. In the same line, if a breach occurs in this pseudonymised database, re-identification will be computationally hard.

In any case, the pseudonymised database should not be correlated with any other device identifier that the app developer possibly processes – e.g. for providing personalised app configurations that the user chooses; actually, another pseudonymisation process may occur for pseudonymising any such identifier (see Use case 1).

Use case 3: Multiple pseudonyms for the same data

A smart meter is an electrical meter that records consumption traces of a household and sends them to the corresponding electricity supplier. Such traces are being used for billing purposes by the supplier (data controller). The users (electricity consumers) are able, via a relevant mobile app, to check information on their energy usage in real time.

To alleviate privacy risks with regard to the profiling of household ’s habits (that can be derived through the smart meter’s operation) 52, one possible option could be that the supplier stores consumption traces in pseudonymised form, in a way that different pseudonyms are being assigned to each different measurement stemming from the same household (consumer). Hence, for a given consumer, his or her traces are stored under the pseudonym A in one time interval, under a different pseudonym B in the next time interval and so on. To satisfy such a property, a probabilistic encryption scheme (e.g. as described in Section 3.4) could be a possible pseudonymisation approach.

Use case 4: Local generation of pseudonyms

A smart app provides monitoring of a driver’s behavior. In particular, whenever the driver (user of the app) keeps the application active, a profiling of his/her driving habits is being built (and stored by the app provider). By default, the app provider does not associate the data with any other data of the user’s device and does not automatically send the data to any recipient. An optional function of the app, allows the user to authorize transfer of data by the app provider to affiliated insurance companies (e.g. in order for the user to get a discount rate). In the general case, although the app provider needs to be able to track the user (driver), so as to deliver the data that are relevant to him or her in his/her specific device, there is no need for the provider to know the real identity of the user. Such type of identification will only be needed whenever the user explicitly authorizes the provider to send his or her data to an insurance company.

Pseudonymisation can clearly support this scenario too. A possible data protection by design solution rests with allowing the user to generate a pseudonym in his/her device, in a way that nobody else can re-identify him/her, unless the user allows it – e.g. through appropriately encrypting user identifiers in a way that only the user has access to the decryption key (i.e. a passphrase). Of course, appropriate security mechanisms should be put in place in this approach; for instance, the secret key/passphrase should not be shipped in the app. Moreover, the pseudonym generated by the app in the users ’ device should be transmitted encrypted to the app server – e.g. via the TLS protocol – and uncorrelated from any other device identifier. Note also that there exist specialized cryptographic techniques (see Section 3.5) that allow a user to generate a pseudonym locally in his/her environment, without necessitating exchange of information with issuing parties, such that he/she can prove at any time that he/she is the owner of the pseudonym (see, e.g., [Schartner, 2005]).

4.2 Library providers

The usage of third-party libraries by mobile apps raises several privacy concerns [Grace, 2012], owing to the fact that library providers (e.g. ad providers) are able to execute code on users’ devices with the same permissions as the host applications; this in turn results in collecting personal data [ENISA, 2017]. Hence, the owners of the libraries may build detailed users profiles by combining the data they collect from different mobile apps that are using the same app. Such a threat is also known as intra-library collusion and rests with processing globally unique identifiers through different apps with (possibly) different permissions installed in the same device [ENISA, 2017] [Taylor, 2017].

Pseudonymisation, in combination with other privacy enhancing mechanisms, could possibly be used to limit the above-mentioned issue. In this direction, it is essential that each library provider associates a different identifier per application, even for the same device. To this end, such a unique identifier may be obtained though, e.g., the following calculation [Stevens, 2012]:

hash(library provider || app identifier || device ID)

Such a computation allows for deriving a different identifier, for the same library provider and the same device, across different applications (and, of course, a different app identifier, for the same device, across different library providers). Moreover, a non-permanent device identifier (i.e. a user-resettable identifier) can be also used in cases that the OS supports such an option, which further enhances the privacy of the user (see Section 4.3).

Another issue that is associated with the processing of unique identifiers by library providers is that any unauthorised party (e.g. an adversary) who simply monitors the network may be able to build user profiles via associating such unique identifiers. This is especially relevant when libraries (e.g. ad-providers) APIs, embedded in mobile apps, send user information over the Internet in clear text [Chen, 2014]. An approach to alleviate such a concern is to secure – i.e. to encrypt – all communications between the user’s device and the library provider; by these means, the adversary will not be able to correlate network traffic corresponding to the same device [Stevens, 2012] [Chen, 2014] 53. Note that simply hashing a device identifier, without encryption, does not solve this issue; for example, even if an ad provider hashes the Android ID, other ad providers may still transmit it in plaintext and, thus, a correlation between the Android ID and its hash value is trivial.

4.3 Operating system providers

Operating system (OS) providers play a central role in mobile apps users’ privacy, as several aspects with regard to the processing of personal data (e.g. permissions model) are platform dependent. To this end, an OS provider, towards supporting the data protection by design principle, could adopt specific approaches to facilitate pseudonymisation techniques, e.g. whenever this can promote data minimisation and in combination with other privacy-enhancing measures. As stated in previous Sections, a major source of privacy risks is the usage of permanent device identifiers by mobile apps developers/providers and/or library providers. Therefore, the OS providers should put effort to impede such a processing.

In this direction, the OS providers can restrict applications and third parties from accessing the permanent unique device identifiers via providing non-permanent software-based identifiers (it should be pointed out that the most recent versions of the popular operating systems follow this approach with respect to the tracking purposes from third parties 54). Such identifiers are suitable for user tracking only to a limited extent. Ideally, it is essential to differentiate the identifier per app and per user. By these means, pseudonymisation of such identifiers (e.g. by app providers) can lead to stronger protection of personal data, provided that the knowledge of the non-permanent identifiers does not allow a computation of a permanent device ID. This in turn means that special emphasis should be given on how to appropriately generate these non-permanent IDs. For instance, it has been recently shown that commonly used MAC randomisation techniques may be inappropriate in case that specific, well-determined, best practices are not adopted 55 [Vanhoef, 2016], [Martin, 2017]. In the same line, the OS providers should make reasonable efforts that apps will be rejected during the review process (i.e. if the OS provider runs an app store [ENISA, 2017]) in case that they misuse the device identifiers 56.

In addition, there are also further options for the OS providers to facilitate the development of efficient pseudonymisation techniques. Recalling the issues discussed in Section 4.2, a decoupling of application and advertising permissions seems to be a proper design principle, which is unfortunately not the case for all famous OS; for instance, certain versions of the Android security model do not support the separation of privileges between apps and their embedded libraries [Spensky, 2016]. As also stated in [Stevens, 2012], third-party code should not be allowed to access application-specific data unless the user provides his/her explicit informed consent (in the app containing the third-party code).