Anonymisation is great, but don't undervalue pseudonymisation

26/04/2014

Earlier this week, the Article 29 Working Party published its Opinion 05/2014 on Anonymisation Techniques. The opinion describes (in quite some technical detail) the different anonymisation

Earlier this week, the Article 29 Working Party published its Opinion 05/2014 on Anonymisation Techniques. The opinion describes (in quite some technical detail) the different anonymisation techniques available to data controllers, their relative values, and makes some good practice suggestions - noting that "Once a dataset is truly anonymised and individuals are no longer identifiable, European data protection law no longer applies".

This is a very significant point - data, once truly anonymised, is no longer subject to European data protection law. This means that EU rules governing how long that data can be kept for, whether it can be exported internationally and so on, do not apply. The net effect of this should be to incentivise controllers to anonymise their datasets, shouldn't it?

Well, not quite. Because the truth is that many controllers don't anonymise their data, but use pseudonymisation techniques instead.

Difference between anonymisation and pseudonymisation

Anonymisation means transforming personal information into data that "can no longer be used to identify a natural person ... [taking into account] 'all the means likely reasonably to be used' by either the controller or a third party. An important factor is that the processing must be irreversible." Using anonymisation, the resulting data should not be capable of singling any specific individual out, of being linked to other data about an individual, nor of being used to deduce an individual's identity.

Conversely, pseudonymisation means "replacing one attribute (typically a unique attribute) in a record by another. The natural person is therefore still likely to be identified indirectly." In simple terms, pseudonymisation means replacing 'obviously' personal details with another unique identifier, typically generated through some kind of hashing, encryption or tokenisation function. For example, "Phil Lee bought item x" could be pseudonymised to "Visitor 15364 bought item x".

The Working Party is at pains to explain that pseudonymisation is not the same thing as anonymisation: "Data controllers often assume that removing or replacing one or more attributes is enough to make the dataset anonymous. Many examples have shown that this is not the case..." and "pseudonymisation when used alone will not result in an anonymous dataset."

The value of pseudonymisation

The Working Party lists various "common mistakes" and "shortcomings" of pseudonymisation but curiously, given its prevalence, fails to acknowledge the very important benefits it can deliver, including in terms of:

Individuals' expectations: The average individual sees a very big distinction between data that is directly linked to them (i.e. associated with their name and contact details) and data that is pseudonymised, even if not fully anonymised. In the context of online targeted advertising, for example, website visitors are very concerned about their web browsing profiles being collected and associated directly with their name and address, but less so with a randomised cookie token that allows them to be recognised, but not directly identified.

Data value extraction: For many businesses, anonymisation is just not an option. The data they collect typically has a value whose commercialisation, at an individual record level, is fundamental to their business model. So what they need instead is a solution that enables them to extract value at a record level but also that respects individuals' privacy by not storing directly identifying details, and pseudonymisation enables this.

Reversibility: In some contexts, reversibility of pseudonymised data can be very important. For example, in the context of clinical drug trials, it's important that patients' pseudonymised trial data can be reversed if needing, say, to contact those patients to alert them to an adverse drug event. Fully anonymised data in this context would be dangerous and irresponsible.

Security: Finally, pseudonymisation improves the security of data held by controllers. Should that data be compromised in a data breach scenario, the likelihood that underlying individuals' identities will be exposed and that they will suffer privacy harm as a result is considerably less.

It would be easy to read the Working Party's Opinion and conclude that pseudonymisation ultimately serves little purpose, but this would be a foolhardy conclusion to draw. Controllers for whom anonymisation is not possible should never be disincentivised from implementing pseudonymisation as an alternative - not doing so would be to the detriment of their security and to their data subjects' privacy.

Instead, pseudonymisation should always be encouraged as a minimum measure intended to facilitate data use in a privacy-respectful way. As such, it should be an essential part of every controller's privacy toolkit!