The anonymisation challenge
This article was first published in Data Protection Law & Policy in November 2012
For a while now, it has been suggested that one of the ways of tackling the risks to personal information, beyond protecting it, is to anonymise it. That means to stop such information being personal data altogether. The effect of anonymisation of personal data is quite radical – take personal data, perform some magic to it and that information is no longer personal data. As a result, it becomes free from any protective constraints. Simple. People's privacy is no longer threatened and users of that data can run wild with it. Everybody wins. However, as we happen to be living in the 'big data society', the problem is that with the amount of information we generate as individuals, what used to be pure statistical data is becoming so granular that the real value of that information is typically linked to each of the individuals from whom the information originates. Is true anonymisation actually possible then?
The UK Information Commissioner believes that given the potential benefits of anonymisation, it is at least worthwhile having a go at it. With that in mind, the ICO has produced a chunky code of practice aimed at showing how to manage privacy risks through anonymisation. According to the code itself, this is the first attempt ever made by a data protection regulator to explain how to rely on anonymisation techniques to protect people's privacy, which is quite telling about the regulators' faith in anonymisation given that the concept is already mentioned in the 1995 European data protection directive. Nevertheless, the ICO is relentless in its defence of anonymisation as a tool that can help society meet its information needs in a privacy-friendly way.
The ICO believes that the legal test of whether information qualifies as personal data or not allows anonymisation to be a realistic proposition. The reason for that is that EU data protection law only kicks in when someone is identifiable taking into account all the means 'likely reasonably' to be used to identify the individual. In other words and as the code puts it, the law is not framed in terms of the mere possibility of an individual being identified. The definition of personal data is based on the likely identification of an individual. Therefore, the ICO argues that although it may not be possible to determine with absolute certainty that no individual will ever be identified as a result of the disclosure of anonymous data, that does not mean that personal data has been disclosed.
One of the advantages of anonymisation is that technology itself can help make it even more effective. As with other privacy-friendly manifestations of technology – such as encryption and anti-malware software – the practice of anonymising data is likely to evolve at the same speed as the chances of identification. This is so because technological evolution is in itself neutral and anonymisation techniques can and should evolve as the uses of data become more sophisticated. What is clear is that whilst some anonymisation techniques are weak because reintroducing personal identifiers is as easy as stripping them out, technology can also help bulletproof anonymised data.
What makes anonymisation less viable though is the fact that in reality there will always be a risk of identification of the individuals to whom the data relates. So the question is how remote that risk must be for anonymisation to work. The answer is that it depends on the level of identification that turns non-personal data into personal data. If personal data and personally identifiable information were the same thing, it would be much easier to establish whether a given anonymisation process has been effective. But they are not because personal data goes beyond being able to 'name' an individual. Personal data is about being able to single out an individual so the concept of identification can cover many situations which make anonymisation genuinely challenging.
The ICO is optimistic about the benefits and the prospect of anonymisation. In certain cases – mostly in the context of public sector data uses – it will clearly be possible to derive value from truly anonymised data. In many other cases however, it is difficult to see how anonymisation in isolation will achieve its end, as data granularity will prevail in order to maximise the value of the information. In those situations, the gap left by imperfect anonymisation will need to be filled in by a good and fair level of data protection and, in some other cases, by the principle of 'privacy by default'. But that's a different kind of challenge.