At a time when technological progress is moving faster every day to day, uncertainty about essential concepts such as anonymization and pseudonymization can be frightening. For technologies based on large amounts of data such as big data analysis in the med tech sector or AI systems, these core concepts can be decisive for permissions in data processing. There is often no clear distinction between anonymity and identification, and if there is one at all, it becomes increasingly blurred. Clarifications are clearly needed.
Privacy vs. Modern Technologies
Even though social media often paint a different picture, people still have a strong interest in privacy. The fundamental right to data protection, also reflected in the European General Data Protection Regulation 2016/679 (GDPR), legally protects this desire.Conflicting with this desire is an everincreasing interest in all areas of human life, such as online browsing and purchasing behaviour, financial situation and health, physical activity or communication patterns. People freely disclose this type of data but are often reluctant for companies to analyse and possibly monetize it.
Fundamental principles of data protection law, such as data minimization and purpose limitation, present limitations when it comes to gaining insights that can drive not only profit and customer-friendly products, but also advances in science and medicine. Other prime examples of the application of data analysis for the benefit of the consumer are the further development of AI-based tools (e.g. speech recognition software) or analyses of trends in the business sector. It is a fact that anonymization is an essential element in today’s world and is often seen as the “silver bullet” of data protection by design where legal and technical challenges arise.
Anonymity and Pseudonymity according to the GDPR
Whenever information identifies, or can identify, an individual it is considered personal data and thus falls within the scope of GDPR. However, if the individual is not, or is no longer identifiable – regardless of whether the data was collected anonymously from the outset or anonymized later – it does not constitute personal data and is therefore excluded from the scope of the GDPR (recital 26 of the GDPR).
This seemingly easy concept is riddled with trapdoors though. The GDPR defines personal data as “any information relating to an identified or identifiable natural person (‘data subject’)” (Art. 4 No. 1 GDPR). While “identification of a person” is further defined, it remains unclear how concrete the reference to a natural person must be for the data to be personal and thereby in scope of GDPR.
Absolute vs. Relative Anonymization
The question of whether data is anonymous or not has long been discussed among privacy experts, since the legal situation pre-GDPR also lacked clarity. Initially, the key question revolved (and partly still does) around the concept of “absolute vs relative anonymity”. Absolute anonymity means that re-identification is impossible for everyone, whereas for relative anonymity, such re-identification only fails because the effort (in terms of time and cost) would be disproportionately high or is legally prohibited. This situation of relative anonymity often arises in cases of transfer of limited or pseudonymized sets of data: the data can, in theory, still be re-identified (e.g. by the sender of the data) but the recipient is unable, factually and legally, to re-identify the data.
This differentiation, which initially appears to be purely conceptual, is of great importance in practice. Here is an example: A controller transmits raw data to a cloud provider, which is encrypted in transit and at rest at the recipient cloud provider, using state-of-the-art technology. It is impossible for the cloud provider to access the key or decrypt the data at any time. Requiring absolute anonymity, the cloud provider would be qualified as a processor, since it is in theory still possible to relate the data to a person – with the knowledge of the controller (who has the key to the encrypted data). As a legal consequence, the relevant legal and contractual processor obligations from Art. 28 GDPR apply to it.
In the case where the concept of relative anonymity is assumed, the regulations of the GDPR would not apply – it is not possible for the cloud provider to overcome the encryption and therefore, from its point of view, no personal identification can be made at any time. Data protection regulations would not apply in this context.
The opportunity to clarify this controversy came with inception of the GDPR – and went. In Art. 4 No. 5, GDPR chooses a compromise: The definition of pseudonymization in Art. 4 No. 5 GDPR as well as recital 26 both refer to additional knowledge or technical measures in the context of a possible identification, which, however, only have an effect if using them does not require a disproportionately large financial or other economic effort, and is legally permissible. A reliable guarantee of anonymity cannot be derived from a legal point of view.
In addition to legal uncertainty comes a factual uncertainty which is often “homemade”: Due to the size of data lakes, even if data was originally anonymized, the sheer mass of accumulated information can lead to a greater risk of re-identification, especially if health and/or genetic data is involved.
Anonymization Requirements - The Legal Context
There are two use cases in the context of anonymous data: it was either collected anonymously from the outset or was later anonymized. This distinction is relevant: If data is “collected” anonymously, GDPR does not apply. Art. 4 No. 2 GDPR necessarily links the term “processing” (including collection) to personal data.
Subsequent anonymization of personal data is itself a processing step, with the consequence of requiring a legal basis. Since the GDPR itself does not offer a specific legal basis for anonymization, all legal bases mentioned in Art. 6 GDPR and Art. 9 (2) GDPR are eligible.
Non-Special Categories of Personal Data
In addition to consent (Art. 6 (1) lit. a GDPR), the legitimate interest under Art. 6 (1) lit. f GDPR is the legal basis most relevant to controllers in practice. This involves balancing whether the interests of the controller or a third party in anonymization override the interests of the data subject worthy of protection. Due to common sense, this balancing must always be in favour of the controller when anonymization is concerned, since it does not result in any further impairment of the rights of the data subject and no special need for further protection arises. This presupposes though that the controller has technical control over the anonymization process (see the requirements of Art. 32 GDPR).
Special Categories of Personal Data
As far as the anonymization of special categories of personal data is concerned, only the limited scope of Art. 9 (2) GDPR is applicable. In practice, anonymization of such data is often only possible with consent (Art. 9 (2) lit. a GDPR), since the other constellations only cover a very narrow range of areas. A recourse to a balancing of interests as before is not envisaged for special categories of personal data, although the positive effects mentioned above would also make sense for the data subject with regard to this category of data.
A Nonymization as Change of Purpose
There may also be situations in which it is possible to justify anonymization on the basis of Art. 6 (4) GDPR. Often, the personal data to be anonymized is collected for a specific other purpose. Subsequent anonymization in these cases therefore constitutes further processing, the purpose of which must be compatible with the original purpose of collection. If this compatibility is found, the legal basis for anonymization as further processing continues to be the same legal basis that legitimized the original processing (see recital 50 GDPR). However, for anonymization of special categories of personal data, it is still being debated whether Art. 6 (4) GDPR is applicable.
A Nonymization - The Technical Aspects
In addition to the legal uncertainty of anonymization, it is also technically unclear when a personal reference no longer exists. There are different methods to anonymise personal data, however, for simplification, it is assumed that the data to be anonymized is organized in a structured way. Some of the most common methods are the following:
- Deletion: Parts of a data record are deleted to generalize the information.
- Falsification: Part (or all) of the data is changed by e.g. swapping parameters within the data record or an artificial data record is created using the original data record as an example.
- Clustering: Individual data are combined, for example, by forming the median value from the individual data.
Furthermore, scientific approaches to the development of formal anonymity models have also been in development for a while.
But anonymity of data is difficult to achieve, and while many companies have a sound grasp of these technical solutions there are thousands of companies who have not, and are thus vulnerable in their approach to anonymization.
Example: Voice Recognition
A prime example of the difficulties regarding anonymity of data are speech recognition systems/software. Such software is often employed for training of employees in customer service centres. The systems record the calls and usually make them available for the supervisors to review them and thereupon train the employees. However, the recordings are also often used to improve the software, and to provide statistical analysis back to the customer service centres. In order to do so, the recordings are cut into fragments of as little as parts of a second – but often up to several seconds. Clearly the software provider is unable to identify the speakers in those fragments. However, individuals at the call centre are likely able to identify the employees according to their voices. If the recordings are considered personal data, the use of the fragments is fully subject to GDPR, requiring a legitimate justification for the use of the employee data. If it were considered non-personal information, the service provider would be free to use the data. Thus it is essential for the business of the service provider to be able to use the information.
In this scenario, the differentiation between relative and absolute anonymity becomes very clear. If only absolute anonymity were sufficient, the service fragments would clearly be considered personal data and thus be subject to much stricter limitations and obligations, and the customer service centres may not even be able to provide for a safe processing for the purposes of the service provider even though the service provider itself has no way of identifying the individual.
Is it reasonable to assume a relative anonymity in this case and maintain this as sufficient? From the perspective of the service provider, for sure. But customers are often of a different view and are reluctant to agree to this concept, thereby inhibiting the further development and improvement of the product they are using.
The use of anonymous data is not only beneficial for controllers, who are provided with the opportunity to process large amounts of data for a wide variety of economic, scientific and other purposes, but also for data subjects whose fundamental rights remain protected. However, there are still uncertainties at many points, for example with regard to terminology, legal bases or a clear definition of when anonymity is ensured. Particularly with regard to special categories of personal data, there are unnecessary hurdles to anonymization that do not add value, because data subjects can hardly be better protected than by using anonymous data.
To see the full article, press here.
The article was published in Privacy Laws & Business International Report, December 2020, www.privacylaws.com.
Sign up to our email digest