Skip to main content
Insight

Big Data analytics in Europe - the reports of its death are greatly exaggerated

Mark Webber
21/04/2017
As GDPR is strengthens the rights of individuals in the EUs and becomes more prescriptive about the collection and use of data it doesn't forbid big data. Mark Webber explores the hurdles for businesses and how to handle big data with GDPR.

It was reporting that this week at the IAPP Global Privacy Summit in Washington DC, Irish Data Protection Minister Dara Murphy admitted that EU data protection rules do play a "small part” in inhibiting EU companies as they grow digital business. This is fair recognition that some of the EU's data rules can be a challenge.  They need to be considered, weighed up and navigated but they rarely prevent a complete barrier to innovation.  Where there is limited understanding of their purpose and scope the compliance process can seem daunting.  Businesses that develop and only then consider the rules can feel stymied.

I was in a meeting in San Francisco with a prospective client this week because they heard on the grapevine the GDPR meant that from 2018 big data analysis in Europe was becoming "illegal". "Please say it isn't so", they pleaded.  "Our business model depends on analytics" they continued.  "Well you're not alone", I said.  I was able to share the news that: when it involves personal data, big data can have privacy implications and can come into conflict with the rights of individuals in the EU.  While the GDPR is strengthening these rights and becoming more prescriptive about the collection and use of data it doesn't forbid big data.  The GDPR presents hurdles and asks businesses to think about what they're doing with data but these hurdles are typically not insurmountable.  I was able to reassure. To coin a phrase attributed to Mark Twain: the reports of the death of big data are greatly exaggerated.

The GDPR will not kill big data

The complexity of big data analytics should not preclude businesses from complying with the GDPR. What business can't do is just suck up data, combine it with other data collected for other purposes and use it however they see fit.  The business value can't be put ahead of the rules and the rights of data subjects.  This blog draws on the recent publication by the UK Information Commissioners Office (the ICO) on the implications of big data, Artificial Intelligence (AI) and machine learning for data protection to help explain what businesses should be considering.  This updated guidance examines big data, the growth in artificial intelligence (AI) and machine learning algorithms all of which are being deployed to discover patterns in data that traditional data analysis couldn’t hope to find.  Three innovative technologies, delivering tangible benefits and scientific advances across the globe on a daily basis. 

In launching the publication the ICO commented: "The complexity and opacity of these types of processing operations mean that it’s often hard to know what’s going on behind the scenes. This can be problematic when personal data is involved, especially when decisions are made that have significant effects on people’s lives. The combination of these factors has led some to call for new regulation of big data, AI and machine learning, to increase transparency and ensure accountability".

Additionally, working out in Silicon Valley, I often use the phrase: "just because you can, it doesn't mean you should". There should be a line in the sand.  Sometimes we don't just draw on the law. It's also important to check-in with your moral compass.  As ICO mentions: "whilst the means by which the processing of personal data are changing, the underlying issues remain the same. Are people being treated fairly? Are decisions accurate and free from bias? Is there a legal basis for the processing?" The ICO reiterates that these are all issues that it has been addressing for many years, through oversight of existing European data protection legislation.

Big data, AI and algorithms – the language of so much of today's tech

Big data is often described in terms of three characteristics – volume, velocity and variety of data, which are difficult to analyse using traditional data analysis methods. AI analyses data to create a model, inferences from this model are then used to predict and anticipate possible future events. Machine learning is the technique or tool that enables computers to think and create an output based on algorithms, underpinning and facilitating AI. Collectively, they are known as 'big data analytics'.

The implications of big data analytics on data protection

  • Use of algorithms

The use of algorithms provides a way for companies to target consumers in a certain way based on factors identified by analytics. Whilst this enables a company to provide a tailored approach it can interfere with data protection.

One of the clarifications provided for in the GDPR is the concept of profiling. Something we've explored in the context of the GDPR in previous blogs. Whilst the GDPR does not prevent profiling, it recognises that individuals have a qualified right not to be subject to it.  This is therefore a key consideration for big data analytics which use algorithms to make decisions. On a practical level, the obligation to inform the individual has become the right to be provided with fair processing information.  This means transparency and for an individual, that means the right to be provided with this information in an intelligible form, using clear and plain language. The complexities of algorithm data may therefore prove a more practical challenge in terms of compliance with this requirement.  The new accountability principle should also be considered. With machine learning growing in popularity, a concept of 'algorithmic accountability' has emerged and relies on the ability to check that the algorithms used and developed are doing what the company thinks they're doing and aren’t producing erroneous, discriminatory or unjustified results. It has been suggested that big data analysts will need to find ways to build detection mechanisms into the machine learning to prevent such decisions being made.

  • Opacity of processing

One of the key features of EU data protection law and the GDPR is the concept of transparency. Big data analytics, whilst has the benefit of analysing vast amounts of data from a variety of sources, often means it is hard to understand how the decision was actually made and what sources were used to reach that conclusion. Whilst this may prove a challenge for businesses engaged in big data when determining the accuracy of the decision and indeed concerns over its representative nature or bias, it may also bring into question compliance with the GDPR requirements for automated processing. The GDPR states that an individual has the right to an explanation of a decision based on automated processing. Whilst this is restricted only to decisions made 'solely' by automated processing and those that 'significantly' affect individuals, businesses engaged in big data analytics may need to be cautious in relying on decisions which lack human rationalisation.

  • Using all the data

A benefit of using big data analytics is the collection of a huge volume of data but this directly conflicts with the GDPR concept of data minimisation. Data minimisation is a principle under the GDPR's "privacy by default" and again something we've explored with you before.  The GDPR states that collation of data must be for a specific purpose that is "adequate, relevant and limited to what is necessary". Big data analytics may collect a huge amount of data meaning that it may be considered 'excessive' for the processing purpose. Often big data analytics may find further correlations than originally thought but finding correlation later on does not retrospectively justify obtaining the data in the first place. A further concern about the volume of data collected is that data should not be retained for longer than is necessary for the purpose for which it was processed. With such large volumes it will need to be clear what data is being used for each purpose and have appropriate measures and policies in place to ensure personal data is kept secure and not kept for longer than necessary. In addition, businesses engaged in big data may face challenges when met with requests from data subjects regarding their right to erasure due to the volume and variety of data collected; although the industry has suggested that a move towards big data may mean that data stores are brought together, so in fact all the data on an individual may well be easier to find.

  • Repurposing data

As briefly mentioned above, big data collection can mean correlations are found which are unrelated to the original purpose for the collection. This can prove to be a challenge when considering the grounds for lawful processing. In big data the initial recorded purpose may change and the ultimate reason may be unclear. Businesses will therefore need to consider whether the 'new' use would be reasonable to those concerned, is naturally connected with the reason people use the service and whether the purpose is unrelated to the delivery of the service'. It may not be considered unfair to process it in this 'new' way but a data subject's reasonable expectations should be contemplated along with whether the new purpose is incompatible with the original purpose. It will challenge big data businesses to be creative in their fair processing notices and the way in which they ensure they have obtained the correct consents or relied on other lawful processing grounds.

  • Data Controller or Processor?

Big data analytics can also make the distinction between data controllers and data processors unclear. Organisations outsourcing analytics to companies specialising in AI and machine learning need to be careful to consider who has control over the processing of data as processors who have the freedom to decide what data to collect and how to apply it analytically, blurs the distinction and may inadvertently become a controller as well as a processor.

So how can big data analytics and data protection co-exist?

Anonymisation – If data is fully anonymised, it is no longer considered personal data so it is taken out of the data protection sphere. It is also considered to mitigate the risk of inadvertent disclosure of loss of data and still means businesses can use analytics which don't require it to be identifiable.  Therefore if the personal data isn’t needed for analysis then remove or anonymise it.

Privacy impact assessments (PIA) – These can be a requirement under the GDPR and big data analytics involving personal data are very likely to fall into the requirement as any big data analytics is planned.   Assessing big data analytics under a PIA may be a legal requirement. A well-prepared PIA will help any business understand the issues and remain accountable for its actions by identifying privacy risks and addressing them.

Proportionality and processing grounds – Some types of big data analytics, such as profiling, can have intrusive effects on individuals.  The GDPR requires that personal data must be “processed fairly, lawfully and in a transparent manner in relation to the data subject”.  Big data analytics can seem creepy and have unintended impact on an individual. A key question for businesses using personal data for big data analytics is whether the processing is "fair" or unduly "intrusive".  This is in part about transparency but also about assessing the effects on individuals.  Appropriate grounds for processing need to be established and businesses need to assess the various conditions applicable to reliance on consent or legitimate interests.  Ultimately, a data subject must also be able to exercise their rights.

Privacy notices – 'Traditional' privacy notices are less feasible for big data analytics due to the nature of the changing purpose element but it will challenge businesses using big data to be innovative in finding new ways to convey privacy notices and information concisely (perhaps at different stages throughout a project).  They will also need to place an emphasis on the purpose of the processing and keep on top of updates as these purposes change. 

A business hasn't been transparent if the data subject isn’t aware. Business needs to consider how data was sourced and what individuals were told at the time. What's more, businesses need to consider whether the use of personal data in big data applications is within people’s reasonable expectations.

Profiling considerations – ICO recognises big data analytics may involve potentially intrusive types of automated profiling. The GDPR does not prevent automated decision making or profiling, but it does give individuals a qualified right not to be subject to purely automated decision making.  It also says that a data controller should use “appropriate mathematical or statistical procedures for the profiling” and take measures to prevent discrimination on the basis of race or ethnic origin, political opinions, religion or beliefs, trade union membership, genetic or health status or sexual orientation".What's more, businesses should consider greater algorithmic transparency because, the more open they are about the use of algorithms, the easier it will be for businesses engaged in big data to demonstrate compliance with the GDPR automated processing requirements.

Privacy by design – Embedding privacy by design solutions into big data analytics can help to protect privacy whilst still maintaining the purpose of big data analytics. This said, businesses need to strike the balance here and may need to moderate the volume of data being gathered, retained and analysed.

Privacy seals and certification – There has been a suggestion that privacy seals and certifications should be introduced potentially with a focus on particular industries, in order to help demonstrate data protection compliance. The GDPR will encourage such schemes and businesses engaged in big data could look to have their own. Today however, these remain a pipe dream as no such schemes have been advanced.

Deeper engagement – while transparency and control should be front and centre compliance is not just about lawyers applying the right "terms" moments before launch.  Compliance is a design challenge and good legal minds alone won't solve all big data issues.  The business needs privacy advocates - the way data should be handled needs to be communicated and the legal teams need to ensure this is known by the techies operating in development and backend services.  Good data practices need to be understood, respected and reflected in the way the data is treated. Ideally distilled and set out in a set of internal rules or principles.  That way all know how to act and also, when the inevitable scope creep ensues, that those promises are remembered and reflected. Remember ethics in the art of the possible!

Finding new innovative and compliant ways to process data

Whilst big data analytics may face more challenges than traditional data analysis, this provides an opportunity rather than a barrier. It is possible for data protection and big data analytics to co-exist, it will just involve careful consideration from those embarking on big data analytics to ensure compliance with the GDPR. This should lead to innovative new ways to inform data subjects of privacy notices and changes in processing needs and will also encourage the development of machine learning to provide better safeguards and mechanisms during algorithm calculations. It has the potential to provide business with an invaluable resource, provided in doing so big data analytics can find a result which can benefit and protect data subjects simultaneously.

Mark Webber and Hannah Blake – Fieldfisher (Silicon Valley) LLP

 

Sign up to our email digest

Click to subscribe or manage your email preferences.

SUBSCRIBE