Big data means all data
This article was first published in Data Protection Law & Policy in April 2013.
There is an awesomeness factor in the way data about our digital comings and goings is being captured nowadays. That awesomeness is such that it cannot even be described in numbers. In other words, the concept of big data is not about size but about reach. In the same way that the 'wow' of today's computer memory will turn into a 'so what' tomorrow, references to terabytes of data are meaningless to define the power and significance of big data. The best way to understand big data is to see it as a collection of all possible digital data. Absolutely all of it. Some of it will be trivial and most of it will be insignificant in isolation, but when put together its significance becomes clearer – at least to those who have the vision and astuteness to make the most of it.
Take transactional data as a starting point. One purchase by one person is meaningful up to a point – so if I buy a cookery book, the retailer may be able to infer that I either know someone who is interested in cooking or I am interested in cooking myself. If many more people buy the same book, apart from suggesting that it may be a good idea to increase the stock of that book, the retailer as well as other interested parties – publishers, food producers, nutritionists – could derive some useful knowledge from those transactions. If I then buy cooking ingredients, the price of those items alone will give a picture of my spending bracket. As the number of transactions increases, the picture gets clearer and clearer. Now multiply the process for every shopper, at every retailer and every transaction. You automatically have an overwhelming amount of data about what people do with their money – how much they spend, on what, how often and so on. Is that useful information? It does not matter, it is simply massive and someone will certainly derive value from it.
That's just the purely transactional stuff. Add information about at what time people turn on their mobile phones, switch on the hot water or check their e-mail, which means of transportation they use to go where and when they enter their workplaces – all easily recordable. Include data about browsing habits, app usage and means of communication employed. Then apply a bit of imagination and think about this kind of data gathering in an Internet of Things scenario, where offline everyday activities are electronically connected and digitally managed. Now add social networking interactions, blogs, tweets, Internet searches and music downloads. And for good measure, include some data from your GPS, hairdresser and medical appointments, online banking activities and energy company. When does this stop? It doesn't. It will just keep growing. It's big data and is happening now in every household, workplace, school, hospital, car, mobile device and website.
What has happened in an uncoordinated but consistent manner is that all those daily activities have become a massive source of information which someone, somewhere is starting to make use of. Is this bad? Not necessarily. So far, we have seen pretty benign and very positive applications of big data – from correctly spelt Internet searches and useful shopping recommendations to helpful traffic-free driving directions and even predictions in the geographical spread of contagious diseases. What is even better is that, data misuses aside, the potential of this hugemongous amount of information is as big as the imagination of those who can get their hands on it, which probably means that we have barely started to scratch the surface of it all.
Our understanding of the potential of big data will improve as we become more comfortable and familiar with its dimensions but even now, it is easy to see its economic and social value. But with value comes responsibility. Just as those who extract and transport oil must apply utmost care to the handling of such precious but hazardous material, those who amass and manipulate humanity's valuable data must be responsible and accountable for their part. It is not only fair but entirely right that the greater the potential, the greater the responsibility, and that anyone entrusted with our information should be accountable to us all. It should not be up to us to figure out and manage what others are doing with our data. Frankly, that is simply unachievable in a big data world. But even if we cannot measure the size of big data, we must still find a way to apportion specific and realistic responsibilities for its exploitation.