On 14 June we were treated to a Lunchtime Talk by Studio resident Nello Cristianini. Nello is a professor of Artificial Intelligence within the Intelligent Systems Laboratory of the University of Bristol. Nello spoke to us about Big Data and the ethics surrounding it.

What is big data?

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data. (Quote from IBM)

Data is big news; business, politicians and journalists are talking about big data at an increased rate. Nello showed us a graph showing the sharp increase of news articles and online searches relating to big data over the last couple of years. But what’s caused this increase, we’ve been analysing data for a long time. Nello explained that funding has caused a big increase into big data research and businesses & corporations realising the potential uses of big data. We now also have the technology available to exploit immense masses of data.  

Nello explained that the research field has changed. In the 1980’s researchers wanted to understand more about the data they collected and they wanted to discover why algorithms worked. Now researchers are less concerned about why it works (as they know it does) and more concerned with spotting trends in the data collected that allows increasingly accurate predictions to be made. Nello considers this change from comprehension to predication as indicators of success to be a fundamental paradigm shift in the field of big data.

Uses of Big Data

Amazon and Google can now make very accurate predications about your information, need and purchasing behaviour just from data collected from previous searches and purchases. This means they can increase revenue for their businesses and make ads less intrusive, and more appealing for users. This is an example of masses of data at a macro-level being used to deliver personalised prediction at a micro-level.

Nello explained that his group are developing a web agent that can read the content of news and social media. They can automatically extract the information from the content and use it in many different ways. One is to support social science research. One of the reasons big data is so useful to social scientists it allows them to study a huge amount of data showing actual behaviours – blogging, buying and clicking, without having to rely on responses from surveys which are notoriously inaccurate.  

Nello’s research group have also been looking at analysing which words make people click on a news article. This means newspapers can use it to easily find out what stories appeal to their audience. They can create most read/shared lists and find out what headlines attract the most interest from readers. The group can also analyse collective mood via twitter identifying emotive language, which they can then correlate the data against events that have happened or news stories that have been released.
 
Ethics of Big Data

There are many compelling and really positive cases made for the use of big data. We have more cohesive and reliable data gathering systems than ever before, but its use is not always beneficial. Nello explained that he believes we should be studying what could go wrong when collecting this big data as well as what the benefits are. Nello showed us ‘Riding the Wave: How Europe can gain from the rising tide of scientific data’ a report to the European Commission from 2010 highlighting what big data can do for them. It is full of positive user scenarios in which ‘Marie from a genomics lab’ or ‘Carlos who likes bugs’ stand to gain from deploying big data to their advantage. It is entirely lacking in any counter-scenarios where the collection, analysis or exploitation of data might have negative consequences.  

Nello was also interested in the natural phenomena often involved in the metaphors for big data including: riding the wave, the rising tide, a coming storm, all inferring that the prevalence is somehow an ‘act of God’ rather than the product of choices made by people fuelled by the conscious development of new tools.

All this data being collected gives governments and organisations the ability to monitor everyone’s behaviour, but how many are really aware of how much data is being collected about them? This throws up massive questions around the ethics of the data being collected. Recent leaks in the media about the NSA’s PRISM programme have thrown up questions about this very subject but few seem genuinely surprised by the methods revealed.

What will be the results of our current race to a data structure society?

Nello asked who is expected in society to consider the ethical and social effects of our data. Governments and organisations will have their own agendas. It’s up to us to ask these questions and to understand the implications for society when we make choices about what we share and who owns that information.