We have always lived in a society dependent on data – the change that has happened over the last decade or so is the sheer scale of the data being generated. We have probably passed the point where some data will never be looked at by a human being – it all just sits there. Most of this data has value, it is just that we need to think how to extract that value effectively. Both this government and the last realised that they hold a lot of data themselves and that it might be valuable. There have been several initiatives aimed at releasing that value, and they are beginning to deliver, but everyone is increasingly aware of the challenges in releasing this valuable national asset.
Many years ago, I worked for ICI Acrylics. They made the material that went into spas in the USA. Spas are a luxury market, and we spent a lot of time making the material in colours that reflected that luxury price point. One year, I had to go to the big baths and spas show in Las Vegas to see how we might be able to introduce innovative new products into the market. I met a large number of spa makers and saw some amazing sights in the show, but my real insight came in our hospitality room. As the dealers came to see us, they all made a point of thanking our sales manager for his help. It wasn’t just one of them – it was all of them. At the end of the evening I asked him what his secret was. He told me that he had bought the US Government data on the distribution of income against zip codes. Before he visited a dealer, he looked up the most affluent neighbourhood near the dealer and suggested that they focus their sales there. Needless to say, people with higher incomes were more likely to splash out on a spa (sorry, couldn’t help that one!) and the dealers got more sales. This made me realise that data only gets translated into useful information if you have value for the output.
It is all very well realising that data has value, but a major learning from the last few years is that most data is stored without a thought to its future use. It is often in the wrong format, contains errors and the sorts of simple confusions that make is difficult to use, and these can, in turn, spawn errors of interpretation that undermine the value. Many of these errors are not easily dealt with using machines. They are often caused by the people inputting the data – wrong addresses, swapped letters or numbers, misspellings and so on – and are most effectively corrected by people too. Our work with LinkedGov has shown us that identifying people who can see the potential value of the data, or who are looking to develop new data applications, can provide the resource and insight necessary to make sure the data they understand is “clean”.
Once the individual data sets are available, it is down to identifying the insight that linking different data sets can provide, carrying out that linking and extract the value. That can only come from an understanding of the market – like the need to find people with enough disposable income to be able to buy a spa! This means that you have to start with a question and seek out the data needed to answer it. Owning the data and thinking it has intrinsic value is coming at the situation the wrong way around!
One of the challenges people often quote against this opening up of data is that it often ends up being about people – and people are not sure that they want others to know about their habits or assets. This certainly seems to be true when it’s a theoretical question. However, it is actually more often the case that the opening up of data makes peoples lives easier. How many of those suggestions Tesco or Amazon make about what you might be interested in actually lead to you discovering a new food, a new author or a new band you might not have otherwise discovered? Anonymisation of data will be necessary to protect some aspects of our lives, but there might well be knowledge about other aspects we would gladly trade for convenience.
The Open Data Institute, announced as part of the Autumn Statement the other week, is an important step. The academic understanding of linking data and the development of languages and methodologies to carry out that process is strong in the United Kingdom and the Institute will draw heavily on this expertise. But it is to be located in Shoreditch, where possibly the highest concentration of young, ambitious and market focused companies operating in this space are to be found. It will make that expertise available to companies with the insight into what has the most value and hep them develop the science and technology into products and services.
Data is a resource like any other – it has value when it answers a need. We are rich in it, so we ought to focus on making sure it is able to answer the questions we ask of it.