On January 20th the UK Government launched the beta-version of its new website, www.data.gov.uk, where members of the public can access official data (2,500 datasets) such us traffic statistics, crime rates, exam results, house prices and much more. Surely making this data readily available to the voting public will avoid those pointless political arguments about what the statistics really mean?
This release follows other initiatives around the world including the American repository, www.data.gov , that provides information ranging from the US defence department to NASA. One of the biggest caveats around all data is the reliance on correct data interpretation. It is easy to use it to come to the wrong conclusions, to ignore its limitations, to be misled by samples, averages or outliers. Being statistically savvy is the most important criterion for making good use of all this data. A recent example of the possible confusion caused by data interpretation was highlighted by ‘More or Less’ on Radio 4. After the latest release of unemployment figures, two members of different political parties quoted different unemployment figures from the same report. One of them quoted the official headline number of unemployed people whilst the other quoted the number of ‘inactive’ people ‘wanting a job’ (those who have not been looking for work in the last four weeks but say they would like a regular paid job plus those that look for work but can’t start within two weeks). We located the source data by using our new favourite site, http://www.data.gov.uk/dataset/labour_market_statistics and represented the key tables in the charts below:
Chart 1: Unemployment figures with economic inactivity and reasons Needless to say, the reports released last week show one of these values is going down whilst the other is rising. Our chart nicely illustrates the reason why the (rising) figure of ‘inactive’ people ‘wanting a job’ (bright blue) is not included in the headline unemployment level (bright red) - namely that many of these people are economically inactive (e.g. students, carers, retired, long-term sick) and aren’t able to work even if they say they want a job. Once the correct data has been identified the issue turns to how to represent it in a clear and meaningful way. One of the top applications making use of the new repository is www.wheredoesmymoneygo.org which uses Public Expenditure Statistical Analysis (PESA) data to show total public expenditure broken down by geography, services and year. The Capgemini Operational Research team specialises in data visualisation and is delivering similar visual and dynamic outputs in conjunction with a number of local government Total Place projects. One part of these projects is to ‘count’ public expenditure across all services (local government, justice, health, emergency services etc). By identifying and avoiding overlap, the Total Place initiative looks at how a ‘whole area’ approach to public services can lead to better services at less cost.
Chart 2: Example of Data Visualisation: Waterfall diagram It will be interesting to see, especially in an election year, how data.gov.uk and its associated applications evolve. As Sir Tim Berners-Lee pointed out, large organisations in general and governments in particular have historically been reluctant to free data because they are worried it will be interpreted differently from the ‘official’ line. …but remember the famous warning: you can prove anything with statistics and as Winston Churchill once said: "When I call for statistics about the rate of infant mortality, what I want is proof that fewer babies died when I was Prime Minister than when anyone else was Prime Minister!"