We Should Adopt Open Data, With Caution

A protean army of computer scientists, hackers and citizen researchers think we are living in an era of data access prohibition. February 22 was heralded as international open data day. The unofficial mission objective was to 'liberate data sets' -- a phrase popular with those whom we call 'data absolutists'.

Their argument is compelling. Unshackled data -- such as government-owned datasets on rates of illness in discrete populations -- are rich repositories of hidden gems of insights, from which citizen researchers, provided they have access to the data, can investigate in order to relieve human suffering. Philosophically, openness fulfills the grand vision of the Web, which has always been to break down information hierarchies: to make all information elegantly structured, equal, free and useful.

'Data absolutists' believe that citizen researchers -- not just University academics with specialty access to taxpayer-funded data sets -- can infuse their wisdom into the data, and thereby 'mash up' information on, say, hospital patient safety records, using location-based data, or 'patient experience stories' on open-access blogs. We side with the data absolutists, not the 'restrictionists' -- to a point.

Open data has as its mission a twin goal: to not only enable disenfranchised citizen researchers to legally reuse and redistribute the data, but also to enable researchers to access that data in easily manipulated file formats for analysis. Combining ease of access with free access drives greater participation.

England is leading the way. Hosted on October 31, 2013, the Open Government Partnership London Summit with 61 representatives of member states was hosted by David Cameron, the UK Prime Minister. Open data, he said, is not a "nice to have" but is "absolutely fundamental to a nation's potential success in the 21st is a vital part of any country's plan for prosperity."

Data absolutists champion a kind of virtuous feedback loop. First, the analysis and 'mash ups' of these data sets offer social and commercial value to all citizens. Second, increased awareness of this value leads to improved participation and engagement by citizens who then demand more openness.

In 2009, Sir Tim Berners-Lee, inventor of the word wide web, put out a plea for raw data. The armies of open data enthusiasts are stepping into the breach and advocating for sweeping change across the globe. The Open Government Partnership now has 63 countries signed on and is "committed to making their governments more open, accountable, and responsive to citizens."

But pay attention the data restrictionists, who want to limit access to data. Why so? They are well-intentioned. Consider the dangers of a rogue citizen researcher potentially de-anonymizing data sets; or manipulating data such that it is possible to publish online information about who suffers from chronic illness in tiny communities. More than 40 per cent of Americans and more than 40 per cent of British citizens are very concerned about how their personal data is used, according to new insights from the Global Business Research Network. Yet there is large variation in what people consider sensitive personal data; some think the past websites that they visited are sensitive data; others do not.

As the Global Business Research Network data discussed at the IIeX Amsterdam data conference recently, there is a need for permission-based explicit consent, anonymization ("the right to be forgotten"), and transparency in how any public data will be used (e.g., data linkage). Yet the risks of linked data sets, we believe, can be solved through rigorous encrypted de-identification.

Just because there are grey areas of dispute does not melt away the strong arguments of those who are lobbying for more open data. Using data from the UK government's open data website, researchers were able to analyze family physician prescription patterns which may help guide decision makers in identifying cost-saving measures. When the earthquake hit Haiti in 2010, people collaborated on geospatial data sets for risk assessment. At, locations of health facilities and Cholera Treatment Centres surfaced in files that could be easily mapped. That helped aid organizations collaborate and use their resources and donations to ensure maximum impact.

We believe making data free and open needs to be guided to ensure high impact and meaningful engagement. Guided engagement can play a part in defining a critical set of questions that need to be answered. For example, global pandemic surveillance data, perhaps the most closed data base in the world whilst an epidemic is emergent, needs interpretation guides by expert public health authorities who can point out to citizen researchers the potential use and abuse of such data. To save lives and relieve human suffering, we need people not only to use the data, but also to suggest improvements to the data sets, to collaborate, and, through collaboration and refinement, to get that data quickly into the hands of decision-makers, such as the WHO or the CDC.

For the International Open Data Day Hackathon there were 194 'hackathons'. Many of these cities have municipal or provincial engagement. There have also been national hackathons such as the Canadian Open Data Experience (CODE). Hackathons are bringing together governments, designers, hackers, and citizens to generate ideas of how the data case be used. But focus and collaboration are critical.

The Sunlight Foundation has been using open government data to create "technology to enable more complete, equitable and effective democratic participation." Careful planning of the hackathons has amplified the potential impact of the data by focusing people's energy on key issues.

Another example of guided engagement was the development of DSM-V, the manual for diagnosing mental illness. The American Psychiatric Association sought out views of the public, including patients, researchers, and physicians, to update the definitions of certain illness categories and incorporate into new diagnostic criteria the range of patient and caregiver insights that had been historically ignored.

Successful open data initiatives show that artfully "guided advice" by researchers on how to use the data is important. We cannot let "open data hype" get in the way of the real goal: engagement and mashing up data to deliver high ROI. Serendipitous discovery does occasionally happen when the data "hang open," yet serendipity can be accelerated; it can be gently shaped to ensure the right confluence of players communicate better to solve the wickedest problems of our time.

This article was co-authored with Sabrina Tang, a Junior Fellow at Massey College in the University of Toronto and graduate student in biomedical engineering at the University of Toronto