A guest post from Johanna Walker and Mark Frank
Many existing approaches to open data measurement start from assessing ideal properties of datasets. But there is little evidence these properties capture the key things that data users need. Through an Open Data for Development grant from the Open Government Partnership Open Data Working Group, we’ve been exploring user-centred methods for measuring open data.
We started from the assumption that the purpose of open data metrics should be to determine the value of a group of open datasets to specific users. For this project, we have focussed on one kind of user: small third sector organisations working with the homeless in the UK (Winchester) and in India (Ahmedabad)
In order to ensure we were measuring attributes of importance to users, we created a ‘reverse-hackathon’ process, where we asked users to outline problems that mattered to them and then to identify the information that could be used to solve those problems. We then located data that could be used to create this information, and through a process of workshop discussion and role-play, established the key attributes for usability of the data. From this, we identified metrics for these attributes.
We used an approach that reflected the key challenges of making effective use of the the data, but also aimed to provide metrics which could be easily operationalised and applied. We used both subjective and objective metrics, though with a preference for objective metrics where practical. Where a suitable metric already existed, we used it.
Attributes and Metrics
The five relevant attributes derived from this process were discoverability, granularity, immediate intelligibility, authority and how linked the data is. Each attribute is scored on a points system, to build up an assessment of whether datasets on a given topic are fit for user needs.
Discoverability – Data Portal Points Measure
For any given dataset, points are awarded if the owner/published has an open data portal and desirable characteristics of the portal such as providing a list of unpublished datasets.
Granularity – Aggregation Scale
We evaluated two approaches. One was context specific and relied on a subject matter expert specifying appropriate levels of aggregation for that context e.g. country, state, city, street. The second approach looked for the presence of generic class data which can be used for aggregation such as post codes, dates, and gender.
Immediate Intelligibility – Supporting Documentation Scale
We evaluated a scale based on how quickly a user could access supporting documentation ranging from no documentation exists through to context specific information available instantly.
Trustworthiness – 7 Ws Score
Ram and Liu (2005) identified 7 questions which should be answered to make data trustworthy. We scored datasets on how many of those questions were answered in the metadata.
Linkable to other data
The Five Stars of Linked Open Data is an accepted and easily applicable measure of open data format standards which reflects the user need to be able to discover unanticipated relationships among data.
All these metrics have been piloted against data sets and assessed for validity, reliability, discrimination, transferability/comparability and efficiency. In India these were the data sets used in the workshops, in the UK they were the data sets used in the NESTA Open Data Housing Challenge.
What we’ve learnt
Although none of the user-derived attributes were completely novel, they demonstrate a different emphasis from ‘top-down’ or publisher-focused metrics widely used in the literature, and provide a way of assessing data in a given sector, rather than looking at isolated datasets.
The attributes we found to be important were similar for both the UK and Indian participants in our study. However, the features of data that matter may be very different for larger organisations, who may, for example, have more resources to invest in data discovery and understanding poorly documented data. This raises potential discrimination issues if publishers are influenced by the concerns of larger, rather than smaller third sector organizations.
The time and resource constrained nature of our groups meant that we had to focus on quick and simple to apply metrics. We were only able to find one pre-existing appropriately efficient and ‘low-tech’ metric for these attributes, the ‘Five Stars of Open Linked Data’, and one model on which we could base a metric. Otherwise, the metrics had to be developed afresh.
Our users were digitally literate, relatively skilled in administrative IT and domain experts. Despite this, there were non-trivial blocks to using Open Data;
Discoverability and the amount of contextual knowledge required to facilitate this is a substantial challenge.
Debating measurement methods
The full findings of this research will be presented as part of the Open Data Research Symposium ahead of IODC on 27th May, and then on Friday 29th will feed into discussions in the Measurement Action Area.
What do you think about the potential for rethinking open data measurement from the bottom-up, rather than the top-down? Do these metrics make sense to you? How do we balance easy-to-operationalise measurement, with measurement that matters to users. Share your thoughts in the thread below.