June 8, 2015 IODC

A guest post from Lynne McAvoy.

At the 3rd International Open Data Conference 2015 (#iodc15), moderator José Alonso of the World Wide Web Foundation, initiated the discussion on global data standards with a panel of open data supporters, including Caroline Burle, World Wide Web Consortium’s Brazil office, Hudson Hollister, Data Transparency Coalition, Chris Taggart, OpenCorporates, Sarah Telford, United Nations Office for the Coordination of Humanitarian Affairs (UN OCHA), with guest, Michael Cañares, from Step Up Consulting in the Philippines.

When asked what the phrase “global data standard” meant to the panelists, we heard that there are different areas in which standards are required. A global data standard would ensure a common semantic understanding across governments and countries, allowing us to effectively extract and use shared data, while improving data quality. Without such a standard, there is no basis for comparison of various social indicators, from government financial transactions through to fair contracting practices or environmental assessments.  Chris Taggart felt that the Open Contracting Data Standard is a first step in the right direction; by increasing clarity in the procurement process, the operational burden is reduced. Sarah Telford highlighted the importance of finding ways to communicate across countries, focusing on the importance of lightweight programs such as Humanitarian Exchange Language (HEL) to ISO standards, which provide stable, internationally agreed-upon guidance. Caroline Burle felt that a global data standard should allow the use and sharing of data and that, most critically, it enables people to use the Web from anywhere.

Drivers for data standard creation include cultural change and a top-down interest. In Jakarta, there are two kinds of response: for agencies lacking infrastructure, data standards provide a framework from which they can begin development; agencies with existing infrastructure can be resistant to onboard new standards, unless it can be shown how the new standard will increase efficiency or provide a quick solution to an existing pain point. Chris Taggart mentioned that such important tools as the Open Data Contracting Standard and the Legal Entity Identifierare critical as we move forward, and are a benefit to different levels of government in defining their data. An example of this is the location code PA: does this refer to Panama or Philadelphia? Sarah Telford stressed that expectations have changed. Previous data collection was done in Excel, using a cut and paste approach. Data collection methods must be nimble, and the time required to understand and use standards must be short. Caroline Burle has been promoting open data standards in Brazil since 2008, and has witnessed the rapid release of  both open data and its metadata.

According to the panel, the best data standard models that have emerged over the past few years are those which exploit top-down  and bottom-up engagement, which increase the chances of adoption. A lack of data awareness and knowledge is seen as a challenge in Jakarta, where the first draft of a metadata standard, based on the World Wide Web Consortium’s Data Catalog Vocabulary (DCAT), was edited beyond recognition because of language issues. By moving away from the standardized terminology, the agency involved impacted the potential for interoperability and data sharing. The risk is that one agency can say “yay” or “nay” to the proposed model. We must engage users and encourage them to modify their behaviour when collecting, describing, and using data. Data standard models that work are led by knowledgeable people who have the time to dedicate to doing it right. Chris Taggart mentioned that there are several barriers to using data standard models: projects close, we have to pay for ISO standards, examples provided in documentation often contain errors. This leads to the idea that standards are there, but only available to some. What is needed is inclusiveness and iteration. He feels that “Open access is a fundamental requirement to what we are trying to accomplish, and the legacy approach will not work”. Sarah Telford stated that rapidly-needed information requires a more nimble approach. The Humanitarian Exchange Language was born of necessity, but the reason it has been taken up so quickly is because a large organization, UN OCHA, is at the helm to promote it. “What makes a standard is that people adopt it and use it”.

When asked about a business model for maintaining a high-quality data standard, the panel agreed that it is not just about the money. Organizations need to change their culture and behaviours around data. Legacy systems and proprietary software are being put aside as we move to open data standards and open access tools, more nimble and agile models for handling data. The role of the Chief Information Officer is changing to include the responsibility of Chief Data Officer, hopefully because organizations are recognizing the importance of having leadership in changing the information and data management culture.

May 28, 2015 IODC

Screen Shot 2015-05-28 at 15.18.26Yesterday, attendees of the International Open Data Conference met for a pre-workshop exploring how we put users at the centre of the standards process.

The topic of standards is very broad: there are standards all around us, from the size of the nuts and bolts that hold up the buildings we’re in, to the shape of the plugs that connect our computers to electricity, through to the identifiers and field structured used in our spreadsheets and databases. Layers upon layers of standards build up upon each other – often invisibly fading into the background. Yet, standards shape our world. Data standards not only represent Screen Shot 2015-05-28 at 15.16.48phenomena – they structure them. What makes it into shared datasets influences what makes it into plans, which affects where resources flow and how decisions are made. Our discussions yesterday were about making sure the development of standards happens in ways that are connected to their use on the ground.

After an opinion finder activity, exploring whether participants thought global data standards are a net-positive or negative, we moved to discuss:

  • The potential for light weight standards – building on learning from the index-card sized HXL standard;
  • The importance and challenges of creating common identifiers;
  • The need for joined up data around the world – particularly to support global development efforts;
  • Processes for involving users in shaping standards – including the approach taken by Open Contracting of balancing a focus on data supply and demand through creating user stories and feeding in clear user requirements to the standard development;

Screen Shot 2015-05-28 at 15.19.49Attention then turned to creating a series of user stories, starting from identified needs on the ground for information, and then working through to explore what implications these information needs had for data standards.

These use cases looked at needs from supporting Ebola response in West Africa, to fact-checking in Argentina, building engineering standards in the Caribbean, and managing information on extractives and the Canadian government. Across the standards a number of themes emerged, from the importance of thinking about how location is represented (not all locations are latitude and longitude), to the need to think about the levels of data quality, granularity and timeliness that different groups need. Full notes from these different use cases will be shared soon.

Some of the take away suggestions shared at the end of the session included:

  • The need to develop proof-of-concepts that connect up different existing standards;
  • Identifying existing standards or specifications that should be more commonly adopted;
  • Documenting and sharing more use stories for standards – and making sure we have use cases at the national level;
  • The idea of dedicated institutional support for open data standards development;
  • Making sure we invest in public good infrastructures to make these standards successful;
  • Standardize user outreach as part of creating data standards;
  • Find lightweight versions of more complicated standards for onboarding users with limited resources;
  • Get out of the office and see how data is really being used
  • Review use case scenarios for a standard every 6-12 months and evaluate progress, changes, challenges etc.
  • Incorporate all levels of users… even those w/o computers

Tomorrow’s action sessions on Connecting Data Standards will hopefully dig into a number of these issues in more depth.

May 27, 2015 IODC

A guest post from UNOCHA on the Humanitarian Exchange Language data standard.

hxlUNOCHA is excited to announce the 1.0 beta release of the Humanitarian Exchange Language (HXL) standard, developed with the help of representatives from British Red Cross, the UK Department for International Development, the Humanitarian Innovation Fund, the International Organization for Migration, Save the Children, the Standby Task Force, the UN Refugee Agency, UNICEF, USAID, Ushahidi, the World Food Programme, and the World Bank.

How it works

Inspired by social-media hashtags, HXL helps humanitarian organisations add value to the data they’re already creating by improving automation and interoperability.  The principle of the standard is simple: we ask organisations to add a row of HXL hashtags to their spreadsheets, between the headers and the data, like this:


To support use in the field, the core standard is small enough to fit on a 4×6 inch (10×15 cm) postcard, copies of which we will be handing out at the Third International Open Data Conference in Ottawa, Canada.


Cooperation, not competition

Unlike most data standards, HXL is cooperative rather than competitive. A competitive standard starts by dictating to data providers how they should work:

  • Switch to a different data format (and acquire and learn new software tools).
  • Change the information you share (and the way the organisation collects and uses that information).
  • Abandon what is valuable and unique about your organisation’s data (and conform to the common denominator).

For HXL, we reversed the process and started by asking humanitarian organisations how they currently work, then instead of trying to clean up the (so-called) mess, we worked out how we could build a cooperative standard to embrace and enhance that work:

  • Users told us that they use spreadsheets for data sharing, so HXL works with tabular data (e.g., Excel files, Google Sheets, CSV API output).
  • Users told us that every crisis and activity has different data requirements, so HXL offers a selection of hashtags to mix and match to suit different reporting needs (e.g., activity reporting, population data, needs assessments, facility locations).
  • Users told us that organisations often collect types of information that no one else has, so HXL allows providers to leave some columns untagged, or to invent new hashtags for additional types of information.

Accompanying the standard’s beta release are public-domain Python and Javascript libraries for HXL processing (Java and PHP coming soon), as well as an early prototype online service for validating, filtering, transforming, and visualising HXL-tagged data on the fly.

Next steps

HXL has already gone through early trials during the West Africa Ebola and Nepal Earthquake crises, and we will be expanding to much-more comprehensive implementations over the next few months. We will also be releasing video tutorials and cookbooks of best practices, as those emerge from field trials.

For debate: Can we make all standards as simple, and cooperative, as HXL aims to be? Share your thoughts in the comments below.

May 23, 2015 IODC2

A guest post from Tim Davies, exploring how interoperability offers a richer conceptual framework for thinking about the challenges of joining up data. 


  • We want to be able to join up data from different sources, and to assess and increase the quality of information being shared;
  • An interoperability lens offers a better way to think about achieving this than a focus on standards alone;
  • The solutions will involve tools, identifiers, platforms, policies and collaborations as well as technical specs and standards;
  • Our challenge is to work out the right levels of interoperability to aim for, and the right strategies to achieve it

In full

There is a lot of focus in the run up the International Open Data Conference in Ottawa next week. Two of the Action Area workshops on Friday are framed in terms of standards – at the level of data publication best practices, and collaboration between the standards projectsworking on thematic content standards at the global level.

It’s also a conversation of great relevance to local initiatives, with CTIC writing on the increasing tendancy of national open data regulations to focus on specific datasets that should be published, and to prescribe data standards to be used. This is trend mirrored in the UK Local Government Transparency code, accompanied by schema guidance from Local Government Association, and even where governments are not mandating standards, community efforts have emerged in the US and Australia to develop common schemas for publication of local data – covering topics from budgets to public toilet locations.

But – is all this work on standards heading in the right direction? In his inimitable style, Friedrich Lindenberg has offered a powerful provocation, challenging those working on standards to consider whether the lofty goal of creating common ways of describing the world so that all our tools just seamlessly work together is really a coherent or sensible one to be aiming for.

As Friedrich notes, there are many different meanings of the word ‘standard’, and often multiple versions of the word are in play in our discussions and our actions. Data standards like the the General Transit Feed Specification, International Aid Transparency Initiative Schema, orOpen Contracting Data Standard are not just technical descriptions of how to publish data: they are also rhetorical and discplinary interventions, setting out priorities about what should be published, and how it should be represented. The long history of (failed) attempts to find general logical languages to describe the world across different contexts should tell us that data standards are always going to encode all sorts of social and cultural assumptions – and that the complexity of our real-world relationships, and all that we want to know about the different overalapping institutional domains that affect our lives will never be easily rendered into a single set of schema.

This is not to say we should not pursue standardisation: standards are an important tool. But I want to suggest that we should embed our talk of standards within a wider discussion about interoperability, and information quality.

An interop approach

I had the chance to take a few minutes out of IODC conference preparations last week to catch up with Urs Gaser, co-author of Interop: The Promise and Perils of Highly Interconnected Systems, and one of the leaders of the ongoing interop research effort. As Urs explained, an interoperability lens provides another way of thinking about the problem standards are working to address.

Where a focus on standards leads us to focus on getting all data represented in a common format, and on using technical specifications to pursue policy goals – an interoperability focus can allow us to incorporate a wider range of strategies: from allowing the presence of translation and brokering layers between different datasets, to focussing on policy problems directly to secure the collection and disclosure of important information.

And even more importantly, an interop approach allows us to discuss what the right level of interoperability to aim for is in any situation: recognising, for example, that as standards become embedded, and sunk into our information infrastructures, they can shift from being a platform for innovation, to a source of innertia and constraints on progress. Getting the interopabiliy level right in global standards is also important from a power perspective: too much interoperability can constrain the ability of countries and localities to adapt how they express data to meet their own needs.

For example, looked at through a standards lense, the existence of different data schema for describing the location of public toilets in Sydney, Chennai and London is a problem. From the standards perspective we want everyone to converge on the same schema and to use the same file formats. For that we’re going to need a committee to manage a global standard, and an in-depth process of enrolling people in the standard. And the result with almost undoubtedly be just one more standard out there, rather than one standard to rule them all, as the obligatory XKCD cartoon contends.

But through an interoperability lense, the first question is what level of interoperability do we really need? Andwhat are the consequences of the level we are striving for?. It invites us to think about the different users of data, and how interoperablity affects them. For example, a common data schema used by all cities might allow a firm providing a loo-location app in Ottawa to use the same technical framework in Chennai, but is this really the ideal outcome? But the consequences of this could be to crowd out local developers who could build something much more culturally contextualised. And there is generally nothing to stop the Ottawa firm from building a translation layer between the schemas used in their app, and the data disclosed in other cities – as long as the disclosure of data in each context include certain key elements, and are internally consistent.

Secondly, an interoperability lens encourages us to consider a whole range of strategies: from regulations that call consistent disclosure of certain information without going as far as giving schema, to programmes to develop common identification infrastructures, to the development and co-funding of tools that bridge between data captured in different countries and contexts, and the fostering of collaborations between organisations to work together on aggregating heterogenous data.

As conversations develop around how to enable collaboration between groups working on open aid data, public contracts, budgets, extractives and so-on, it is important to keep the full range of tools on the table for how we might enable users to find connections between data, and how the interoperability of different data sources might be secured: from building tools and platforms, working together on identifiers and small building-blocks of common infrastructure, to advocating for specific disclosure policies and, of course, discussing standards.

Information quality

When it comes down to it – for many initiatives, standards and interoperability are only a means to another end. The International Aid Transparency Initiative cares about giving aid recieving governments a clear picture of the resources available to them. The Open Contracting Partnership want citizens to have the data they need to be more engaged in contracting, and for corruption in procurement to be identified and stopped. And the architects of public loo data standards don’t want you to get caught short.

Yet often our information quality goals can get lost as we focus on assessing and measuring the compliance of data with schema specs. Interoperability and quality are distinct concepts, although they are closely linked. Having standardised, or at least interoperable data, makes it easier to build tools which go some of the way to assessing information quality for example.


But assessing information quality goes beyond this. Assessments need to take place from the perspective of real use-cases. Whilst often standardisation aims at abstraction, our work on promoting the quality, relevance and utility of data sharing – at both the local and global levels – has to be rooted in very grounded problems and projects. Some of the work Johanna Walker and Mark Frank have started on user-centered methods for open data assessment, and Global Integrity’s bottom-up Follow The Money work starts us down this path, but we’ve much more work to do to make sure our discussions of data quality are substantive as well as technical.

Thinking about assessing information quality distinct from interoperability can also help us to critically analyse the interoperability ecosystems that are being developed. We can look at whether an interoperability approach is delivering information quality for a suitable diverse range of stakeholders, or whether the costs of getting information to the required quality for use are falling disproportionately one one group rather than another, or are leading to certain use-cases for data being left unrealised.

Re-framing the debate

I’m not calling for us to abandon a focus on standards. Indeed, much of the work I’m committed to in the coming year is very much involved in rolling out data standards. But I do want to invite us to think about framing our work on standards within a broader debate on interoperability and information quality (and ideally to embed this conversation within the even broader context of thinking on Information Justice, and an awareness of critical information infrastructure studies, and work on humanistic approaches to data).

Exactly what shape that debate takes: I don’t know yet… but I’m keen to see where it could take us…


May 21, 2015 IODC4

A guest post from Friedrich Lindenberg proposing some common fallacies of data standarization projects.

Developing open data standards is all the rage. IATI, EITI, OCDS, GTFS, XBRL, SDMX, BDP, HDX – if your sector doesn’t have a cryptic-sounding data initiative yet, it probably will soon.

In fact, chances are that you’re drawing one up right now (I am). In that case, here’s a list of things you may believe about your data standard. They are probably not true:

  • Policy and tech people on your team mean the same thing when they say standard. There is a magical and unbreakable bond between conventions for government policy and column naming schemes.
  • Different systems of government will produce data that should be expressed in the same format. Everybody’s mechanism for debating and making laws, or for handing out public contracts, or for managing public funds is basically the same, right?
  • Standards are tools for publishing data. Use cases can be derived from the data structure available in your in-house database. For end users, put ‘researchers, journalists, NGOs’. Never put your own name there, your job is to empower others.
  • Many people will develop tools and platforms to handle your data. You will not be stuck having to pay for your own ecosystem for the next fifteen years. The tools will actually work with data published by different sources.
  • The economics of standardization always work out. This is true even when to date, nobody has been using the data. They will be able to do that more effectively now. When evaluating gains, think about Nairobi startups, not your own organisation.
  • Your committee is the centre of the known universe. It is your duty to specify what countries exist in the world, what currencies they use and what constitutes a company. DublinCore is the maximum level of possible standards re-use.
  • A standard is the best way of publishing data. Having a centralized API that actually works and has data quality assurance built in would be some sort of tech imperialism. The coordination cost of spreading data all over the web and collecting it upon use is lower.
  • All data analysis problems are global. You need a standard before you can derive knowledge from data. As your data scales, the analytical questions people are working on will begin to apply to vastly different contexts.

Have I missed any?

May 13, 2015 IODC0

A guest post from Bill Anderson of Development Initiatives on the challenge ahead to make sure we have the data that matters. 

Decision-makers and those that hold them to account, whether at local, national or global level, require access to usable, meaningful information that throws light on the problems they are seeking to solve. Putting information into context invariably requires joining up data from a variety of sources.

In the past five years open data champions have had many successes persuading both governments and others to open up and there are a now hundreds of public portals containing a huge variety of datasets. Yet most of these repositories are silos: containing similar data, yet in different formats, structures and standards.

It remains incredibly difficult, for instance, to assess government spending in the context of the demographics and impacts of its intended purpose. It remains a huge challenge, often with life threatening consequences, for humanitarian aid workers to match financial and material resources against needs and conditions in the midst of an emergency. We clutch at jigsaw pieces and very rarely see the finished picture.

One of the biggest blocks to joining up data is a lack of compatible standards. We are faced with different rules for access, different data formats, different data definitions and different quality guidelines. This poses a number of challenges.

  • Technical standards need to streamline and simplify the way in which machines are able to share data – both with humans and other machines
  • Publishing standards need to ensure that similar concepts are both comparable and interoperable across different schema.
  • Global data standards defining such things as geospatial and functional entities need to describe the world in a common, or at least translatable, language
  • Data quality standards need to ensure that data is not only logically valid, but that it is both accurate and meaningful.

There is, too, a bigger standards picture: often the data presented to us is not what we think it is. Here’s an example.

The number of women who are still dying during childbirth is an obscenity. Reduction of maternal mortality was a key target of the Millennium Development Goals and remains one in the post-2015 SDGs. How do we know how many women are dying? In 65 countries they are counted through the national registry of deaths. In the remaining 116 countries maternal mortality is calculated through an algorithm using the GDP, a (calculated) fertility rate and an (estimated) statistic called “Skilled attendant at birth”. In other words we have absolutely no idea how many women die in childbirth. This is why the call for a data revolution for sustainable development is gathering steam with African countries in the forefront of the struggle.

We have serious work to do in Ottawa. Data users, infomediaries, decision-makers, watchdogs and others frustrated by a lack of meaningful joined up data need to articulate their demands and put pressure on all standards bodies to get their acts together.

What are your priorities for action on open data standards? Share your thoughts in the comment thread below…

May 13, 2015 IODC3

A guest post from CTIC exploring work towards developing common open data standards. CTIC has been involved in many of the most important open data projects in Spain over recent years, from the implementation of the national catalogue, and multiple initiatives at local levels, to working as a partner in the SharePSI2.0 project, that brings together 40 partners from 25 countries with the aim of harmonizing the Opendata in Europe and identifying good practices.

In the early days of the open data movement, the call was for “raw data now”, but more recently we’ve been learning to get more strategy. To paraphrase the campaign led by Tim Berners on creating a ‘Web we Want’, it’s time to be talking about the “Open Data we Want” and how it should be published using common standards.

The growth of open data

Since the “Big Bang” that occurred with the launch of Data.gov in 2009, we have seen an expanding Open Data universe, with an almost exponential growth in terms of Open Data initiatives and portals throughout the world. In Europe, we have seen a strong push from the European Commission through the development of the legal frameworks for open data and the launch of projects for the development and dissemination of Open Data, such as:

At Spanish level, this process began in 2010 with the publication of catalogues in the Basque Country and in the city of Zaragoza and then the publication in 2011 of the National Catalogue datos.gob.es. Since thenthe process has grown to include more than 110 open data initiatives at all levels of government: national, regional, and local  (although some initiatives have fallen by the wayside).

Besides the momentum that the government of Spain has provided for open data, a key driver of the success of the national catalogue has been the process of  federating catalogues. Through this,  data from more than 75 catalogues in  Spain are brought together representing about 60% of the entire national catalogue. This  is enabled by the catalogue description standard DCAT, developed by the W3C.

This progression has been measured in the last Estudio de caracterización del Sector infomediario en España (2014), a report taking into account, among other factors, the economic impact on employment and Opendata.

Considering all of this, it would seem easy to say that this is a resounding success of the Open Data initiative. However, if we remember the expectations raised initially, it is clear that we have made progress but we have not reached the full promised value generation of open data.

Standardizing key datasets

One of the main hindrances causing the reuse of data more difficult is the lack of standardization. The same sets of data provided by the city of Madrid, should be easily discoverable in Barcelona, Paris, Amsterdam or Ottawa. The standardization of this data would enable an application created by an entrepreneur in Spain to be able to potentially use it anywhere in the world.

The process of standardization requires a continuous and on-going  dialogue between those who produce data and those who use it. Practical standardization should focus on the most popular datasets, in order to deliver the maximum benefit.

In Spain, this process began in early 2015 with the publication of the Spanish standard UNE 178 301:2015, created by the standardization group Smartcities. It defines a set of indicators divided into 5 axes (political, organizational, technical, legal and economic) as well as a measurement metric that can assess the level of open data initatives in cities. But, event more interestingly it defines 10 datasets, along with their corresponding vocabularies, that governments should publish, and in the process this recommendation paves the way towards standardization: at least for cities.

The ten datasets are their schemas are listed here:

Over time, this process must move towards a greater number of datasets and recommended vocabularies and must provide an extension to other levels of the government in order to improve coordination and standardization of public catalogues, which would mean a real improvement in the ratios of reuse and the economic value produced.

How can we encourage more global collaboration around this standard setting? Or is standardisation always a national task? 

May 5, 2015 IODC0

In this guest post, Tom Lee of Mapbox argues that simplicity is vital for successful data standards, even at the cost of expressiveness, and that standards can help overcome barriers to government coordination:


Finding your way through the world is a basic need, so it makes sense that satellite navigation systems like GPS and Galileo are among open data’s most-cited success stories. But as wonderful as those systems are, they’re often more useful to robots than people. Humans usually navigate by addresses, not coordinates. That means that address data is an essential part of any complete mapping system.

Unfortunately, address data has historically been difficult to obtain. At best, it was sold for large amounts of money by a small set of ever-more consolidated vendors. These were often the product of public-private partnerships set up decades ago, under which governments granted exclusive franchises before the digital era unveiled the data’s full importance. In some cases, these agreements mean that the data simply isn’t available at any price.

The situation is finally improving, in large part thanks to the open data movement. Scores of governments are beginning to recognize that address data is an important component of open data policy. But the global scale of the internet means that developers need a consolidated, standardized dataset. Governments don’t have the inventives or resources necessary for that kind of coordination.

This is why we’re helping build OpenAddresses.io. With over 115 million points, it’s the world’s largest set of openly-licensed address data. This collaboration is made possible by the data offerings of forward-thinking governments, classic open-source community norms, and the time and talents of its volunteer researchers and developers.

But there is another secret to the project’s success: its approach to standards.

OpenAddresses is not, after all, the first project to attempt to organize address data. But it is the most successful. In part, this is because it embraces simple, widely-understood standards, both technically and legally. The important of this approach cannot be overstated.

In the US, the Federal Geographic Data Committee has developed what is, by all accounts, a comprehensive and well-designed XML specification for address data. It was five years in the making, and reflects the wisdom of a dozen agencies and countless expert contributors. You can review it for yourself here. It’s 555 pages long, and, to my knowledge, not fully supported by any open source software at all.

By contrast, OpenAddresses assembles data into CSV files containing Longitude, Latitude, Street Number, Street Name and Postal Code. We don’t support multiple projections, or even address unit numbers! Our chosen standard is woefully underpowered relative to the capabilities of the FGDC design. It can also be read by every programming language, derived from a huge variety of source data formats, used with any modern web mapping library without worry over coordinate systems, and understood by developers at a glance rather than after a week of research.

The situation is similar when it comes to how the project’s data is licensed. OpenAddresses.io only collects and republish sources that are offered under terms similar to or less restrictive than Creative Commons By-Attribution (CC-BY). This is in contrast to the restrictive Open Database License (ODbL) that is used by OpenStreetMap (OSM).

OpenStreetMap is, frankly, a better place to put address data. Its enormous community of volunteers can not only organize address data but collect and maintain it, and the project’s technical toolchain and collective expertise are vastly larger than that of the relatively small OpenAddresses team. But OpenAddresses now contains more than twice the address points of OpenStreetMap, and its data is being used in a much wider variety of geocoders. OpenStreetMap’s viral ODbL license poses gnarly legal questions, particularly when it comes to geocoding, whereas OpenAddresses’ attribution-only approach offers a simple and straightforward approach. The ODbL is also much less widely-adopted than CC-BY, and has consequently been subject to much less jurisprudence and expert review.

Simplicity has made it much easier for OpenAddresses to represent a vibrant and useful resource. Some will contend that a focus on early simplicity amounts to mortgaging a project’s future. And it is true that bolting more advanced capabilities onto a simple format can be a challenge (I’m sure that, for example, more than a few GTFS developers familiar with CSVs were thrown for a loop when real-time capabilities arrived with their use of protocol buffers). But one does not need to look very far on the internet to find powerful, nuanced and barely-used standards, from XBRL to Akoma Ntoso to SOAP to the vast mausoleum of Semantic Web technology proposals.

Standards are about shared understanding. In this realm, at least, simplicity will always trump power. That principle is helping us solve the world’s open data address problem. It can help solve other problems, too.

Do you agree? Should simple CSV standards be the focus of the open data movement? Or should more comprehensive standards be the goal? 


Use of cookies

This site uses cookies in order to improve your user experience. By continuing to use the site, you are agreeing to the use of cookies and accepting our cookies policy. .