20161209_142839_HDR-1200x675.jpg

December 27, 2016 Marc Garriga

Marc Garriga (@mgarrigap) graduated in Computer Engineering from UPC (Polytechnic University of Catalonia) and in Market Research and Techniques from UOC (Open University of Catalonia). Being an expert on Open Data and, more generally, in open government, he has taken part in various initiatives in each field. By mid-2012, he set up desideDatum Data Company, a company focused on offering services related to management and data opening such as consulting the Generalitat de Catalunya (on the creation of the Catalonia Transparency Portal, which will host more than a thousand public entities), among many other projects related to open data, open government and Smart Cities.

Besides, desideDatum Data Company is one of Socrata’s main partners, the most important open data service provider globally.

He is a pro-public sector, pro-transparency, pro-open government and pro-Network Society activist; among others, he is founding member of the Spanish Chapter on the Open Knowledge Foundation Network (OKFN-Spain), of Xarxa d’Innovació Pública (XiP), of the group Catalunya Dades, and he is a node atTheProject (THP).

For the European experts in open data and open government, it’s been an honor for the Old Continent to have the opportunity of hosting just within two months the two main international events in the matter: IODC and OGP.

At the beginning of October we had in Spain for the first time (and also for the first time in Europe) the foremost international event related with open data: IODC. This consisted in two very intense days adding the rest of the week with numerous associated pre-events. Also, at the beginning of December, the Open Government Partnership international congress took place in Paris, France, also Europe.
That means that within two months Europe has hosted the main events for the opening of the public sector; once again in Europe -withot prejudice to the rest of the world- is where the opening and innovation in public administration leads. Both congresses convened the main international experts on said fields (open data at IODC and open government at OGP). Given the tight relationship between both concepts, obviously we were many the ones who attended both events.

As usual, both conferences ended up with their corresponding conclusions. One stands out above the others, and even though maybe IODC has a more technical approach and the OGP has a more political one, this conclusion was common in both events (as it is supposed to be): the need for demonstrating the IMPACT (in capital letters) that open data and government generates in our society.

In other words, we experts have explained for a long time that the new form of governing has to involve opening our public administrations to society, to its actors (citizens, private sectors, organizations in general, etc.). This means that these actors must stop having such a passive role to become absolutely active in the public activities, from their design to their provision.

The opening of our public administrations is what was put to debate in Paris, and the bases on which this opening is supported is open data, discussed in Madrid.

However, although the number of administrations that are opening to the public is growing -in Spain, Europe or any other part of the planet-, what’s true is that the impact presumably derived from open government data has not reached the citizens yet.

No one discusses that having open governments is beneficial for society, but to what extent? How can we measure that impact? What good examples -I mean groundbreaking examples- do we have that show clearly that the opening is positive for our society?

Each day we have more and more people, more governments and more countries that favor the opening of their public institutions (we should remind ourselves that three years ago France wasn’t part of the OGP and, three years after, has celebrated the most important congress of this partnership). This entails that there are less doubts related witht the fact that opening is the way to follow; there are many reasons, beginning by the, if I may, ethical reasons: all that’s public should be at any citizen’s reach (except for limited exceptions).

However, it we don’t show a REAL impact of opening we can fall in short time within an Openness Bubble. In other words, we are advocating for opening without being able to clearly show why this is really the key towards the necessary change of paradigm of the public sector. We urgently need to show that policies favoring public institution opening are beneficial and besides we need to quantify that they are on a certain magnitude, according to the indicators we believe most appropriate (this was another common conclusion in both congresses: the need for metrics that help us see quantitatively how our open data initiatives develop).

Only like this we can keep organizing congresses (in Europe or wherever else) favoring open data and, most importantly, this is the only way we can convince the most sceptical that opening is the path to follow in order to achieve achieve the best management of our public institutions and the best governance over us all as citizens.


ehpyd4u5830-kalen-emsley-1200x800.jpg

December 21, 2016 Pablo Rodríguez Bustamante

Pablo Rodríguez Bustamante, Geographer, Owner-consultant at GEOCyL. He holds a Geography, Urbanism and Land Management Advanced Studies Diploma (recognized researcher), PhD candidate and Associate Professor at University of Valladolid (Spain). “Mi Ciudad Inteligente” -My Smart City- Project promoter. He collaborates with several web/blogs writing articles and posts about smart cities, smart mobility and geomarketing and has been recognized with several awards for entrepreneurship.

Data have to be: accessible, understandable, usable, must create dialogue and impact to make this planet a safer place for us and the next generations. With these tools, we could help people have more comfortable lives while helping the planet, towards a state of general welfare.

“For example, regarding climate data, it is important to use information to take the best possible decision and analyze the available resources, with the objective of maximizing or replicating in order to verify. The result of these activities is saving lives,” Sergio Estela stated (Spain, founder and chief design officer at Vizzuality).

This information helps people make better decisions. Data are used so that people contribute, create, compare, transmit and respond to their own stories and projects.

“Open data are actually at the core of each dataset; they are available to download for free, and it is an open data portal because we believe information and data to be truly powerful in the hands of people.” Carolyn Tirelly (World Resources Institute, the Global Forest Watch Team).

To organize all these data and this kind of information, a GIS becomes necessary to retrieve information and other datasets; and, as mentioned above, this information is very powerful when people have it. For example, Global Forest Watch Climate is an application that goes beyond monitoring deforestation,  as it also monitors carbon emissions from deforestation and allows visualizing all this information on a GIS.

There are two essential aspects in open data and the environment: first, trying to get the government to open as many geospatial data as possible, and second, analyzing data and developing digital tools so that people can use this information.

We need official data, scientific data and local knowledge to implement a comprehensive open data portal and satisfy people’s needs. In order to elaborate a comprehensive cartographic and statistical report about the configuration of the land, one must start from open data from legal tools that control land and natural resource use. The keyword is implementing; implementing official information about land use through analysis mapping tools and, of course, open data. It is important to develop historic databases of the land because if we look back, at least 20 years ago, we might not get this information about all kinds of matters: minerals gas, fishing resources, environmental impact assessments, toxic and polluting emissions, GMOs, etc. It is not enough for the government agency to publish some official open dataset; it is also necessary that governments and businesses share their data. Each methodology has its own limitations and related problems, but it must try to get to a better place,” as stated by Manuel Manu (Mexico, NGO director, GIS specialist).

To sum up, the environment is a sector in which, unwittingly, there is also a technological transformation based on digitalization and the world of data. For example, according to the European Commission, the collection and publication of environment data has been one of the priority aspects on the subject of open data since 2014.

Exactly, which data can we found published? And what can we do with them? Some examples are:

– Information about the amount of polluting emissions, being able to control them so as not to exceed the maximum allowed, which would be harmful to our health.

– Information on air quality, which allows taking measures and restrictions on pollutants (for example, in Madrid the maximum speed for vehicles is limited in order to reduce emissions).

-Information on pests, so as to control and prevent them.

– Information on pollen levels, very important for allergic people, as this is how they know about the impact level of these agents and when to take medication.

– Current meteorological information, provided in advance in order to be able to carry out statistical analises and obtain a probability of events.

– Other data and public services such as noise pollution, sea conditions, water quality and management, waste management, protected areas and species, etc.

– Also, its knowledge contributes to environmental education, as it reveals the relationship between man and nature and provides this knowledge to all people. All this focuses on an improvement in sustainability.

Finally, we currently have a great amount of data, numbers that, in many cases, only remain like what they are, since we still have to advance in their analysis. Not individually, but interrelating and analyzing them. For that purpose, the standardization of information is primordial.

 

 

Featured cover photo by Kalen Emsley


l-lxknv6f7s-ming-e-yip-1200x803.jpg

December 15, 2016 Greg von Nessi

Mathematician, physicist, and data analyst with a love of mathematics, algorithms, machine learning and problem solving.

Holding a Ph.D. in Mathematics, Greg has an over 15 years’ experience in using mathematical modelling, scientific programming, empirical methods and statistical data analysis to solve interesting problems in both the academic research and private sectors.

For the last 2 years, Greg has been working in the private sector as a Data Scientist, using his experience to tackle data problems in finance, inventory management and business decision making.

As the Lead Data Scientist for Link Digital, Greg combines his experience with multiple programming languages, big scientific data and private sector data adding immense value to the team’s CKAN, Open Data and Data management services.

WHAT DO PEOPLE WANT FROM DATA?

It may be obvious but people are not businesses. With all the buzz and talk surrounding ‘Big Data’ making just about everything better, it’s easy to forget that ‘Big Data’ is really a collection of concepts that benefit businesses much more than the individual; and that’s not necessarily a bad thing. Yes, the person in front of a screen may get some more relevant search results or advertising pop-ups, but ultimately, it’s the market edge a company gets when using ‘Big Data’ that has led to its boom. Again, not a bad thing.

“…for most people, data translates into getting something done or learning something.”

But what about the people? While everyone uses data all the time to make decisions, learn, formulate views, etc., no single individual has the ability to statistically process the amount of data that a company like Google or Amazon does on a continuous basis. If you’re looking to buy a home and want to figure out what the fire risk to the property is, chances are you are not going to be too happy if your real estate agent just plops a bunch of actuarial tables in front of you. However, if you are an insurance company, those tables are exactly what you want for determining fire risk. People need something to pre-process the raw data before they can make sense of it.

So, if people need an intermediary to make sense of raw data, what are they? We categorise intermediaries as either being an application or piece of data journalism. Here are some examples:

Applications:

• Search Engine Interfaces

• Trip Planners

• Online Maps

Data Journalism:

Sustainable Energy Without the Hot Air (excellent book by D. J. C. MacKay)

XKCD (funny webcomic with some fantastic data visualisations)

• Political polls/trends

Obviously, the line between data journalism and applications can be pretty blurry; but we generally classify an application as something that empowers an individual to accomplish some task, where a piece of data journalism facilitates some form of learning or development of a personal view or opinion. So, for most people, data translates into getting something done or learning something. Of course, ‘getting something done’ or ‘learning something’ may be just a part of a larger, individual goal; but we think those two broad motivations are particularly effective in distilling what people overwhelmingly want to get out of data.

DECISIONS, DECISIONS

Most people have no problem articulating what their preferred news sources, writers, phone/computer apps, etc. are. While humans are not so crash hot on processing raw data, we seem quite comfortable making judgment calls on entities that process the data for us. Do you prefer to watch CNN, Fox News, BBC or Al Jazeera? Each one of those news outlets, by and large, has access to the same news sources; but they all present the data from those sources in very different ways to their viewership. Despite this, humans have no problem picking one or more as being their preferred. In some very real sense, this sort of decision making is something we have evolved to being good at. However, with the advent of the internet, the target of this data filtration is no longer restricted to faceless markets/demographics; it can be tailored right down to the individual. For instance, modern internet search engines will normally try to factor in a user’s previous search history when processing a list of search results. So, while one person searching for the term ‘fusion’ may get a list of links about nuclear physics, another may get a list of cooking sites. As social creatures, reliant upon communities to survive, we have always had to make decisions on who can do what the best (i.e. no one person is the best at everything). This ability to make judgments of authority and delegation is something our brains excel at by design.

NOT JUST A SOFTWARE ENGINEERING PROBLEM

OPEN DATA PORTALS (ODPS) AS A FOUNDATION OF TRUST

Coming back to applications and data journalism, a picture starts to emerge.

diagram_1

So, to bring data to the people there needs to be a facilitator between data and the applications/journalists that need to use it. With that facilitator in place, apps will be created, data journalism pieces written and the people will then be able to start using those pieces of work. However, for this data chain to be effective and efficient, certain key requirements need to be fulfilled: data needs to be accurate and easily discoverable, while the data processing entities need to be known and trusted. Trust is the only one these requirements that has a dependency on the others. To put it another way, a trusted processor of information has to be known to individuals, while being able to access as much accurate data, relevant to its context, as possible. Obviously, these are not the only necessary requirements that empower trust, as it is still possible to misuse good data. In this context one can look at application and journalism entities, as transforming the individual’s task from processing data to that of making a judgment call based on how much they trust the processor.

diagram_2

“With a strong open-source community, the pathways for extending awareness, generating effective advertising, etc. for data processing applications becomes a lot clearer and are more easily distributed than in the case of using a piece of proprietary software with developers being very isolated from users.”

WHY OPEN SOURCE?

The image above defines two engineering domains, which have some level of overlap where they both meet at the App/Journalism layer of the graph. As one moves from the datastore to the user, the blend of engineering tasks gradually shifts from software to social. At the data layer, it is obvious there are a lot of technical tasks that need to be addressed with hardware and software implementations. Similarly, at the user layer, there are obvious social tasks that need to be engaged that surround advertising, awareness, community building, etc. In this view, open software solutions start making a lot of sense, as a successful open source data portal will be one that not only produces a strong software solution but one that also fosters a strong developer and user community around that software. Moreover, if we are considering a data chain that is sensitive to individual and not just a market context, an open source solution helps protect the interest of minority users within a processing application’s target. In particular, an open source project can develop a diverse set of features, which may not otherwise fall in line with a proprietary solution’s need to maximise the profits of its software. With a strong open-source community, the pathways for extending awareness, generating effective advertising, etc. for data processing applications becomes a lot clearer and are more easily distributed than in the case of using a piece of proprietary software with developers being very isolated from users. Thus, the gradual shift from software to social engineering can be deeply merged into an open-source development workflow. An excellent reflection of this point is how open source projects provide a cornerstone to many civic hacktivism efforts, which include public hackathons. The explosion in open source project awareness and proliferation has subsequently led to a boom in these groups and efforts. Indeed, by using existing open source codes, developers involved with civic hacktivism do not have to re-invent the wheel every time they engage in a project and are thus able to create genuinely useful pieces of software in a very short amount of time. While civic hacktivism is a growing social trend, it requires heavy lifting in both the social and software engineering domains whose continued growth will be greatly aided by the expansion of open data publishers and greater engagement/appreciation from end users.

Beyond the above arguments, having an effective integration with a distributed, open-source community helps manage the risk associated with getting locked into proprietary solutions or a single vendor. It also becomes a lot clearer about what features are well-supported/established and which are not; again, this can facilitate better risk management on the part of the ODP in terms of determining which features/extensions get deployed. In general, the open source model is naturally resistant to information/business silos being constructed within the project, which are generators of risk.

Some potential issues with open source are that it can take a long time for the software to get features to a mature, well-supported state. Thus, there may be situations where an ODP needs to implement a feature on timescales faster than which the community is moving. This can lead to frustration and some poorly thought out implementations. However, this risk can and should be mitigated by strongly feeding back into the core open source implementation via pull requests, posting issues and core developer involvement. This feedback ensures that features get implemented correctly in the codebase, or at the very least alert the core developers to potential needs for architecture restructuring. Again, the development of an ODP is both a software and social challenge; thus, a good ODP will always have a strong feedback into its core open source community.

MACHINES ARE USERS TOO

Stepping back into a higher level view of ODPs, it is clear that a good ODP is needed to empower the apps and journalistic works that ultimately bring data to the people. However, this also implies that the primary users of ODPs are machines, not people. This perhaps unintuitive conclusion is of key importance when it comes to the technical design of an ODP, as high-value features for machines are generally quite different from those that are of high value to individual users. Of course, ODPs need to be usable by people as well as machines, as humans are still required to make many of the decisions surrounding data curatorship; and if nothing else, individuals should always have the ability to investigate data directly for themselves. However, the vast majority of ODP usage will be via machines and not via a direct human interface.

Ultimately, this combined with the above leads us to the following set of desired characteristics for an ODP:

  • The ODP be open source.
  • Built primarily for app/journalistic intermediaries that, in turn, help people with tasks or learning.
  • Empower intermediaries to deliver data to individuals that best fit their personal context.
  • Empower intermediaries to build trust with their users by ensuring provided data is easily discoverable and reliably accessible.

NITTY-GRITTY TECHNICALS OF AN OPEN DATA PORTAL

WHAT EVERYONE EXPECTS OF AN OPEN DATA PORTAL

We have so far painted a picture of how the ODP fits into the wider chain that supplies data to people and that an effective ODP has to be one that empowers intermediaries in their efforts to garner trust amongst potential users. Now, we zoom in on some of the technical requirements an ODP needs to satisfy to be an effective part of this data delivery chain.

Before moving into specifics, we make some assumptions about features an ODP is to provide; a standard base level of expected functionality:

  • maintains a catalog of metadata
  • both custodian users and machines are able to manage the metadata catalog
  • options for custodian users and machines to upload datasets to at least one persistent, highly available datastore
  • reliable data downloading from the ODP’s datastore(s)
  • options for custodians to link in external data
  • ability to extend the ODP’s core functionality without breaking the core software upgrade path
  • open sourced (see the previous section)

While most people will agree that at least most of the above points should be core features of any software powering an ODP, there are still some technical points that are not so clear cut.

DATA AND HISTORICAL PATH ENTROPY

The one big difference between code and data, is that code is almost always manipulated via the auspices of a programmer typing into a console, i.e. transforming the uncompiled code. Data, on the other hand, can be transformed by any sort of codified mathematical operation, which there is a countless amount of.

A concept that motivates much of what we think an ODP should and should not provide, surrounds a concept we call ‘Historical Path Entropy’. The quotes are probably unnecessary, as we believe the term to be pretty accurate from an information theory point of view.

One can view any dataset as the endpoint of a timeline encompassing a sequence of create, read, update and delete (CRUD) events that a dataset has gone through to arrive at its current state. The historical path entropy of a dataset corresponds to the number of possible timelines that could have led to the current state of the dataset. Historical path entropy is minimized if a historical log of its timeline is available (i.e. there is only one path) and increases whenever data is duplicated (i.e. a snapshot branch off another timeline) or transformed without being noted in its historical log.

A concrete example of just how important this is can be found in software engineering, which has been hugely influenced by tools like CVS, SVN and GIT to basically manage a historical log of code development. These tools enable developers to know exactly how a piece of code got to its current state and has facilitated a revolution in how code is developed and documented. We’re basically saying data should be created and maintained in almost an identical fashion.

The one big difference between code and data, is that code is almost always manipulated via the auspices of a programmer typing into a console, i.e. transforming the uncompiled code. Data, on the other hand, can be transformed by any sort of codified mathematical operation, which there is a countless amount of. So, instead of associating a user with a particular code change, an entity and a unique identifier of the transformation operation itself should be associated with the data manipulation record.

Any imperfections (or outright absence) of such an authoritative dataset log, effectively lead to a higher historical path entropy, which means the uncertainty of how the data arrived in its current state increases. At the risk of stating the obvious, this is extremely important for both businesses and journalists using the data; such historical uncertainties can hide hidden biases and data processing errors that could easily have a business or journalistic impact on its use.

APIS AND POST-PROCESSING

In general, an ODP will constantly grow in terms of the number of datasets it provides. Moreover, it will often be the case that these datasets will have some wide degree of variance in terms of size and complexity. This leads to an intrinsic problem with ODPs providing API endpoints or any post processing facility for its datasets, as it becomes impractical to give any concrete service level agreements (SLAs) for the functionality of those endpoints.

Most data-oriented business models will be built around SLAs of the services they provide. Obviously, if a business decides to leverage an ODP’s API endpoint for their product, the associated SLAs cannot be better than the SLAs of the ODP API endpoint. So, we come to a problem with ODP APIs being unable to provide assurances for API functionality.

Of course, an ODP with sufficient funding and staff could provide SLAs for its datasets; but this still would imply either 1) The cost of running the site scales with its size 2) The data provided by the ODP is constrained to a size manageable by the current resources. The first option is almost always unacceptable for obvious reasons; unbounded budgets don’t exist. The second option will force many ODPs to go against some core Open Data philosophies: it may require deletion of datasets or holding data away from public view. Neither option is a good one.

Compounding this is the fact that controlling un-throttled APIs can lead to larger businesses placing a high, persistent load on the system or even white DDoSing the site.

In short, ODPs shouldn’t be advertising API endpoints under their control for business-critical applications. There just is no way to minimize business risk over a long term interval when providing this sort of endpoint.

Instead, businesses should be periodically updating via download in a batch process from the ODP, subsequently using their local copy to power their application. This protects the business from being directly reliant on real-time data availability from the ODP. It also provides a buffer in case the dataset changes in a way that breaks functionality in their application.

Of course, it is perfectly fine for an ODP to provide references for external API endpoints, which will presumably have their own SLAs, user agreements, etc., as the risk associated with managing that endpoint will not fall directly on the shoulders of the ODP.

Data post-processing is something closely linked with APIs, as one common function of Data APIs is to furnish some sort of SQL-like querying interface, which can easily communicate data processing instructions. Realtime post-processing of the data faces all the issues listed above with APIs and then some. In particular, post-processing data leads to higher historical path entropy. Even in the process of converting one data format to another, errors can creep in, numerical precision lost or falsely extended, etc. Again, we believe that all such conversions need to be included in the historical log of the dataset.

If an ODP finds the necessity to post process data, then it should be run as an asynchronous batch process, with appropriate additions being made to the datasets historical log in its metadata. A good way to handle this would be through the auspices of a micro-service. A micro-service here would constitute a running piece of software which would be sandboxed away from the primary ODP software (e.g. running on its own virtualised instance). Micro services would be managed uniquely from the core ODP, having a uniquely defined deployment domain over datasets, SLA, hardware requirements and associated business contracts. With any micro-service, there should be some general implementation in place to ensure dataset logs are appropriately updated for any datasets manipulated by the micro-service.

HARVESTING AND SEARCH FEDERATION

A pattern that has emerged in ODPs is the use of data-harvesting from other portals (pushing or pulling). Harvesting from other portals causes several technical issues and is, in general, too difficult to make robust enough for enterprise-level applications. Problems with harvesters typically have to do with the following: endpoint having a different metadata schema, endpoint changing unexpectedly, endpoint becoming un-responsive, batch-pull harvesting taking an excessively long time for large harvests, data duplication, data deletions handled incorrectly, and harvester not running fully to completion. Compounding these technical issues is the fact that most harvester implementations increase the dataset historical entropy, if not obfuscating what the real source of truth for the dataset is completely.

Another issue we’ve seen is when most the data held by an ODP is harvested data. The situation is obviously not a good one, as most the ODP’s datastore is dedicated to duplicate data for which the ODP is not the source of truth for. This can be taken to an extreme where the clear majority of an ODP’s system resources are dedicated to handling data for which it is not the source of truth.

At the end of the day, we have never seen a compelling use case for harvesting data in an ODP context. Unfortunately, funding for many ODP’s is based on the number of datasets they host, so harvesting is used to inflate those numbers.

A much more sensible alternative to harvesting is building search federations: augmenting local dataset search results with those of partnered datasources/portals. This basically presents search results as comprising of both local datasets and non-local datasets whose entries directly link to an external data portal. In practice, we have found this to be much easier to support and maintain. Moreover, the non-local search results can be generated in real-time, which ensures non-local results are always up to date; and since data isn’t being duplicated, the process is adding to the historical path entropy of the dataset.

WHAT SHOULD AN ODP PROVIDE?

So, after all this, we have a slightly extended wishlist of features for an ODP:

Data Journalism:

• Spec for site meta-data schema

• Data quality criterion and ratings

• Meta-data search capabilities

• Federated with partnered ODPs

• Historical log with links to previous transformations

• Reliable dataset downloads

• Throttled API endpoint

Applications:

• Spec for site meta-data schema

• Data quality criterion and ratings

• Meta-data search capabilities

• Federated with partnered ODPs

• Historical log with links to previous transformations

• Reliable dataset download

Overall, we think an ODP should facilitate the finding of data, creation and maintenance of a historical data log and provide a robust facility for acquiring data for which it is the source of truth.

With available cloud services, it is easy to provide an enterprise-level download endpoint for datasets, even with the nature of ODPs expected to monotonically increase the number of its datasets.

Data quality measurements are a good metric to present to the user, as it provides context for how well the data is relatively curated in an ODP with possibly many data-publishers, which is a unique and useful facility for an aggregating ODP to provide to its users. Even though data quality criterion can be somewhat subjective, statistically such an assessment will become more useful and refined as the ODP grows. This is one case where the ODP’s nature to constantly grow works in its favor. Indeed, in an ODP with many datasets, such quality ratings become invaluable as a judgment of a dataset’s quality relative to the set of other datasets under that same rating scheme.

Probably the most import facility furnished by the ODP is the ability to efficiently search through metadata. Indeed, one can argue the whole point of ODPs is to facilitate data discovery. To us, metadata searching should enable one to search via contextual and geospatial criteria. It also should enable one to explore the historical log of datasets, providing external links to parent datasets and/or processing codes whenever possible. It should also provide the ability to find child datasets (in its search federation) of a given dataset.

TL;DR

While an ODP can be easily understood as being key in bringing data to the public, it has to be much more than a website with a bunch of spreadsheet download links. People are generally bad at processing large quantities of numerical data, but they are really good at figuring out which applications and news sources they trust to do this processing for them. Hence an ODP needs to be designed to empower applications and journalists to produce trustworthy distillations of that data.

 

Featured cover photo by Ming-E Yip


hau6kycdhac-james-pond-1200x800.jpg

December 12, 2016 Natalia Mazotte

Natália Mazotte leads the School of Data in Brazil and is co-director of Genero e Número, a data-driven magazine focused on gender issues.

At IODC in 2015, capacity building was already taken as a key aspect to move forward the open data agenda. We became better at understanding that to be useful, data doesn’t need only to be available, but also accessible and comprehensive for all. In the capacity building Action session this year, we identified some points in which we have advanced, but we also raised some questions to help to guide the proposals to the next edition of the conference.

There is a need to create learning opportunities with a community-centred approach, which goes beyond creating capacity in specific spaces at governments, private sector, and the organized civil society. How can we address open data literacy as a means to engaging citizens more autonomously in tackling communities’ challenges? How can organizations move beyond building capacity for general data literacy skills, and start using problem-specific approaches to come up with solutions that tackle real issues?

The work needed to be done includes capacity building for data collection, which is relevant for solving both hyper-local problems and working in resource-constrained environments. Communities need to take ownership of their own data to better understand – and advocate for – the solutions to their problems. What are the resources/examples available in this field? How do we get greater representation and reach people difficult to reach such as from low-tech communities?

There is increasing collaboration across different regions and organizations, such as School of Data, ODI, Open Data Alliance, and more, who have started formalizing methodologies and mapping best practices. But more partnerships and learning exchanges must be encouraged, not only in the IODC context. Why not to create a coalition to map collaboratively the technical capabilities and relevant skills among these actors, as well as the available methodologies to work with open data initiatives? The findings could be shared in an online platform where each organization also update the rest of the network about its last achievements in the field of data literacy.

IODC is an event where some of the most active and inspiring people from the open data movement get together to build collective action, so it is an extremely opportune moment for mapping their open data-related problems and their data skills gap.

Why not to prepare a survey to be answered early across key open data conferences in order to  subsidize capacity building discussions? This year, in the closing session of the IODC, the Data Literacy anchors released a survey to understand the challenges and data skills gap among the open data community. Unfortunately we haven’t received enough responses to arrive at solid conclusions yet, but a permanent working group could take advantage of existing regional conferences to apply some surveys directly in contact with the participants, resulting in a more robust data source to understand the open data movement skills gaps and needs.

For the next edition of IODC in Argentina in 2018, we have to revisit these takeaways beforehand and plan the actions to pursue effective answers for all the current challenges we have recognized in the capacity building area.

 

Featured cover photo by James Pond


jnbmhsha6xe-delfi-de-la-rua-1200x673.jpg

123

For more than two decades North American countries, Canada, Mexico, and the United States, have acknowledged that it is only together that we will be able to meet the challenges of our future.

With a combined population of close to half a billion people, a combined GDP of 20 trillion dollars, and one of the most transcendental regional integration processes in history, NAFTA, North America is today one of the most dynamic and competitive regions of the world focusing on classic issues such as  trade, investment, migration, security, and energy.

In addition to these traditional sectors, there are increasingly more opportunities for our countries to partner on innovative projects and programs including the advancement of open data.

In the past few years open data has continued to appear more and more in our international, regional, and bilateral meetings as a new and promising enabler to boost competitiveness and ensure the wellbeing of our region.

At the international level our three countries are promoting open data through the following multilateral platforms:

At the regional level:

  • In the last North American Leaders’ Summit the three countries pledged to support an increased openness agenda to strengthen development.
  • In October 2016, at the margins of United Nations Global Assembly, we recognized the need for “Open, accessible, and timely data [which] are vital to development and humanitarian efforts across the globe”, through the Joint Declaration on Harnessing the Data Revolution for Climate Resilience.

At the bilateral level:

  • The US and Canada continue to work on the development of shared technical platforms and the development of data standards.
  • Canada and Mexico are working together to develop a stable International Open Data goals and principles as Lead Stewards for the International Open Data Charter.
  • US and Mexico are working to cooperate in the development of national Sustainable Development Goals platforms fueled by open data.

These are just examples of the presence that open data in the North American agenda, but there is much more to do.

Open Data must be acknowledged, at the highest political level, as an enabler of our integration agenda for the benefit of our people.

As we move forward, all countries have emphasized the need to work more with all levels of government, academia, and civil society to establish standards, best practices for data sharing and interoperability, as well as ensuring that open data becomes less fragile and becomes a stable and sustainable resource.

In times where division and closeness loom into our discourse, it is only openness which can shed light into the benefits of our trilateral partnership, by promoting a data driven region and do our best to make smart evidenced based decisions.

 

Featured cover photo by Delfina de la Rua.


30174260475_d6d4974001_k-1200x800.jpg

November 15, 2016 Irena Cerović

Irena Cerović is the Portfolio Manager in UNDP Serbia’s governance team.

This year’s IODC featured a big leap in the number of people coming from Eastern Europe and Central Asia, signifying a new part of the world taking up the open data agenda. While this region comprises countries of vastly different size, economic power, and political culture, containing at least three more coherent subregions, what binds them in this context is the sense of being new in the game.

The policy-makers, activists, researchers, and techies from these countries were thus greeted by a global community more mature, pensive, and self-critical than would have been the case several years ago.

Indeed, the gradual pace of progress with open data in the region may suggest that a more thoughtful approach is being taken by these countries internally as well. In good part, of course, this comes of necessity – at least in the Balkans, challenges may concern the mere existence of data, significant quality issues, muddled lines of authority, and a nascent administrative and political culture of evidence-based policy-making. In addition, the overwhelming aspiration of joining the EU coupled with a deep fiscal crisis leaves governments with little maneuvering room or mental space for further innovation. Yet there we all were, reporting more than mere exploratory steps. Ukraine has just joined the International Open Data Charter, Serbia has gone from an ODRA to first hackathons and portal development in little over a year, and other parts of the region are articulating innovative ways of using data to address citizen needs.

Among the reasons for this shift are the relative maturity of civil societies and the readiness of policymakers to benefit from others’ hindsight and consider questions of defining and measuring impact, ensuring feedback loops, prioritizing, and involving users early. The success of these initiatives will depend on how strongly these factors can be sustained over time, and how creative practitioners will be in identifying demand where it exists.

One of the important themes in the East Europe regional talk at IODC16 concerned the ostensible lack of demand for open data in our societies. A recent study in Serbia found both “a very small number of civil society stakeholders who are actively engaged with the topic of open data” and outlined significant potential particularly through partnerships with academia, the tech community, and the media. Activists in Kosovo* deliberately shift attention away from the abstract label of “open data”, focusing entirely on tools for addressing problems of jobs or skills. In Kazakhstan, where the agenda is driven from the top and as part of an ambitious digital drive, demand is more likely defined through use of services. But across the board, results are defined through use, rather than production.

The case for openness is equally strong in countries where open data is developing alongside nascent FOI regulation and where such regulation has been solidly in place for a decade or longer. Although the latter may have the advantage of working institutional arrangements and more ripe general awareness, activists in most countries will still report the threat of open washing as a crucial concern.

The IODC concluded with a firm shared understanding of community commitments for next year’s conference. Questions of capacity building beyond literacy, and even more so those raised as the need for more, and more collaborative research will surely be followed carefully by East Europeans and Central Asians alike.

* This designation is without prejudice to positions on status, and is in line with UNSC 1244 and the ICJ Opinion on the Kosovo Declaration of Independence


photo-1425421573508-14537e99cbd0-1200x801.jpg

November 10, 2016 Wei-Chung Hwang

 

asia-regional-talk-iodc16

The Asia regional talk held at #IODC16 was a remarkable success (watch the recording of the session here). Featuring six representatives from different countries (India, Indonesia, Japan, Korea, Taiwan, and Thailand), the session yielded discussions on a variety of topics, namely:

  • national open data agendas;
  • regional partnerships; and
  • goals for global collaboration.

Dr. Chi-Ming Peng from Open Data Alliance (ODA) Taiwan provided information on the initiation of the Asia Open Data Partnership. He also mentioned some of the events important to this region, such as the yearly open data summit and cross-country hackathons.

Dr. Peng provided updates on the development of open data initiatives and the data economy in Taiwan. It was noted that Taiwan has come out on top among 122 nations and areas in 2015 according to Open Knowledge International’s global open data index. This achievement was based on the top-down open data policy announced in 2011, and a healthy ecosystem in public and private partnerships.

Mr. Sumandro Chattapadhyay from The Centre for Internet and Society (CIS) of India shared updates on the policy and progress of open data in India, both for central government and state governments. He mentioned that while the driven force from government is important, the data outside the government and the joint effort from civil society is also critical.

Mr. Arthur Glenn Maail from World Wide Web Foundation Open Data Lab Jakarta explained the state of open data in Indonesia based on the insights from the Open Data Barometer (which is a collaborative project between the World Wide Web Foundation and the Open Data for Development Network). While at the the national level, the Government of Indonesia (GoI) is implementing “one data” policy, open data is gaining traction at the sub-national level, and civil society organizations continue to lead the way in Indonesia’s open data movement.

Mr. Teruo Tomita from the National Strategy Office of ICT, Cabinet Secretariat, Government of Japan, shared the deployment of the open data initiative in Japan. The open data initiative was formed after the Great East Japan Earthquake in 2011. A roadmap for open government data promotion and action for encouraging new open data was announced in the following years. In the future, Japan will take a further step to implement “solution-oriented open data”, and to promote dynamic engagement of all citizens.

Mr. YS Lee from National Information Society Agency (NIA) of Korea provided information on the open data agenda in Korea. He noted that Korea enacted the world’s first open data law in 2013. After that, the government has established a clear policy and organizational framework to manage and promote the release and usage of open data, and to incubate new startups. He emphasized that the ultimate goal of open data is to realize the data-based creative economy.

Dr. Panachit Kittipanya-ngam from Electronic Government Agency (EGA) of Thailand noted that while open data is one of the key sources among the big data value chain in Thailand, the government has been working on government open data, data community engagement, and prototyping of data platform and applications.

In reflecting on the discussions held at this session, we learned that different approaches were used to promote open data for each country. Generally, some regions adopt open data to address social issues with the driving force from civil society and open data community, while other regions pay more attention to the economic value of open data by top-down policy and a clear roadmap.

According to the opinions and suggestions provided by the representatives, the most fundamental principle for the regional open data partnership is to recognize the needs and efforts in each country and to continue to work on the common interests bilaterally and multilaterally. Such common interests include the extension of international hackathons to include more countries, indices to evaluate the progress of open data in each country, regional open data portals, application programming interfaces, and applications that are specific to regional needs.

 

Featured image: Ben Rosett


foto-2-silvana-fumega.png

November 8, 2016 Silvana Fumega

contribSilvana Fumega is originally from Buenos Aires, Argentina. She holds a PhD (University of Tasmania, Australia); her thesis is focused on international NGOs working with Open Government Data and Freedom of Information policies. She also holds a Master’s degree in Public Policy from Victoria University of Wellington (New Zealand) and a degree in Political Science from the University of Buenos Aires (Argentina). She also participated of the Research Programme Chevening Hansard (United Kingdom). She has served as a consultant for several international organizations, governments and civil society groups 

Two years ago we –quite impressed– highlighted the fast progress of the open data agenda in Latin America [1]. Today, a bit less surprised, we keep reflecting on the role of many actors from the region in the global open data agenda.

. Recuento de algunas discusiones regionales y globales

Within the framework of the 4th International Open Data Conference, Latin American actors shared some of the initiatives that have been implemented in the region. From research to development of applications in the civic technology sphere, including data journalism, they’ve had their chance during the conference and its many pre-events. In this context, during the first conference day’s afternoon, we had a round table with some of the actors in the region, along with some actors from the Caribbean, to discuss about what’s going on in Latin America. A –quite reduced– list of some of the highlighted topics of this session and the Conference is included below:

    • Latin America has a lot to offer to the open data agenda. These advances are not exemplified anymore as developing countries that try to follow the agenda of first world countries, but as actors with a weight of their own that contribute equally to the dialogue. Perhaps the fact that Argentina will organize the next IODC exemplifies this quite well.

Recuento de algunas discusiones regionales y globales

  • At the Latin America and Caribbean regional talk at IODC16 the work of civil society actors was addressed. These actors are in a very active state in the data release processes and, more comprehensibly, in the promotion of the agenda. In any case, infrastructure in management and public data release terms is still quite precarious in most countries of the region. There is still a long way ahead in this sense.
  • Despite the advances and the dialogue, it is also necessary to identify obstacles and pending tasks such as the data infrastructure. Even though the agenda has experienced strong developments, cultural change around opening (not only data but the government in general) still means a challenge in the region. Cultural change, yet functioning, is still far from becoming a reality in most countries.
  • In order to overcome the obstacles it is a sine qua non condition to start thinking on long-term policies (State polities) and not in short-term projects. The logic of fast wins conspires against the development and the possibility of scaling these policies in the region.
  • Similarly, one of the points that were repeated most often during the conference –and transcended at regional level– was the necessity of focusing on the problems of the different sectors. It is necessary to start thinking about opening policies at sectoral level, responding to the specific problems of public policy implementation in each area and collaborating with the construction of a community of intermediaries who collaborate and add value to these data. We must invest in the construction of a community of users and intermediaries.
  • Regarding the previous point, we need actors who work on the open data agenda to understand, just like other communities did, that this agenda is not an end itself, but a means to achieve/solve other problems.
  • It is also crucial to highlight that language unity, in many contexts, has collaborated along with the leadership of some actors to the fluid dialogue between different actors from the region. Seeing the professional bonds that have turned into personal in many cases, the dialogue and exchange of experiences and reflections is very fluid and doesn’t stop surprising actors from other latitudes. This exchange should be extended to other actors such as those living in the Caribbean. The dialogue between Latin American and Caribbean actors is not yet as fluid as some would think. Hence we need an additional effort to try to connect with these actors and empower the agenda in both regions, which is perceived by many as one only region.
  • This parity is possible due to the capacities that have been developed in the region and that allow the advancement of the agenda. This generation of capacities is a point that needs all the support –through the articulation of actors and resources– to keep on generating actors and initiatives within the region that can continue the advances of this agenda.

To close this blog post I would like to point out something that has generated a lot of discussion (very enriching, though) and numerous tweets: the “open washing” idea.

Even though this point should have its own post (we’ll see if time and those pending articles permit), it is worth mentioning that the regional and global open data communities have started to lose innocence –which has taken us at times to an incommensurate enthusiasm and optimism– in order to start questioning some policies and initiatives that, sometimes, seemed more focused on improving the image of certain actors that on achieving an actual opening of a sector or government. This looked like a sign of maturity from many of the involved actors in the promotion of the agenda and this should be celebrated. From now on we still need to see how the agenda will develop in Latin America and the rest of the world and how, all together, we can get to minimize the negative consequences of this “open washing” in the cases where it is identified. At a personal level, I applaud this advance.

[1] 2014: http://silvanafumega.blogspot.com.ar/2014/10/esta-vez-las-noticias-han-llegado.html 2015: http://silvanafumega.blogspot.com.ar/2015/09/los-desafios-de-la-maduracion.html

[2] More info: EN: http://opendatacon.org/increasing-demand-measugin/ ES: http://opendatacon.org/2431/?lang=es


blockchain-1200x801.jpeg

November 3, 2016 Martín Álvarez-Espinar

Martín Álvarez-Espinar, Engineer in Computer Science, is the Manager at the W3C Spanish Office. He has broad experience in Web standards development. Martín has worked as eGovernment consultant at CTIC since 2007, specialized in the Open Government field. He has participated in the development of over a dozen of Open Government Data initiatives in Europe, and he is part of various working groups at W3C and in the European Commission’s Joinup, for the standardization of technologies in the Open Data an PSI re-use field.

It’s time to reflect upon IODC 2016. Numerous interesting ideas and experts keen on challenges and solutions. We clearly have a long way ahead as for the basic aspects of data opening, but everyday we find more success stories, and we’ve seen some evolutions in the commitments established in last year’s conference edition. There are some news among the technical matters, and this is where I’d like to highlight one of the big promising technological tools of the time: blockchain.

Blockchain, also known as distributed ledgers, is the paradigm upon which the bitcoin –the famous crypto-coin that’s alerted the global financial sector- is based. Blockchain is a P2P-based system that ensures a robust and safe distribution of information. All users connected to a blockchain network can access all data; they are participants in the verification of transactions and control the whole range of details in the chain of potential modifications to which these data may have been subject. Blockchain doesn’t need unique servers that centralize the publication of information, but all users scattered over the network have a copy of that information –that’s were the term distributed ledger comes from.

It’s still early to guarantee blockchain will have direct application on open data, but with its attributes we can solve some of the big recurring problems that we so often discuss on forums: data persistence, integrity and origin. A blockchain network is based on the following: preserving coherence in data and transactions, keeping a complete record of users who have modified the information and modifications themselves.

This robustness of the system brings a great computational weight, something that complicates the agility of the solution for certain types of data (currently, it seems unviable to manage great amounts of real-time information). Still, if what prevails is the exhaustive control of the quality of the datum and knowing anytime who is generating it, where it passes through and who touches it, this technology can be the solution.

Half an hour about blockchain on this year’s conference sparked great expectation and long discussions in the hallways. This will be definitely one of the matters to study in next conferences. We’ll be aware.

 

 

Featured image: Fré Sonneveld


photo-1471864190281-a93a3070b6de-1200x800.jpeg

November 1, 2016 Paul Zeitz

Paul Zeitz is Director of the Data Revolution for Sustainable Development Team.

Over the last century, we’ve seen new technologies revolutionize health and save lives. The first antibiotic was discovered in 1928 giving doctors a tool to completely cure patients of deadly infections.  In 2003, the human genetic code was completed, paving the way for a boom in medical research and live-saving treatments that are still multiplying.  And today we are experiencing the start of the next revolution in health: open data.

There is more data available today than ever before.  Some estimate that 90% of all the data in the world has been generated over the last two years.  Eric Schmidt, the CEO of Google, has notably said that every two days we create as much information as we did from the dawn of civilization through 2003.  Let me repeat that: we generate more information in two days than we did over the course of thousands of years.

This unprecedented deluge of data is starting to be harnessed to improve health in countries and communities around the world. Data are transforming how people manage their own health (have you checked your FitBit today?), how people choose and rate their health care providers, how research is conducted and advanced, and how public health services are managed.  Here are a few areas where the open data revolution for health is taking off:

  1. Citizen Engagement and Feedback: Open data allow for real-time feedback on health and health services, empowering people with information to improve their health and closing the feedback loop between practitioners and patients.  For example, Yelp, probably the most ubiquitous example of a crowd sourced user-review platform, recently added data on average emergency room wait times, fines a facility may have paid, and serious deficiencies reported on all listings for hospitals and other health care services.  Data Uruguay, in partnership with the Ministry of Public Health, created a tool called “At Your Service” to harness open health data from the government to empower patients to choose their health care provider by comparing key performance metrics annually.  And Web 3.0 has created a pilot for an app that they hope will answer health questions in a manner similar to “Siri” to help address the growing shortage of health care workers in some parts of the world.
  2. Research and Development: The open data movement is beginning to create a new culture of sharing among health researchers and product developers – a notoriously closed space – that has the potential to catalyze innovation and breakthroughs in medicine.  For example, Figshare – a repository where users can make all of their research outputs available in a citable, shareable and discoverable way – includes a trove of data on everything from community and child health to cancer cell biology (check out their category on health sciences).  Also, the Medicines for Malaria Venture (MMV) made 400 diverse compounds with antimalarial activity available free of charge, asking only that researchers publish any data generated in the public domain.  This helped create an open and collaborative forum for neglected diseases drug research, and resulted in new drugs being developed.  The company has now released a Pathogen Box which includes 400 compounds with active effects against 13 pathogens, with hope to boost neglected disease drug discovery.
  3. Public Health Management: Public health donors, international organizations, and civil society organizations are using open data to better inform decision making and increase the impact of investments in health. The U.S. Presidents Emergency Plan for AIDS Relief (PEPFAR) is using data in transformational ways to prevent more HIV infections and save more lives, opening as much of data as possible while protecting individual and community confidentiality.  Earlier this year, PEPFAR published 2015 annual program results at the subnational level, enabling stakeholders to view, download, and utilize PEPFAR data in more accessible ways.

These examples are just the tip of the iceberg; countless exist and each are paving the way for the open data revolution in health.  But challenges exist to fully realize the potential of open data.  First, open data for open data’s sake is not enough.  As we work to get real-time, dynamic and disaggregated data into the hands of citizens, health service providers, and other stakeholders, we must simultaneously work to build their capacity to use that data to inform decision-making and drive innovative solutions at the individual, local, and national level.  Second, we need to harness open data to ensure that all resources for health are having the greatest possible impact.

We’ve seen new technologies revolutionize health over the last century. Looking forward, we can envision a world where every person in the world has access to the best available and most relevant, timely information to improve their health, optimize access to services, and accelerate the next generation of health innovations.  The open data revolution for health is here; we must harness it to drive action and deliver on this vision.

Featured image: freestocks.org


Use of cookies

This site uses cookies in order to improve your user experience. By continuing to use the site, you are agreeing to the use of cookies and accepting our cookies policy. .

ACEPTAR