API Evangelist Human Services API Stories
These are stories I've published on API Evangelist that are relevant to the Open Referral Human Services Data API work that I do. Most of these were published to API Evangelist, but I've aggregated here for easier browsing.
I am spending more time thinking about the unknown unknowns when it comes to API security. This means thinking beyond the usual suspects when thinking about API security like encryption, API keys, and OAuth. As I monitor the API space I’m keeping an eye out for examples of what might be security concerns that not every API provider is thinking about. [I found one recently in ARS Technica, about the Federal Communication Commission (FCC) leaking the email addresses through the CC API for anyone who submitted feedback as part of any issues like the recent Net Neutrality discussion.
It sounds like the breach with the FCC API was unintentional, but it provides a pretty interesting example of a security risk that could probably be mitigated with some basic API testing and monitoring, using common services like Runscope, or Restlet Client. Adding a testing and monitoring layer to your API operations helps you look beyond just an API being up or down. You should be validating that each endpoint is returning the intended/expected schema. Just this little step of setting up a more detailed monitor can give you that brief moment to think a little more deeply about your schema–the little things like whether or not you should be sharing the email addresses of thousands, or even millions of users.
I’m working on a JSON Schema for my Open Referral Human Services API right now. I want to be able to easily validate any API as human services compliant, but I also want to be able to setup testing and monitoring, as well as security checkups by validating the schema. When it comes to human services data I want to be able to validate every field present, ensuring only what is required gets out via the API. I am validating primarily to ensure an API and the resulting schema is compliant with HSDS/A standards but seeing this breach at the FCC has reminded me that taking the time to validate the schema for our APIs can also contribute to API security–for those attacks that don’t come from outside, but from within.
I’m wrestling with the different levels of conversations I’m having around my human services API work. Some of my audience are more technical and are pushing for discussion at the granular level, while other parts of my audience are more about the business of things at the 100K. I appreciate these types of projects, but when there are many different conversations going on at many different levels, it is a lot of work to wrestle things into something coherent that everyone involved will appreciate.
One day I’m thinking about which individual fields are required, then next I will considering how multiple human services API integrators will be syndicating and sharing information between clusters of human service API implementations. While I’m relying on Github, and Slack to facilitate conversations that going on, I am ultimately relying on OpenAPI and APIs.json to help me hammer out the contract that will speak to the developers at the granular level but can also communicate the business and political terms of the API contract. It will describe which fields are required as well as describe the webhooks I need to define how to syndicate and share between implementations.
OpenAPI is pretty focused on helping me with things happening at API sea level, but I’m exploring using APIs.json to help me organize conversations all the way up to the 100K foot level. Things like, where do I signup for my API keys, access partnership levels of access, find the terms of service, or possibly someone to contact and answer a question. Then using the OpenAPI I can publish documentation for developers to understand the surface area of the API (sea level), and while the APIs.json includes a pointer to this discussion, it also provides pointers to other discussions going on around support, communications, changes, privacy, security, so that I can generate documentation for business and partner stakeholders as well.
I’m working on an example of doing this for my Open Referral Human Services API. An APIs.json + OpenAPI that helps articulate what is happening with any single human services API implementation from sea level to 100K. The trick is I also need to articulate how this will work at scale across clusters of human services API implementations, allowing vendors and partners to syndicate and federate. With everything defined as a machine readable index (using APIs.json and OpenAPI), which can be used to generate very technical API documentation, as well as more business-friendly aspects of operations.
One of the projects I’m working on as part my Human Services API work is trying to define the layer that allows developers to add, update, and delete data via the API. We ultimately want to empower 3rd party developers, and external stakeholders to help curate and maintain critical human services data within a community, through trusted partners.
The Human Services API allows for the reading and writing of organizations, locations, and services for any given area. I am looking to provide guidance on how API implementors can allow for POST, PUT, PATCH, and DELETE on their API, but require approval before any changing transaction is actually executed. Requiring the approval of an internal system administrator to ultimately give the thumbs up or thumbs down regarding whether or not the change will actually occur.
A process which immediately begs for the ability to have multiple administrators or even possibly involving external actors. How can we allow organizations to have a vote in approving changes to their data? How can multiple data stewards be notified of a change, and given the ability to approve or disprove, logging every step along the way? Allowing any change to be approved, reviewed, audited, and even rolled back. Making public data management a community affair, with observability and transparency built in by default.
I am doing research into different approaches to tackling this, ranging from community approaches like Wikipedia, to publish and subscribe, and other events or webhook models. I am looking for technological solutions to opening up approval to the API request and response structure, with accompanying API and webhook surface area for managing all aspects of the approval of any API changes. If you know of any interesting solutions to this problem I’d love to hear more, so that I can include in my research, future storytelling, and ultimately the specification for the Open Referral Human Services Data Specification and API.
I’m refining my approach to moving forward the discussion around the Human Services Data Specification and API in an attempt to include more vendors and implementors in the conversation. Part of this work is to streamline how we move forward an increasing number of conversations regarding the schema and API definition.
I am looking help solidify our communication strategy around the human services API, and help make clear which channels participants can tune into:
- Github - Github Issues is where the specific conversation around a variety
- Slack - A variety of Slack channels for discussing the evolution of API.
- Blog - Storytelling via API Evangelist, and specific project level blogs.
- GHangouts - Virtual gatherings to discuss the API via video conferencing.
These are the channels where the HSDS/A conversations are occurring. It is spread unevenly across these synchronous and asynchronous digital channels. We are using a variety of signals including Github issues, Slack messaging as well as video conference calls, blog posts, and semi-regular virtual gatherings.
I am heavily using the blog post to organize my ideas, distilling down the explosion of information, ideas, and technical details in smaller, coherent, bite-size chunks. This helps me organize and better communicate what’s going on, which includes having a single URL to share with new players. In fact, this blog post is part of me pulling together my communication around the API communications strategy for the human services API project and will be the most current URL I share with people.
I’m working on my API definition and design strategy for my human services API work, and as I was doing this Box went all in on Opening, adding to the number of API providers I track on who not just have an OpenAPI but they also use Github as the core management for their API definition.
Part of my API definition and design advice for human service API providers, and the vendors who sell software to them is that they have an OpenAPI and JSON schema defined for their API, and share this either publicly or privately using a Github repository. When I evaluate a new vendor or service provider as part of the Human Services Data API (HSDA) specification I’m beginning to require that they share their API definition and schema using Github–if you don’t have one, I’ll create it for you. Having a machine-readable definition of the surface area of an API, and the underlying schema in a Github repo I can checkout, commit to, and access via an API is essential.
Every API should begin with a Github repository in my opinion, where you can share the API definition, documentation, schema, and have a conversation around these machine readable docs using Github issues. Approaching your API in this way doesn’t just make it easier to find when it comes to API discovery, but it also makes your API contract available at all steps of the API design lifecycle, from design to deprecation.
I am working my way through a variety of API design considerations for the Human Services Data API (HSDA)that I’m working on with Open Referral. I was working through my thoughts on how I wanted to approach the filtering of the underlying data schema of the API, and Shelby Switzer (@switzerly) suggested I follow Irakli Nadareishvili’s advice and consider using RFC 7240 -the Prefer Header for HTTP, instead of some of the commonly seen approaches to filtering which fields are returned in an API response.
I find this approach to be of interest for this Human Services Data API implementation because I want to lean on API design, over providing parameters for consumers to dial in the query they are looking for. While I’m not opposed to going down the route of providing a more parameter based approach to defining API responses, in the beginning I want to carefully craft endpoints for specific use cases, and I think the usage of the HTTP Prefer Header helps extend this to the schema, allowing me to craft simple, full, or specialized representations of the schema for a variety of potential use cases. (ie. mobile, voice, bot)
It adds a new dimension to API design for me. Since I’ve been using OpenAPI I’ve gotten better at considering the schema alongside the surface area of the APIs I design, showing how it is used in the request and response structure of my APIs. I like the idea of providing tailored schema in responses over allowing consumers to dynamically filter the schema that is returned using request parameters. At some point, I can see embracing a GraphQL approach to this, but I don’t think that human service data stewards will always know what they want, and we need to invest in a thoughtful set design patterns that reflect exactly the schema they will need.
Early on in this effort, I like allowing API consumers to request minimal, standard or full schema for human service organizations, locations, and services, using the Prefer header, over adding another set of parameters that filter the fields–it reduces the cognitive load for them in the beginning. Before I introduce any more parameters to the surface area, I want to better understand some of the other aspects of taxonomy and metadata proposed as part of HSDS. At this point, I’m just learning about the Prefer header, and suggesting it as a possible solution for allowing human services API consumers to have more control over the schema that is returned to them, without too much overhead.
I am going through my API design checklist for the Human Services Data API work I am doing. I’m trying to make sure I’m not forgetting anything before I propose a v1.1 OpenAPI draft, so I pulled together a simple checklist I wanted to share with other stakeholders, and hopefully also help keep me focused.
First, to support my API design work I got to work on these areas for defining the HSDS schema and the HSDA definition:
- JSON Schema - I generated a JSON Schema from the HSDS documentation.
- OpenAPI - I crafted an OpenAPI for the API, generating GET, POST, PUT, and DELETE methods for 100% of the schema, and reflective its use in the API request and response.
- Github Repo - I published it all in a Github repository for sharing with stakeholders, and programmatic usage across any tooling and applications being developed.
Then I reviewed the core elements of my API design to make sure I had everything I wanted to cover in this cycle, with the resources we have:
- Domain(s) - Right now I’m going with api.example.com, and developer.example.com for the portal.
- Versioning - I know many of my friends are gonna give me grief, but I’m putting versioning in the URL, keeping things front and center, and in alignment with the versioning of the schema.
- Paths - Really not much to consider here as the paths are derived from the schema definitions, providing a pretty simple, and intuitive design for paths–will continue adding guidance for future APIs.
- Verbs - A major part of this release was making sure 100% of the surface area of the HSDS schema add the ability to POST, PUT, and DELETE, as well as just GET a response. I’m not addressing PATCH in this cycle, but it is on the roadmap.
- Parameters - There are only a handful of query parameters present in the primary paths (organizations, locations, services), and a robust set for use in /search. Other than that, everything is mostly defined through path parameters, keeping things cleanly separated between path and query.
- Headers - I’m only using headers for authentication. I’m also considering using the HTTP Prefer Header for schema filtering, but nothing else currently.
- Actions - Nothing to do here either, as the API is pretty CRUD at this point, and I’m awaiting more community feedback before I add any more detailed actions beyond what is possible with the default verbs–when relevant I will add guidance to this area of the design.
- Body - All POST and PUT methods use the body for request transport. There are no other uses of the body across the current design.
- Pagination - I am just going with what is currently in place as part of v1.0 for the API, which uses page and per_page for handling this.
- Data Filtering - The parameters for core resources (organizations, locations, and services all have a query parameter for filtering data, and the search path has a set of parameters for filtering data returned in response. Not adding anything new for this version.
- Schema Filtering - I am taking Irakli Nadareishvili’s advice and going to go with RFC 7240 - Prefer Header for HTTP, and craft some different representations when it comes to filtering the schema is returned.
- Sorting - There is no sorting currently. I did some research in this area, but not going to make any recommendations until I hear more requests from consumers, and the community.
- Operation ID - I went with camelCase for all API operation IDs, providing a unique reference to be included in the OpenaPI.
- Requirements - Going through and making sure all the required fields are reflected in the definitions for the OpenAPI.
- Status Codes - Currently I’m going to just reflect the 200 HTTP status code. I don’t want to overwhelm folks with this release and I would like to accumulate more resources so I can invest in a proper HTTP status code strategy.
- Error Responses - Along with the status code work I will define a core set of definitions to be used across a variety of responses and HTTP statuses.
- Media Types - While not a requirement, I would like to encourage implementors to offer four default media types: application/json, application/xml, text/csv, and text/html.
After being down in the weeds I wanted to step back and just think about some of the common sense aspects of API design:
- Granularity - I think the API provides a very granular approach to getting at the HSDS schema. If I just want a phone number for a location, and I know its location id I can get it. It’s CRUD, but it’s a very modular CRUD that reflects the schema.
- Simplicity - I worked hard to keep things as simple as possible, and not running away with adding dimensions to the data, or adding on the complexity of the taxonomy that will come with future iterations and some of the more operational level APIs that are present in the current metadata portion of the schema.
- Readability - While lengthy, the design of the API is readable and scannable. Maybe I’m biased, but I think the documentation flows, and anyone can read and get an idea of the possibilities with the human *services API.
- Relationships - There really isn’t much sophistication in the relationships present in the HSDA. Organizations, locations, and services are related, but you need to construct your own paths to navigate these relationships. I intentionally kept the schema flat, as this is a minor release. Hypermedia and other design patterns are being considered for future releases–this is meant to be a basic API to get at the entire HSDS schema.
I have a functioning demo of this v1.1 release, reflecting most of the design decisions I listed above. While not a complete API design list, it provides me with a simple checklist to apply to this iteration of the Human Services Data API (HSDA). Since this design is more specification than actual implementation, this API design checklist can also act as guidance for vendors and practitioners when designing their own APIs beyond the HSDS schema.
Next, I’m going to tackle some of the API management, portal, and other aspects of operating a human services API. I’m looking to push my prototype to be a living blueprint for providers to go from schema and OpenAPI to a fully functioning API with monitoring, documentation, and everything else you will need. The schema and OpenAPI are just the seeds to be used at every step of a human services API life cycle.
When I suggest modern approaches to API management be applied to public data I always get a few open data folks who push back saying that public data shouldn’t be locked up, and needs to always be publicly available–as the open data gods intended. I get it, and I agree that public data should be easily accessible, but there are increasingly a number of unintended consequences that data stewards need to consider before they publish public data to the web in 2017.
I’m going through this exercise with my recommendations and guidance for municipal 211 operators when it comes to implementing Open Referral’s Human Services Data API (HSDA). The schema and API definition centers around the storage and access to organizations, locations, services, contacts, and other key data for human services offered in any city–things like mental health resources, suicide assistance, food banks, and other things we humans need on a day to day basis.
This data should be publicly available, and easy to access. We want people to find the resources they need at the local level–this is the mission. However, once you get to know the data, you start understanding the importance of not everything being 100% public by default. When you come across listings Muslim faith, and LGBTQ services, or possibly domestic violence shelters, and needle exchanges. They are numerous types of listings where we need to be having sincere discussions around security and privacy concerns, and possibly think twice about publishing all or part of a dataset.
This is where modern approaches to API management can lend a hand. Where we can design specific endpoints, that pull specialized information for specific groups of people, and define who has access through API rate limiting. Right now my HSDA implementation has two access groups, public and private. Every GET path is publicly available, and if you want to POST, PUT, or DELETE data you will need an API key. As I consider my API management guidance for implementors, I’m adding a healthy dose of the how and why of privacy and security using existing API management solutions and practice.
I am not interested in playing decider when it comes to what data is public, private, and requires approval before getting access. I’m merely thinking about how API management can be applied in the name of privacy and security when it comes to public data, and how I can put tools in the hands of data stewards, and API providers that help them make the decision about what is public, private, and more tightly controlled. The trick with all of this is how transparent should providers be with the limits and restrictions imposed, and communicate the offical stance with all stakeholders appropriately when it comes to public data privacy and security.
I’m comparing five separate vendor API implementations with the Human Services API standard I’m working on at the moment. I’m looking to push version 1.0 of the API towards a 1.1 with some incremental, forward-thinking changes.
During This phase of the project, I’m looking to get as much feedback on the API interface from commercial vendors. The Human Services schema is being moved forward by a separate, but overlapping group, and has already gone through a feedback phase, and has officially released version 1.1 of the schema–I’m looking to do the same for the API.
Even though the Human Services schema is present, the purpose of the API definition is to open up discussion about what access to that data looks like, with the OpenAPI for the Human Services API acting as a distributed and repeatable contract governing how we access publicly available human services data.
The contract provided by the Human Services API defines how stakeholders can access organizations, locations, services. The Human Services schema defines how human services data is stored, and with the assistance of the API will be defined in transit for every request made, as well as the response that is given.
If we are going to get thousands of municipalities exchanging data with each other, as well as with the growing number of applications and systems they are using to serve the public, we will need a shared definition for how data is stored, as well as accessed by everyone involved. As I prepare responses to vendors involved in the feedback loop, I just wanted to gather my thoughts regarding the separation between the schema efforts and the API efforts.
<p>When thinking about generating revenue generated from APIs it is easy to focus on directly charging for any digital resource being made available via the API. If it’s an image, we charge per API call, and maybe the amount of MB transferred. If it’s messaging, we charge per message. There are plenty of existing examples out there regarding how you directly charge for data, content, or algorithms using APIs, and an API way of doing business–look to Amazon, Twilio, and other pioneers.</p>
Where there are fewer examples and less open discussions, is around the value of the operation level of APIs,Â and making these data available via APIs--yes APIs for APIs. Modern approaches to doing APIs are all about requiring each application to use an API key with each call they make, the logging of each request and response, possessing the identifying key for each application. This is how API providers are developing an awareness of who is accessing resources, how they are being put them to use, and specific details about each application, and maybe even the users involved.
Sometimes the value generated at this layer doesn't exist. Due to restrictive access models, and direct revenue models, there isn't much going on operationally, so there isn't much value generated. However, when there is heavy usage around APIs, the exhaust of the API management layer can become increasingly valuable. What are people searching for? What are applications most popular? Which geographic regions are the most active? There is a pretty lengthy laundry list of valuable data points being applied across modern API operations, that are helping API providers better understand what is going on, that aren't often being included as part of the API road map, and future revenue models.
Ok, let me pause here for a moment. I identify the value being generated at this layer because I see existing providers reaching this realization in their operations, as well as wanting to help other providers see the potential being achieved by successful API providers. I also acknowledge there is a lot of exploitation, abuse, and surveillance going on at this level, which is one of the biggest reasons I'm encouraging more transparency, observability, and discussion about this layer. I want API providers to understand the potential, but I also want API consumers and the end users of their applications to understanding what is going on at the API operational layer as well.Â
The current model I'm looking at through this lens currently is around my Open Referral Human Services Data Specification (HSDS) work, where I'm trying to help define the operational layer of human services APIs, as well as the direct API access to this critical public data. I am asking the question of how stewards of this very important data at the municipal level leverage APIs to make their valuable resources more accessible, and put to work where it is most needed, while also being able to generate and harness the valuable particles generated as part of an API exhaust system. What are people searching for? How are demographics evolving in a city, and how can city services shift and evolve too. Making the operational layer available via API so that it is available to key decision makers, even if those are private sector decisions makers who are required to pay for access to this intelligence--bringing critical new revenue streams for data stewards.
Let's pause again and be honest about the privacy concerns here. Access at this layer needs an additional level of scrutiny and care, over the direct access layers. Examples I'm concerned with can be seen in searches for Muslim religious services, or possibly LGBTQ services, and other information that could be used to violate the privacy and rights of application developers and end users. There are numerous privacy and security concerns at this level, but the inverse of these concerns also highlight the value of a data access exhaust system at this level. This is important information, that can provide some real time signals for both the public and private sector to consider more deeply.
I am purposely using the word exhaust here, specifically as a noun, as I don't think people are considering this layer, and may often see log files, and other ways data being generated in this as way as an unusable byproduct and output of web and mobile operations. I want to help people see the potential dangers of exhaust from API-centric data operations, but also understand that when it is captured, it can become valuable, similar to natural gas capture from recycling or landfill operations. There are some toxic aspects of API-driven data operations, but when measured and controlled, and made more observable, the dangerous aspects can be mitigated, and you might also find ways that other reuse and extraction that can also occur along the way.Â
<p>I’m involved in some very interesting conversations with public data folks who are trying to push forward the conversation around sensible revenue generation by cities, counties, state, and the federal government using public data. I’m learning a lot from these conversations, resulting in the expansion and evolution my perceptions of how the API layer can help the government develop new revenue streams through making public data more accessible.Â </p>
I have long been a proponent of using modern API management infrastructure to help government agencies generate revenue using public data. I would also add that I'm supportive of the crafting of sensible approaches to developing applications on top of public data and API in ways that generate a fair profit for private sector actors. I am also in favor of free and unfettered access to data, and observability into the platform operations, as well as ALL commercial interests developing applications on top of public data and APIs. I'm only in favor of this, when the right amount of observability is present--otherwise digital good olÂ boy networks form, and the public will lose.
API management is the oldest area of my API research, expanding into my other work to eventually defineÂ documentation, SDKs, communication, support, monetization, and API plans. This is where you define the business of API operations, organizing APIs into coherent catalogs, where you can then work to begin establishing a wider monetization strategy, as well as tiers and plans that govern access to data, content, and algorithms being made available via APIs. This is the layer of API operations I'm focusing on when helping government agencies better understand how they can get more in tune with their data resources, and identify potential partnerships and other applications that might establish new revenue streams.
A portion of thisÂ conversation that I am having was involved in the story from Anthony Williams about maybe government data shouldn't always be free, where the topic of taxation came up. One possible analogy for public data access and monetization was brought up as a reference to theÂ Vehicle-miles Traveled (VMT) tax, injecting the concept of taxation to my existing thoughts on revenue generation using API management. I've considered affiliate and reseller aspects to the API management layer before, applying percentage based revenue and payments on top of API access, but never thought about a government taxation layer existing here.
I thought my stance on revenue generation on public data using API management was controversial before, adding in concepts of taxation to the discussion is really going to invigorate folks who are in opposition to my argument. I'm sure there is a libertarian free web, open data advocate, smaller government Venn diagram in there somewhere. I'm not too concerned, as the monetization is already going on, I'm simply talking about making it more observable, and in favor of revenue generation for data stewards and government agencies. I'm confident that most won't folks in opposition won't even read this paragraph, as it's buried in the middle of this post. ;-)
I take no stance on which data, content, or algorithms should be taxed, or what that tax rate should be. I leave this to data stewards and policy makers. My objective is to just introduce folks to the concept, and marry with the existing approaches to using APIs to develop digital products and services in the private sector. However, if I was wearing my policy maker hat I would suggest thinking about this as a digital VAT tax, "that is collected incrementally, based on the surplus value, added to the price on the work at each stage of production, which is usually implemented as a destination-based tax, where the tax rate is based on the location of the customer."
My thoughts on a government tax at the API management layer are at an early stage. I am just exploring the concept on my blog--this is what I do as the API Evangelist. I'd love to hear your thoughts, on your blog. I am merely suggesting a digital VAT tax at the API contract layer around public data and APIs when commercial activities are involved. Eventually, I could see the concept spread to other sectors as the API economy becomes a reality, but I feel that public data provides us with a rich test bed for a concept like this. I'm considering reframing my argument about charging for commercial access to public data using APIs as taxing commercial usage of public data using APIs, allowing for tax revenue to fund future investment in public data and API efforts.
As I remove my API Evangelist hat and think about this concept, I'm not 100% sure if I'm in agreement with my argument. It will take a lot more polishing before I'm convinced that taxation should be included in the API management layer. I'll keep exploring, and play with a variety of potential use cases, and see if I can build a case for API taxation when public data is involved, and applications are generating surplus value in the API economy.Â
This is a multipart story on monetizing public data using APIs. I have spent the last seven years studying over 75+ aspects of the API delivery lifecycle across companies, organizations, institutions, and government agencies. This project is designed to be a distillation of my work to help drive a conversation around sensible and pragmatic revenue generation using public data--allowing the city, county, state, and federal government agencies to think critically about how open data efforts can exist and grow. It lives as a standalone repository, as well as individual stories that are meant to stand on their own, while also contributing to an overall narrative about public data monetization.
While my primary income is not derived from developing software for sale, I have developed commercial software throughout my career, and actively maintain my own API driven technology platform for tracking on the API industry. This is my best attempt to put on my technology vendor hat on for a bit to better understand the concerns and perspective of the software vendors involved with the public data sector. There is a wide spectrum of technology vendors servicing the space, making this exercise difficult to generalize, but I wanted to take a shot at defending and jumpstarting the conversation at the commercial vendor level.
Commercial tech vendors are always at the frontline of discussion around monetization of public data, for better or worse. When open data activists push back on my work to understand how public data can be monetized, the most common response I have is that public data is already being monetized by commercial vendors, and my work is about shining a light on this, and not being in denial that it is already occurring everywhere. Here are some of my thoughts from the public data commercial vendor landscape:
- Investment In Data - As a technology company I am investing a significant amount of resources into our data, and the data of our customers. While views may greatly vary on how much ownership platform and technology operators have around the public data they touch, it can't be argued that commercial vendors play a significant role--the discussion should be more about how great of a role, and how much ownership is there.
- Investment in Software - Beyond the data, we are investing a significant amount of resources into software, that our customers use, and we use to manage our business internally. This is where we will keep most of the proprietary value generated around public data, although the door around the availability of open source tooling needs to remain open. Similar to data, the question is about how much ownership over software do I need as a commercial vendor and how much can I give back to the community.
- Lead Generation - I am interested in generating leads for new customers, and joining in new conversations that demonstrate the value of the products and services that my company brings to the table.
- Sell My Services - I am interested in selling my products and services, and my motivation is going to reflect this. No matter what our mission or marketing may say, I'm interested in generating a profit for my company, and its interests.
- Premium Services - Our domain expertise, and investment in data and software opens up the opportunity for us to offer premium services on top of public data operations. While our customers may not always pay directly for data storage and management, or even software licenses, the ability to sell premium services is valuable to everyone involved.
- Protect Intellectual Property - It is important to us that our intellectual property is protected in all conversations, and that the licensing of data and software is respected, and reflected in our partnerships. While perspectives on what is appropriate regarding intellectual property will vary, it is important that IP Is always an up-front part of the conversation.
- Investment in R&D - Commercial vendors are more likely to invest in research and development, helping push forward innovation around public data, something that isn't always possible unless there are clear revenue opportunities for commercial operators and clear communication and understanding with non-commercial stakeholders about what is possible, and being done.
- Consistent Support - One important thing commercial vendors bring to the table for government agencies, and non-commercial stakeholders are the opportunity for consistent support. As seasons change in government, commercial vendors can be there to fill gaps in business and technical support services, keeping important conversations moving forward.
I have to be 100% transparent here and stop to say that while I am advocating for revenue generation around public data, I'm not always a proponent of that revenue benefitting commercial interests. First, and foremost, I want revenue to benefit the public, secondarily the non-commercial stakeholders, then thirdly for the benefit commercial vendors. Making this argument from the commercial vendor perspective is possible for me, just not something I'm always going to be in full support of, and I will always insist on pushing back on aggressive behavior from commercial vendors to dominate the conversation, in favor of data stewards, and the public.
With that said, I'm a believer that commercial activity can benefit public data efforts. Sensible revenue can be generated from delivering services, products, and tooling developed around public data, while also investing back into data operators, owners, and stewards, and most importantly benefit those being served. Depending on where you operate in the public data space you will see this argument, or hopefully conversation, differently. This is just an attempt to look at things from the side of commercial vendors, and being honest and transparent about what the commercial interests are when it comes to public data.
You can keep an eye on my public data monetization research via a separate site--I will be adding each segment here on the blog, as well as the individual project website. You can participate via the Github repository I am using to manage all my work in this area.
Data is power. If you have valuable data, people want it. While this is the current way of doing things on the Internet, it really isn't a new concept. The data in databases has always been wielded alongside business and political objectives. I have worked professionally as a database engineer for 30 years this year, with my first job building COBOL databases for use in schools across the State of Oregon in 1987, and have seen many different ways that data is the fuel for the engines of power.
Data is valuable. We put a lot of work into acquiring, creating, normalizing, updating, and maintaining our data. However, this value only goes so far if we keep it siloed, and isolated. We have to be able to open up our data to other stakeholders, partners, or possibly even the public. This is where modern approaches to APIs can help us in some meaningful ways, allowing data stewards to sensibly, and securely open up access to valuable API resources using low-cost web technology. One of the most common obstacles I see impeding companies, organizations, institutions, and agencies from achieving API success, center around restrictive views on data licensing, not being able to separate the data layers by using APIs, and being overly concerned about a loss of power when you publish APIs.
You worked hard to develop the data you have, but you also want to make accessible. To protect our interests I see many folks impose pretty proprietary restrictions around their data, which ends up hurting its usage and viability in partner systems, and introducing friction when it comes to accessing and putting data to work--when this is the thing you really want as a data steward. Let me take a stab at helping you reduce this friction by better understanding how APIs can help you peel the licensing onion layers back when it comes to your valuable data.
Your Valuable Data
While openly licensed data is one important piece of the puzzle, and data should be openly licensed when it makes sense, this is the layer of this discussion you may want to be a little more controlling in who has access to, and how partners and the public are able to put your data to use in their operations.
In an online, always on, digital environment, you want data accessible, but to be able to do this you need to think critically about how you peel back the licensing onion when it comes to this data.
The Schema For Your Data
Ideally, your schema already employs predefined schemas like we find at Schema.org, or Open Referral. Following common definitions will significantly widen the audience for any dataset, allowing data to seamlessly be used across a variety of systems. These forms of schema or openly licensed, usually putting into the public domain.
The schema layer of open data can often resemble the data itself, using machine readable formats like XML, JSON, and YAML. This is most likely the biggest contributing factor for data stewards failing to see this as a separate layer of access from the data itself, and sometimes applying a restrictive license, or forgetting to license it at all.
Data is often more ephemeral than the schema. Ideally, schemas do not change often, are shared and resused, as well as free from restrictive licensing. For system integrations to work, and for partnerships to be sustainable, we need to speak a common language, and schema is how we describe our data so it can be put to use outside our firewall.
Make sure the schema for your data is well-defined, machine readable, and openly licensed for the widest possible use.
Defining Access To Data Using OpenAPI
As with the schema layer of data operations, you are hoping that other companies, organizations, institutions, and government agencies will integrate this layer into their operations. This layer is much more permanent than the ephermeral data layer, and should be well defined, ideally sharing common patterns, and free from restrictive licensing.
Per the Oracle v Google Java API copyright case in the United States, the API layer is subject to copyright enforcement, meaning the naming, ordering of the surface area of your API can be copyright. If you are looking for others to comfortably integrate this definition into their operations, it should be openly licensed.
The API layer is not your data. It defines how data will be accessed, and put to use. It is important to separate this layer of your data operations, allowing it to shared, reused, and implemented in many different ways supporting web, mobile, voice, bot, and a growing number of API driven applications.
Make sure the API layer to data operations is well-defined, machine readable, and free from restrictive licensing when possible.
Currently, many data providers I talk to see this all as a single entity. It's our data. It's valuable. We need to protect it. Even at the same time they really want it simultaneously put to work in other systems, by partners, or even the public. Because they cannot separate the layers, and understand the need for separate licensing considerations, they end up being very closed with the data, schema, and API layers of their operations--introducing friciton at all steps of the application and data life cycle.
Modern approaches to API management and logging at the API layer is how savvy data stewards are simultaenoulsy opening up access, and maintaing control over data, while also increasing awareness around how data is being put to use, or not used. Key-based access, rate limits, access plans, are all approaches to opening up access to data, while maximizing control, and maintaining a desired balance of power between steward, partners, and consumers. In this model, your schema and API definition needs to be open, accessible, and shareable, where the data itself can be much more tightly controlled, depending on the goals of the data steward, and the needs of consumers.
Let me know if you want to talk through the separation of these layers of licensing and access around your data resources. I'm all for helping you protect your valuable data, but in a pragrmatic way. If you want to be successful in partnering with other stakeholders, you need to be thinking more critically about separating the layers of your data operations, and getting better at peeling back the onion of your data operations--something that seems to leave many folks with tears in their eyes.
I have several volunteers available to do work on Open Referral's Human Services Data Specification (API). I have three developers who are ready to work on some projects, as well as an ongoing stream of potential developers I would like to keep busy working on a variety of implementations. I am focusing attention on the top four cloud platforms that companies are using today: AWS, Azure, Google, and Heroku.
I am looking to develop a rolling wave of projects that will run on any cloud platform, as well as taking advantage of the unique features that each provider offers. I've setup Github projects for managing the brainstorming and development of solutions for each of the four cloud platforms:
- AWS - A project site outlining the services, tooling, projects, and communication around HSDS AWS development.
- Azure - A project site outlining the services, tooling, projects, and communication around HSDS Azure development.
- Google - A project site outlining the services, tooling, projects, and communication around HSDS Google development.
- Heroku - A project site outlining the services, tooling, projects, and communication around HSDS Heroku development.
I want to incentivize the develop of APIs, that follow v1.1 of the HSDS OpenAPI. I'm encouraging PHP, Python, Ruby, and Node.js implementations, but open to other suggestions. I would like to have very simple API implementations in each language, running on all four of the cloud platforms, with push button (or at least easy) installation from Github for each implementation.
Ideally, we focus on single API implementations, until there is a complete toolbox that helps providers of all shapes and sizes. Then I'd love to see administrative, web search, and other applications that can be implemented on top of any HSDS API. I can imagine the following elements:
- API - Server-side implementations, or API implementation using specialized services available via any of the providers like Lambda, or Google Endpoints.
- Validator - A JSON Schema, andany other suggested validotr for the API definition, helping implementations validate their APIs.
- Admin - Develop an administrative system for managing all of the data, content, and media that is stored as part of an HSDS API implementation.
- Website - Develop a website or application that allows data, content, and media within an HSDS API implementation to be searched, browsed and engaged with by end-users.
- Mobile App - Develop a mobile application that allows data, content, and media within an HSDS API implementation to be searched, browsed and engaged with by end-users via common mobile devices.
- Developer Portal - Develop an API portal for managing and providing access to an HSDS API Implementation, allowing developers to sign up, and integrate with an API in their web, mobile, or another type of application.
- Push Button Deployment - The ability to deploy any of the server side API implementations to the desired cloud platform of your choice with minimum configuration.
I'm looking to incentivize a buffet of simple API-driven implementations that can be easily deployment by cities, states, and other organizations that help deliver human services. They shouldn't be too complicated or be trying to do everything for everyone. Ideally, they are simple, easily deployed infrastructure that can provide a seed for organizations looking to get started with their API efforts.
Additionally, I am looking understand the realities of running a single API design across multiple cloud platforms. It seems like a realistic vision, but I know it is one that will be much more difficult than my geek brain thinks it will be. Along the way, I'm hoping to learn a lot more about each cloud platform, as well as the nuance of keeping my API design simple, even if the underlying platform varies from provider to provider.
I am working to take an existing API, built on top of an evolving data schema, and move forward a common API definition that 211 providers in cities across the country can put to use in their operations. The goal with the Human Services Data Specification (HSDS) API specification is to encourage interoperability between 211 providers, allowing organizations to better deliver healthcare and other human services at the local and regional level.
So far, I have crafted a v1.0 OpenAPI derived from an existing Code for America project called Ohana, as well as a very CRUD (Create, Read, Update, and Delete) version 1.1 OpenAPI, with a working API prototype for use as a reference. I'm at a very important point in the design process with the HSDS API, and the design choices I make will stay with the project for a long, long time. I wanted to take pause and open up a conversation with the community about what is next with the APIs design.
I am opening up the conversation around some of the usual API design suspects like how we name paths, use headers, and status codes, but I feel like I should be also asking the big questions around use of hypermedia API design patterns, or possibly even GraphQL--a significant portion of the HSDS APIs driving city human services will be data intensive, and maybe GraphQL is one possible path forward. I'm not looking to do hypermedia and GraphQL because they are cool, I want them to serve specific business and organizational objectives.
To stimulate this conversation I've created some Github issues to talk about the usual suspects like versioning, filtering, pagination, sorting, status & error codes, but also opening up threads specifically for hypermedia, and GraphQL, and a thread as a catch-all for other API design considerations. I'm looking to stimulate a conversation around the design of the HSDS API, but also develop some API design content that can help bring some folks up to speed on the decision-making process behind the crafting of an API at this scale.
HSDS isn't just the design for a single API, it is the design for potentially thousands of APIs, so I want to get this as right as I possibly can. Or at least make sure there has been sufficient discussion for this iteration of the API definition. I'll keep blogging about the process as we evolve, and settle in on decisions around each of these API design considerations. I'm hoping to make this a learning experience for myself, as well as all the stakeholders in the HSDS project, but also provide a blueprint for other API providers to consider as they are embarking on their API journey, or maybe just the next major version of its release.
<p>The Runscope team recently published a post on a pretty cool approach to using Google Sheets for running API tests with multiple variable sets, which I thought is valuable at a couple of levels. They provide a template Google Sheet for anyone to follow, where you can plug in your variable, as well as your Runscope API Key, which allows you to define the dimensions of the tests you wish to push to Runscope via their own API.</p>
The first thing that grabs me about this approach is how Runscope is allowing their customers to define and expand the dimensions of how they test their API using Runscope in a way that will speak to a wider audience, beyond just the usual API developer audience. Doing this in a spreadsheet allows Runscope customers to customize their API tests for exactly the scenarios they need, without Runscope having to customize and respond to each individual customer's needs--providing a nice balance.
The second thing that interests me about their approach is the usage of a Googe Sheet as a template for making API calls, whether you are testing your APIs, or any other scenario an API enables. This type of templating of API calls opens up the API client to a much wider audience, making integration copy and pastable, shareable, collaborative, and something anyone can reverse engineer and learn about the surface area of an API--in this scenario, it just happens to be the surface area of Runscope's API testing API.Â
Runscope's approach is alignment with my previous post about sharing data validation examples. A set of assertions could be defined within a spreadsheetsÂ and any stakeholder could use the spreadsheet to executeÂ and make sure the assertions are met. This would have huge implications for the average business user to help make sure API contracts are meeting business objectives. I'm considering using this approach to empower cities, counties, and states to test and validate human services API implementations as part of my Open Referral work.
It told John Sheehan, the CEO of Runscope that their approach was pretty creative, and he said thatÂ "Google sheets scripts are underrated" and that Google Sheets is the "API client for the everyperson". I agree. I'd like to see more spreadsheet templates like this used across the API life cycle when it comes to design, deployment, management, testing, monitoring, and every other area of API operations. I'd also like to see more spreadsheet templates available for making calls to other common APIs, making APIs accessible to a much wider audience, who are familiar with spreadsheets, and more likely to be closer to the actual problems in which API solutions are designed to solve.
This is an article from the current edition of the API Evangelist industry guide to API definitions. The guide is designed to be a summary of the world of API definitions, providing the reader with a recent summary of the variety of specifications that are defining the technology behind almost every part of our digital world.
The U.S. Data Federation is a federal government effort to facilitate data interoperability and harmonization across federal, state, and local government agencies by highlighting common data formats, API specifications, and metadata vocabularies. The project is focusing on being a coordinating interoperability across government agencies by showcasing and supporting use cases that demonstrate unified and coherent data architectures across disparate agencies, institutions, and organizations.
The project is designed to shine a light on “emerging data standards and API initiatives across all levels of government, convey the level of maturity for each effort, and facilitate greater participation by government agencies”--definitely in alignment with the goal of this guide. There are currently seven projects profiled as part of the U.S. Data Federation, including Building & Land Development Specification, National Information Exchange Model, Open Referral, Open311, Project Open Data, Schema.org, and the Voting Information Project.
By providing a single location for agencies to find common schema documentation tools, schema validation tools, and automated data aggregation and normalization capabilities, the project is hoping to incentivize and stimulate reusability and interoperability across public data and API implementations. Government agencies of all shapes and sizes can use the common blueprints available in the U.S. Data Federation to reduce costs, speed up the implementation of projects, while also opening them up for augmenting and extending using their APIs, and common schema.
It is unclear what resources the U.S. Data Federation will have available in the current administration, but it looks like the project is just getting going, and intends to add more specifications as they are identified. The model reflects an approach that should be federated and evangelized at all levels of government, but also provides a blueprint that could be applied in other sectors like healthcare, education, and beyond. Aggregating common data formats, API specifications, metadata vocabularies, and authentication scopes will prove to be critical to the success of the overall climate of almost any industry doing business on the web in 2017.
If you have a product, service, or story you think should be in the API Evangelist industry guide to API design you can email me , or you can submit a Github issue for my API definition research, and I will consider adding your suggestion to the road map.
Building on earlier stories about how my API partners are making API deployment more modular and composable, and pushing forward my understanding of what is possible with API deployment, I'm looking into the details of what DreamFactory enables when it comes to API deployment. "DreamFactory is a free, Apache 2 open source project that runs on Linux, Windows, and Mac OS X. DreamFactory is scalable, stateless, and portable" -- making it pretty good candidate for running it wherever you need.
After spending time at Google and hearing about how they want to enable multi-cloud infrastructure deployment, I wanted to see how my API service provider partners are able to actually power these visions of running your APIs anywhere, in any infrastructure. Using DreamFactory you can deploy your APIs using Docker, Kubernetes, or directly from a Github repository, something I'm exploring as standard operating procedure for government agencies, like we see with 18F's US Forest Service ePermit Middlelayer API--in my opinion, all federal, state, and local government should be able to deploy API infrastructure like this.
One of the projects I am working on this week is creating a base blueprint of what it will take to deploy a human services API for any city in Google or Azure. I have a demo working on AWS already, but I need a basic understanding of what it will take to do the same in any cloud environment. I'm not in the business of hosting and operating APIs for anyone, let alone for government agencies--this is why I have partners like DreamFactory, who I can route specific projects as they come in. Obviously, I am looking to support my partners, as they support me, but I'm also looking to help other companies, organizations, institutions, and government agencies better leverage the cloud providers they are already using.
I'll share more stories about how I'm deploying APIs to AWS, as well as Google and Azure, as I do the work over the next couple of weeks. I'm looking to develop a healthy toolbox of solutions for government agencies to use. This weeks project is focused on the human services data specification, but next week I'm going to look replicating the model to allow for other Schema.org vocabulary, providing simple blueprints for deploying other common APIs like products, directories, link listings, and directories. My goal is to provide a robust toolbox of APIs that anyone can launch in AWS, Google, and Azure, with a push of a button--eventually.
This is an article from the current edition of the API Evangelist industry guide to API definitions. The guide is designed to be a summary of the world of API definitions, providing the reader with a recent summary of the variety of specifications that are defining the technology behind almost every part of our digital world.
A lot of attention is given to APIs and the world of startups, but in 2017 this landscape is quickly shifting beyond just the heart of the tech space, with companies, organizations, institutions, and government agencies of all shapes and sizes are putting APIs to work. API definitions are being applied to the fundamental building blocks of the tech sector, quantifying the computational, storage, images, videos, and other essential resources powering web, mobile, and device based applications. This success is now spreading to other sectors, defining other vital resources that are making a real impact in our communities.
One API making an impact in communities is the Human Services Data Specification (HSDS), also known as the Open Referral Ohana API. The project began as a Code for America project, providing an API, website, and administrative system for managing the organizations, locations, and the human services that communities depend on. Open Referral, the governing organization for HSDS, and the Ohana API is working with API Evangelist and other partners to define the next generation of the human services data specification, API definition, as well as the next generation of API, website, admin, and developer portal implementations.
The HSDS Specification API isn’t about any single API, it is a suite of API-first definitions, schema, and open tooling that cities, counties, states and federal government agencies can download or fork, and employ to help manage vital human services for their communities. Providing not just a website for finding vital services, but a complete API ecosystem that can be deployed incentivizing developers to build important web, mobile, and other applications on top of a central human services system. Better delivering on the mission of human services organizations, and meeting the demands of their constituents.
This approach to delivering APIs centers around a common data schema, extending it as an OpenAPI Spec definition, describing how that data is accessed and put to use across a variety of applications, including a central website and administrative system. While server-side HSDS API implementations, website, mobile, administrative, developer portal, and other implementations are important, the key to the success of this model is a central OpenAPI definition of the HSDS API. This definition connects all the implementations within an API’s ecosystem, but it also provides the groundwork for a future where all human services implementations are open and interoperable with other implementations--establishing a federated network of services meeting the needs of the communities they serve.
Right now each city is managing one or multiple human service implementations. Even though some of these implementations operate in overlapping communities, few of them are providing 3rd party access, let alone providing integration between overlapping geographic regions. The HSDS API approach employs an API-first approach, focusing on the availability and access of the HSDS schema, then adding on a website, administrative and API developer portals to support. This model opens up human services to humans via the website, which is integrated with the central API, but then also opens up the human services for inclusion into other websites, mobile and device applications, as well as integration with other systems.
The HSDS OpenAPI spec and schema provide a reusable blueprint that can be used to standardize how we provide human services. The open source approach to delivering definitions, schema, and code reduces the cost of deployment and operation for cash-strapped public organizations and agencies. The API-first approach to delivering human services also opens up resources for inclusion in our applications and system, potentially outsourcing the heavy lifting to trusted partners, and 3rd party developers interested in helping augment and extend the mission of human service organizations and groups.
If you’d like to learn more about the HSDS API you can visit Open Referral. From there you can get involved in the discussion, and find existing open source definitions, schema, and code for putting HSDS to work. If you’d like to contribute to the project, there are numerous opportunities to join the discussion about next generation of the schema and OpenAPI Spec, as well as develop server-side and client-side implementations.
If you have a product, service, or story you think should be in the API Evangelist industry guide to API design you can email me , or you can submit a Github issue for my API definition research, and I will consider adding your suggestion to the road map.
I spent a day last week at the Google Community Summit, learning more about the Google Cloud road map, and one thing I kept hearing them focus on was the notion of being able to operate on any cloud platform--not just Google. It's a nice notion, but how real of a concept is it to think we could run seamlessly on any of the top cloud platforms--Google, AWS, and Microsoft.
The concept is something I'll be exploring more with my Open Referral, Human Services Data Specification (HSDS) work. It's an attractive concept, to think I could run the same API infrastructure in any of the leading cloud platforms. I see two significant hurdles in accomplishing this: 1) Getting the developer and IT staff (me) up to speed, and 2) Ensuring your databases and code all runs and scales seamlessly whichever platforms you operate in. I guess I'd have to add 3) Ensure your orchestration and continuous integration works seamlessly across all platforms you operate on.
I am going to get to work deploying an HSDS compliant API on each of the platforms. My goal is to have just a simple yet complete API infrastructure running on Amazon, Google, and Microsoft. It is important to me that these solutions provide a complete stack helping me manage DNS, monitoring, and other important aspects. I'm also looking for there to be APIs for managing all aspects of my API operations--this is how I orchestrate and continuously integrate the APIs which I roll out.
Along with each API that I publish, I will do a write up on what it took to stand up each one, including the cloud services I used, and their API definitions. I am pretty content (for now) on the AWS platform, leveraging Github Pages as the public facade for my projects, and each repositories acting as the platform gears of API code, and definitions. Even though I'm content where I am at, I want to ensure the widest possible options available to cities, and other organizations who are looking to deploy and manage their human service APIs.
I was handed the URL for a human services API implementation for Miami. It was my job to now deploy a portal, documentation, and other supporting resources for the API implementation. This project is part of the work I'm doing with Open Referral to help push forward the API conversation around the human services data specification (HSDS).
I got to work forking my minimum viable API portal definition, to provide a doorway for the Miami Open211 API. Next, I got to work on setting up a basic presence for the human services API. I started with giving the portal a title, and a basic description of what the service does, then I got to work on each of the portal elements that will help people put the data to work.
It can be hard to cut through what you need to get going with an API and cut through all the information available. The portal has a getting started page providing a basic introduction, a handful of links to the documentation, code, and where to get help--the page is driven from a YAML data store available in the _data folder for the repository.
I included an authentication page to make it clear that the API is publicly available, but also provide a placeholder to explain that we will be opening up write access to the organizations, locations, and services that are being made available--the page is driven from a YAML data store available in the _data folder for the repository.
Frequently Asked Questions
Next, I wanted to always have the most frequently asked questions front and center where anyone can find it. I am using this page as a default place to publish any questions asked via Github, Twitter, or email. The page is driven from a YAML data store available in the _data folder for the repository.
To help jumpstart integration, I also generated and published a Postman Collection, so that anyone can quickly import it into their client, and get to work playing with the API in their environment. You can do this with OpenAPI also, but Postman helps extend the possibilities--the Postman Collection is editable via its Github page.
Next, I published a road map so we could share what is next for the project, providing a list where developers can stay in tune with what is going to happen. The road map entries are pulled from the Github issues for any entry with the road map label. There is a script that can be run regularly to keep the issues in sync with the roadmap, and the page is driven from a YAML data store available in the _data folder for the repository.
Similar to the road map, I created a page for sharing any open Github issues, that are labeled 'issues', to help communicate outstanding and known issues with the platform. It stays in sync using the same script, and the page is driven from a YAML data store available in the _data folder for the repository.
In addition to a road map and known issues, when these items get accomplished or fixed they get moved to the change log, keeping a complete history of everything that has changed with the platform. It stays in sync using the same script, and the page is driven from a YAML data store available in the _data folder for the repository.
Terms of Service
For the terms of service, I just grabbed an open source copy from Wikidot, providing a baseline place to start when it comes to the TOS for the API-the terms of service is editable via its Github page.
The blog for the project is driven by the Jekyll framework for the developer portal hosted on Github Pages. To manage the blog entries, you just add or update pages in the _posts folder for the website. All entries in the _post folder are listed in chronological order on the blog page for the developer portal.
This developer portal runs 100% on Github and leverages the potential of Jekyll when running on Github. The API is hosted on Heroku and run by someone else, but the developer portal is a static website, completely editable via Github through the web interface, API, or locally with the desktop client. Github also provides much of the support framework for the project, driving the roadmap, issues, change log, and 1/3 of the support options for developers--the entire site is driven from the _data store, with the website just being a Liquid-driven Jekyll template.
This developer portal is defined by its OpenAPI definition. It drives the documentation, generated the code samples, fired up the API Science monitors, and is the central contract defining the APIs operation. I will be keeping the OpenAPI up to date, and using it as the central truth for the API and its operations.
The portal is entirely indexed using APIs.json, providing a single machine-readable definition of the API and its operations. All the supporting pages of the API are linked to in the index, and their contents and data are all machine-readable YAML and JSON. The APIs.json provides access to the OpenAPI which describes the surface area of the API, as well as providing links to all it's supporting operations.
What Is Next?
I'm going to put things down for a couple days. We still need some FAQs entered, and the content needs fluffing up and editing. Then we'll invite other folks to take a look at. Then I will get to work on the POST, PUT, PATCH, and DELETE paths, and once those are certified I will push as part of the OpenAPI, regenerate the code samples, and turn on the ability for people to get involved not just reading data, but also potentially adding and updating data in the database--making things a group effort.
I'm going to take what I have here, and fork it into a new project, making it a baseline demo portal for Open Referral. My goal is to have a downloadable, forkable API portal that is well documented, and anyone providing an HSDS compliant API can put to use for their project. I just wanted to take a moment and gather my thoughts on what I did and share the approach I took with you.
I am keeping a version of an OpenAPI Spec in sync with a Data Package. It's not a perfect sync because the Data Package doesn't describe the surface area of the API, just the underlying data schema used on the backend. During project discussions, one of the concerns that was brought up focused on the loss of primary and foreign key references. During our next discussion, I wanted to have a more coherent explanation of why this was ok, and this post will help me do that.
The OpenAPI Spec I've created has each resource in the Data Package represented but leaves out the database relationships represented by those keys in the backend. The API defines the basic CRUD (Create, Read, Update and Delete) for each resource represented, but allows the relationships to be expressed using the URI. Each item in the Data Package has a corresponding path, and each relationship is available as its own path as well--in this case an example might be /locations, and /locations/services/.
All the relationships are defined and enforced in the URI given for each API requests, and HTTP takes care of the indexing, performance, and other considerations using caching, and other basic building blocks of the web. My friend James Higginbotham (@launchany) compared this to the concept of views in database backend speak, and OpenAPI Spec describes the HTTP version of OCI (oracle), or TSQL (MS SQL)--depending on what you speak. As an old database guy I like that, "web views", but relying on the request and responses employed as part of the API design.
My explanation isn't as coherent as I'd like, but this gives me a start. I'm trying to help database folks who are keepers of the backend, and the schema better understands that what I'm doing with OpenAPI Spec augments and evolves their work. I do not want them to think I am looking to replace or compete with what they are bringing to the table. I'll keep working on this argument because I want to be able to be able to better articulate why API design is an important part of the process, and that defining the API surface area using OpenAPI Spec, as well as the data model in play using specifications like JSON Schema and Data Package is an important thing.
I am working with Open Referral to evolve the schema for the delivery of human services, as well as helping craft a first draft of the OpenAPI Spec for the API definition. The governing organization is looking to take this to the next level, but there are also a handful of the leading commercial providers at the table, as well other groups closer to the municipalities who are implementing and managing Open211 human service implementations.
I was working with Open Referral on this before checking out this last summer, and would like to help steward the process, and definition(s) forward further in 2017. This means that we need to speak using a common language when hammering out this specification and be using a common platform where we can record changes, and produce a resulting living document. I will be borrowing from existing work I've done on API definitions, schema, and scope across the API space, and putting together a formal process design specifically for the very public process of defining, and delivering human services at the municipal level.
I use OpenAPI Spec (openapis.org) as an open, machine readable format to drive this process. It is the leading standard for defining APIs in 2017, and now is officially part of the Linux Foundation. OpenAPI Spec provides all stakeholders in the process with a common language when describing the Open Referral in JSON Schema, as well as the surface area of the API that handles responses & requests made of the underlying schema.
I have an OpenAPI Spec from earlier work on this project, with the JSON version of the machine-readable definition, as well as a YAML edition--OpenAPI Spec allows for JSON or YAML editions, which helps the format speak to a wider, even less technical audience. These current definitions are not complete agreed upon definitions for the human services specification, and are just meant to jumpstart the conversation at this point.
OpenAPI Spec provides us with a common language to use when communicating around the API definition and design process with all stakeholders, in a precise, and machine readable way. OpenAPI Spec can be used to define the master Open Referral definition, as well as the definition of each individual deployment or integration. This opens up the opportunity to conduct a "diff" between the definitions, showing what is compatible, and what is not, at any point in time.
The platform I will be using to facilitate this discussion is Github, which provides the version control, "diff", communication, and user interactions that will be required throughout the lifecycle of the Open Referral specification. Allowing each path, parameter, response, request, and other elements to be discussed independently, with all history present. Github also provides an interesting opportunity for developing other tools, like I have for annotating the API definition as part of the process.
This approach to defining a common data schema and API definition requires that all stakeholders involved become fluent in OpenAPI Spec, and JSON Schema, but is something that I've done successfully with other technical, as well as non-technical teams. This process allows us all to all be on the same page with all discussion around the Open Referral API definition and schema, with the ability to invite and include new participants in the conversation at any time using Github's existing services.
Once a formal draft API specification + underlying JSON schema for Open Referral is established, it will become the machine readable contract and act as a central source of truth regarding the API definition as well as the data model schema. It is a contract that humans can follow, as well as be used to drive almost every other stop along the API life cycle like deployment, mocking, management, testing, monitoring, SDKs, documentation, and more.
This process is not unique to Open Referral. I want to be as public with the process to help other people, who are working to define data schema, understand what is possible when you use APIs, OpenAPI Spec, JSON Schema, and Github. I am also looking to reach the people who do the hard work of delivering human services on the ground in cities and help them understand what is possible with Open Referral. Some day I hope to have a whole suite of server-side, and client-side tooling developed around the standard, empowering cities, organizations, and even commercial groups deliver human services more effectively.
This is my walk-through of the concepts involved with the monetization of public data using APIs. In this work I am not advocating that companies should be mindlessly profiting from publicly available data, my intent is to provide a framework for organizations to think through the process of generating revenue from commercial access to public data, acknowledging that it costs money to aggregate, serve up, and keep data up to date and usable for the greater public good--if public data is not accessible, accurate, and up to date it is of no use to anyone.
I have long argued that companies and even government agencies should be able to charge for commercial access to public data and be able to generate revenue to cover operational costs, and even produce much-needed funds that can be applied to the road map. My work in this has been referenced in existing projects, such as the Department of Interior and Forest Service looking to generate revenue from commercial access and usage of public data generated by the national parks systems. In my opinion, this conversation around generating revenue from publicly available digital assets should be occurring right alongside the existing conversations that already are going on around publicly available physical assets.
Building Upon The Monetization Strategies Of Leading Cloud Providers
My thoughts around generating revenue from public open data is built upon monitoring the strategies of leading online platforms like Amazon Web Services, Google, and others. In 2001 a new approach to providing access to digital resources began to emerge from Internet companies like Amazon and Salesforce, and by 2016, it has become a common way for companies to do business online, providing metered, secure access to valuable corporate and end-users data, content, and other algorithmic resources. This research looks to combine these practices into a single guide that public data stewards can consider as they look to fund their important work.
Do not get me wrong, there are many practices of leading tech providers that I am not looking to replicate when it comes to providing access to public data, let alone generating revenue. Much of the illness in the tech space right now is due to the usage of APIs, it is due to a lack of creative approaches to monetizing digital assets like data and content, and terms of service that do not protect the interest of users. My vantage point is the result of six years studying the technology, business, and politics of the API sector, while also working actively on open data projects within city, state, and federal government--I'm looking to strike a balance between these two worlds.
Using Common Government Services As A Target For Generating Much-Needed Revenue
For this research, I am going to use a common example of public data, public services. I am focusing in this area specifically to help develop a strategy for Open Referral but it is also a potential model that I can see working beyond just public services. I am looking to leverage my existing Open Referral work to help push this concept forward, but at the same time, I am hoping it will also provide me with some public data examples that are familiar to all of my readers, giving me with some relevant ways to explain some potentially abstract concepts like APIs to the average folk we need to convince.
For the sake of this discussion things down and focus on three areas of public data, which could be put to work in any city across the country:
- Organizations - The business listings and details for public and private sector agencies, businesses, organizations, and institutions.
- Locations - The details of specific locations which organizations are providing access to public services.
- Services - The listings and details of public services offered at the municipal, county, state, or federal levels of government.
Open Referral is a common definition for describing public services organizations, locations, and services, allowing the government, organizations, institutions, and companies to share data in a common way, which focuses on helping them better serve their constituents--this is what public data is all about, right? The trick is getting all players at the table to speak a common language, one that serves their constituents, and allows them to also make money.
While some open data people may snicker at me suggesting that revenue should be generated on top of open data describing public services, the reality is that this is already occurring--there are numerous companies in this space. The big difference is it is currently being done within silos, locked up in databases, and only accessible to those with the privileges and the technical expertise required. I am looking to bring the data, and the monetization out of the shadows, and expand on it in a transparent way that benefits everybody involved.
Using APIs To Make Public Data More Accessible and Usable In A Collaborative Way
Publicly available data plays a central role in driving websites, mobile applications, and system to system integrations, but simply making this data available for download only serves a small portion of these needs, and often does so in a very disconnected way, establishing data silos where data is duplicated, and the accuracy of data is often in question. Web APIs are increasingly being used to make data not just available for downloading, but also allow it to be updated, and deleted in a secure way, by trusted parties.
For this example I am looking provide three separate API paths, which will give access to our public services data:
- http://example.com/organizations/ - Returns JSON or XML listing and details of organizations for use in other applications.
- http://example.com/locations/ - Returns JSON or XML listing and details of organizational locations for use in other applications.
- http://example.com/services/ - Returns JSON or XML listing and details of public services for use in other applications.
A website provides HTML information for humans, and web APIs provides machine readable representations of the same data, making it open for use in a single website, but also potentially multiple websites, mobile applications, visualizations, and other important use cases. The mandate for public data should ensure it isn't available on a single website but is as many scenarios that empower end-users as is possible. This is what APIs excel at, but is also something that takes resources to do properly, making the case for generating revenue to properly fund the operations of APIs in the service of the public good.
The Business of Public Data Using Modern Approaches to API Management
One of the common misconceptions of public web APIs is that they are open to anyone with access to the Internet, with no restrictions. This might be the case for some APIs, but increasingly government agency, organizations, and institutions are making public data available securely using common API management practices defined by the Internet pioneers like Salesforce, Amazon, and Google over the last decade.
API management practices provide some important layers on top of public data resources, allowing for a greater understanding and control over how data is accessed and put to use. I want to provide an overview of how this works before I dive into the details of this approach by outlining some of the tenets of an API management layer:
- Users - Requiring users to register, establishing a unique account for associated all API and public data activity.
- Applications - Requiring users to define the application (the web, mobile, visualization, etc.) and other viable information regarding their access to the public data.
- Keys - Issuing of unique API keys for each application, requiring their inclusion in all consumption of public data via the API.
- Service Composition - Placement of public data resource (organizations, locations, services) into tiers, defining which APIs different users have access to and the scope of that access.
- Resource Specific - Allowing access to specific data resources to a select audience.
- Read / Write - Restricting write access to select users and applications.
- Data Specific - Limiting which data is returned, filtering based on who is requesting it.
- Rate Limits - All APIs are rate limited, allowing for different levels of access to public data resources, which can be defined in alignment with the costs associated with operations.
- Logging - Each API call is logged, with required user application keys, as well as details of the request and response associated with each API call.
- Analytics - The presence of a variety of visual layers that establish an awareness of who is accessing public data APIs, what they are accessing, and details on how and where it is being applied.
These seven areas provide some very flexible variables which can be applied to the technical, business, and politics of providing access to public data using the Internet. Before you can access the organizations, locations, and service information via this example public services API you will need to be a registered user, with an approved application, possessing valid API keys. Each call to the API will contain these keys, identifying which tier of access an application is operating within, which API paths are available, the rate limits in existence, and logging of everything you consume and add so it can be included as part of any operational analytics.
This layer enables more control over public data assets, while also ensuring data is available and accessible. When done thoughtfully, this can open up entirely new approaches to monetization of commercial usage by allowing for increased rate limits, performance, and service level agreements, which can be used to help fund the public data's mandate to be accessible by the public, researchers, and auditors.
Providing The Required Level Of Control Over Public Data Access
Understandably, there concerns when it comes to publishing data on the Internet. Unless you have experience working with modern approaches to delivering APIs it can be easy to focus on losing control over your data when publishing on the web--when in reality data stewards of public data can gain more control over their data when using APIs over just publishing for a complete download. There are some distinct ways that API providers are leveraging modern API management practices to evolve greater levels of control over who accesses data, and how it is put to use.
I wanted to highlight what can be brought to the table by employing APIs in service of public data, to help anyone make the argument for why providing machine readable data via APIs is just as important as having a website in 2016:
- Awareness - Requiring all data to be accessed via APIs which required keys to be used for ALL applications, combined with a comprehensive logging strategy, brings a new level of awareness regarding which data is accessed, and how it is being used, or not used.
- Security - While API keys are primarily used for logging and analytics, it also ensures that all public data resources are secured, providing tiered levels of access to 3rd parties based upon trust, contributing value to the data, and overall platform involvement--allowing data to be securely made available on the open web.
- Quality Control - APIs act as central gatekeeper regarding how data is updated, evolved, and deleted, allowing for a centralized, yet also potentially distributed, and collaborative approach to ensuring public data is accurate, possessing a high level of quality control.
- Terms of Service - All API usage is governed by the legal terms of service laid out as part of platform operations, requiring all users to respect and follow terms of service if they expect to maintain their public data API keys.
- Governance - Opening up the overall management of the availability, usability, integrity, and security of the public data which may include oversight from governing body or council, a defined set of procedures, and a plan to execute those procedures.
- Provenance - Enabling visibility into the origins, history, and oversight of public data, helping establish the chain of custody regarding shared use of valuable data across platform operations.
- Observability - Allowing for the observability of data resources, and its contributors and consumers using existing platform outputs and mechanisms, enabling high levels of awareness through the API management framework employed as part of platform operations, meeting service level agreements, and expected availability.
It is important to discuss, and quantify this control layer of any public data being made available via APIs if we are going to talk about monetization. Having APIs is not enough to ensure platform success, and sometimes too strict of control can suffocate consumption and contribution, but a lack of some control elements can also have a similar effect, encouraging the type of consumption and contribution that might not benefit a platform's success. A balanced approach to control, with a sensible approach to management and monetization, has helped API pioneers like Amazon achieve new levels of innovation, and domination using APIs--some of this way of thinking can be applied to public data by other organizations.
Enabling and Ensuring Access To Public Data For Everyone It Touches
Providing access to data through a variety of channels for commercial and non-commercial purposes is what modern API management infrastructure is all about. Shortly after possessing a website became normal operating procedure for companies, organizations, institutions, and government agencies, web APIs began to emerge to power networks of distributed websites, embeddable widgets, and then mobile applications for many different service providers. APIs can provide access to public data, while modern API management practices ensure that access is balanced and in alignment with platform objectives--resulting in the desired level of control discussed above.
There are a number of areas of access that can be opened up by employing APIs in the service of public data:
- Internal - APIs can be used by all internal efforts, powering websites, mobile applications, and other systems. The awareness, logging, and other benefits can just as easily be applied to internal groups, helping understand how resources are used (or not used) across internal groups.
- Partner - After internal access to data resources, access levels can be crafted to support partner tiers of access, which might include access to special APIs, read and write capabilities, and relaxing of rate limits. These levels often include service level agreements, additional support options, as well as other benefits.
- Public - Data can be made publicly available using the web, while also maintaining the quality and security of the data, keep the access as frictionless as possible, while ensuring things stay up and running, and of expected quality and availability.
- Privacy - Even with publicly available data there is a lot to consider when it comes to the privacy of organizations, locations, and services involved, but also the logging, and tracking associated with platform operations.
- Transparency - One important characteristic of API platform is transparency in the API management layer, being public with the access tiers, resources available, and how a platform operates--without necessary transparency, consumers can become distrustful of the data.
- Licensing - Ideally all data and all schema in this scenario would be licensed as CC0, putting them into the public domain, but if there are license considerations, these requirements can be included along with each API response, as well as in platform documentation.
- Platform Meta API - APIs do not just provide access to the public data, they also provide access to the API management layer for the public data. Modern API management allows for API access to the platform in the several important ways:
- Users - Providing API access to user's data and usage.
- Apps - Providing API access to applicaiton level data and usage.
- Tiers - Providing API access to platform tiers and details.
- Logs - Providing API access to the platform log files.
- Billing - Providing API access to the platform billing for access.
- Analytics - Providing API access to the analytics derived from logs, billing, and usage.
- Observability - An API management layer on top of public data makes data access observable, allowing platform operators, government agencies, and potentially 3rd party and independent auditors--observability will define both the control as well as access to vital public data resources.
In a world that is increasingly defined by data, access to quality data is important and easy, secure access via the Internet is part of the DNA of public data in this world. API management provides a coherent way to define access to public data, adhering to the mandate that the data is accessible, while also striking a balance to ensure the quality, reliability, and completeness of the public data.
There has been a lot of promises made in the past about what open or public data can do by default when in reality opening up data is not a silver bullet for public services, and there is a lot more involved in successfully operating a sustained public data operation. APIs help ensure data resources are made available publicly, while also opening up some new revenue generation opportunities, helping ensure access is sustainable and continues to provide value--hopefully find a balance between public good and any sensible commercial aspirations that may exist.
APIs Open Up Many Potential Applications That Support the Mission
As doing business on the web became commonplace in the early 21st century, Amazon was realizing that they could enable the sales of their books and other products on the websites of their affiliate partners by using APIs. In 2016 there are many additional applications being developed on top of APIs, with delivering public data to multiple web sites being just the beginning.
- Web - It is common for websites to pull from a database. Increasingly APIs are being used to drive not just a single website, but networks, and distributed websites that draw data and content from a variety of sources.
- Mobile - APIs are used to make data and content available across a variety of mobile applications, on different platforms.
- Embeddable - Delivering data to buttons, badges, bookmarklets, and widgets that can be embedded across a variety of websites, and applications.
- Conversational - Using data in conversational interfaces like for bots, messaging, and voice-enabled applications.
- Visualizations - Including data in visualizations, showing API consumption, and platform usage around public data.
- iPaaS / ETL - Enabling the migration of public data to and from other external 3rd party platforms using traditional ETL, or more modern iPaaS solutions powered via the API.
- Webhooks - Notifying external systems of relevant events (location or service update) by pushing to URLs via what is called a webhook.
- Spreadsheets - Publishing of data to Microsoft Excel or Google Spreadsheet using the public data APIs, as well as spreadsheet APIs.
This is just an overview of the number of ways in which a single, or multiple APIs can be used to deliver public data to many different endpoints, all in service of a single mission. When you consider this in support of public services, a bigger picture of how APIs and public data can be used to better serve the population--the first step always involved standardized, well-planned set of APIs being made available.
The Monetization Requirements Around Public Data API Operations
This is where we get to the portion of this discussion that is specifically about monetization of the operations around publishing and maintaining high-quality sources of public data. Before a sensible monetization strategy can be laid out, we need to be able to quantify what it costs to operate the platform and generate the expected value from everyone at the table.
What are the hard costs that should be considered when operating a public data platform and looking to establish some reasonable monetization objectives?
- Data Acquisition - What one-time, and recurring costs are associated with acquiring data. This might include ongoing payouts to API contributors who are adding, updating, and validating data via the API.
- Discover - What was spent to discover data, and identify its use on the platform.
- Negotiate - What time to I have in actually getting access to something.
- Licensing - Are there licensing costs or fees involved in the acquisition of data.
- Development - What one-time, and recurring costs are associated with platform development.
- Normalization - What does it take me to clean up, and normalize a data set, or across content. This is usually the busy janitorial work necessary.
- Validation - What is involved with validating that data is accurate correct, providing sources, and following up on references.
- Database - How much work is being put putting into setting up the database, maintaining, backing up, and delivering optimal levels of performance.
- Server - Defining the amount of work put into setting up, and configuring the server(s) to operate an API, including where it goes in the overall operations plan.
- Coding - How much work goes into actually coding an API. Ideally, open source frameworks are employed to reduce overhead, maintenance, and the resources needed to launch new endpoints.
- Operational - What one-time, and recurring costs are associated with platform development.
- Compute - What costs are associated with providing server compute capacity to process and deliver public data via APIs.
- Storage -What costs are associated with on the disk storage, for both the database and other images, video, and related objects.
- Network - How much bandwidth in / out is an API using to get the job done, as well as any other network overhead.
- Management - What percentage of API management resources is dedicated to the API. A flat percentage of API management overhead until usage history exists.
- Monitoring - What percentage of the API monitoring, testing, and performance service budget is dedicated to this API. How large is the surface area for monitoring?
- Security - What does it cost to secure a single API, as part of the larger overall operations? Does internal resource spend time, or is this a 3rd party service.
Understand The Value Being Generated By Public Data
Now that we understand some of our hard costs, let's have an honest conversation about what value is being generated? First, public data has to offer value, or why are we doing all this hard work? Second, nobody is going to pay for anything if it doesn't offer any value. Let's stop for a moment and think about why we are doing all of this in the first place, and what value is worthy of carving off to drive monetization efforts.
- Direct Value Generation - What direct value is being generated by the public data platform operations.
- Usage - How is usage wielded as part of value generation? Is value about the increased usage of a resource, or possible value generated by a minimum usage of a resource? Usage is an important dimension of determining how value is generated as part of API operations.
- Users - How is the value generated on a per user level? Are more users valuable? or possibly more targeted users? Teams, groups, and many other ways to determine how users impact positively or negatively the value generated from platform usage.
- Relationships - How can relationships between users, or companies be applied to value generated? Will access to more relationships positively or negatively impact how value is generated for the platform and consumers?
- Data Acquisition - Is the acquisition of data part of the generation of value via the public data platform, encouraging the growth of data.
- Applications - Is value generated looked at on a per application basis? Does having multiple applications impact the value generate? Coming up with interesting ways to understand how applications impact platform value for everyone.
- Integrations - What external integrations are available? How can these be leveraged to enhance the value for consumers? Are some integrations part of base operations, where others are accessible at higher levels, or on a one-off basis.
- Support - Is access to support something that impacts the value being generated? Does access to support resources introduce the optimal levels of value consumers are looking for? How is support wielded within overall platform monetization?
- Service Level Agreements - Are we measuring the number of contracts signed, and partner agreements in place? And how we are delivering against those agreements?
- Revenue - What revenue opportunities for the ecosystem around an API and its operation, sharing in the money made from platform operations. Obviously, anything beyond operating costs should be applied to expansion of efforts.
- Indirect Value - What are some of the indirect value being generated by the public data platform operations.
- Marketing Vehicle - Having an API is cool these days, and some APIs are worth just having because of the PR value, and discussion.
- Traffic Generation - The API exists solely for distributing links to the web and mobile applications, driving traffic to specific properties - is this tracked as part of other analytics?
- Brand Awareness - Applying a brand strategy, and using the API to incentivize publishers to extend the reach of the brand and ultimately the platform - can we quantify this?
- Analysis - How can analytics be leveraged as part of API value generation? Are analytics part of the base of operations, or are they an added value incentive for consumers, and platform operators.
- Competitiveness - Is the public data effort more agile, flexible, and competitive because it has an API and can deliver on new integrations, partnerships, and to new platforms easier, and more cost effectively?
- Public Service - Making data available for use on many different web, mobile, and other applications demonstrates a commitment to public service, and the good public data can do.
While there may be other hard costs associated, as well as areas of value being generated, this should provide a simple checklist that any open data provider can use as a starting blueprint. Additional costs can be included on in these existing areas or added on as new areas as deemed relevant--this is just about getting the monetization conversation going.
There are two main objectives in this exercise: 1) understanding the hard costs and value associated with operations 2) assembling into a coherent list so that we can explain to others as part of transparency efforts. When it comes to the business of public data, it is more than just generating revenue, it is about being upfront and honest about why we are doing this, and how it is done--mitigating the political risk involved with doing business with public resources out in the open.
Putting Together A Working Plan Involving Public Data
With an understanding of the hard costs of providing a public data platform and an awareness of the intended value to be generated via operations, we can now look at what details would be involved in a plan for executing this monetization strategy. API management practices are architected for metering, measuring, and limiting access to data, content, and algorithmic resources in service of a coherent, transparent public data monetization strategy.
Here is a core framework of API management that can be applied to public data that can be used to drive monetization efforts:
- Access - What are the default access levels for public data access via the API.
- Self-Service - Public access to the platform via user registration, or 3rd party authentication like Twitter, Facebook, or Github.
- Approval - Access level(s) that require the approval of user or application before they are able to access platform resources.
- Tiers - What are the multiple tiers of access to all data resources available via API.
- Public - Define the default public access for the platform, with a free, limited access tier that is obtainable via a self-service registration without approval.
- Contributor - Providing a tier of access to contribute content, validate and management data on the platform.
- Service Provider - Providing a tier of access for service providers involved with public data operations.
- Internal - Access tier for internal groups, used by all websites, mobile applications, and system integrations.
- Partner - Tier(s) of access design for data, and other partners involved in the management of public data resources.
- Commercial - Access tier(s) for commercial usage of public data resources with higher levels of access for set fees.
- Non-Commercial - Access tier(s) for non-commercial usage of public data resources with specific access waving fees.
- Government - A set of API resources is available specifically for government access.
- Auditor - Access across APIs specifically designed for external 3rd party auditors.
- Elements - What are the core elements that make up the service composition for the monetization plan(s).
- Paths - Establishing plans based upon the availability and access to different API endpoints, including platform meta API.
- Read / Write - Restricting read and write access to public data to specific tiers, limiting who writes data to only trusted applications.
- Time Frames - What are the timeframes that impact the public data / API monetization plan(s) and consumption.
- Daily - What are the details for managing, guiding, and restricting plan entries each day.
- Weekly - What are the details for managing, guiding, and restricting plan entries in weekly timeframes.
- Monthly - What are the details for managing, guiding, and restricting plan entries on a monthly basis.
- Metrics - What is being measured to quantify value generated, providing a framework to understand monetization possibilities.
- API Calls - Each call to the API is measured, providing the cornerstone of monetizing access and contribution to public data--remember not all calls will cost, some will add value with contributions.
- URL Clicks - Each click on a URL served up via the API drive data and content is measured, providing details on value delivered to internal and external websites--URL shortener required for this.
- Searches - All searches conducted via the API are measured, providing details on what users are looking for.
- Users - Measure user acquisitions and history to keep track of the value of each platform user.
- Applications - Measure the number of applications added, with details of activity to understand value generated.
- Limits - What are the limitations imposed across all tiers of access as part of the API monetization plan.
- API Calls - How many API calls any single application can make during a specific time frame.
- IP Address - Which IP addresses you can request data from, limiting the scope of serving data.
- Spend - How much any user can spend during a given time period, for each user or application.
- Pricing - What prices are set for different aspects of the monetizing the platform.
- Tiers - What are the base prices established for each tier of API access.
- Unit - What are the default unit prices of per API call access for each tier.
- Support - What charges are in place for receiving support for platform applications.
- SLA - What costs are associated with delivering specific quality or levels of service and availability?
These are the moving parts of a public data monetization strategy. It allows any public data resources to be made available on the web, enabling self-service access to data 24/7. However, it does it in a way that requires accountability by ALL consumers, whether they are internal, partner, or the public at large. This API management scaffolding allows for the frictionless access to public data resources by the users and applications that are identified as worthwhile, and imposing limits, and fees for higher volume and commercial levels of access.
Speaking To A Wide Audience With This Public Data Monetization Research
I purposely wrote this document to speak to as wide as possible audience as possible. In my experience working with public data across numerous different industries, there can be a wide variety of actors involved in the public data stewardship pipeline. My objective is to get more public data accessible via web APIs, and generating revenue to help fund this is one of my biggest concerns. I am not looking to incentivize people in making unwarranted profits on top of public data, this is already going on. My goal is open up the silos of public data out there right now, make them more accessible, while opening up the opportunity for delivering to a variety of applications, while also funding this important process.
I wanted to help anyone reading this to craft a coherent argument for generating much-needed revenue from public data, whether they are trying to convince a government agency, non-profit organization, institution, or a commercial company. Public data needs to be available in a machine-readable way for use in a variety of applications in 2016--something that takes resources and collaboration. APIs are not another vendor solution, they are the next step in the evolution of the web, where we don't just make data available for humans by publishing as HTML--we need the raw data available for use in many different applications.
Sharing of API definitions is critical to any industry or public sector where APIs are being put to work. If the API sector is going to scale effectively, it needs to be reusing common patterns, something that many API and open data providers have not been that great at historically. While this is critical in any business sector, there is no single area where this needs to happen more urgently than within the public sector.
I have spent years trying wade through the volumes of open data that comes out of government, and even spent a period of time doing this in DC for the White House. The lack of open API definition formats like OpenAPISpec, API Blueprint, APIs.json, and JSON Schema across government is a passion of mine, so I'm very pleased to the new US Data Federation project coming out of the General Services Administration (GSA).
"The U.S. Data Federation supports data interoperability and harmonization across Federal, state, and local government agencies by highlighting common data formats, API specifications, and metadata vocabularies."
The U.S. Data Federation has focused in on some of the existing patterns that exist in service of the public sector, including seven existing initiatives:
- Building & Land Development Specification
- National Information Exchange Model
- Open Referral
- Project Open Data
- The Voting Information Project
I am a big supporter of Open Referral, Open311, Project Open data, and Schema.org. I will step up and get more familiar with the building & land development specification, national information exchange model, and the voting information projects. The US Data Federal project echoes the work I've been doing with Environmental Protection Agency (EPA) Envirofacts Data Service API, Department of Labor APIs, FAFSA API, and my general Adopta.Agency efforts.
Defining the current inventory of government APIs and open data using OpenAPI Spec, and indexing the with APIs.json is how we do the hard work of identifying the common patterns that are already in place and being used by agencies on the ground. Once this is mapped out, we can begin the long road towards defining the common patterns that could be proposed as future initiatives for the US Data Federation. I think the project highlights this well on their about page:
"These examples will highlight emerging data standards and API initiatives across all levels of government, convey the level of maturity for each effort, and facilitate greater participation by government agencies."
The world of API definitions is a messy one. It may seem straightforward if you are a standards oriented person. It may also seem straightforward if you are a scrappy startup person. In reality, the current landscape is a tug of war between these two words. There are a wealth of existing web API concepts, specifications, and data standards available to us, but there are also a lot of leading definitions being defined by tech giants like Amazon, Google, Twitter, and others. With the tone set by VC investment, and distorted views on what intellectual property is, the sharing of open API definitions and schemas has been deficient across many sectors, for many years.
What the GSA is doing with the US Data Federation project is important. They are mapping out the common patterns that already exist, and providing a forum for helping identify others, as well as to help evolve the less mature, or disparate API and schema patterns out in the wild. A positive sign that they are heading in the right direction is also that the US Data Federation project is operating on Github. It is important that these common patterns exist on the social coding platform, as it's increasingly being used as an engine for the API economy--touch all stops along the API life cycle.
I will carve out the time to go through some of my existing government open data work, which includes rebooting my Open Referral leadership role. I'm finding that just doing the hard work crafting OpenAPI Specs for government APIs is a very important piece of the puzzle. We need a machine-readable map of what already exists, otherwise, it is very difficult to find a way forward in the massive amounts of government open data available to us. However, I believe that when you take these machine readable API definitions and put them on Github, it becomes much easier to find the common patterns that the GSA is looking to define with US Data Federation.
As I was preparing for my talk with Dan from Open Referral, I published some of my thoughts on the organization, and the Human Services Data Specification (HSDS). One of the things I did as part of that work was generating of a first draft of an OpenAPI Spec for the Open Referral API. To create that draft, I used the existing Ohana API as the base, exposing the same endpoints as they did in the Code For America project.
Over the last couple days, I spent some more time getting to know the data model set forth by HSDS, and got to work evolving my draft OpenAPI Spec to be closer in alignment with the data schema. To do this I took the JSON Schema for HSDS that was available on the Github, and generated used it as a framework to add any missing elements to the API definition, resulting in almost 70 API paths, in support of almost 20 separate entities dictated in the HSDS.
|Open Referral API||OpenAPI Spec|
This is a very formulaic, generated, representation of what the Open Referral API could look like. While I have lots of ideas on how to improve on the design, I want to be cautious to not project too much of my own views on the API design--something the community should do together. I can tell a lot of work went into the current specification, and the same amount of energy should go into evolving the API design.
I accomplished what I wanted. Learn more about HSDS, get more familiar with the entities at play, while also producing a fairly robust representation of what an API could look like for Open Referral. It has way more details than the average implementation will need, but I wanted to cover all the bases, providing full control over every entity, and relationship represented in HSDS. Most importantly I was able to get more intimate with the specification, while also producing an OpenAPI Spec that will playing a central role throughout this project.
Next I'm going to play with some minimum viable representations, and other ways to tell stories and talk about the HSDS. I'd like to eventually have a whole toolbox of YAML / JSON driven UI elements, like the one I pasted in this post, allowing me to describe all the moving parts of the Open Referral work. More posts to come, as I work through my thoughts, and play possible designs for the Human Services Data Specification (HSDS).
I am working on several very rewarding API efforts lately, but one I'm particularly psyched about is Open Referral. I'm working with them to help apply the open API format in a handful of implementations, but to also share some insight on what the platform could be in the future. I have been working to carve out the time for about it, and finally managed to do so this week, resulting in what I am hoping will be some rewarding API work.
As i do, I wanted to explore the project, work to understand all the moving parts, as well as what is needed for the future, using my blog. I am not recommending that Open Referral tackle of all this work right now, I am just trying to pull together a framework to think about some of the short, and long terms areas we can invest in together. I intend to continue working with Greg, and the Open Referral team to help spread the awareness of the open API specification, and help build the community.
Open Referral is all about being an open specification, dedicated to helping humans find services, and even more humans to help other humans to find the services they need--I can't think of a more worthy implementation of an API. In my opinion, this is what APIs are all about -- providing open access to information, while also allowing for commercial activity. To help prime the pump, let's take a look at the specification, and think more about where I can help, when it comes to the Open Referral organization and eventually, the Open Referral platform.
Human Services Data Specification (HSDS)
"The Human Services Data Specification (HSDS) defines content that provides the minimum set of data for Information and Referral (I&R) applications as well as specialized service directory applications." Which represents a pretty huge opportunity to help deliver vital information around public services, to those who need them, where they need them, using an open API approach.
Currently there is an existing definition for HSDS available on Github, but I'd like to see the presence of HSDS elevated, showcasing it independently of any single implementation of the API, or the web, and mobile applications that are built on top of it. It is important that new people, who are just learning about HSDS understand that it is a format, and independent of any single instance. Here is a break down of the HSDS presence I'd like to see.
- Website - Establish a simple, dedicated website for just the specification.
- Twitter - Establish a dedicated Twitter account for the specification.
- Github Repo - Can repo be moved under Open Referral Github?
- Partners - Link to the Open Referral partner network.
- Road Map - What is the road map for the specification?
- Change Log - What is the change log for the specification?
- Licensed - CC0 License
I want to help make sure HSDS is highly available as an OpenAPI Specification, as well as the API Blueprint format. Both of these formats will enable anyone looking to put HSDS to work, to use the definition as a central reference for their API implementation, that can drive API documentation, code samples, testing, and much more.
I do not know about you, but having an open standard for finding and managing open data about human services, that can be used across cities, regions, and countries, seems like a pretty vital API design pattern--one that could make a significant impact in people's lives. When you are talking about helping folks find food and health services, making sure the disparate systems all speak the language matters, and could be the difference between life and death, or at least just make your life suck a little less.
While Open Referral, and HSDS was born out of Code For America, there is an organization in place, to use as a base for evolving the format, and building a community of implementations around the important specification. I wanted to take some time and organize some of the existing moving parts of the Open Referral organization, while also exporting what elements that I feel be needed to help evolve it into a platform.
The Open Referral Organization
As I mentioned, there is an organization already setup to guide the effort, "the Open Referral initiative is developing common formats and open platforms for the sharing of community resource directory data — i.e., information about the health, human and social services that are available to people in need." -- You can count me in, helping with that. Right up my alley.
Right now Open Referral is a nice website, with some valuable information about where things are today. The "common formats" portion of that vision is in place, but how do help scale Open Referral toward being an open platform, while also enabling others to also deploy their own open platform, in support of their own human services project(s). Some of these projects will be open civic projects by government and non-governmental agencies, while some will be commercial efforts -- both approaches are acceptable when it comes to Open Referral, and HSDS.
Let's explore what is currently available for the Open Referral organization, and what is needed to help evolve it towards being a platform enabler. Here is what I have outlined so far:
There is already a basic web presence for the organization, it just needs a little help to look as modern as it possibly can, and assume the lead role in getting folks aware, and involved with Open Referral and HSDS as possible.
- Website - Having a simple, modern web presence for the Open Referral organization.
OpenReferral.org is the tip of the platform, but if we want to increase the reach of the organization, and take the conversation to where people already exist, we'll need to think more multi-channel when it comes to the organizational presence.
There is already a great presence in place, an active blog, Twitter, and Google Group. Based upon the approach of other open formats, and software efforts, there are a number of other platforms we should be looking to spread the Open Referral presence to.
- Twitter - Managing an active, human presence on Twitter.
- LinkedIn - Managing an active, human presence on LinkedIn.
- Facebook - Managing an active, human presence on Facebook.
- Blog- Having an active, informative, blog available.
- Blog RSS - Providing a machine readable feed from blog.
- Medium - Publishing regularly to Medium as well as blog.
- Google Group - Maintaining community and discussion on Google Groups.
- Newsletter - Provide a dedicated partner newsletter.
So far we are just talking about marketing, and social media basics for any organization. We will need to make sure the overall organizational presence for Open Referral dovetails seamlessly with the more technical side of things, establishing a very non-developer friendly, yet still a little more technical, developer, and IT focused audience.
Open Referral Developer Portal
I suggest following the lead of other successful open standard, and software efforts, and establish a dedicated portal for the platform at http://developer.openreferral.org. This central portal will not provide access to a working implementation of the API, but focus instead on the community resources it will take to help ensure the widespread adoption of HSDS.
Right now, there is only the Ohana API, and supporting client tools that have been developer by Code for America. This is a great start, but Open Referral needs to evolve, making sure there are a wealth of language and platform formats available for supporting any implementation. I went to town thinking through what is possible with the Open Referral developer portal, based upon other open API, specification, and software platforms I have studied. Not everything here is required to get started, with a minimum viable developer portal, but provides some food for thought around what could be.
- Landing Page - A simple, distilled representation of everything available.
- HSDS Specification - Link to separate site dedicated to the specification.
- Github - The organizational organization as umbrella for presence.
- Server Implementations (PHP, Python, Ruby, Node, C#, Java)
- Server Images (Amazon, Docker, Heroku Deploy)
- Database Implementations (MySQL, PostgreSQL, MongoDB)
- Client Samples (PHP, Python, Ruby, Node, C#, Java)
- Client SDKs (PHP, Python, Ruby, Node, C#, Java)
- White Label Apps
- Platform Development Kits
- WordPress (PHP)
- Spreadsheet Connector(s) (Google, Excel)
- Database Connector(s) (MySQL, SQL Server, PostgreSQL)
- Widgets (ie. Search, Featured)
- Buttons (ie. Bookmarklet, Share)
- Visualizations (ie. Graphs, Charts)
- Email - The email channels in which the organization provides.
- Github Issues - Setup for platform, and aggregate across code projects.
- Google Group - Setup specific threads dedicated to the developers.
- Legal - The legal department for the Open Referral organization and platform.
- Terms of Service - What are the terms of service set by the Open Referral organization.
- Licensing (Data, Code, Content) - What licensing is applied to content, data, and code resources.
- Branding - What are the branding guidelines and assetts available for the Open Referral platform.
The Open Referral developer portal really is just a project website which organizes links, and meta information about any valuable code that is developed, that uses HSDS as its core. The ultimate goal is to provide a rich marketplace of server, client-side, platform, and language resources that can be applied anywhere. Some of it will be officially platform support, while other will be partner and Open Referral community supported. The central portal is purely to help organize all the valuable resources that are generated from the community, and easy to find by the community.
Open Referral Demo Portal
I have assembled this outline, based upon the portal presence of leading API platforms like Twitter, Twilio, and Stripe. As with every other area, not all these elements will be in the first iteration of the Open Referral demo portal, but we should consider what the essentials should be in a minimum viable definition for an Open Referral demo portal.
- Landing Page - A simple, distilled down version of portal into a single page.
- Getting Started
- Overview - What is the least possible information we need to get going.
- Registration / Login - Where do we signup or login for access.
- Signup Email - Providing a simple email when signing up for access.
- FAQ - What are the most commonly asked questions, easily available.
- Overview - Provide an overview of how to authenticate.
- Keys - What is involved in adding an app, and getting keys.
- OAuth Overview - Provide an overview of OAuth implementation.
- OAuth Tools - Tools for testing, and generating OAuth tokens.
- Interactive (Swagger UI) - Providing interactive documentation using Swagger UI.
- Static (Slate) - Providing more static, attractive version of documentation in Slate.
- Schemas (JSON) - Defining all underlying data models, and providing as JSON Schema.
- Pagination - Overview of how pagination is handled across API calls.
- Error Codes - A short, concise list of available error codes for API responses.
- Samples (PHP, Python, Ruby, Node, C#, Java) - Simple code samples in variety of languages.
- SDKs (PHP, Python, Ruby, Node, C#, Java) - More complete SDKs, with authentication in variety of languages.
- Widgets (ie. Search, Featured) - Simple, embeddable widgets that make public or authenticated API calls.
- Buttons (ie. Bookmarklet, Share) - Simple browser, web, or mobile buttons for interacting with APIs.
- Visualizations (ie. Graphs, Charts) - Provide a base set of D3.js or other visualizations for engaging with platform.
- Outbound - Allow for outbound webhook destinations and payload be defined.
- Inbound - Allow for inbound webhook receipt and payload be defined.
- Analytics - Offer analytics for outbound, and inbound webhook activity.
- Alerts - Provide alerts for when webhooks are triggered.
- Logging - Offer access to log files generated as part of webhook activity.
- Limits - What are the limits involved with accessing the APIs.
- Pricing - At what point does API access become commercial.
- Road Map - Providing a simple road map of future changes coming for the platform.
- Issues - A list of current issues that are known, and being addressed as part of operations.
- Change Log - Providing a simple accounting of the changes that have occurred via the platform.
- Status - A real time status dashboard, with RSS feed, as well as historical data when possible.
- Github Issues - Provide platform support using Github issues, allowing for public support.
- Email - Provide an email account dedicated to supporting the platform.
- Phone- Provide an phone number (if available) for support purposes.
- Ticket System - Providing a more formal ticketing system like ZenDesk for handling support.
- Blog w/ RSS - Providing a basic blog for sharing stories around the platform operations.
- Slack - Offering a slack channel dedicated to the platform operations.
- Developer Account
- Dashboard - An overview dashboard providing a snapshot of platform usage for consumers.
- Account Settings - The ability to manage settings and configuration for platform.
- Application / Keys - A system for adding, updating, and remove application and keys for API.
- Usage / Analytics - Simple visualizations that help consumers understand their platform usage.
- Messaging - A basic, private messaging system for use between API provider and consumer(s).
- Forgot Password - Offering the ability to recover and reset account password.
- Delete Account - Allow API consumers to delete their API accounts.
- Terms of Service - A general, open source terms of service that can be applied.
- Licensing (Data, Code, Content) - Licensing for the data, code, and content available via the platform.
- APIs.json - Providing a machine readable API.json index for the API implementation.
- APIs.io - Registering of the API with the APIs.io search engine via their API.
This base portal design will act as a demo implementation, with an actual functional API operating behind it. It could also be potentially forked, and used in other Open Referral API implementations, as a forkable base, that can be customized, and built on for each individual deployment. Github, using Github Pages, along with Jekyll pages allows for the easy design, development, and then forkability of an open portal blueprint. I'd like to see all the project sites that support the Open Referral effort operate in this similar fashion, which isn't unique to Github, and can run on Amazon S3, Dropbox, and almost any other hosting environment.
One of the strengths of the Open Referral organization, and is essential to evolve into a platform is the availability of a formal partner program to help manage a variety of different partners who will be contributing in different ways. I suggest operating a site dedicated to the Open Referral partner program, located at the sub domain http://partner.openreferral.org. This provides a clear location to visit to see who is helping building out the Open Referral platform, and get involved when it makes sense.
- Overview - An overview of the Open Referral partner program.
- Gallery of Partners - Who are the Open Referral Partners.
- Gallery of Applications - What are the Open Referral implementations.
- Partner Stories - What are the stories being the partner implementations.
- Types - The types of partners involved with platform.
- Application - The partners who are just deploying single web, or mobile applications.
- Integration - The partners who are just deploying single API, portals.
- Platform - The partners who are implementing many server, and app integration.
- Investor - Someone who is investing in Open Referral and / or specific implementations.
- Registration / Form - A registration form for partners to submit and join the program.
- Marketing Activities
- Blog Posts - Provide blog posts for partners to take advantage of one time or recurring.
- Press Release - Provide press releases for new partners, and possibly recurring for other milestones.
- Discounts - Provide discounts on direct support for partners.
- Office Hours - Provide virtual open office hours just for partners.
- Training - Offer direct training opportunities that is designed just for partners.
- Advisors - Provide special advisors that are there to support partners.
- Quotes - Allow partners to provide quotes that can be published to relevant properties.
- Testimonials - Have partners provide testimonials that get published to relevant sites.
- Use of Logo - Allow partners to use the platform logo, or special partner platform logo.
- Blog - Have a blog that is dedicated to providing information for the partner program.
- Spotlight - Have a special section to spotlight on partners.
- Newsletter - Provide a dedicated partner newsletter.
Formalizing the partner program for Open Referral will help in organizing for operation, but also provide a public face to the program, lending credibility to the platform, as well as to its trusted partners. Not all partnerships need to be publicized, but it will lend some potential exposure to those that want. Not every detail of Open Referral partnerships needs to be present, but operating in the open, being as transparent as possible will help build trust in a potentinally competitive environment.
There will be some HSDS API implementations, as well as potentially web or mobile applications that are developed by Open Referral, with some developed and operated by partners. Whenever possible, being transparent about this will help build trust, and reduce speculation around the organizational mission. Formalizing the approach to platform partnerships, that help set a positive tone for the community, and go from just site, to community, to a true platform.
I wanted to explore some of the services that will be needed in support of the Open Referral format specification, open source software development, as well as specific implementations. Not all of these services will be executed by Open Referral, with partners being leveraged at every turn, but it will also be important for Open Referral to develop internally capacity to support all areas, and as many types of implementations as possible. This internal capacity will be necessary to help move the specification forward in a meaningful way.
Here are some of the main areas I identified that would be be needed to help support core API implementations, as well as some of the web and mobile applications implementations that will use HSDS.
- Server Side
- Deployment - The deployment of existing or custom server implementations.
- Hosting - Hosting and maintenance of server implementations for customers.
- Operation - The overseeing of day to day operations for any single implementation.
- Data Services
- Acquisition - The coordination, access, and overall acquisition of data from existing systems.
- Normalization - The process of normalization of data as part of other data service.
- Deployment - The deployment of a database in support of implementation.
- Hosting - The hosting of database, APIs, and applications in the support of implementations.
- Backup - The backing up of data, and API, or application as part of operations.
- Migration - The migration of an existing implementation to another location.
- Development - The development of an application that uses an Open Referral API implementation.
- Hosting - The hosting of a web or mobile application that uses an Open Referral API implementation.
- Management - The management of an existing web or mobile application that uses an Open Referral API implementation.
- UI / UX - There will be the need to create graphics, user interface, and drive usability of end-user applications.
- Developer Portal
- Deployment - The demo portal can be used as base, and template for portal deployment services.
- Management - Handling the day to day operations of a developer portal.
- Registration - Registering for the domains used as part of implementations.
- Management - Running the day to day management of DNS for implementations.
- App Monitoring - The monitoring of apps that are deployed.
- API Monitoring - The monitoring of APIs that are deployed.
- API - Initial, and regular evaluation of the security of the API.
- Application - Initial, and regular evaluation of the security of applications.
In some of these areas I want to offer API Evangelist assistance as a partner, while in others I will be looking for partners to step up. I will also be looking at what cloud services, or open source software can assist in augmenting needs in these service areas. These are all areas that Open Referral will not be able to ignore, with many projects needing a variety of assistance in any number of these areas. Ideally Open Referral develops enough internal capacity to play a role in as many implementations as possible, even if it is just part of the platform storytelling, or support process.
What service providers will be used as part of operations? Throughout this project exploration I've mentioned the usage of Github, a potentially free, and paid solution to multiple service areas. I've listed some of the other common service providers I recommend as part of my API research, and would be using to help deliver some of my contributions to the platform, and specific projects.
- Github - Github is used for managing code, content, and project sites.
- Amazon - AWS is used as part of database, hosting, and storage.
- CloudFlare - Used for DNS services, and DNS level security.
- Postman - Applied as part of on boarding, testing, and integrating with APIs.
- 3Scale - A service that can be used as part of the API management.
- API Science - A service that can be used as part of API monitoring.
- APIMATIC - A service that can be used to generate SDKs.
For a well balanced approach I recommend that Open Referral strike a balance in the number of services it uses to operate the platform, and what it suggests for partners, and specific implementations. If possible, it would be nice to have one or more cloud services identified, as well as some potentially open source tooling that might be able to help deliver in the specific area.
Open Source Tooling
What tools will be used as part of operations? Complementing the services showcased above, let's explore some of the open source tooling that will be used as part of Open Referral platform operations. This should be a growing list, hopefully outweighing the number of cloud services listed above, providing low cost options to tackle much of what is needed to stand up, and operate an Open Referral, HSDS driven solution.
- Slate - A static, presentation friendly version of API documentation.
- Jekyll - An open source content management systems used for project sites.
I have only gotten started here. There are no doubt other open tools already in use, as well as some we should be targeting. What are these, what will they be used for, and do their licensing and support reflect the Open Referral mission. Each of these solutions should be forked, and maintained alongside other organizational developed or managed software.
HSDS is an open definition, built on the back of, and supporting other existing open definition formats. Let's showcase this heart of what Open Referral, and HSDS is, by providing an update to date list of all open definition formats, and standards in use.
- OpenAPI Spec - An open source, JSON API definition format for describing web APIs.
- APIBlueprint - An open source, Markdown API definition format for describing web APIs.
- MSON - An open source, markdown data schema format.
- JSON Schema - An open source, JSON data schema format.
- The Alliance of Information and Referral Systems XSD and 211 Taxonomy
- Schema.org - Civic Services Schema (at the W3C)
- The National Information Exchange Model - via the National Human Services Information Architecture - logic model here.
Open source software, and open definitions are the core of Open Referral. The goal is to provide open formats, APIs, data, and tools that can be easily replicated by cash strapped municipalities, government agencies, and other organizations. However software development, and operation takes money, and resources, so there will be a monetization aspect to Open Referral, which will need to be explored, and planned for.
I wanted to take what I've learned in the API sector, and put towards the evolution of a monetization framework that can applied across the Open Referral platform, down to the individual project level. Most monetization planning will be at the project level, with some of these considerations when it comes to thinking of generating revenue.
- Acquisition - What does it cost to get everything together for a project from first email, right before development starts.
- Development - What person hours, and other costs associated with development of a project.
- Operations - What goes into the operation of APIs, portals, and other applications developed as part of integration.
- Direct Value
- Services - What revenue is generated as part of services.
- Grants - What grants have been received, and being applied to projects.
- Investment - What investments have been made for platform projects.
- Indirect Value
- Branding - What branding opportunities are opened up as part of operations.
- Partners - What partnerships have been established as part of operations.
- Traffic - What traffic to the website, project sites, and other properties.
- Internal - What internal reporting is needed as part of platform monetization?
- Public - What reporting is needed to fulfill public needs?
- Partners - What partner reporting is needed as part of the program.
- Investment - What reporting is needed for investors?
- Grants - What grant reporting is required for grants.
Most of these areas will be applied to each project, but no doubt will need to be rolled up and reported, and understand across projects, as well as by other areas listed above. Open Referral will not be a profit driven platform, but will be looking to revenue generation to not just develop the open specification further, but also push for the development of open tooling, and other resources.
Monetization strategies applied to Open Referral will heavily drive the plans for API access that are applied to each individual implementation. While not everything will be standard across HSDS supporting implementations, there should be a base set of plans for how partners can operate, and generate their own revenue to support operations.
Platform API Plans
What are the details of API engagement plans offered as part of operations? I wanted to explore the many ways that leading API platforms open up access to their resources, and hand pick the ones that made sense for a minimum set of plans that could be inherited by default, within each implementation. Of course, each potential implementation might be different, but these are some of the essential platform plan considerations.
- Public - What are the details of public access.
- Commercial- At what point does access become commercial.
- Sponsor - How much access is sponsored by partners?
- Partner - Which plans are only available to partners?
- Education - Are there educational and research access?
- Time Frames
- Seconds - Resources are restricted by the second.
- Daily - Resources are restricted by the 24 hour period.
- Monthly - Resource access is reported on my monthly timeframe.
- Calls - Individual API calls are measured.
- Support - Support time is measured.
- Writes - The ability to write data to platform is measured.
- Country - In country deployment opportunities are available.
- On-Premise - On-premise options are available for deployment.
- Regions - The deployment in predefined regions are available.
- Range - API access limitations are available in multiple ranges.
- Minutes - Support access is limited in hours
- Hours - Support access is limited in hours.
- Endpoints - There are access limitations applied to specific API paths.
- Verbs - There are access limitations applied to the method / verb level.
While it is ideal that HSDS implementations provide public access to the vital resources being made available, it is not a requirement, and some implementations might severely lock down the public access elements of the platform. Regardless, all of the items listed should be considered, when one to five separate API access plans. The plans should cover hard infrastructure costs like compute, storage, and bandwidth, while also providing other commercialization opportunities that support revenue generation as well.
These are mostly the resources that currently exist on the public website, but I wanted to also make sure and provide other details about the organization, the team behind the efforts. These are a few other resources that shouldn't be forgotten.
- FAQ - Providing an organized list of the frequently asked questions for the platform.
- History - Provide the history necessary to understand the background of the project.
- Strategic - What are the strategic objectives of the organization and specification.
- Technical - What are the technical details of the organization and specification.
- Organization - Description of the organization.
- Team - Description of the team involved.
- Specification - Description of the HSDS.
I can keep adding to this list, but I think this represents a pretty significant v2 presence for Open Referral, as well as the Human Services Data Specification (HSDS) format. This isn't just a suggested proposal. I needed to think about what was needed, and what is next to help support projects on table, and proposals that in the works for specific implementations. I couldn't think about any single project without exploring the big picture.
Now I'm going to share this with Greg Bloom, the passionate champion behind Open Referral, and HSDS. I just needed to make sure everything was in my head, in support of our discussion in person tomorrow. We'll be looking to move the needle forward on this vision, in conjunction with the implementations on the table. Exploring the big picture on my blog, is how I put my experience on the table, working through all of its moving parts, and make sure I've covered all the ground I need to discuss.
What Does The Road Map Look Like?
Greg and crew are in charge of the road map. I just need to get more intimate with the specification. I have already created a v1 draft, scraped from the Slate documentation for the existing Ohana API implementation, using OpenAPISpec. I have the PDF documentation for an Open Referral partner to convert to a machine readable OpenAPI Spec as well. The process will help me further build awareness around the specification itself. This post has helped me see the 100K view, crafting the OpenAPI Spec will help me dive deep down into the weeds of how to deliver a human services API using the HSDS standard.
A Model For Human Services API And Hopefully Other Public API Services
I'm pretty stoked with the potential for working on Open Referral, and honored Greg has invited me to participate. This is just a first draft, tailored for what I would like to see considered for Open Referral / HSDS API, and for a couple of immediate implementations. However the model is something I will keep evolving alongside this project, as well as a more generic blueprint for how public service APIs could possibly be implemented.
There are several other API implementations that have come across my desk, which I've felt a model like this should be applied. I was thinking about applying this to the FAFSA API, to help develop a student aid API community. I also thought it could be applied around the deployment of the RIDB API, in support of our national park system. In both of these environments a centralized, common, open API definition, with supporting schema and dictionaries, and a healthy selection of of open source server, and client side web or mobile app implementations, would have gone a long ways.
Anyways, I have what I need in my head so that I can talk with Greg, and coherently discuss what could be next.
I'm neck deep in discussions around API monetization lately, from building a business model in the fast growing podcast space with AudioSear.ch, funding scientific research through API driven revenue, and the latest being a continuing conversation around how to monetize high volume usage around the Recreational Information Database (RIDB).
I have been pulled into the conversation around the API for our National Park system information several times now. In October of 2014 I asked for Help To Make Sure The Dept. of Agriculture Leads With APIs In Their Parks and Recreation RFP, something I saw some Next Steps For The The Recreation Information Database (RIDB) API this January. This time I was pulled in to comment on a change in language, which allows the vendor who is operating the API to charge for some levels of API access.
I received this National Forest Service Briefing, regarding the pricing change last week:
|U.S. Forest Service
National Forest System Briefing Paper
|Date: August 17, 2015|
Topic: Addendum to Recreation One Stop Support Services Contract RFP for a Recreation Information Database API download cost recovery mechanism for high frequency, high-volume requests
Issue: Questions and comments from prospective contractors for the R1S support services contract included significant concern about the costs associated with supporting a completely open API. There is an incremental cost for each instance that a third party ‘calls’ the API. In private industry, the volume of calls is often managed by provisioning access to the API by requiring registration and agreeing to the volume of calls in advance. For third parties wishing to create an interface that will call the API frequently, private industry typically implements a tiered pricing approach where costs rise as volume increases.
In response to these concerns and to provide a mechanism for cost recovery for high frequency, high-volume requests, the R1S Program Management Team offered this solution by posting this statement to questions on FedBizOpps (FBO.gov).
Additionally, automated access to recreation data shall be free of charge for users making nominal data requests. The contractor may propose a fee structure applicable only to high volume data consumers. Such a fee structure shall be enforced through an agreement directly between the Contractor and the data consumer and shall be consistent with industry best practices and established market pricing. Should the contractor opt to propose such a fee structure, their proposal shall clearly state the applicable rates and details of the proposed fee structure.
A member of the open-data community quickly reacted to this provision indicating that it no longer meets the intent of the President’s Open Government executive order. It is possible that media coverage will daylight dissatisfaction over this provision.
It is important to note that it shall be the R1S contractor’s responsibility to manage and control access to the API so that excessive calls from outside entities do not put unreasonable stress on the system that may be cause performance issues or be malicious in nature. To accomplish this, the R1S contractor will need to provide sufficient server capability and staff to manage and support the API and the consumers using it. The costs for the basic service are contained in the fee-per-transaction model, which will support free access to the API for all users, with a cost-recovery mechanism in place for high-use consumers.
To clarify the intent of the government, the RFP will be amended to state:
The Government recognizes that high frequency, high-volume data requests may have a detrimental effect on the performance and security of R1S Reservation Services system and that the management and mitigation of such negative consequences drives costs to the contractor. Accordingly, automated access to recreation data shall be free of charge for users making nominal data requests, however, the contractor may propose a fee structure, or establish access limitations, applicable only to higher volume data consumers. Any proposed fee structure shall comply with OMB Circular A-130; Section 8 – Policy, which states, “Agencies will … Set user charges for information dissemination products at a level sufficient to recover the cost of dissemination but no higher.”
- The RIDB API is now open and available to anyone to download free of charge.
- Federal recreation data is and shall continue to be available in machine-readable formats and shall safeguard privacy, confidentiality, and security in compliance with the Open Data Executive Order.
- The follow-on contract for R1S requires that in addition to more static recreation and inventory data, real-time availability data shall also be made available through an API.
- The audiences we anticipate using the API is widely varied and includes those who may want to incorporate federal recreation data into tourism portals and travel planning applications. Others however include those who wish to produce new interfaces to the real-time availability data that could generate a very high volume of calls to the API.
- We will continue to offer completely free access to the RIDB API for routine and reasonable requests in support of the President’s Open Government Executive Order.
- R1S is allowing offerors for the follow-on contract to propose a cost-recovery fee structure for high-volume data customers that exceed reasonable access in accordance with OMB Circular A-130; Section 8. These proposals will be considered as a provision within the new contract expected to be awarded in 2017.
- The Recreation.gov API(s) will be funded entirely by recreation fee revenue generated through reservation transactions made by the general public. By following private industry standards, R1S will be able to continue to provide free and open access to nominal users of the API without passing on higher costs associated with high volume use to the general reservation making public.
Background: Charging fees for access to government APIs is a relatively new concept, however open-data evangelists and private industry all agree that there is a time and place for creating a reasonable tiered pricing structure which supports free open data and provides a framework for managing increased costs associated with higher end use.
Here are a few articles weighing both sides of this debate:
- TechPresident - Free the Data: The Debate Over APIs and Open Government, 17 Mar 2014
- API Evangelist - A Better Understanding of Government APIs and their Consumers Before Considering Charging For Use, 07 Mar 2014
- Civic Innovations - http://civic.io/2014/03/07/apis-to-charge-or-not-to-charge/
- Mashable - Should Your API be Free or Pay-to-Play? 21 Apr 2011
That concludes the briefing paper, but after I shared my thoughts with them, I received an update of what the language has evolved to, resulting in the following:
The Government seeks to encourage usage of the Recreation.gov API, especially for third parties that could use the API to initiate additional reservations. At the same time, the Government recognizes that it is difficult to predict the likely query volume on Recreation.gov’s APIs, and that very high-frequency API requests from third parties that do not result in reservations on the system could have a detrimental effect on the performance or cost of the system, without providing associated benefits to the contractor or the Government.
Accordingly, the contractor may propose an API management plan that protects against extremely high-frequency usage of the API from third-parties that are not driving reservations to the system, while also encouraging widespread usage from third parties that are making a reasonable number and frequency of requests, and provides a mechanism for supporting and encouraging heavy API usage from third parties who demonstrate value and success in driving reservations on the R1S reservation system. Such plans may include establishing guidelines for third party interaction with the API (i.e., recommended best practices for caching API responses, implementing conditional requests, and defining “abusive” API usage that may be restricted), requiring users to register to receive a token or key to access the API and using techniques such as rate-limiting the number of API requests allowed from a given third party over a given period of time (i.e., XXXX requests per hour), or introducing “tiers” of access that limit high-frequency, high-volume API usage to those third parties who are successfully driving reservations on the system or are willing to pay a nominal fee that covers the incremental costs of serving non-reservation-generating high-frequency requests.
This is the first precedent I have seen, of a modern API driven monetization strategy in the federal government. There are many examples of private companies charging for access to federal government data, but this is the first example of applying modern API business models on top of government APIs and open data.
To me, this conversation also goes well beyond just charging for high volume access to government APIs, to cover the cost of delivering API driven resources reliably. It also introduces the concept of service composition into government APIs. We've had government APIs keyed up with API Umbrella for some time now, an open source approach that is modeled after modern, commercial API management offerings. What the RIDB API approach does, is open up the ability to introduce different levels of access tiers, rate limit, and charge for commercial levels of usage around vital government resources.
When government follows the business model applied across the API sector, it will allow for the free, lower levels of access, while also charging for higher levels of access, that will keep critical APIs operating at scale, in a dependable way. I'm also hoping it opens up other approaches to service composition, like allowing developers to write and contribute to the evolution of government data. I'm just hoping the possibility of covering the cost of API operations, is enough of an incentive for government agencies, and the vendors that serve them, to explore other approaches to API service composition.
The trick in all of this will be teaching the agencies and vendors, about the transparency required to make all of this work. Agencies, and their vendors, will have to make sure and share the algorithm they use to establish service tiers, rate limits, and pricing levels. They will also have to be transparent about which API consumers / partners exist in which tiers, to eliminate speculation around favoritism. This transparency will be critical to all of this working smoothly, otherwise the whole approach will suffer from similar illnesses that existing government procurement practice suffer from. APIs != Good by Default
The RIDB API approach, which allows vendors to add API service levels, rate limits, and a pricing layer, sets a precedent for generating much needed revenue that can cover the costs of API operations. While this may seem like a footnote on a single government RFP, as I mentioned in earlier posts on this subject, it represents how we will manage commercial usage of our virtual government resources in the future, in the same way we've done for our physical government resources for many years now.
The question of whether government should charge for APIs and other digital services came up again this week during a Google Hangout I did with Luke Fretwell(@lukefretwell) of @GovFresh. I began exploring this concpet last year in my post, Should the Government Subsidize and Profit from Data Market, after talking with several city government open data folks.
Luke had pointed me to a page on the Hawaii.gov website that described their subscriber services:
A subscriber account offers the benefit of monthly invoicing and payment and provides convenience to users who conduct large volumes of online transactions through our website. In some cases, a subscriber account is also required to access a specific online service.
Many people I talk to in the open data space have reacted negatively when I propose the idea of government charging for access to services, but as reflected in Hawaiis approach, this is just for higher volume and heavy commercial use cases.
It costs money to gather and organize government data, design, deploy and manage APIs. I feel strong that many of the resources coming out of government should be free and open for access, but in the cases where it is incurring huge costs for government to provide, it might make sense to pass costs off to consumers.
Think of any other physical government assets like city, county, state, or national parks. These are open for anyone to access, and many are even free, but if you want to use them for commercial purposes, you have to pay. Virtual resources should be seen in a similar light.
However, this could easily evolve into a negative analogy, because park access fees sometimes can seem ridiculous, and I want to incentivize consumption of government open data and APIs through as wide of access as possible--not scare people away.
My goal here is to not take a for or against stance, but to stimulate conversation around the topic and see where it makes sense to apply. I don't think government charging for access is applicable in all areas of open data and APIs, but if it will ensure better quality of service and potentially fund new data and APIs, I think it can be a good idea. What do you think?