These are the news items I've curated in my monitoring of the API space that have some relevance to the API definition conversation and I wanted to include in my research. I'm using all of these links to better understand how the space is testing their APIs, going beyond just monitoring and understand the details of each request and response.18 Sep 2018
I am at the Kong Summit in San Francisco all day tomorrow. I’m going to be speaking about research into the event-driven architectural layers I’ve been mapping out across the API space. Looking for the opportunity to augment existing APIs with push technology like webhooks, and streaming technology like SSE, as well as pipe data in an out of Kafka, fill data lakes, and train machine learning models. I’ll be sharing what I’m finding from some of the more mature API providers when it comes to their investment in event-driven infrastructure, focusing in on Twilio, SendGrid, Stripe, Slack, and GitHub.
As I am profiling APIs for inclusion in my API Stack research, and in the API Gallery, I create an APIs.json, OpenAPI, Postman Collection(s), and sometimes an AsyncAPI definition for each API. All of my API catalogs, and API discovery collections use APIs.json + OpenAPI by default. One of the things I profile in each of my APIs.json, is the usage of webhooks as part of API operations. You can see collections of them that I’ve published to the API Gallery, aggregating many different approaches in what I consider to be the 101 of event-driven architecture, built on top of existing request and response HTTP API infrastructure. Allowing me to better understand how people are doing webhooks, and beginning to sketch out plans for a more event-driven approach to delivering resources, and managing activity on any platform that is scaling.
While studying APIs at this level you begin to see patterns across how providers are doing what they are doing, even amidst a lack of standards for things like webhooks. API providers emulate each other, it is how much of the API space has evolved in the last decade. You see patterns like how leading API providers are defining their event types. Naming, describing, and allowing API consumers to subscribe to a variety of events, and receive webhook pings or pushes of data, as well as other types of notifications. Helping establish a vocabulary for defining the most meaningful events that are occurring across an API platform, and then providing an even-driven framework for subscribing to push data out when something occurs, as well as sustained API connections in the form of server-sent event (SSE), HTTP long polling, and other long running HTTP connections.
As I said, webhooks is the 101 of event-driven technology, and once API providers evolve in their journey you begin to see investment in the 201 level solutions like SSE, WebSub, and more formal approaches to delivering resources as real time streams and publish / subscribe solutions. Then you see platforms begin to mature and evolve into other 301 and beyond courses, with AMQP, Kafka, and often times other Apache Projects. Sure, some API providers begin their journey here, but for many API providers, they are having to ease into the world of event-driven architecture, getting their feet wet with managing their request and response API infrastructure, and slowly evolving with webhooks. Then as API operations harden, mature, and become more easily managed, API providers can confidently begin evolving into using more sophisticated approaches to delivering data where it needs to be, when it is needed.
From what I’ve gathered, the more mature API providers, who are further along in their API journey have invested in some key areas, which has allowed them to continue investing in some other key ways:
- Defined Resources - These API providers have their APIs well defined, with master planned designs for their suite of services, possessing machine readable definitions like OpenAPI, Postman Collections, and AsyncAPI.
- Request / Response - Who have fined tuned their approach to delivering their HTTP based request and response structure, along with their infrastructure being so well defined.
- Known Event Types - Which has resulted in having a handle on what is changing, and what the most important events are for API providers, as well as API consumers.
- Push Technology - Having begun investing in webhooks, and other push technology to make sure their API infrastructure is a two-way street, and they can easily push data out based upon any event.
- Query Language - Understanding the value of investment in a coherent querying strategy for their infrastructure that can work seamlessly with the defining, triggering, and overall management of event driven infrastructure.
- Stream Technology - Having a solid understanding of what data changes most frequently, as well as the topics people are most interested, and augmenting push technology with streaming subscriptions that consumers can tap into.
At this point in most API providers journey, they are successfully operating a full suite of event-driven solutions that can be tapped internally, and externally with partners, and other 3rd party developers. They probably are already investing in Kafka, and other Apache projects, an getting more sophisticated with their event-driven API orchestration. Request and response API infrastructure is well documented with OpenAPI, and groups are looking at event-driven specifications like AsyncAPI to continue to ensure all resources, messages, events, topics, and other moving parts are well defined.
I’ll be showcasing the event-driven approaches of Twilio, SendGrid, Stripe, Slack, and GitHub at the Kong Summit tomorrow. I’ll also be looking at streaming approaches by Twitter, Slack, SalesForce, and Xignite. Which is just the tip of the event-driven API architecture opportunity I’m seeing across the existing API landscape. After mapping out several hundred API providers, and over 30K API paths using OpenAPI, and then augmenting and extending what is possible using AsyncAPI, you begin to see the event-driven opportunity that already exists out there. When you look at how API pioneers are investing in their event-driven approaches, it is easy to get a glimpse at what all API providers will be doing in 3-5 years, once they are further along in their API journey, and have continued to mature their approach to moving their valuable bits an bytes around using the web.
We need your help moving the AsyncAPI specification forward. Ok, first, what is the AsyncAPI specification? “The AsyncAPI Specification is a project used to describe and document Asynchronous APIs. The AsyncAPI Specification defines a set of files required to describe such an API. These files can then be used to create utilities, such as documentation, integration and/or testing tools.” AsyncAPI is a sister specification to OpenAPI, but instead of describing the request and response HTTP API landscape, AsyncAPI is describing the message, topic, event, and streaming API landscape across the HTTP and TCP landscape. It is how we are going to continue to ensure there is machine readable descriptions of this portion of the API landscape, for use in tooling and services.
My friend Fran Mendez (@fmvilas) is the creator and maintainer of the specification, and he is doing way too much of the work on this important specification and he needs our help. Here is Fran’s request for our help to contribute:
AsyncAPI is an open source project that’s currently maintained by me, with no company or funds behind. More and more companies are using AsyncAPI and the work needed is becoming too much work for a single person working in his spare time. E.g., for each release of the specification, tooling and documentation should be updated. One could argue that I should be dedicating full time to the project, but it’s in this point where it’s too much for spare time and very little to get enough money to live. I want to keep everything for free, because I firmly believe that engineering must be democratized. Also, don’t get me wrong, this is not a complaint. I’m going to continue running the project either with or without contributors, because I love it. This is just a call-out to you, the AsyncAPI lover. I’d be very grateful if you could lend a hand, or even raise your hand and become a co-maintainer. Up to you 😊
On the other hand, I only have good words for all of you who use and/or contribute to the project. Without you, it would be just another crazy idea from another crazy developer 😄
Thank you very much! 🙌
– Fran Mendez
When it comes to contributing to the AsyncAPI, Fran has laid out some pretty clear ways in which he needs our help, providing a range of options for you to pitch in and help, depending on what your skills are, and the bandwidth you have in your day.
1. The specification There is always work to do in the spec. It goes from fixing typos to writing and reviewing new proposals. I try to keep releases small, to give time to tooling authors to update their software. If you want to start contributing, take a look at https://github.com/asyncapi/asyncapi/issues, pick one, and start working on it. It’s always a good idea to leave a comment in the issue saying that you’re going to work on it, just so other people know about it.
2. Tooling As developers, this is sometimes the most straightforward way to contribute. Adding features to the existing tools or creating new ones if needed. Examples of tools are:
- Code generators (multiple languages):
- https://github.com/asyncapi/node-codegen (going to be deprecated soon in favor of https://github.com/asyncapi/generator)
- Documentation generators (multiple formats):
- https://github.com/asyncapi/docgen (going to be deprecated soon in favor of https://github.com/asyncapi/generator)
- Validation CLI tool (nobody implemented it yet)
- API mocking (nobody implemented it yet)
- API gateways (nobody implemented it yet)
As always, usually the best way to contribute is to pick an issue and chat about it before you create a pull request.
3. Evangelizing Sometimes the best way to help a project like AsyncAPI is to simply talk about it. It can be inside your company, in a technology meetup or speaking at a conference. I’ll be happy to help with whatever material you need to create or with arguments to convince your colleagues that using AsyncAPI is a good idea 😊
4. Documentation Oh! documentation! We’re trying to convince people that documenting your message-driven APIs is a good idea, but we lack documentation, especially in tooling. This is often a task nobody wants to do, but the best way to get great knowledge about a technology is to write documentation about it. It doesn’t need to be rewriting the whole documentation from scratch, but just identifying the questions you had when started using it and document them.
5. Tutorials We learn by examples. It’s a fact. Write tutorials on how to use AsyncAPI in your blog, Medium, etc. As always, count on me if you need ideas or help while writing or reviewing.
6. Stories You have a blog and write about the technology you use? Writing about success stories, how-to’s, etc., really helps people to find the project and decide whether they should bet on AsyncAPI or not.
7. Podcasts/Videos You have a Youtube channel or your own podcast? Talk about AsyncAPI. Tutorials, interviews, informal chats, discussions, panels, etc. I’ll be happy to help with any material you need or finding the right person for your interview.
I’m going to take the liberty and add an 8th option, because I’m so straightforward when it comes to this game, and I know where Fran needs help.
8. Money AsyncAPI needs investment to help push forward, allowing Fran to carve out time, work on tooling, and pay for travel expenses when it comes to attending events and getting the word out about what it does. There is no legal entity setup for AsyncAPI, but I’m sure with the right partner(s) behind it, we can make something happen. Step up.
AsyncAPI is important. We all need to jump in and help. I’ve been investing as many cycles as I can in helping learn about the specification, and tell stories about why it is important. I’ve been working hard to learn more about it so I can contribute to the roadmap. I’m using it as one of the key definition formats driving my Streamdata.io API Gallery work, which is all driven using APIs.json, OpenAPI, and provides Postman Collections as well as AsyncAPI definitions when a message, topic, event, or streaming API is present. AsyncAPI is where OpenAPI (Swagger) was in 2011/2012, and with more investment, and a couple more years of adoption and maturing, it will be just as important for working with the evolving API landscape as OpenAPI and Postman Collections are.
If you want to get involved with AsyncAPI, feel free to reach out to me. I’m happy to help you get up to speed on why it is so important. I’m happy to help you understand how it can be applied, and where it fits in with your API infrastructure. You are also welcome to just dive in, as Fran has done an amazing job of making sure everything is available in the Github organization for the project, where you can submit pull requests, and issues regarding whatever you are working on and contributing. Thanks for your help in making AsyncAPI evolve, and something that will continue to help us understand, quantify, and communicate about the diverse API landscape.
I am shifting my long running API operations from a PHP / EC2 based implementation to a more efficient Node.js / Lambda based solution, and I promised James Higginbotham (@launchany) that I’d share a breakdown of my process with him a month or so back. I’m running 100+, to bursts of 1000+ long running API requests for a variety of purposes, and it helps me to tell the narrative behind my code, introducing some coherence into the why and how of what I’m doing, while also sharing with others along the way. I had covered my earlier process a little bit in a story a few months ago, but as I was migrating the process, I wanted to further flesh out, and make sure I wasn’t mad.
The base building block of each long running API request I am making is HTTP. The only difference between these API requests, and any others I am making on a daily basis, is that they are long running–I am keeping them alive for seconds, minutes, and historically hours. My previous version of this work ran as long running server side jobs using PHP, which I monitored and kept alive as long as I possibly could. My next generation scripts will have a limit of 5 minutes per API request, because of constraints imposed by Lambda, but I am actually find this to be a positive constraint, and something that will be helping me orchestrate my long running API requests more efficiently–making them work on a schedule, and respond to events.
Ok, so why am I running these API calls? A variety of reasons. I’m monitoring a Github repository, waiting for changes. I’m monitoring someone’s Twitter account, or a specific Tweet, looking for a change, like a follow, favorite, or retweet. Maybe I’m wanting to know when someone asks a new question about Kafka on Stack Overflow, or Reddit. Maybe I’m wanting to understand the change schedule for a financial markets API over the course of a week. No matter the reason, they are all granular level events that are occurring across publicly available APIs that I am using to keep an eye on what is happening across the API sector. Ideally all of these API platforms would have webhook solutions that would allow for me to define and subscribe to specific events that occur via their platform, but they don’t–so I am doing it from the outside-in, augmenting their platform with some externally event-driven architecture.
An essential ingredient in what I am doing is Streamdata.io. Which provides me a way to proxy any existing JSON API, and turn into a long running / streaming API connection using Server-Sent Events (SSE). Another essential ingredient of this is that I can choose to get my responses as JSON PATCH, which only sends me what has changed after the initial API response comes over the pipes. I don’t receive any data, unless something has changed, so I can proxy Github, Twitter, Stack Overflow, Reddit, and other APIs, and tailor my code to just respond to the differential updates I receive with each incremental update. I can PATCH the update to my initial response, but more importantly I can take some action based upon the incremental change, triggering an event, sending a webhook, or any other action I need based upon the change in the API space time continuum I am looking for.
My previous scripts would get deployed individually, and kept alive for as long as I directed the jobs manager. It was kind of a one size fits all approach, however now that I’m using Lambda, each script will run for 5 minutes when triggered, and then I can schedule to run again every 5 minutes–repeating the cycle for as long as I need, based upon what I’m trying to accomplish. However, now I can trigger each long running API request based upon a schedule, or based upon other events I’m defining, leveraging AWS Cloudwatch as the logging mechanism, and AWS Cloudwatch Events as the event-driven layer. I am auto-generating each Node.js Lambda script using OpenAPI definitions for each API, with a separate environment layer driving authentication, and then triggering, running, and scaling the API streams as I need, updating my AWS S3 Lake(s) and AWS RDS databases, and pushing other webhook or notifications as I need.
I am relying heavily on Streamdata.io for the long running / streaming layer on top of any existing JSON API, as well as doing the differential heavy lifting. Every time I trigger a long running API request, I’ll have to do a diff between it’s initial response, and the previous one, but every incremental update for the next 4:59 is handled by Streamdata.io. Then AWS Lambda is doing the rest of the triggering, scaling, logging, scheduling, and event management in a way more efficient way than I was previously with my long running PHP scripts running as background jobs on a Linux EC2 server. It is a significant step up in efficiency and scalability for me, allowing me to layer on an event-driven layer on top of existing 3rd party API infrastructure I am depending on to keep me informed of what is going on, and keep my network of API Evangelist research moving forward.
My friend James Higginbotham (@launchany) was sharing his frustration with being able to stay in tune with changes to a variety of APIs. Like me, James works to stay in tune with a variety of signals available via platforms like Twitter, Github, and other commonly used services. These platforms don’t always properly signal when things are updated, changed, or advanced, making it difficult to understand the granular changes that occur like likes, votes, edits, and other common events that occur via highly active platforms.
This challenge is why the evolution towards a more event-driven approach to operating an API platform is not just more efficient, it gives users what they need. Using event-driven architectural approaches like Webhooks, and real times streams. This is one of the reasons I’m interested in what Streamdata.io does, beyond them helping support me financially, is that they allow me to focus on the event-driven shift that is occurring with many leading API providers, and needs to be brought to the attention of other platforms. Helping API providers be more efficient in what they are doing, while also meeting the needs of the most demanding customers like James and myself.
It is easy to think Streamdata.io is just about streaming real time data. This is definitely a large aspect of what the SaaS solution does, but the approach to using Server-Sent Events (SSE), with incremental updates using JSON Patch adds another useful dimension when it comes to understanding what has changed. You can proxy an existing HTTP API that returns a JSON response using Streamdata.io, and the first response will look just like any other, but every pushed response after that will be a JSON Patch of just what has changed. Doing the heavy lifting of figuring out what has changed in each API response and only sending you the difference, and allowing you to focus only on what has changed, and not having to rely on timestamps, and other signals within the JSON response to understand what the difference is from the previous API response.
Using Streamdata.io you don’t have to keep polling an API asking if things have changed, you just proxy the API and you get pushed changes via an HTTP stream. You also don’t have to sort through each response and try to understand what changed, you just take the JSON Patch response, and it tells you what has changed. I’m going to create a draft blueprint for James of how to do this, that he can use across a variety of APIs to establish multiple API connections using long running, server-side API streams for a variety of topics. Allowing him to monitor many different APIs, and stay in tune with what changes as efficiently as possible. Once I craft a generic blueprint, I’m going to apply to Twitter and see if I can increase the efficiency of my Twitter monitoring, by turning their REST APIs into real time feeds using Streamdata.io.
I’m turning different APIs into topical streams using Streamdata.io. I have been profiling hundreds of different APIs as part of my work to build out the Streamdata.io API Gallery, and as I’m creating OpenAPI definitions for each API, I’m realizing the potential for event and topical driven streams across many existing web APIs. One thing I am doing after profiling each API is that I benchmark them to see how often the data changes, applying what we are calling StreamRank to each API path. Then I try to make sure all the parameters, and even enum values for each parameter are represented for each API definition, helping me see the ENTIRE surface area of an API. Which is something that really illuminates the possibilities surrounding each API.
After profiling the Stack Exchange Questions API, I began to see how much functionality and data is buried within a single API endpoint, and was something I wanted to expose and make much easier to access. Taking a single OpenAPI definition for the Stack Exchange Questions API:
Then exploding it into 25 separate tech topic streaming APIs. Taking the top 25 enum value for the tags parameter for the Stack Overflow site, and exploding into 25 separate streaming API resources. To do this, I’m taking each OpenAPI definition, and generating an AsyncAPI definition to represent each possible stream:
I’m not 100% sure I’m properly generating the AsyncAPI currently, as I’m still learning about how to use the topics and streams collections properly. However, the OpenAPI definition above is meant to represent the web API resource, and the AsyncAPI definition is meant to represent the streaming edition of the same resource. Something that can be replicated for any tag, or any site that is available via the Stack Exchange API. Turning the existing Stack Exchange API into streaming topic APIs, that people can subscribe to only the topics they are interested in receiving updates.
At this point I’m just experimenting with what is possible with OpenAPI and AsyncAPI specifications, and understanding what I can do with some of the existing APIs I am already using each day. I’m going to try and turn this into a prototype, delivering streaming APIs for all 25 of the top Stack Overflow tags. To demonstrate what is possible on Stack Exchange, but also to establish a proof of concept that I can apply to other APIs like Reddit, Github, and others. Then eventually automating the creation of streaming topic APIs using the OpenAPI definitions for common APIs, and the Streamdata.io service.
404: Not Found
I had breakfast with Mike Amundsen (@mamund) and Matt McLarty (@MattMcLartyBC) of the CA API Academy team this morning in midtown this morning. As we were sharing stories of what each other was working on, the topic of what is needed to execute an API call came up. Not the time consuming find an API, sign up for an account, figure out the terms of service and pricing version, but all of this condensed into something that can happen in a split second within applications and systems.
How do we distill down the essential ingredients of API consumption into a single, machine readable unit that can be automated into what Mike Amundsen calls, “find and bind”. This is something I’ve been thinking a lot about lately as I work on my API discovery research, and there are a handful of elements that need to be present:
- Authentication - Having keys to be able to authentication.
- Surface Area - What is the host, base url, path, headers, and parameters for a request.
- Terms of Service - What are the legal terms of service for consumption.
- Pricing - How much does each API request cost me?
We need these elements to be machine readable and easily accessible at discover and runtime. Currently the surface area of the API can be described using OpenAPI, that isn’t a problem. The authentication details can be included in this, but it means you already have to have an application setup, with keys. It doesn’t include new users into the equation, meaning, discovering, registering, and obtaining keys. I have a draft specification I call “API plans” for the pricing portion of it, but it is something that still needs a lot of work. So, in short, we are nowhere near having this layer ready for automation–which we will need to scale all of this API stuff.
This is all stuff I’ve been beating a drum about for years, and I anticipate it is a drum I’ll be beating for a number of more years before we see come into focus. I’m eager to see Mike’s prototype on “find and bind”, because it is the only automated, runtime, discovery, registration, and execute research I’ve come across that isn’t some proprietary magic. I’m going to be investing more cycles into my API plans research, as well as the terms of service stuff I started way back when alongside my API Commons project. Hopefully, moving all this forward another inch or two.
As I prepare to launch the Streamdata.io API Gallery, I am doing a handful of presentations to partners. As part of this process I am looking to distill down the objectives behind the gallery, and the opportunity it delivers to just a handful of talking points I can include in a single slide deck. Of course, as the API Evangelist, the way I do this is by crafting a story here on the blog. To help me frame the conversation, and get to the core of what I needed to present, I wanted to just ask a couple questions, so that I can answer them in my presentation.
What is the Streamdata.io API Gallery? It is a machine readable, continuously deployed collection of OpenAPI definitions, indexed used APIs.json, with a user friendly user interface which allows for the browsing, searching, and filtering of individual APIs that deliver value within specific industries and topical areas.
What are we looking to accomplish with the Streamdata.io API Gallery? Discover and map out interesting and valuable API resources, then quantify what value they bring to the table while also ranking, categorizing, and making them available in a search engine friendly way that allows potential Streamdata.io customers to discover and understand what is possible.
What is the opportunity around the Streamdata.io API Gallery? Identify the best of breed APIs out there, and break down the topics that they deliver within, while also quantifying the publish and subscribe opportunities available–mapping out the event-driven opportunity that has already begun to emerge, while demonstrating Streamdata.io’s role in helping get existing API providers from where they are today, to where they need to be tomorrow.
Why is this relevant to Streamdata.io, and their road map? It provides a wealth of research that Streamdata.io can use to understand the API landscape, and feed it’s own sales and marketing strategy, but doing it in a way that generates valuable search engine and social media exhaust which potential customers might possibly find interesting, bringing them new API consumers, while also opening their eyes up to the event-driven opportunity that exists out there.
Distilling Things Down A Bit More Ok, that answers the general questions about what the Streamdata.io API Gallery is, and why we are building it. Now I want to distill down a little bit more to help me articulate the gallery as part of a series of presentations, existing as just a handful of bullet points. Helping get the point across in hopefully 60 seconds or less.
- What is the Streamdata.io API Gallery?
- API directory, for finding individual units of compute within specific topics.
- OpenAPI (fka Swagger) driven, making each unit of value usable at run-time.
- APIs.json indexed, making the collections of resources easy to search and use.
- Github hosted, making it forkable and continuously deployable and integrate(able).
- Why is the Streamdata.io Gallery relevant?
- It maps out the API universe with an emphasis on the value each individual API path possesses.
- Categories, tags, and indexes APIs into collections which are published to Github.
- Provides a human and machine friendly view of the existing publish and subscribe landscape.
- Begins to organize the API universe in context of a real time event-driven messaging world.
- What is the opportunity around the Streamdata.io API Gallery?
- Redefining the API landscape from an event-driven perspective.
- Quantify, qualify, and rank APIs to understand what is the most interesting and highest quality.
- Help API providers realize events occurring via their existing platforms.
- Begin moving beyond a request and response model to an event-driven reality.
There is definitely a lot more going on within the Streamdata.io API Gallery, but I think this captures the essence of what we are trying to achieve. A lot of what we’ve done is building upon my existing API Stack work, where I have worked to profile and index public APIs using OpenAPI and APIs.json, but this round of work is taking things to a new level. With API Stack I ended up with lists of companies and organizations, each possessing a list of APIs. The Streamdata.io API Gallery is a list of API resources, broken down by the unit of value they bring to the table, which is further defined by whether it is a GET, POST, or PUT–essentially a publish or subscribe opportunity.
Additionally, I am finally finding traction with the API rating system(s) I have been developing for the last five years. Profiling and measuring the companies behind the APIs I’m profiling, and making this knowledge available not just at discover time, but potentially at event and run time. Basically being able to understand the value of an event when it happens in real time, and be able to make programmatic decisions regarding whether we care about the particular event or not. Eventually, allowing us to subscribe only to the events that truly matter to us, and are of the highest value–then tuning out the rest. Delivering API ratings in an increasingly crowded and noisy event-driven API landscape.
We have the prototype for the Streamdata.io API Gallery ready to go. We are still adding APIs, and refining how they are tagged and organized. The rating system is very basic right now, but we will be lighting up different dimensions of the rating(s) algorithm, and hopefully delivering on different angles of how we quantify the value of events that occurring. I’m guessing we will be doing a soft launch in the next couple of weeks to little fanfare, and it will be something that builds, and evolves over time as the API index gets refined and used more heavily.
I am creating a lot of OpenAPI definitions right now. Streamdata.io is investing in me pushing forward my API Stack work, where I profile API using OpenAPI, and index their operations using APIs.json. From the resulting indexes, we are building out the Streamdata.io API Gallery, which shows the possibilities of providing streaming APIs on top of existing web APIs available across the landscape. The OpenAPI definitions I’m creating aren’t 100% complete, but they are “good enough” for what we are needing to do with them, and are allowing me to catalog a variety of interesting APIs, and automate the proxying of them using Streamdata.io.
I’m finding the most important part of doing this work is making sure there is a rich summary, description, and set of tags for each API. While the actual path, parameters, and security definitions are crucial to programmatically executing the API, the summary, description, and tags are essential so that I can understand what the API does, and make it discoverable. As I list out different areas of my API Stack research, like the financial market data APIs, it is critical that I have a title, and description for each provider, but the summary, description, and tags are what provides the heart of the index for what is possible with each API.
When designing an API, as a developer, I tend to just fly through writing summary, descriptions, and tags for my APIs. I’m focused on the technical details, not this “fluff”. However, this represents one of the biggest disconnects in the API lifecycle, where the developer is so absorbed with the technical details, we forget, neglect, or just don’t are to articulate what we are doing to other humans. The summary, description, and tags are the outlines in the API contract we are providing. These details are much more than just the fluff for the API documentation. They actually describe the value being delivered, and allow this value to be communicated, and discovered throughout the life of an API–they are extremely important.
As I’m doing this work, I realize just how important these descriptions and tags are to the future of these APIs. Whenever it makes sense I’m translating these APIs into streaming APIs, and I’m taking the tags I’ve created and using them to define the events, topics, and messages that are being transacted via the API I’m profiling. I’m quantifying how real time these APIs are, and mapping out the meaningful events that are occurring. This represents the event-driven shift we are seeing emerge across the API landscape in 2018. However, I’m doing this on top of API providers who may not be aware of this shift in how the business of APIs is getting done, and are just working hard on their current request / response API strategy. These summaries, descriptions, and tags, represent how we are going to begin mapping out the future that is happening around them, and begin to craft a road map that they can use to understand how they can keep evolving, and remain competitive.
Github is the number one signal in my API world. The activity that occurs via Github is more important than anything I find across Twitter, Facebook, LinkedIn, and other social channels. Commits to repositories and the other social activity that occurs around coding projects is infinitely more valuable, and telling regarding what a company is up to, than the deliberate social media signals blasted out via other channels is. I’m always working to dial in my monitoring of Github using the Github API, but also via the RSS feeds that are present on the public side of the platform.
The simple PHP script just takes an array of Github users, loops through them, pulls their RSS feeds, and then aggregates them into a single array, sorts by date, and then outputs as JSON. It is a pretty crude JSON API, but it provides me with what I need to be able to use these RSS feeds in a variety of other applications. I’m going to be mining the feeds for a variety of signals, including repo and user information, which I can then use within other applications. The best part is this type of data mining doesn’t require a Github API key, and is publicly available, allowing me to scale up much further than I could with the Github API alone.
Next, I have a couple of implementations in mind. I’m going to be creating a Github user leaderboard, where I stream the updates using Streamdata.io to a dashboard. Before I do that, I will have to aggregate users and repos, incrementing each commit made, and publishing as a separate JSON feed. I want to be able to see the raw updates, but also just the most changed repositories, and most active users across different segments of the API space. Streamdata.io allows me to take these JSON feeds and stream them to the dashboard using Server-Sent Events(SSE), and then applying each update using JSON Patch. Making for a pretty efficient way to put Github to work as part of my monitoring of activity across the API space.
Working with Streamdata.io has forced a shift in how I see the API landscape. When I started working with their proxy I simply saw it about doing API in real time. I was hesitant because not every API had real time needs, so I viewed what they do as just a single tool in my API toolbox. While Server-Sent Events, and proxying JSON APIs is just one tool in my toolbox, like the rest of the tools in my toolbox it forces me to think through what an API does, and understand where it exists in the landscape, and where the API provider exists in their API journey. Something I’m hoping the API providers are also doing, but I enjoy doing from the outside-in as well.
Taking any data, content, media, or algorithm and exposing as an API, is a journey. It is about understanding what that resource is, what it does, and what it means to the provider and the consumer. What this looks like day one, will be different from what it looks like day 365 (hopefully). If done right, you are engaging with consumers, and evolving your definition of the resource, and what is possible when you apply it programmatically through the interfaces you provide. API providers who do this right, are leveraging feedback loops in place with consumers, iterating on their APIs, as well as the resources they provide access to, and improving upon them.
Just doing simple web APIs puts you on this journey. As you evolve along this road you will begin to also apply other tools. You might have the need for webhooks to start responding to meaningful events that are beginning to emerge across the API landscape, and start doing the work of defining your event-driven architecture, developing lists of most meaningful topics, and events that are occurring across your evolving API platform. Webhooks provide direct value by pushing data and content to your API consumers, but they have indirect value in helping you define the event structure across your very request and response driven resource landscape. Look at Github webhook events, or Slack webhook events to understand what I mean.
API platforms that have had webhooks in operation for some time have matured significantly towards and event-driven architecture. Streaming APIs isn’t simply a boolean thing. That you have data that needs to be streamed, or you don’t. That is the easy, lazy way of thinking about things. Server-Sent Events (SSE) isn’t just something you need, or you don’t. It is something that you are ready for, or you aren’t. Like webhooks, I’m seeing Server-Sent Events (SSE) as having the direct benefits of delivering data and content as it is updated, to the browser or for other server uses. However, I’m beginning to see the other indirect benefits of SSE, and how it helps define the real time nature of a platform–what is real time? It also helps you think through the size, scope, and efficiency surrounding the use of APIs for making data, content, and algorithms available via the web. Helping us think through how and when we are delivering the bits and bytes we need to get business done.
I’m learning a lot by applying Streamdata.io to simple JSON APIs. It is adding another dimension to the API design, deployment, and management process for me. There has always been an evolutionary aspect of doing APIs for me. This is why you hear me call it the API journey on a regular basis. However, now that I’m studying event-driven architecture, and thinking about how tools like webhooks and SSE assist us in this journey, I’m seeing an entirely new maturity layer for this API journey emerge. It goes beyond just getting to know our resources as part of the API design, and deployment process. It builds upon API management and monitoring and helps us think through how our APIs are being consumed, and what the most meaningful and valuable events are. Helping us think through how we deliver data and content over the web in a more precise manner. It is something that not every API provider will understand right away, and only those a little further along in their journey will be able to take advantage of. The question is, how do we help others see the benefits, and want to do the hard work to get further along in their own API journey.
One aspect of my partnership with Streamdata.io is about helping define what it is that Streamdata.io does–internally, and externally. When I use any API technology I always immerse myself in what it does, and understand every detail regarding the value it delivers, and I work to tell stories about this. This process helps me refine not just how I talk about the products and services, but also helps influence the road map for what the products and services deliver. As I get intimate with what Streamdata.io delivers, I’m beginning to push forward how I talk about the company.
The first thoughts you have when you hear the name Streamdata.io, and learn about how you can proxy any existing JSON API, and begin delivering responses via Server-Sent Events (SSE) and JSON Patch, are all about streaming and real time. While streaming of data from existing APIs is the dominant feature of the service, I’m increasingly finding that the conversations I’m having with clients, and would be clients are more about efficiencies, caching, and streamlining how companies are delivering data. Many API providers I talk to tell me they don’t need real time streaming, but at the same time they have rate limits in place to keep their consumers from polling their APIs too much, increasing friction in API consumption, and not at all about streamlining it.
These experiences are forcing me to shift how I open up conversations with API providers. Making real time and streaming secondary to streamlining how API providers are delivering data to their consumers. Real time streaming using Server-Sent Events (SSE) isn’t always about delivering financial and other data in real time. It is about delivering data using APIs efficiently, making sure only what what has been updated and needed is delivered when it is needed. The right time. This is why you’ll see me increasingly adding (line) to the Stream(line)data.io name, helping focus on the fact that we are helping streamline how companies, organizations, institutions, and government agencies are putting data to work–not just streaming data in real time.
I really enjoy this aspect of getting to know what a specific type of API technology delivers, combined with the storytelling that I engage in. I was feeling intimidated about talking about streaming APIs with providers who clearly didn’t need it. I’m not that kind of technologist. I just can’t do that. I have to be genuine in what I do, or I just can’t do it. So I was pleasantly surprised to find that conversations were quickly becoming about making things more efficient, over actually ever getting to the streaming real time portion of things. It makes what I do much easier, and something I can continue on a day to day basis, across many different industries.
I am working to understand the value that Streamdata.io brings to the table, and one of the tools I am developing is a set of APIs to help me measure the difference in data received for normal API calls versus when they are proxied with Streamdata.io using Server-Sent Events (SSE) and JSON Patch. Creating an API to poll any 3rd party API I plug in is pretty easy and straightforward, but setting up a server setup to operate long running Server-Sent Events (SSE), managing for failure and keeping an eye on the results takes a little more consideration. Doing it browser side is easy, but server side removes the human aspect of the equation, which starts and stops the process.
This post is just meant to just outline what I’m looking to build, and act as a set of project requirements for what I’m going to develop–it isn’t a guide to building it. This is just my way of working through my projects, while also getting content published on the blog ;-). I just need to work out the details of what I will need to run many different Server-Sent Events (SSE) jobs for long periods of time, or even continuously, and make sure nothing breaks, or at least minimize the breakages. Half of my project will be polling hundreds of APIs, while the other half of it will be proxy those same APIs, and making sure I’m receiving those updates continuously.
I will need some basic APIs to operate each event stream I want to operate:
- Register - Register a new API URL I wish to run ongoing stream on.
- Start - Kick off a new stream for any single API I’m tracking on.
- Stop - Stop a stream from running for any single API I have streaming.
Any API I deem worthy, and have successfully proxied with Streamdata.io will be registered, and operating as a long running background scripts via AWS EC2 instances I have deployed. This is the straightforward part of things. Next, I will need some APIs to monitor these long running scripts, to make sure they are doing what they should be doing.
- Status - Check the status of a long running script to make sure it is still running and doing what it is supposed to do.
- Logs - View the logs of an event that has been running to see each time it has executed, and what the request and response were.
- Notify - Adding a notification API to send a ping to either myself, or someone else response for a long running script to investigate further.
I’m think that set of APIs should give me what I need to run these long running jobs. Each API will be executing command scripts that run in the background on Linux instances. Then I’m going to need a similar set of services to asses the payload, cache, and real time status of each API, keeping in line with my efforts to break down the value of real time APIs.
- Size - A service that processes each partial API response in the bucket and calculates the size of the response. If nothing changed, there was no JSON Patch response.
- Change - A service that determines if a partial API response has changed from the previous response from 60 seconds before, identifying the frequency of change. If nothing changed, there was no JSON Patch response.
I have three goals with long running script microservice. 1) Monitor the real time dimensions of a variety of APIs over time. 2) Understand the efficiencies gained with caching and streaming over polling APIs, and 3) Potentially store the results on Amazon S3, which I will write about in a separate post. I will build an application for each of these purposes on top of these APIs, keeping the microservice doing one thing–processing long run scripts that receive Server-Sent Events (SSE) deliver via Streamdata.io proxies I’ve sent for APIs I’ve targeted.
Next, I am going to get to work programming this service. I have a proof of concept in place that will run the long running scripts. I just need to shape it into a set of APIs that allow me to program against the scripts, and deliver these different use case applications I’m envisioning. Once I have done, I will run for a few months in beta, but then probably open it up as a Server-Sent (SSE) events as a service, that allows anyone to execute long running scripts on the server side. Others may not be interested in measuring the performance gains, but I am guessing they will be interested in storing the streams of response.
Everyone wants their API to be used. We all suffer from “if we build it, they will come” syndrome in the world of APIs. If we craft a simple, useful API, developers will flock to it and integrate it into their applications. However, if you operate an API, you know that getting the attention of developers, and standing out amongst the growing number of APIs is easier said than done. Even if your API truly does bring the value you envision to the table, getting people to discover this value, and invest the time into integrating it into the platforms, products, and services takes a significant amount of work–requiring that you remove all possible obstacles and friction from any possible integration opportunity.
One way we can remove obstacles for possible integrations is by allowing for ALL types of applications–even other APIs. If you think about it, APIs are just another type of application, and one that many API providers I’ve talked with either haven’t thought about at all, or haven’t thought about very deeply and restrict this use case, as they see it as directly competing with their interests. Why would you want to prevent someone from reselling your API, if it brings you traffic, sales, and the other value your API brings to your company, organization, institution, or government agency? If a potential API consumer has an audience, and wants to private label your API, how does that hurt your business? If you have proper API management in place, and have a partner agreement in place with them, how is it different than any other application?
I’ve been profiling companies as part of my partnership with Streamdata.io, looking for opportunities to deliver real time streaming APIs on top of existing web APIs. Ideally, API providers become a Streamdata.io customer, but we are also looking to enable other businesses to step up and resell existing APIs as a streaming version. However, in some of the conversations I’m having, people are concerned about whether or not API provider’s terms of service will allow this. These developers are worried that revenue generation through the reselling of an existing API as something that would ruffle the feathers of their API provide, and result in getting their API keys turned off. Which is a completely valid concern, and something that is spelled out in some terms of service, but I’d say is often left more as an unknown, resulting in this type of apprehension from developers.
Reselling APIs is something I’m exploring more as part of my API partner research. Which APIs encourage reselling, white and private labeling, and OEM partnerships? Which APIs forbid the reselling of their API? As well as which APIs have not discussed it all. I’d love to hear your thoughts as an API provider, or someone who is selling their services to the API space. What are your thoughts on reselling your API, and have you had conversations with potential providers on this subject. I am going to explore this with Streamdat.io, APIMATIC, and other companies I already work with, as well as reach out to some API providers I’d like to resell as a streaming API using Streadmata.io and see what they say. It’s an interesting conversation, which I think we’ll see more discussion around in 2018.
The Metropolitan Transportation Authority (MTA) Bus Time API Supports Service Interface for Real Time Information (SIRI)11 Jan 2018
The General Transit Feed Specification (GTFS) format for providing access to transit data has dominated the landscape for most of the time I have been researching transit data and APIs over the last couple of weeks. A dominance led by Google and their Google Maps, who is the primary entity behind GTFS. However the tech team at Streamdata.io brought it to my attention the other day that the Metropolitan Transportation Authority (MTA) Bus Time API Supports Service Interface for Real Time Information (SIRI), another standard out of Europe. I think MTA’s introduction to Siri, and the story behind their decision tells a significant tale about how standards are viewed.
According to the MTA, “SIRI (Service Interface for Real Time Information) is a standard covering a wide range of types of real-time information for public transportation. This standard has been adopted by the European standards-setting body CEN, and is not owned by any one vendor, public transportation agency or operator. It has been implemented in a growing number of different projects by vendors and agencies around Europe.” I feel like their thoughts about SIRI not being owned by any one vendor is an important thing to take note of. While GTFS is an open standard, it is clearly a Google-led effort, and I’d say their decision to use Protocol Buffers reflects the technology, business, and politics of Google’s vision for the transit sector.
The MTA has evolved SIRI as part of their adoption, opting to deliver APIs as more of a RESTful interface as opposed to SOAP, and providing responses in JSON, which makes things much more accessible to a wider audience. While technologically sound decisions, I think using Protocol Buffers or even SOAP have political implications when you do not deeply consider your API consumers during the planning phases of your API. I feel like MTA has done this, and understands the need to lower the bar when it comes to the access of public transit data, ensuring that as of an audience as possible can put the real time transit data to use–web APIs, plus JSON, just equals an easier interface to work with for many developers.
I’m getting up to speed with GTFS and GTFS Realtime, and I am also getting intimate with SIRI, and learning how to translate from GTFS into SIRI. I’m looking to lower the bar when it comes to accessing real time transit data. Something simple web APIs excel at. I’ve been able to pull GTFS and GTFS Realtime data and convert into simpler JSON. Now that MTA has introduced me to SIRI, I’m going to get acquainted with this specification, and understand how I can translate GTFS into SIRI, and then stream using Server-Sent Events (SSE) and JSON Patch using Streamdata.io. Truly making these feeds available in real time, using common web standards.
I’ve been diving into the world of transit data, and learning more about GTFS and GTFS Realtime, two of the leading specifications for providing access to static and real time transit data. I’ve been able to take the static GTFS data and quickly render as APIs, using the zipped up CSV files provided. Next on my list I wanted to be able to work with GTFS Realtime data, as this is where the data is that changes much more often, and ultimately is more valuable in applications and to consumers.
The GTFS-realtime data is encoded and decoded using Protocol Buffers, which provides a compact binary representation designed for fast and efficient processing of the data. Even with the usage of Protocol Buffers, which is also used by gRPC via HTTP/2, all of the GFTS Realtime data feeds I am consuming are being delivered via regular HTTP/1.1. I’m doing all this work to be able to make GTFS Realtime feeds more accessible for use by Streamdata.io, as the Protocol Buffers isn’t something the service currently supports. To make the data accessible for delivery via Server-Sent Events (SSE), and for partial updates to be delivered via JSON Patch, I need the Protocol Buffer format to be reduced to a simpler JSON format–which will be my next weeks worth of work on this project.
I was able to pretty quickly bind to the MTA subway GTFS Realtime feed here in NYC using the PHP bindings, and get at up to date “vehicle” and “alerts” via the transit authorities feeds. I’ve just dumped the data to the screen in no particular format, but was able to prove that I am able to connect to any GTFS feed, and easily convert to something I can translate into any format I desire. I’m opting to go with the Service Interface for Real Time Information (SIRI), which is more verbose than GTFS, but allows for availability in a JSON format. Now I just need to get more acquainted with the SIRI standard, and understands how it maps to the GTFS format.
I’m looking to have a solid approach to proxying an GTFS, and GTFS Realtime feed, and deploying as a SIRI compliant API that returns to JSON in coming weeks, so that I can quickly proxy using Streamata.io and deliver updates in true real time. Where transit vehicles are located at any particular moment, and details about alerts coming out of each transit authority are the most relevant, and real time aspect of transit operations. While the GTFS Realtime format is real time in name, it really isn’t in how its delivered. You still have to poll the feeds for changes, which is a burden on both the client and server, making Server-Sent Events, and JSON Patch a much more desirable, and cost effective way to get the job done.
I’ve been studying the overall Apache Stack a lot lately, with an emphasis on Kafka. I’m trying to understand what the future of APIs will hold, and where the leading edge of real time, event-driven architecture is at these days. I’m going through the protocol page for Kafka, learning about exactly how they move data around, and found their answers behind the decisions they’ve made along the way in deciding what protocols they chose to use were very interesting.
All the way at the bottom of the Kafka protocol page you can find the following “Some Common Philosophical Questions”, providing some interesting backstory on the decisions behind the very popular platform.
Some people have asked why we don’t use HTTP. There are a number of reasons, the best is that client implementors can make use of some of the more advanced TCP features–the ability to multiplex requests, the ability to simultaneously poll many connections, etc. We have also found HTTP libraries in many languages to be surprisingly shabby.
Others have asked if maybe we shouldn’t support many different protocols. Prior experience with this was that it makes it very hard to add and test new features if they have to be ported across many protocol implementations. Our feeling is that most users don’t really see multiple protocols as a feature, they just want a good reliable client in the language of their choice.
Another question is why we don’t adopt XMPP, STOMP, AMQP or an existing protocol. The answer to this varies by protocol, but in general the problem is that the protocol does determine large parts of the implementation and we couldn’t do what we are doing if we didn’t have control over the protocol. Our belief is that it is possible to do better than existing messaging systems have in providing a truly distributed messaging system, and to do this we need to build something that works differently.
A final question is why we don’t use a system like Protocol Buffers or Thrift to define our request messages. These packages excel at helping you to managing lots and lots of serialized messages. However we have only a few messages. Support across languages is somewhat spotty (depending on the package). Finally the mapping between binary log format and wire protocol is something we manage somewhat carefully and this would not be possible with these systems. Finally we prefer the style of versioning APIs explicitly and checking this to inferring new values as nulls as it allows more nuanced control of compatibility.
It paints an interesting story about the team, technology, and I think the other directions the API sector is taking, when it comes to which protocols they are using. I don’t know enough about how Kafka works to take any stance on their decisions. I’m just looking to just take a snapshot of their stance, so that I can come back to it at some point in the future, when I do.
I published my diverse toolbox diagram a couple weeks back, which includes Kafka. As I continue to develop my understanding of the Apache Stack, and Kafka, I will further dial-in the story that my API toolbox visual tells. The answers above further muddy the water for me about where it fits into the bigger picture, but I’m hoping it is something that will clear up with more awarness of what Kafka delivers.
I spend a lot of time trying to understand and define what is API. With my new partnership with Streamdata.io I’m pushing that work into understanding APIs in real time, and as part of event-driven architecture. As the Streamdata.io team and I work to identify interesting APIs out there that would benefit from streaming using the service, a picture of the real time nature of API platforms begins to emerge. I’m beginning to see all of this as a maturity aspect of API platforms, and those who are further along in their journey, have a better understanding the meaningful events that are occurring via their operations.
As part of this research I’ve been studying the Stripe API, looking for aspects of the platform that you could make more real time, and streaming. Immediately I come across the Stripe Events API, which is a “way of letting you know when something interesting happens in your account”. Using the Stripe Events API, “you can retrieve an individual event or a list of events from the API. We also have a separate system for sending the event objects directly to an endpoint on your server, called webhooks.” This is the heartbeat of the Stripe platform, and represents the “events” that API providers and consumers want to know about, and understand across platform usage.
I think about the awareness API management has brought to the table in the form of metrics and analytics. Then I consider the blueprint the more mature platforms like Stripe have established when it comes to codifying this awareness in a way that can be accessed via API, and begin to make more real time, or at least asynchronous using webhooks. Then I think about what Streamdata.io provides with Server-Sent Events, and JSON Patch, providing a stream of these meaningful events in real time, as soon as these events happen–no polling necessary. This is what I find interesting about what they do, and why I’ve signed up to partner with them. Well, that combined with them supporting me financially. ;-)
Even before working with Streamdata.io, I have been studying the value of API driven events, and working to identify the API platforms who are mature enough to be mapping this value exchange out. This has led to me to desire a better understanding of event-driven architecture, not necessarily because it is the next thing with APIs, but because there is some substance in there. There is a reason why events matter. They represent the meaningful exchanges that are occurring via platforms. The valuable ones. The percentage of transactions we should be tuning into. I want to better understand this realm, and continue collecting a wealth of blueprints regarding how companies like Stripe are maximizing these exchanges.
Disclosure: In case it isn’t clear, Streamdata.io is my primary partner, paying me to do research in this area, and helping support me as the API Evangelist.
404: Not Found
When I first started diving into what Streamdata.io does, and thinking of their role in the wider API landscape, I was pretty exclusively focused API providers. Meaning, if you are an API provider, depending on the resources you are serving up, you should consider augmenting it with a real time stream using Streamdata.io. This still holds true, but after using Streamdata.io more as a developer, it is becoming clear of Streamdata.io’s value in my toolbox as an API consumer, and thinking about how I can make my applications more efficient, real time, and event-driven.
Right now, I’m just taking a wide variety of existing web APIs and running through the Streamdata.io proxy, and seeing what comes out the other end. I’m in the phase where I’m just understanding what Server-Sent Events (SSE) combined with JSON Patch does to existing web APIs, and their resources. This process is helping me understand the possibilities with streaming existing web APIs, but as I fire up each API I’m seeing it also reveal a new layer of events that exist in between providing APIs, and consuming APIs. I feel like this layer isn’t always evident to API providers, who haven’t made it very far in their API journey.
While I study how the bleeding, and leading edge developers are deploying event-driven architecture, mining for the event value that exists within big data, I’m thinking there is also a pretty interesting opportunity in mining the event layer for existing web APIs. Once I turn on streaming for a web API, the immediate value you see is when a new resource is added. However, this really isn’t that amazing beyond just subscribing to a webhook, or polling an API. I feel like the valuable events we don’t fully see without Server-Sent Events (SSE) is the changes. When a price changes. When a link is modified. When content is refreshed. The subtle events that occur that might not be noticed in regular operations.
I’ve had this conversation with Nicolas Rigaud, the VP Marketing & Partners for Streamdata.io several times recently. That there is unrealized value in these changes to any system. The more they are known, recognized, and responded to, the more value they will possess. I feel like this is potentially the value that is driving the wider event-driven architecture movement at the moment. Understanding the subtle, but important changes that exist across systems and the data that is generated. Not just individual events, but also aggregate events at scale, which equal something much, much bigger. While I feel like “hoovering” up all the data you can find, and dialing in Kafka, or some other event-driven, streaming solution, is how you mine this value at scale, I think there is an equally great opportunity to tune into web APIs, and the unrealized events that happen via everyday platforms.
I’m working on a target list of around 100 APIs to proxy with Streamdata.io so that I can get a handle on the types of events that are occurring within some of the most used APIs out there. I’m guessing that the API providers who have the resources and skills on staff are already jumping at this opportunity, but I’m guessing there are many other APIs that have a significant amount of untapped potential for defining the event layer. This is where I see the potential for Streamdata.io as a tool in the hands of API consumers, and the average developer. To step in from an external vantage point and identify the most meaningful events that are occurring, and make them accessible to other systems, and within applications. Depending on the industry, I’m guessing there will become a growing number of monetization opportunities to emerge from these newfound events as we discover them in the real time streams.
I’m continuing to break down the technology stack as I get to know my new partner Streamdata.io. Yesterday I wrote about their use of JSON Patch for returning partial responses of changes made to an API that has been proxied through the service, and today I want to focus on understanding Server-Sent Events (SSE), which Streamdata.io uses to stream those events in real time to any consumer. In my experience, SSE is a lesser known of the real time technologies out there, but is one that holds a lot of potential, so I wanted to spend some time covering it here on the blog.
As opposed to technology that delivers a two-way stream, Server-sent events (SSE) is all about a client receiving automatic updates from a server via HTTP connection. The technology is a standard, with the Server-sent events (SSE) EventSource API being standardized as part of the HTML5 specification out of the W3C. Similar to Streamdata.io’s usage of JSON Patch, SSE is all about efficiency. Making web APIs real time isn’t always about having a two-way connection, and SEE is a great way to make things streaming in a one-way direction, only sending you what has changed in real-time using JSON Patch. Efficiency in direction, delivery, and in message.
To help me validate this theory I will keep playing with Streamdata.io and proxying any API I can get my hand on to see what is possible when you replace basic web API requests and responses with Server-sent events (SSE), and begin streaming only what changes after that initial request. I’m guessing that a whole new world of events will begin to emerge, allowing us to think about look at common web API resources differently. I feel like there is a lot of opportunity in deploying real-time, event-driven solutions like Kafka, and other Apache solutions, but I feel like there will be even more opportunity when it comes to getting intimate with the events that are already occurring across existing web APIs, even if the providers are fully tuned into what is going on, or have the resources to tackle event-driven architecture yet.
Disclosure: Streamdata.io is an API Evangelist partner.
I am playing with Streamdata.io as I learn how to use my new partner’s service. Streamdata.io proxies any API, and uses Server-Sent Event (SSE) to push updates using JSON Patch. I am playing with making a variety of APIs real time using their service, and in my style, I wanted to share the story of what I’m working on, here on the blog. I was making updates to some data in a Google Sheet that I use to drive some data across a couple of my websites, and thought…can I make this spreadsheet streaming using Streamdata.io? Yes. Yes, I can.
To test out my theory I went and created a basic Google Sheet with two columns, one for product name, and one for price. Simulating a potential product pricing list that maybe I’d want to stream across multiple website, or possibly within client and partner portals. Then I published the Google Sheet to the web, making the data publicly available, so I didn’t have to deal with any sort of authentication–something you will only want to do with publicly available data. I’ll play around with an authenticated edition at some point in the future, showing more secure examples.
Once I made the sheet public I grabbed the unique key for the sheet, which you can find in the URL, and placed into this URL: https://spreadsheets.google.com/feeds/list/[sheet key]/od6/public/basic?alt=json. The Google Sheet key takes a little bit to identify in the URL, but it is the long GUID in the URL, which is the longest part of the URL when editing the sheet. Once you put the key in the URL, you can take the URL and paste in the browser–giving you a JSON representation of your sheet, instead of HTML, basically giving you a public API for your Google Sheet. The JSON for Google Sheets is a little verbose and complicated, but once you study a bit it doesn’t take long for it to come into focus, showing eaching of the columns and rows.
Next, I created a Streamdata.io account, verified my email, logged in and created a new app. Something that took me about 2 minutes. I take the new URL for my Google Sheet and publish as the target URL in my Streamdata.io account. The UI then generates a curl statement for calling the API through the Streamdata.io proxy. Before it will work, you will have to replace the second question mark with an ampersand (&), as Streamdata.io assumes you do not have any parameters in the URL. Once replaced, you can open up your command line, paste in the command and run. Using Server-Sent Event (SSE) you’ll see the script running, checking for changes. When you make any changes to your Google Sheet, you will see a JSON Patch response returned with any changes in real time. Providing a real-time stream of your Google Sheet which can be displayed in any application.
Disclosure: Streamdata.io is an API Evangelist partner, and sponsors this site.
Even before I engaged with Streamdata.io on our current partnership, I was working with them to quantity the value they bring to the table with their service. As I was working on my story regarding the roubling terms of service changes From Washington Metropolitan Area Transit Authority (WMATA) data APIs, the Streamdata.io team was running a cost savings analysis on the WMATA APIs. This is where they take their web API, and see what they could save if they used Streamdata.io, and turned it into a streaming API.
The Streamdata.io team took the WMATA Real-Time Bus PredictionsAPI, and assessed the efficiency gains for WMATA when it comes to their most demanding API consumers. Here are the bandwidth and CPU savings:
- Client Bandwidth (BW) Savings - 88%
- Server Bandwidth (BW) Savings - 99%
- Server CPU Savings - 87%
Streamdata.io does this by being stood up in front of the WMATA web API and caching the results, then only showing changes to clients–in real-time. This isn’t just about making something real-time, it is about reducing the number of times API consumers need to be polling an API. When it comes to transit data you can imagine that the client is probably polling every second to see what has changed.
I’m learning about Streamdata.io’s process for calculating these savings, which is why I’m writing this story. I’m going to work to help apply this to many other APIs, as well as look at productizing the tool so that maybe it can become a self-service tool for other API providers to evaluate their own cost savings, if they went to a real-time way of doing things. To help me understand the savings beyond WMATA, I’m going to be doing benchmarks across all the other US transit provides, and see what kind of numbers I can generate.
Disclosure: Streamdata.io is API Evangelists sole partner.
I’ve been learning about a new API definition format called AsyncAPI that allows you to define message-driven APIs in a machine-readable format. It is protocol-agnostic, which means you can use it for APIs that work over MQTT, AMQP, WebSockets, STOMP, and other real-time, and Internet of Things focused APIs. The specification format mirrors OpenAPI, making it pretty easy to get up to speed understanding what is going on.
There are two primary concepts at play with the AsyncAPI:
- Messages - Consumer(s) communicate with your API via messages. A message is a piece of information two or more programs exchange. Most of the times to notify the other end(s) that, either an event has occurred or you want to trigger a command. Technically speaking the events and actions will always be sent in the same way. These are just messages, and their content can be anything. So when we talk about the difference between events and actions, this is only a semantic differentiation of message’s content. We do not enforce you to make any difference between them, although we encourage you to do it. A message can contain headers and a payload. However, both are optional. The specification allows you to define any header, to remain as much protocol-agnostic as possible.
- Topics - Message-driven protocols usually contain something called topic (MQTT), routing key (AMQP), destination (STOMP), etc. To some extent, they can compare to URLs in HTTP APIs. So, when you send a message to your API, it will be routed depending on the topic you published on. This feature allows you to create APIs that subscribe to specific topics and publish to other ones. There’s no standard way of naming topics, so we recommend you to have a look at our proposal here.
I don’t have any APIs I can apply AsyncAPI to, so I have to just learn from the examples and any other work I come across. It makes me happy to see folks developing API specifications like this, going beyond what OpenAPI is doing, but also keeping so closely in alignment with the existing work out of the OAI. I’m always hearing folks say that the OpenAPI specification doesn’t do what they want it to do, yet they don’t invest in vendor extensions, or even augment the work that is going on with a complimentary set of specifications. Good to see people just make it happen!
I’m adding AsyncAPI to my API definition research so I can keep in tune with where it goes. I’m talking with some folks regarding how it should viewed by the OAI. In my opinion, the OAI is going to have to begin considering how it will embrace specs that go beyond what it can do, as well as begin to adopt industry specific OAI implementations that may require acknwoledging some vendor extensions that may never get brought into the core specification. Anyways, it’s good to see movement in this area. Nice work Fran, Bruno, and Mike–you guys are rocking it.
I’m regularly spending time test driving new technologies, and staying in tune with where the API space is headed. This involves understanding the latest tools in the toolbox like GraphQL, Kafka, gRPC, and others. While reviewing each fork in the road, I’m trying to understand where things might be headed, the viability of taking this path, and whether or not I should be helping convince folks they might want to consider walking down any particular path. I can’t predict the future, but after seven years of studying APIs, and a thirty year career in technology, I have some experience in trying to reading the tea leaves. My goal isn’t to convince people of the latest trend, it is to try and help them make sensible decision about how technology might work for them, or just become the next layer of technical debt.
As the API Evangelist I am a storyteller in the tech sector, so names, brands, and the words we use are important to me. Not just for helping people understand what is going on, but actually setting the tone for the conversation. While studying a product, service, or tool I try and do some research into its origins, and understand how it came to be. I was doing this research with Kafka, and find myself confounded regarding why the platform creator(s) chose such a name, especially for a piece of messaging and queuing technology. I’m not up to speed on all of Franz Kafka’s work, but I am aware of the way his name is used, which according to the Merriam Webster dictionary (which I’m not going to link to cause their page keeps crashing in my browser):
of, relating to, or suggestive of Franz Kafka or his writings; especially : having a nightmarishly complex, bizarre, or illogical quality ~ Kafkaesque bureaucratic delays
Yeah, plug that shit into my company right now! I know that engineers aren’t good at thinking about marketing, and the branding around their tools, but c’mon? People are saying Kafka is the next evolution in the world of APIs, as we enter into a more real-time, evented realm of moving our bitz and bytes around the web. In all fairness, according to the Wikipedia entry, the history of Kafka says:
Apache Kafka was originally developed by LinkedIn, and was subsequently open sourced in early 2011. Graduation from the Apache Incubator occurred on 23 October 2012. In November 2014, several engineers who worked on Kafka at LinkedIn created a new company named Confluent with a focus on Kafka. According to a Quora post from 2014, Jay Kreps seems to have named it after the author Franz Kafka. Kreps chose to name the system after an author because it is “a system optimized for writing”, and he liked Kafka’s work.
I’m guessing Jay Kreps didn’t really think about the branding and marketing implications of his tool being Kafkaesque, and how people who are new to the product would interpret it. I’m guessing in engineering circles this won’t be a problem, but that really gets at my concern here. I’m looking at Kafka as one possible path I might want to help my readers consider for evolving their technology platform. Aside from the name implying “having a nightmarishly complex, bizarre, or illogical quality” or “bureaucratic delays”, I’m concerned about the technical complexity of Kafka over the simplicity of web APIs, which have contributed significantly to their success–despite the opinions of many engineers.
I regularly battle with tech folks about why web APIs have worked (or have not worked). Simplicity is in the top five reasons why it is working, and has received more traction than it’s predecessors like SOAP ever did. I have to consider these aspects of technological adoption as we look at whats next, and frankly I get why GraphQL, Kafka, and gRPC works for engineers, and technically minded folks. I’m not convinced that I can sell these solutions to the masses. Ok, I know that many of you think this shouldn’t be how things get done, and that these aspects of doing things should remain in the hands of the technically elite, but that is your hand-crafted world, and belief system. It doesn’t reflect how business gets done across all of the companies, organizations, institutions, and government agencies I’m talking to on a regular basis.
As we make decisions around what is next for the tech space we have to ask ourselves, do we want things to be “nightmarishly complex, bizarre, or illogical quality”? I’m not saying Kafka is this. I am just saying it doesn’t seem as simple as web APIs, and it is something that could contribute signficiantly to technical debt, and technology induced “bureaucratic delays”. I know this stuff might make sense in your head, but I spend my days studying technology and how people adopt or do not adopt it. I’m just saying we aren’t putting as much thought into how this stuff will pencil out at a larger scale…or, maybe we have, and the objective isn’t simplicity, and everyone being able to understanding what is going on, and keeping things within the control of an elite few, who understand how it all works.
My guess, is that we aren’t doing much in the way of deep thinking around why we name things the way we do, why we build technology the way we do, and why we adopt each wave of technology. I’m increasingly convinced that simple web APIs were purely an accident, and something that will be forgotten pretty soon.
If you think there is a link I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.