These are the news items I've curated in my monitoring of the API space that have some relevance to the API definition conversation and I wanted to include in my research. I'm using all of these links to better understand how the space is testing their APIs, going beyond just monitoring and understand the details of each request and response.03 Oct 2017
I’m spending time investing in my data, as well as my database API research. I’ll have guides, with accompanying stories coming out over the next couple weeks, but I want to take a moment to publish some of the raw research that I think paints an interesting picture about where things are headed.
When studying what is going on with data and APIs you can’t do any search without stumbling across an Apache project doing something or other with data. I found 37 separate projects at Apache that were data related, and wanted to publish as a single list I could learn from.
- Airvata** - Apache Airavata is a micro-service architecture based software framework for executing and managing computational jobs and workflows on distributed computing resources including local clusters, supercomputers, national grids, academic and commercial clouds. Airavata is dominantly used to build Web-based science gateways and assist to compose, manage, execute, and monitor large scale applications (wrapped as Web services) and workflows composed of these services.
- Ambari - Apache Ambari makes Hadoop cluster provisioning, managing, and monitoring dead simple.
- Apex - Apache Apex is a unified platform for big data stream and batch processing. Use cases include ingestion, ETL, real-time analytics, alerts and real-time actions. Apex is a Hadoop-native YARN implementation and uses HDFS by default. It simplifies development and productization of Hadoop applications by reducing time to market. Key features include Enterprise Grade Operability with Fault Tolerance, State Management, Event Processing Guarantees, No Data Loss, In-memory Performance & Scalability and Native Window Support.
- Avro - Apache Avro is a data serialization system.
- Beam - Apache Beam is a unified programming model for both batch and streaming data processing, enabling efficient execution across diverse distributed execution engines and providing extensibility points for connecting to different technologies and user communities.
- Bigtop - Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a community around the packaging and interoperability testing of Hadoop-related projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc…) developed by a community with a focus on the system as a whole, rather than individual projects. In short we strive to be for Hadoop what Debian is to Linux.
- BookKeeper - BookKeeper is a reliable replicated log service. It can be used to turn any standalone service into a highly available replicated service. BookKeeper is highly available (no single point of failure), and scales horizontally as more storage nodes are added.
- Calcite - Calcite is a framework for writing data management systems. It converts queries, represented in relational algebra, into an efficient executable form using pluggable query transformation rules. There is an optional SQL parser and JDBC driver. Calcite does not store data or have a preferred execution engine. Data formats, execution algorithms, planning rules, operator types, metadata, and cost model are added at runtime as plugins.
- Crunch - The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.
- DataFu - Apache DataFu consists of two libraries: Apache DataFu Pig is a collection of useful user-defined functions for data analysis in Apache Pig. Apache DataFu Hourglass is a library for incrementally processing data using Apache Hadoop MapReduce. This library was inspired by the prevalence of sliding window computations over daily tracking data. Computations such as these typically happen at regular intervals (e.g. daily, weekly), and therefore the sliding nature of the computations means that much of the work is unnecessarily repeated. DataFu’s Hourglass was created to make these computations more efficient, yielding sometimes 50-95% reductions in computational resources.
- Drill - Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google’s Dremel.
- Edgent - Apache Edgent is a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the continuous streams of data coming from equipment, vehicles, systems, appliances, devices and sensors of all kinds (for example, Raspberry Pis or smart phones). Working in conjunction with centralized analytic systems, Apache Edgent provides efficient and timely analytics across the whole IoT ecosystem: from the center to the edge.
- Falcon - Apache Falcon is a data processing and management solution for Hadoop designed for data motion, coordination of data pipelines, lifecycle management, and data discovery. Falcon enables end consumers to quickly onboard their data and its associated processing and management tasks on Hadoop clusters.
- Flink - Flink is an open source system for expressive, declarative, fast, and efficient data analysis. It combines the scalability and programming flexibility of distributed MapReduce-like platforms with the efficiency, out-of-core execution, and query optimization capabilities found in parallel databases.
- Flume - Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store
- Giraph - Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections.
- Hama - The Apache Hama is an efficient and scalable general-purpose BSP computing engine which can be used to speed up a large variety of compute-intensive analytics applications.
- Helix - Apache Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix automates reassignment of resources in the face of node failure and recovery, cluster expansion, and reconfiguration.
- Ignite - Apache Ignite In-Memory Data Fabric is designed to deliver uncompromised performance for a wide set of in-memory computing use cases from high performance computing, to the industry most advanced data grid, in-memory SQL, in-memory file system, streaming, and more.
- Kafka - A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.
- Knox - The Apache Knox Gateway is a REST API Gateway for interacting with Hadoop clusters. The Knox Gateway provides a single access point for all REST interactions with Hadoop clusters. In this capacity, the Knox Gateway is able to provide valuable functionality to aid in the control, integration, monitoring and automation of critical administrative and analytical needs of the enterprise.
- Lens - Lens provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It seamlessly integrates Hadoop with traditional data warehouses to appear like one.
- MetaModel - With MetaModel you get a uniform connector and query API to many very different datastore types, including: Relational (JDBC) databases, CSV files, Excel spreadsheets, XML files, JSON files, Fixed width files, MongoDB, Apache CouchDB, Apache HBase, Apache Cassandra, ElasticSearch, OpenOffice.org databases, Salesforce.com, SugarCRM and even collections of plain old Java objects (POJOs). MetaModel isn’t a data mapping framework. Instead we emphasize abstraction of metadata and ability to add data sources at runtime, making MetaModel great for generic data processing applications, less so for applications modeled around a particular domain.
- Oozie - Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
- ORC - ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly. Storing data in a columnar format lets the reader read, decompress, and process only the values that are required for the current query.
- Parquet - Apache Parquet is a general-purpose columnar storage format, built for Hadoop, usable with any choice of data processing framework, data model, or programming language.
- Phoenix - Apache Phoenix enables OLTP and operational analytics for Apache Hadoop by providing a relational database layer leveraging Apache HBase as its backing store. It includes integration with Apache Spark, Pig, Flume, Map Reduce, and other products in the Hadoop ecosystem. It is accessed as a JDBC driver and enables querying, updating, and managing HBase tables through standard SQL.
- REEF - Apache REEF (Retainable Evaluator Execution Framework) is a development framework that provides a control-plane for scheduling and coordinating task-level (data-plane) work on cluster resources obtained from a Resource Manager. REEF provides mechanisms that facilitate resource reuse for data caching, and state management abstractions that greatly ease the development of elastic data processing workflows on cloud platforms that support a Resource Manager service.
- Samza - Apache Samza provides a system for processing stream data from publish-subscribe systems such as Apache Kafka. The developer writes a stream processing task, and executes it as a Samza job. Samza then routes messages between stream processing tasks and the publish-subscribe systems that the messages are addressed to.
- Spark - Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics.
- Sqoop - Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
- Storm - Apache Storm is a distributed real-time computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing real-time computation.
- Tajo - The main goal of Apache Tajo project is to build an advanced open source data warehouse system in Hadoop for processing web-scale data sets. Basically, Tajo provides SQL standard as a query language. Tajo is designed for both interactive and batch queries on data sets stored on HDFS and other data sources. Without hurting query response times, Tajo provides fault-tolerance and dynamic load balancing which are necessary for long-running queries. Tajo employs a cost-based and progressive query optimization techniques for optimizing running queries in order to avoid the worst query plans.
- Tez - Apache Tez is an effort to develop a generic application framework which can be used to process arbitrarily complex directed-acyclic graphs (DAGs) of data-processing tasks and also a reusable set of data-processing primitives which can be used by other projects.
- VXQuery - Apache VXQuery will be a standards compliant XML Query processor implemented in Java. The focus is on the evaluation of queries on large amounts of XML data. Specifically the goal is to evaluate queries on large collections of relatively small XML documents. To achieve this queries will be evaluated on a cluster of shared nothing machines.
- Zeppelin - Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects.
There is a serious amount of overlap between these projects. Not all of these projects have web APIs, while some of them are all about delivering a gateway or aggregate API across projects. There is a lot to process here, but I think listing them out provides an easier way to understand the big data explosion of projects over at Apache.
It is tough to understand what each of these do without actually playing with them, but that is something I just don’t have the time to do, so next up I’ll be doing independent searches for these project names, and finding stories from across the space regarding what folks are doing with these data solutions. That should give me enough to go on when putting them into specific buckets, and finding their place in my data, and database API research.
I am spending a lot of time lately thinking about event sourcing, evented architecture, real time, and webhooks. I’m revisiting some of the existing aspects of how we move our bits around the Internet in real time and at scale as part of existing conversation I am having, as well as some projects I’m working on. I recently wrote about making sense of API activity with webhook events, and as I’m crafting a list of meaningful events for my Human Services Data API (HSDA) work, I’m thinking about how these events reflect the value that occurs via API platforms.
As I’m going through the different APIs I’m exposing via a platform, I am working to identify and catalog events in which folks can subscribe to using webhooks. These are the events that occur, like adding a new organization, updating a service, or completing a batch import–all the things people will care about the most. These are the events and activities that occur because their is an API, which have the most value to API consumers, and platform operators. This is what actually matters, and why we are doing an API in the first place, to enable these events to occur. The more these events are triggered, and the more people we have subscribing and engaging with these events, the more value that is generated using an API.
In aggregate, using modern approaches to API management, we might provide analytics and reports that demonstrate all this value being created, to justify the existence of our API. In some implementations, this value created is how we might be charging our API consumers, partners, and other stakeholders. However, in some cases we might even considering paying API consumers when these events occur, incentivizing a certain event-driven behavior that benefits the platform. It is easy to think of API value generation simply as the number of API calls, but I think webhooks has helped establish a new way to look at how value is generated, based upon the number of subscribers to any particular event, or possible a type of of event.
I feel like this is one of the reason we are finally seeing more investment in event sourcing, and evented architecture, and the real time streaming of data and content. The events that matter are getting prioritized, and the technology is advancing to support these events that matter. IDK. As I push forward with my webhook research, and revisit my real time API research, and expand into new realms of messaging and focusing on events, I’m rethinking how we measure and quantify value generation via API platforms. For a long time the measure has been number of API consumers, and the number of API calls, but I feel like things are shifting to the types of events that occur, and how meaningful these events are to API providers, consumers, and their application end-users.
I was taking a fresh look at my real time API research as part of some data streaming, and event sourcing conversations I was having last week. My research areas are never perfect, but I’d say that real time is still the best umbrella to think about some of the shifts we are seeing on the landscape recently. They are nothing new, but there has been renewed energy, new and interesting conversation going on, as well as some growing trends that I cannot ignore. To support my research, I took a day this week to dive in, have a conversation with my buddy Alex over at the TheNewStack.io, and the new CEO of WSO2 Tyler Jewell around what is happening.
The way I approach my research is to always step back and look at what is happening already in the space, and I wanted to take another look at some of the real time API service providers I was already keeping eye on in the space:
- Pubnub - APIs for developers building secure realtime Mobile, Web, and IoT Apps.
- StreamData - Transform any API into a real-time data stream without a single line of server code.
- Fanout.io - Fanout’s reverse proxy helps you push data to connected devices instantly.
- Firebase - Store and sync data with our NoSQL cloud database. Data is synced across all clients in real time, and remains available when your app goes offline.
- Pusher - Leaders in real time technologies. We empower all developers to create live features for web and mobile apps with our simple hosted API.
I’ve been tracking on what these providers have been doing for a while. They’ve all been pushing to boundaries of what is streaming, and real time APIs for some time. Another open source solution that I think is worth noting, which I believe some of the above services have leverages is Netty.io.
- Netty - Netty is an asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients.
I also wanted to make sure and include Google’s approach to a technology that has been around a while:
- Google Cloud Pub/Sub - Google Cloud Pub/Sub is a fully-managed real-time messaging service that allows you to send and receive messages between independent applications.
Next, I wanted to refresh my understanding of all the Apache projects that speak to this realm. I’m always trying to keep a handle on what they each actually offer, and how they overlap. So, seeing them side by side like this helps me think about how they fit into the big picture.
- Apache Kafka - Kafka™ is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
- Apache Flink - Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.
- Apache Spark - Spark Streaming makes it easy to build scalable fault-tolerant streaming applications. Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.
- Apache Storm Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
- Apache Apollo - ActiveMQ Apollo is a faster, more reliable, easier to maintain messaging broker built from the foundations of the original ActiveMQ.
One thing I think is worth noting with all of these is the absence of the web when you read through their APIs. Apollo had some significant RESTful approaches, and you find gateways and plugins for some of the others, but when you consider how these technologies fit into the wider API picture, I’d say they aren’t about embracing the web.
On that note, I think it is worth mentioning what is going on over at Google, with their gRPC effort, which provides “bi-directional streaming and fully integrated pluggable authentication with http/2 based transport”:
- gRPC - A high performance, open-source universal RPC framework
Also, I think most notably, they are continuing the tradition of APIs embracing the web, and built on top of HTTP/2. For me, this is always important, and trumps just being open source in my book. The more web an open source technology, and a company’s service utilize, the more comfortable I’m going to feel telling my readers they should be baking this into their operations.
After these services and tooling, I don’t want to forget about the good ol fashioned protocols available out there, that help use doing things in real time. I’m tracking on 12 real time protocols that I see in use across the companies, organizations, institutions, and government agencies I’m tracking on:
- Simple (or Streaming) Text Orientated Messaging Protocol (STOMP) - STOMP is the Simple (or Streaming) Text Orientated Messaging Protocol. STOMP provides an interoperable wire format so that STOMP clients can communicate with any STOMP message broker to provide easy and widespread messaging interoperability among many languages, platforms and brokers.
- Advanced Message Queuing Protocol (AMQP) - The Advanced Message Queuing Protocol (AMQP) is an open standard for passing business messages between applications or organizations. It connects systems, feeds business processes with the information they need and reliably transmits onward the instructions that achieve their goals.
- MQTT - MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. It is useful for connections with remote locations where a small code footprint is required and/or network bandwidth is at a premium.
- OpenWire - OpenWire is our cross language Wire Protocol to allow native access to ActiveMQ from a number of different languages and platforms. The Java OpenWire transport is the default transport in ActiveMQ 4.x or later.
- Websockets - WebSocket is a protocol providing full-duplex communication channels over a single TCP connection. The WebSocket protocol was standardized by the IETF as RFC 6455 in 2011, and the WebSocket API in Web IDL is being standardized by the W3C.
- Extensible Messaging and Presence Protocol (XMPP) - XMPP is the Extensible Messaging and Presence Protocol, a set of open technologies for instant messaging, presence, multi-party chat, voice and video calls, collaboration, lightweight middleware, content syndication, and generalized routing of XML data.
- PubSubHubbub - PubSubHubbub is an open protocol for distributed publish/subscribe communication on the Internet. Initially designed to extend the Atom (and RSS) protocols for data feeds, the protocol can be applied to any data type (e.g. HTML, text, pictures, audio, video) as long as it is accessible via HTTP. Its main purpose is to provide real-time notifications of changes, which improves upon the typical situation where a client periodically polls the feed server at some arbitrary interval. In this way, PubSubHubbub provides pushed HTTP notifications without requiring clients to spend resources on polling for changes.
- Real Time Streaming Protocol (RTSP) - The Real Time Streaming Protocol (RTSP) is a network control protocol designed for use in entertainment and communications systems to control streaming media servers. The protocol is used for establishing and controlling media sessions between end points. Clients of media servers issue VCR-style commands, such as play and pause, to facilitate real-time control of playback of media files from the server.
- Server-Sent Events - Server-sent events (SSE) is a technology where a browser receives automatic updates from a server via HTTP connection. The Server-Sent Events EventSource API is standardized as part of HTML5 by the W3C.
- HTTP Live Streaming (HLS) - HTTP Live Streaming (also known as HLS) is an HTTP-based media streaming communications protocol implemented by Apple Inc. as part of its QuickTime, Safari, OS X, and iOS software.
- HTTP Long Polling - HTTP long polling, where the client polls the server requesting new information. The server holds the request open until new data is available. Once available, the server responds and sends the new information. When the client receives the new information, it immediately sends another request, and the operation is repeated. This effectively emulates a server push feature.
These protocols are used by the majority of the service providers and tooling I list above, but in my research I’m always trying to focus on not just the services and tooling, but the actual open standards that they support.
I have to also mention the entry level aspect of real time in my opinion. Something, that many API providers support, but also is the 101 level approach that some companies, organizations, institutions, and agencies need to be exposed to before they get overwhelmed with other approaches.
- Webhooks - A webhook in web development is a method of augmenting or altering the behavior of a web page, or web application, with custom callbacks. These callbacks may be maintained, modified, and managed by third-party users and developers who may not necessarily be affiliated with the originating website or application.
That is the real time API landscape. Sure, there are other services, and tooling, but this is the cream on top. I’m also struggling with the overlap with event sourcing, evented architecture, messaging, and other layers of the API space that are being used to move bits and bytes around today. Technologists aren’t always the best at using precise words, or keeping things simple, and easy to understand, let alone articulate. This is one of the concerns I have with streaming API approaches, is that they are often over the heads, and beyond the needs of some API providers, and may API consumers. They have their place within certain use cases, and large organizations that have the resources, but I spend a lot of time worrying about the little guy.
I think a good example of web API vs streaming API can be found in the Twitter API community. Many folks just need simple, intuitive, RESTful endpoints to get access to data, and content. While a much smaller slice of the pie will have the technology, skills, and compute capacity to do things at scale. Regardless, I see technologies like Apache Kafka being turned into plug and play, infrastructure as a service approaches, allowing anyone to quickly deploy to Heroku, and just put to work via a SaaS model. So, of course, I will still be paying attention, and trying to make sense out of all of this. I don’t know where any one it will be going, but I will keep tuning in, and telling stories about how real time, and streaming API technology is being used, or not being used.
I am finally getting the time to invest more into the rest of my API industry guides, which involves deep dives into core areas of my research like API definitions, design, and now deployment. The outline for my API deployment research has begun to come into focus and looks like it will rival my API management research in size.
With this release, I am looking to help onboard some of my less technical readers with API deployment. Not the technical details, but the big picture, so I wanted to start with some simple questions, to help prime the discussion around API development.
- Where? - Where are APIs being deployed. On-premise, and in the clouds. Traditional website hosting, and even containerized and serverless API deployment.
- How? - What technologies are being used to deploy APIs? From using spreadsheets, document and file stores, or the central database. Also thinking smaller with microservices, containes, and serverless.
- Who? - Who will be doing the deployment? Of course, IT and developers groups will be leading the charge, but increasingly business users are leveraging new solutions to play a significant role in how APIs are deployed.
The Role Of API Definitions While not every deployment will be auto-generated using an API definition like OpenAPI, API definitions are increasingly playing a lead role as the contract that doesn’t just deploy an API, but sets the stage for API documentation, testing, monitoring, and a number of other stops along the API lifecycle. I want to make sure to point out in my API deployment research that API definitions aren’t just overlapping with deploying APIs, they are essential to connect API deployments with the rest of the API lifecycle.
Using Open Source Frameworks Early on in this research guide I am focusing on the most common way for developers to deploy an API, using an open source API framework. This is how I deploy my APIs, and there are an increasing number of open source API frameworks available out there, in a variety of programming languages. In this round I am taking the time to highlight at least six separate frameworks in the top programming languages where I am seeing sustained deployment of APIs using a framework. I don’t take a stance on any single API framework, but I do keep an eye on which ones are still active, and enjoying usag bey developers.
Deployment In The Cloud After frameworks, I am making sure to highlight some of the leading approaches to deploying APIs in the cloud, going beyond just a server and framework, and leveraging the next generation of API deployment service providers. I want to make sure that both developers and business users know that there are a growing number of service providers who are willing to assist with deployment, and with some of them, no coding is even necessary. While I still like hand-rolling my APIs using my peferred framework, when it comes to some simpler, more utility APIs, I prefer offloading the heavy lifting to a cloud service, and save me the time getting my hands dirty.
Essential Ingredients for Deployment Whether in the cloud, on-premise, or even on device and even the network, there are some essential ingredients to deploying APIs. In my API deployment guide I wanted to make sure and spend some time focusing on the essential ingredients every API provider will have to think about.
-Compute - The base ingredient for any API, providing the compute under the hood. Whether its baremetal, cloud instances, or serverless, you will need a consistent compute strategy to deploy APIs at any scale. -Storage - Next, I want to make sure my readers are thinking about a comprehensive storage strategy that spans all API operations, and hopefully multiple locations and providers. -DNS - Then I spend some time focusing on the frontline of API deployment–DNS. In todays online environment DNS is more than just addressing for APIs, it is also security. -Encryption - I also make sure encryption is baked in to all API deployment by default in both transit, and storage.
Some Of The Motivations Behind Deploying APIs In previous API deployment guides I usually just listed the services, tools, and other resources I had been aggregating as part of my monitoring of the API space. Slowly I have begun to organize these into a variety of buckets that help speak to many of the motivations I encounter when it comes to deploying APIs. While not a perfect way to look at API deployment, it helps me thinking about the many reasons people are deploying APIs, and craft a narrative, and provide a guide for others to follow, that is potentially aligned with their own motivations.
- Geographic - Thinking about the increasing pressure to deploy APIs in specific geographic regions, leveraging the expansion of the leading cloud providers.
- Virtualization - Considering the fact that not all APIs are meant for production and there is a lot to be learned when it comes to mocking and virtualizing APIs.
- Data - Looking at the simplest of Create, Read, Update, and Delete (CRUD) APIs, and how data is being made more accessible by deploying APIs.
- Database - Also looking at how APIs are beign deployed from relational, noSQL, and other data sources–providing the most common way for APIs to be deployed.
- Spreadsheet - I wanted to make sure and not overlook the ability to deploy APIs directly from a spreadsheet making APIs are within reach of business users.
- Search - Looking at how document and content stores are being indexed and made searchable, browsable, and accessible using APIs.
- Scraping - Another often overlooked way of deploying an API, from the scraped content of other sites–an approach that is alive and well.
- Proxy - Evolving beyond early gateways, using a proxy is still a valid way to deploy an API from existing services.
- Rogue - I also wanted to think more about some of the rogue API deployments I’ve seen out there, where passionate developers reverse engineer mobile apps to deploy a rogue API.
- Microservices - Microservices has provided an interesting motivation for deploying APIs–one that potentially can provide small, very useful and focused API deployments.
- Containers - One of the evolutions in compute that has helped drive the microservices conversation is the containerization of everything, something that compliments the world of APis very well.
- Serverless - Augmenting the microservices and container conversation, serverless is motivating many to think differently about how APIs are being deployed.
- Real Time - Thinking briefly about real time approaches to APIs, something I will be expanding on in future releases, and thinking more about HTTP/2 and evented approaches to API deployment.
- Devices - Considering how APis are beign deployed on device, when it comes to Internet of Things, industrial deployments, as well as even at the network level.
- Marketplaces - Thinking about the role API marketplaces like Mashape (now RapidAPI) play in the decision to deploy APIs, and how other cloud providers like AWS, Google, and Azure will play in this discussion.
- Webhooks - Thinking of API deployment as a two way street. Adding webhooks into the discussion and making sure we are thinking about how webhooks can alleviate the load on APIs, and push data and content to external locations.
- Orchestration - Considering the impact of continous integration and deployment on API deploy specifically, and looking at it through the lens of the API lifecycle.
I feel like API deployment is still all over the place. The mandate for API management was much better articulated by API service providers like Mashery, 3Scale, and Apigee. Nobody has taken the lead when it came to API deployment. Service providers like DreamFactory and Restlet have kicked ass when it comes to not just API management, but making sure API deployment was also part of the puzzle. Newer API service providers like Tyk are also pusing the envelope, but I still don’t have the number of API deployment providers I’d like, when it comes to referring my readers. It isn’t a coincidence that DreamFactory, Restlet, and Tyk are API Evangelist partners, it is because they have the services I want to be able to recommend to my readers.
This is the first time I have felt like my API deployment research has been in any sort of focus. I carved this layer of my research of my API management research some years ago, but I really couldn’t articulate it very well beyond just open source frameworks, and the emerging cloud service providers. After I publish this edition of my API deployment guide I’m going to spend some time in the 17 areas of my research listed above. All these areas are heavily focused on API deployment, but I also think they are all worth looking at individually, so that I can better understand where they also intersect with other areas like management, testing, monitoring, security, and other stops along the API lifecycle.
There are many definitions of what exactly constitutes "real time". I find it is a very relative thing, depending on who you talk to. When asked, many will respond with push notifications as an example. Others immediately think chat and messaging. If you are talking to developers they will reference specific technology like XMPP, Jabber, and WebSockets.
Real time is relative. It is relative to the situation, and to those involved. I'd say real time also isn't good by default, in all situations. The need for real time might change or evolve, and mean different things in different industries. All of this variance really opens up the concept for a lot of manipulation and abuse.
I feel like those who are wielding real time often speak of the benefits to us when in reality it is about real time in service of what they desire. They want a real time channel to you so they can push to you anytime, and get the desired action they are looking for (ie. click, view, purchase). In this environment, the concept of real time quickly becomes just noise, distraction, and many other negative things--rendering real time to just often being a pretty bad idea.
I took the Github repository for Erik Wilde's (@dret) Web Concepts work and forked it, then generated some JSON which I could use to import into my API monitoring system. I've been manually adding specs to my Tweet and LinkedIn scheduling system, but I keep forgetting to go back to the site and add more entries. So I wanted to go ahead and import all the concepts and specs, and schedule out the tweets and LinkedIn posts for everything, over the next couple months.
First I generated the JSON for the concepts:
Then I generated the JSON for the specs:
I left out the relationships between the concepts and specs, as I will just be linking to Web Concepts, and let people explore for themselves. As I was looking through the JSON I got me thinking about why these concepts and specs aren't available in API design tooling, as helpers and tooltips, so that API designers and architects can learn from them and be reminded in real time--as they are crafting their APIs.
It seems like there should be autocomplete for HTTP header fields, and for HTTP status codes, and other relevant items as they are needed. There is a wealth of web literacy available in Erik's work, and across the web concepts and specs he has organized, it seems like these should be available by default within API design services and tooling, and start being baked into IDEs like Atom, Eclipse, and Visual Studio--maybe they already are, and I'm just unaware.
I am working hard to fine tune my world after coming back from the wilderness this summer. Now that I'm back I am putting a lot of thought into how I can optimize for efficiency, as well as for my own happiness. As I fire back up the old API Evangelist machine, I'm evaluating every concept in play, a process being used, and tool in production, and evaluate how it benefits me or creates friction in my world.
During the next evolution of API Evangelist, I am looking to maximize operations, while also helping to ensure that I do not burn out again (5 years was a long time). While hiking on the trail I thought A LOT about what is real time, and upon my return, I've been applying this to reverse engineering what is real time in my world, and fine tuning it for maximum efficiency and helping me achieve my objectives.
As I had all the moving parts of real time spread out across my workbench, one thing I noticed was the emotional hooks it likes to employ. When I read a Tweet that I didn't agree with, or read a blog post that needed a rebuttal, or a slack conversation that @mentioned me--I felt like I needed to reply. When in reality, there is no reason to reply to real time events, in real time. This is what it wants, not always something you want.
I wanted to better understand this element of my real time world, so I reassembled everything and set back into motion--this time I put a delay switch on ALL responses to real time events across all my channels. No matter how badly I wanted, I was forbidden to response within 48 hours to anything. It was hard at first, but I quickly began to see some interesting efficiency gains and a better overall psychological well-being.
Facebook, Twitter, Github, and Slack all were turned off and only allowed to be turned on a couple times a day. I could write a response to a blog post, but I wouldn't be allowed to post it for at least two days. I actually built this delay switch into my world, as a sort of scheduling system for my platform, which allows me to publish blog posts, Tweets, Github commits, and other pushes that were often real time, using a master schedule.
After a couple of weeks my world feels more like I have several puppets on strings, and performing from a semi-scripted play. Where before it felt the other way around, that I was a puppet on other people's strings, performing in a play I've never seen a script for.
Each device will push to cloud, or to intermediary some sort of credentials that latest vulnerability update was pushed.
One thing I'm experiencing as I come out of my Drone Recovery project is the drowning effects of our real-time worlds. I am talking about the desire to stay connected in this Internet age, and subscribe to as many possible available channels (ie. Facebook, Twitter, LinkedIn, RSS, etc.), and more importantly the tuning in, and responding to these channels in real time.
You hear a lot of talk about information overload, but I don't feel the amount of information is the problem. For me, the problem comes in with the emotional investment demanded by real-time, and the ultimate toll it can take on your productivity, or just general happiness and well-being. You can see this play out in everything from expectations that you should respond to emails, all the way to social network memes getting your attention when it comes to the election, or for me personally, the concerns around security and privacy using technology.
The problem isn't the amount of information, it is the emotional toll of real-time. I can keep up with the volume of information, it's once I start paying the toll fee associated with each item, that it begins to add up. I feel the toll fee is higher in the real-time lane than when you do on your own schedule. The people who demand I respond to emails, and be first to the story have skin in the game, and will be collecting a portion of the toll fee, so it is in their best interest to push you to be real time.
Sure, there are some items that will be perishable in all of this. I am not applying this line of thinking across the board, but I am prioritizing things with this in mind. In an increasingly digital world, the demands on our time are only going to increase. To help me to keep from drowning, I'm going to get more critical about what I accept into my world in a real time way. My goal is to limit the emotional toll I pay, and maximize my ability to focus on the big picture when it comes to how technology, and specifically APIs are impacting our world.
I had heard about the Zika virus research that was going on at the University of Wisconsin listening to an NPR episode this last spring. I finally had the time to dig into the topic a little more, and learn more about where the research is at, and some of the software behind the sharing and collaboration around the research.
The urgency in getting the raw data and results of the research out to the wider scientific community caught my attention and the potential for applying API related approaches seems pretty huge. When it comes to mission-critical research that could impact thousands or millions of people, it seems like there should be a whole suite of open tooling that people can employ to ensure sharing and collaboration are real time and frictionless.
As I dug into the Zika virus research, I was happy to find the LabKey technology employed to publish the research. I do not know much about them yet, but I was happy to see the open source community solution, developer resources including a web API for integrating with research that is published using the platform. There are plenty of improvements I'd like to see added to the API and developer efforts, but it is a damn good start when it comes to making important scientific research much more shareable and collaborative.
I'll spend more time learning more about what LabKey currently offers, and then I'll work to establish some sort of next steps blueprint that would employ other modern API approaches to help ensure important research can be made more real-time, aggregated, interoperable, and shareable using technology like definitions, Webhooks, iPaaS, and other common areas of a modern API effort.
When it comes to research, scientists should have a wealth of open tooling and resources that make their work a collaborative and shareable process by default, but with as much control as they desire--something modern web API solutions excel at. I added LabKey to a new research area dedicated to science. I will spend more time going through space, and see what guides, blueprints, and other resources I can pull together to assist researchers in their API journey.
The age of the mountain top and building cellular tower is coming to a close. Our real-time cellar drone algorithm is now in active use in over 3300 telco fleets around the globe. After studying the daily activity of over 100 million cellular customers were able to assemble a reliable algorithm that would guide fleets of cellular network drones to where they are most needed in real time.
It just didn't make sense to have cellular network towers be stationary anymore when our mobile users are well...mobile. We needed our cellular network to move, expand, and grow along with our user base. Thanks to advances in drone technology our network of thousands of drones deploy, hover, migrate, and return home in real time response to demands.
Our initial implementations were all successful due to the hard work, and quality of our algorithm, but with the data we are now receiving from the 3000 drone fleets around the globe, our new Swarm Tower (TM) technology is insanely precise. Rarely do we ever see a network overload or a mobile customer who cannot get the bandwidth (they are willing to pay for) they need at any given moment.
What has really surprised us is the use of the network beyond mobile. Early on 90% of the network connections were consumer and business smartphone devices, where now over 60% of the network capacity being other consumer, commercial and industrial devices. Expanding our mobile networks to meet this demand would never have been possible using traditional cell tower technology.
If you are as excited about Swarm Tower technology as we are, you will be interested in our Q2 announcements around secondary drone activities while they are supporting cellular network activity -- things like surveillance, weather, and other common activities.
If you think there is a link I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.