Translate

Thursday, February 14, 2019

Developing REST APIs

This article introduces a set of tools essential to building REST APIs.

This article introduces a set of tools essential to building REST APIs. The tools are platform independent, which means they are applicable to REST APIs built with any technology stack. The goal of this article is to familiarise novice API developers with different stages of API development and introduce tools that help with those stages. Detailed coverage of these tools can be found on the web. The different phases of API development are enumerated below.
  1. Design — The main goal here is to define the shape of APIs, document interfaces, and provide stub endpoints.
  2. Testing — Here, we do functional testing of APIs by sending a request and analyzing the response at different levels of visibility, namely, application, HTTP, and network.
  3. Web Hosting — When deployed on the web, there are HTTP tools that help with the hosting of APIs for performance, security, and reliability.
  4. Performance — Before moving on to production, we use tools for performance testing of APIs that tell us how much load APIs may support.
  5. Observability — Once the API is deployed in production, testing in production provides the overall health of live APIs and alert us if any problem occurs.
  6. Management — Lastly, we will take a look at some of the tools for API management activities like traffic shaping, blue-green deployment, canary, etc.
The following figure shows different stages highlighting the tools.
Image title
We will illustrate the usage of tools on APIs exposed by a web application as we elaborate on each phase of API development. Product Catalog is a Spring Boot web application that manages a catalog of products. It exposes REST APIs to perform CRUD operations on a product catalog. The code is available on my GitHub.

Design

In the design phase, the API developer collaborates with clients of the API and the data provider to arrive at the shape of the API. REST API essentially consists of exchanging JSON messages over HTTP. JSON is a dominant format in REST API since it is a compact, easy to understand, and has a flexible format that does not require declaring schema up front. Different clients can use the same API and read the data that they need.
We will illustrate API design using Swagger. It is a tool that uses open format to describe the APIs coupled with Web UI for visualizing and sharing. There is no separation between design and implementation. It is an API documentation tool where the documentation is hosted alongside the API. The benefit of this is that the API and the documentation will also remain in sync. The drawback is that only API developers can change the structure of the API. The documentation is generated from the API. This means we need to build the skeleton of our API first. We have used Spring Boot to develop the API and Springfox package to generate the swagger documentation. Bring in swagger 2 and swagger-ui maven dependencies into your pom.xml.
<dependency>
    <groupId>io.springfox</groupId>
    <artifactId>springfox-swagger2</artifactId>
    <version>2.6.1</version>
</dependency>
<dependency>
    <groupId>io.springfox</groupId>
    <artifactId>springfox-swagger-ui</artifactId>
    <version>2.5.0</version>
</dependency>
Add SwaggerConfig.java to the project with following content.
package com.rks.catalog.configuration;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import springfox.documentation.builders.PathSelectors;
import springfox.documentation.builders.RequestHandlerSelectors;
import springfox.documentation.spi.DocumentationType;
import springfox.documentation.spring.web.plugins.Docket;
import springfox.documentation.swagger2.annotations.EnableSwagger2;
@Configuration
@EnableSwagger2
public class SwaggerConfig {
    @Bean
    public Docket api() {
        return new Docket(DocumentationType.SWAGGER_2)
        .select()
        .apis(RequestHandlerSelectors.any())
        .paths(PathSelectors.any()).build();
    }
}
This configuration tells Swagger to scan all the controllers and include all the URLs defined in those controllers for API documentation.
Once the application is started, Swagger documentation of the APIs can be accessed at the URL
http://localhost:8080/swagger-ui.html
Image title
Click on each API to examine the details — the URL, HTTP headers, and the HTTP body where applicable. A useful feature is the "Try it out!" button, which provides a sandbox environment that lets people play with the API to get a feel for it before they start plugging them in their apps.

Testing

Functional testing of REST APIs entails sending HTTP requests and checking responses so that we can verify that APIs behave as we expect. REST uses HTTP for transport that specifies the request and response formats of API. TCP/IP, in turn, takes the HTTP messages and decides how to transport them over the wire. We introduce three sets of tools to test APIs at these three layers of protocol stack, namely, REST Clients for REST layer, Web Debuggers for HTTP layer, and Packet Sniffers for TCP/IP layer.
  • Postman — Postman is a REST client that allows us to test REST APIs. It allows us to:
    • Create HTTP requests and generate equivalent cURL commands that can be used in scripts.
    • Create multiple environments for Dev, Test, Pre-Prod as each environment has different configurations.
    • Create a test collection having multiple tests for each product area. The different parts of a test can be parameterized that allows us to switch between environments.
    • Create code snippets in JavaScript to augment our tests, e.g., assert return codes or set an environment variables.
    • Automate running of tests with a command-line tool called Newman.
    • Import/export test collections and environments.
Image title
  • cURL — It is a command-line tool that uses it's own HTTP stack and is available cross platform.
curl -X POST \
  http://localhost:8080/books \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -d '{
"id":"1",
"author":"shakespeare",
"title":"hamlet"
}'
  • Burp — Burp is a HTTP debugger that let us see the web traffic that goes between the client and the API. It runs as a proxy between the client and the server. This allows us to intercept the request and the reponse and modify them to create scenarios that are otherwise difficult to test without changing the client. It is a suite of tools that is mainly used for security testing but it can be very useful for API testing as well. Set up your postman to send request to Burp proxy and configure Burp to intercept client request and server response. Intercept request and response as shown below.
Image title
Image title
  • Wireshark — Verification of some features of API, e.g., encryption, compression, etc., will require us to look a level deeper to see what is being sent and received on the network. Wireshark is a tool that monitors network interface and keeps a copy of all TCP packets that pass through it. Traffic is split by layers — HTTP, TCP, IP, etc. It also helps us to troubleshoot issues that require us to go deeper, e.g., TLS handshake.
Image title

Web Hosting

In this section, we will look at some of the features of the HTTP protocol that, if properly used, help us deliver performant, highly available, robust, and secure APIs. In particular, we will cover three parts of HTTP protocol — Caching for performance, DNS for high availability and scalability, and TLS for transport security.
  • Caching — Caching is one of the best ways to improve client performance and reduce load on API. HTTP allows clients to save a copy of resource locally by sending a caching header in the response. Next time, the client sends HTTP request for the same resource, it will be served from the local cache. This saves both network traffic and compute load on the API.
    • HTTP 1.0 Expiration Caching. HTTP 1.0 provides Expires header in the HTTP response indicating the time when the resource will expire. This can be useful for shared resource with a fixed expiration time.
    • HTTP 1.1 Expiration Caching. HTTP 1.1 provides a more flexible expiration header cache-control that instructs a client to cache the resource for a period that is set in max-age value. There is another value s-maxage that can be set for the intermediaries, e.g., a caching proxy.
    • HTTP Validation Caching. With caching, there is a problem of a client having an out-dated resource or two clients to have different versions of the same resource. If this is not acceptable or if there are personalized resources that cannot be cached, e.g., auth tokens, HTTP provides validation caching. With validation caching, HTTP provides headers in the response Etag or last-modified timestamp. If API returns either of the two headers, clients cache it and include in subsequent GET calls to the API.
GET http://api.endpoint.com/books
If-none-match: "4v44ffgg1e"
If the resource is not changed, the API will return 304 Not Modified response with no body, and the client can safely use its cached copy.
  • DNS — Domain Name System finds IP addresses for a domain name so that clients can route their request to the correct server. When HTTP request is made, clients first query a DNS server to find the address for the host and then send the request directly to the IP address. DNS is a multi-tiered system that is heavily cached to ensure requests are not slowed down. Clients maintain a DNS cache, then there are intermediate DNS servers leading all the way to a nameserver. DNS provides CNAME (Canonical Names) to access different parts of the server, e.g., both API and the webserver may be hosted on the same server with two different CNAMEs — api.endpoint.com and www.endpoint.com or CNAMEs may point to different servers. CNAMEs also let us segregate parts of our API. For HTTP GET requests, we can have separate CNAME for static and transactional resources that let us set up a fronting proxy for resources that we know are likely to be cache hits. We can also have a CNAME for HTTP POST requests to separate reads and writes so that we can scale them independently. Or we can provide a fast lane for priority customers.
With advanced DNS like Route53, a single CNAME instead of just pointing to a single server may point to multiple servers. A routing policy may then be configured for weighted routing, latency routing or for fault tolerance.
  • TLS — We can secure our APIs with TLS which lets us serve our request over HTTPS. HTTPS works on the basic security principle of key-pair. To enable HTTPS on our API, we need a certificate on our server that contains public and private key-pair. The server sends a public key to the client, which uses it to encrypt data and the server uses its private key to decrypt it. When the client first connects to an HTTPS endpoint, there is a handshake where client and server agree upon how to encrypt the traffic. They exchange another key unique to the session which is used to encrypt and decrypt data for the life of that session. There is a performance hit during the initial handshake due to the asymmetric encryption, but once the connection is established, symmetric encryption is used which is quite fast.
For proxies to cache the TLS traffic, we have to upload the same certificate that is used to encrypt the traffic. Proxy should be able to decrypt the traffic, save it in its cache and encrypt it with the same certificate and send it to the client. Some proxy servers do not allow this. In such situations, one solution is to have two CNAMEs — one for static cacheable resources over HTTP and for non-cacheable personalized resources, requests over secured TLS channel will be served by the API directly.

Performance

In this section, we will look at tools to load test our API so that we can quantify how much traffic our infrastructure can cope with. The basic idea behind performance testing is to send lots of requests to the API at the same time and see at what point performance degrades and ultimately fails. The answers we look for are:
  • What response times can the API give under different load conditions?
  • How many concurrent requests can the API handle without errors?
  • What infrastructure is required to deliver the desired performance?
loader.io is a cloud-based free load testing service that allows us to stress test our APIs. To get a baseline performance of API, different kinds of load tests can be run with increasing loads, measured by the number of requests per second, to find out performance figures quantified by errors and response times, for
  • Soak test — average load for long periods, e.g., run for 48 hours @1 request per second. This will uncover any memory leaks or other similar latent bugs.
  • Load test — peak load, e.g., run 2K requests per second with 6 instances of API.
  • Stress test — way-over peak load, e.g., run10K requests per second for 10 minutes.
This also lets us decide the infrastructure that will let us deliver API with desired performance numbers and whether our solution scales linearly.

Observability

Once API is deployed in production, it does not mean we can forget about the API. Production deployment kicks off another phase of testing — testing in production that may uncover issues that remained uncaught in earlier phases. Testing in production includes a set of activities clubbed together as observability that includes logging, monitoring, and tracing. The tools for these activities will help us to diagnose and resolve issues found in production.
  • Logging — Logging needs to be done explicitly by the developers using their preferred logging framework and a logging standard. For example, one log statement for every 10 lines of code or more if the code is complex with log levels split as - 60 percent DEBUG, 25 percent INFO, 10 percent WARN and 5 percent ERROR.
  • Monitoring — Monitoring runs at a higher level than logging. While logging explicitly tells us what is going on with the API, monitoring provides the overall health of API using generic metrics exposed by the platform and the API itself. Metrics are typically exposed by an agent deployed on the server or it may be part of the solution and are collected periodically by the monitoring solution deployed remotely.
Diagnostic endpoints may be included in the solution that tells us the overall health of the API.
  • Tracing — Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures.
Enabling Centralized Logging covers logging and tracing. For monitoring, interesting metrics may be stored in a time-series store like Prometheus and visualized using Grafana.

Management

API Management tools serve as a gateway that provides services that let:
  • API Clients provision themselves by getting API key
  • API Providers configure DNS, caching, throttling policies, API versioning, canarying.
These features and more are available on AWS API Gateway.
Image title