Web API Design Anti-patterns (or how to give consumers a headache)
When building a web based API there are many design aspects to consider including the endpoint format, the request and response structure, how to handle errors etc. This article will help to highlight some of the poor design decisions web API designers sometimes make and how these decisions have painful repercussions for developers that want to consume the API. For each problem a discussion of possible improvements to the API is provided.
Note that this list is far from exhaustive and will most likely be added to over time.
Unhelpful status codes
HTTP status codes can be used to convey information about the status of the response without the client having to necessarily examine the response body content. HTTP defines five broad categories of statuses:
- 1xx : Informational
- 2xx : Successful
- 3xx : Redirection
- 4xx : Client errors
- 5xx : Server errors
Inappropriate use of status codes, such as returning 200 OK
on errors, can cause confusion. An API may even return the same status code regardless of the type of response effectively rendering the status code as useless.
Improvements: use the status code to convey the type of response based on the five categories. Define more specific status codes for certain scenarios, for example 422 Unprocessable Entity
for request validation errors. Once you have defined a set of status codes your API uses document them for the consumer and be consistent in their use.
Multiple data formats
An API may use multiple different data formats in different situations for the request/response. Every time another data format is used an API client has to write code to handle the extra data format. This wastes time when developing the client as well as adding needless code bloat.
An API can take a step further in the wrong direction by specifying a response content-type header that is a different data format to the actual response content body. For example the response content-type might be specified as application/json
but the content body actually contains XML.
Improvements: pick a default data format, for example JSON, for your entire API (all endpoints). If there is a need for your API to support multiple data formats, such as JSON and XML, then provide a way for the client to specify which format they want to use. HTTP already has a defined way to do this through use of the Accept
request header. Make sure your response Content-Type
header always reflects the format of the data in the response body.
Multiple data structures
An API may return multiple different response content data structures for the same endpoint depending on the scenario. For example upon success in a specific scenario the response content is returned in structure A and in a different scenario the response content is returned in structure B. Using different data structures for the same endpoint means the consumer may have to write more complex code to handle the two different scenarios and may even mean they can’t use deserialization.
Reporting errors in different ways dependent on the types (or even number) of errors can also be a problem. Consider the examples below which demonstrate two different ways a single endpoint is reporting errors back to the client.
In the example above there was no need to have both the error message
and messages
fields. An error message object could simple have been put in the message
field (a single element array) and had its description
set.
Improvements: make sure all scenarios for a single endpoint are handled by a single data structure. This will make serialization/deserialization easy for the client. Determine a single unified way to handle errors and be consistent across all API endpoints. Doing so will allow consumers to write generic code to handle API errors which can be easily reused.
Not well formed response content
An API may have decided upon a particular data format for it’s response content but the implementation it uses to create the data maybe flawed. For example JSON may be the documented data format for response content data but in certain scenarios the data sent is not actually JSON (i.e. it’s not well formed).
Improvements: an endpoint’s response content should be in a single data format. This data should be well formed and conform to the data format’s specification. API’s sometimes fall into the trap of deciding upon a data format but then not always returning well formed content. This can happen when the API does not use a well built library to create the particular format of data. For example historically it was often common to simply use a string
object to create XML data. This meant that even though the data looked like it was XML it was often in fact not. This in turn meant the client could no longer rely on deserialization and instead had to parse the data as though it were a simple string.
An API should make sure the format of the data it sends in the response is well formed before it actually sends it to the client.
Inconsistent field names
The API sends data in a single data type and is well formed but uses an inconsistent field name standard. Field names can be inconsistent by name or by casing:
- If a field name is inconsistent by name it will often use different words to describe the same piece of data. For example a customer identifier might be called a
customerId
in one place and acustomerRef
in another place. This difference in language can lead to confusion for the consumer of the API. - If a field name is inconsistent by case it will mix different casing styles within the same request/response. For example
CustomerId
in one place andcustomer-id
in another. It should be noted that many data formats, such as JSON, are case sensitive soAddressLine1
andaddressLine1
are not the same.
Improvements: decide upon a naming standard for the data you will return in the response and stick to it across all endpoints in your API. Standardize a language to describe the various pieces of data in the API’s request/response contract. Further more decide upon a casing standard and stick to it across all requests and responses.
For reference some of the more popular case naming standards are:
- Camel case:
myField
- Pascal case:
MyField
- Snake case:
my_field
- Kebab case:
my-field
Dynamic field names
Following on from the “Inconsistent field names” section above a response’s content may even use dynamic field names. A dynamic field name is one that can be different depending on the scenario of the request.
The example below uses dynamic field names customer
and orders
as part of an object called errors
:
Dynamic field names can create problems for the consumer. If the consumer cannot rely on the names of the fields in the response content then it can no longer use simple deserialization. In the case above the consumer would probably instead have bespoke code to look through the errors
object and then iterate over each of its fields, examining the data type of the field and then acting accordingly.
Improvements: unless there is a very good reason then dynamic field names should be avoided. If the dynamic field name itself is important then use a static field name and set the value to what the dynamic field name would have been. For example in the case above a field called fieldName
could be added and set to “customer”
or “orders”
.
Inconsistent field data type
Some data formats, such as JSON, have their own set of valid data types for values. For example JSON has numbers, strings, booleans, objects, arrays and nulls. An API might return the same field as a different data type dependent on the scenario.
Consider the following simple JSON based example. In scenario A status
field is a number, in scenario B status
is a string, and in scenario C it is boolean. If the client that consumes the API is written in a strongly typed language this can cause inconvenience. For example if the client were written in C# and the client wants to use deserialization on the response content then it would have to define the status field as a more generic type, such as the .NET object
type.
Improvements: all fields should have one data type and stick to it. Do not reuse fields for different purposes (often why the data type changes between responses). As the designer of an API if you are using JSON and are not sure what data type to use then it’s probably safest to go with a string.
No clear versioning strategy
An API may have no clear versioning strategy. Instead when breaking changes are deployed they break functionality for the consumers of the API.
Improvements: there are a number of different versioning strategies we can use when building a web based API:
- Path: endpoints have the version number as part of the URI’s path. For example the
/v1
in:https://myapi.com/api/v1/customers
. - Query string: endpoints have the version number as part of the URI in a query string name/value pair. For example:
https://myapi.com/api/customers?version=2
. - Custom request header: use a custom HTTP request header with the version number as an attribute. For example:
Api-version: 2
. - Accept header: use the HTTP
Accept
request header to define the version number. TheAccept
header can be used as part of content negotiation to define the content data type and version the client wants the API to use. For example:Accept: application/myaccount.v2+json
.
Each method has its own set of advantage and disadvantages but decide on one at the beginning of your API design and document it for your consumers. If you cannot decide then often using the path method is the simplest.
Inconsistent API domain language
As mentioned in the section “Inconsistent field names” an API may use an inconsistent domain language. This can go beyond simple field names to the rest of the API contract. For example API error messages may talk with inconsistent language. An error message may even talk with the domain language of a backend system behind the API which may use a different domain language to the API itself.
Improvements: define a language for all parts of the API contract (not just the field names). If a customer is called a “customer” in one part of the API contract don’t call them a “client” in another. Be wary of the domain language of systems that the API uses leaking into the domain language of the API itself.
Leaky error messages
An API may on occasion return internally specific error details in the response. This as well as being a pain to the consumer may in turn be a security risk. For example in .NET a request may cause a NullReferenceException
to be thrown and the full exception (with stack trace) be returned in the response because a request field was missing.
Improvements: never throw internal server errors back to the consumer of your API in production. Instead return a response code of 500 Internal Server Error
. Note that it might be a good idea to return internal error/exception information in the response in a development/testing environment for debugging purposes, but make sure this feature does not go to production.
Poor documentation
A lot of APIs are published with poor documentation. This causes nothing but problems for consumers that want to use the API. How does a client know what request data to send (or where) if it is not defined? Similarly, how does a client know what data to expect from the API if it is also not defined?
Improvements:
- Think about the type of useful documentation that could be provided. PDF describing the API? Postman collections? online API portal? …
- Document the fields of the request and response and their data types (if your data format used has data types). Also document any further request field constraints. For example a field
customerId
is a string, is mandatory and needs to be 10 characters in length. - Document examples of requests and responses in different scenarios. Postman collections with examples of requests and responses can also help with this.
- Define how errors should be handled and the possible errors your API can return.
- If your API uses a form of authentication remember to include details on how the consumer is supposed to authenticate and what your API expects.
- Lastly, simply try to think from the API consumers point of view. What would be helpful to them?
Redundant request/response information
Asking the client to provide information in the request that the API isn’t using or could have worked out/derived itself is redundant information. Returning information in the response that is not relevant/useful to any client is redundant information. Redundant information in the request/response is rarely a good thing:
- It needlessly increases the complexity of the contract between the API and it’s clients. Greater contract complexity leads to more work for client developers wishing to use the API.
- It needlessly increases the size of the request/response over the wire so potentially reducing performance.
- The greater number of fields in the request/response between an API and it’s clients the greater potential for tighter coupling. Tighter coupling means the API can be harder to change later.
A lack of analysis in ascertaining the required request/response fields can lead to redundant request/response information. This approach can be thought of as “throw everything at the wall and hope something sticks”.
Redundancy in the request/response can also often be because the API is being lazy and is instead pushing (leaking) work/logic out to it’s clients. Take this API example for a request ID field, the format being:
<vehicle_number>_<operation_called>_<datetime_now>
For example: ABC1234_GETDETAILS_20220113120000
This ID is a composite of a vehicle number that is already being supplied elsewhere in the request, the name of the operation being called on the API and the date time now. All three pieces of this information are known to the API upon the request being made and the API can if it requires derive this ID. So why is the client being asked to create and supply it?
Improvements: generally speaking the smaller the size of contract between API and client the better. Think before putting a field on the request/response is the field actually needed? If the field is on the request does the API need it or could it be derived/worked out by the API itself? If the field is on the response do any potential clients really need it? Beware that once a field is on the request/response you have created coupling between the API and client that can be difficult to undo later.
Final thoughts
A common theme with most of the sections above is consistency. At every point in the design of a API be consistent. This will help the consumer get used to how your API works and go on to create client applications/SDKs quickly with fewer issues.
Aside from obviously testing your API, often the best way to find out problems with your API is simply to use it. Write a client application that tries to use your API and make improvements to the API accordingly. In fact the best designs for APIs are ones that were built entirely from the consumers point of view and what the consumer wants and how they expect to use the API. After all what’s the point of an API with no consumers?