Data Management in Web Services: Less is More

By Alok Mehta and Richard D’Anna

In this article, we will explore how much data a web service should return and what are some of the deciding factors to consider.  According to the best practices in computer science, it is recommended to return only that is needed. However, the decision to return the optimal amount of data depends on the following factors:

  • Business Requirements: When gathering requirements for a new web service client, capture the specific data needs for the client’s use cases. If the client needs five data attributes to be returned, the ideal solution is to only provide access to those five data attributes. If an existing web service endpoint can be reused or extended to meet the client’s requirements, consider other factors such as context, user experience, and response times before making that decision. If there is not an existing web service endpoint to reuse, then establish a new one that only returns the needed data.
  • Data Context: Sending more data than needed can lead to unwanted problems, so it is always good to ask these questions when designing a web service:
    • Can any data in the response be misinterpreted? If the data being returned by the web service is free of anomalies and clearly defined, then either the source data has been immaculately curated over time or the data is being gathered from newly introduced data sources. Often, due to system bugs or unfortunate decisions to change a data attribute’s meaning mid-life, source data can be inconsistent in both its value format and definition. Should this situation arise, consider going through a data normalization exercise to correct the data at the source. If normalization at the source is not possible, then the web service should normalize the data in its response through code to minimize the risk of defects and data misinterpretation. Conduct mapping exercises and data review sessions with the consumer to further minimize the misinterpretation risk.
    • Should the web service let the client specify how much data to return? This concept is interesting and now technology is at a point where the requesting party can specify the amount of data needed. This is an advanced topic, and we will cover that in a future article.
  • Data Cost: One of the key factors to consider when designing the optimal amount of data to be returned by the web service is the cost of infrastructure. Answering the following questions might help limit the amount of data being returned:
    • Is there a data transfer cost? Cloud providers might charge for data transfer between availability zones and/or regions. Be mindful of this.
    • Does more data require powerful servers? Simply put, more powerful servers cost more money. Servers have a ceiling in terms of concurrent requests they can handle. Returning less data can potentially lower your memory, CPU, and number of servers needed.
    • What is the impact on the network? Less data equals less impact on the network. Often, network bandwidth is at a premium in organizations and the best practice is to minimize the data footprint.
  • User Experience: There are services that must return bulk data and many times such services are “headless.” In other words, there is no user experience via an interface like an app or a GUI (Graphical User Interface). However, there are cases in which returning more than the app or GUI can handle will result in a negative user experience. So, it is important to consider user experience when returning the data.
  • Architecture Extensibility: Software should be designed in a way that supports architectural best practices like high cohesion, low coupling, and abstraction, as they promote agility when requirements change. The same principles apply when considering data management in a web service. Following these best practices will optimize data management in web services by simplifying the addition and removal of data.
  • Response Time: This is the most obvious and simple consideration. More data would typically increase the response time, so consider the response time needs of the client. Define the average and maximum expected response times up front and treat them as top-tier requirements. These requirements should be primary drivers in both data-sourcing and data-volume design decisions.
  • Data Source and Format: How the data is sourced can play a key role in designing a web service. Sometimes, sourcing the data is a simple lookup or a join between tables, but sometimes it depends on long-running processes. If no status is provided to the client during the time the data is being fetched, it will result in a negative user experience. Client-side technologies (such as AJAX, iframes, etc.) can be used to communicate updates to the client while the web service is fetching the payload, so the user experience is not compromised. Another key factor to consider is the format of the data from the source to the destination. For example, the data might be stored as a number in the database, but the client expects a string to be returned. Consider this transformation step when designing the web service. If extra data is returned that requires transformation, then response times will increase.
  • Connectivity: In today’s connected world, the expectation is that data is always ubiquitous and available. However, internet connectivity can be a challenge at times, forcing us to think about offline capabilities. In such cases, we must consider if the internet connection of the client is a key factor. If the client is communicating with the web service primarily over cellular internet, then internet connection slowness and failures will be commonplace. Faster response times will reduce the impact of internet connection failures on the user experience.
  • Privacy: Privacy concerns are one of the most important items to consider when managing data via the web service. The questions to ask are how are Personally Identifiable Information (PII) and Personal Health Information (PHI) being managed? PII and PHI data may need to be masked or filtered based on the client’s needs. Whether performing the masking/filtering at the data source or in the web service code, response times will increase. The best approach is to return as little PII and PHI as needed for the use case.
  • Server Constraints: Many server-side technologies have max buffer or memory usage constraints. How does this constraint impact the web service data management decision? What should be done when there is a constraint? You must consider this when designing your web service.
  • Number of Clients: This factor applies to scalability and is part of non-functional requirements. How returned data is managed depends on how many clients are expected to use the web service. If multiple clients are using the web service, it is important to further reinforce the less-is-more principle to ensure acceptable response time and user experience.
  • Reusability: Service-oriented architecture has given us the promise of reusability, and to a large degree, we can reuse webservices if designed appropriately. However, data management plays a significant role in this concept. If a web service is used by multiple consumers, it is important that the context of the data across various consumers is similar in nature. For example, if a web service is returning the address of a person and is consumed by a CRM app and by a billing app, then we must ensure that the definition and context of the address does not change between consumers.
  • Backward Compatibility: Web services should be backward compatible. In other words, consumers should not have to change their code each time an updated version of the web service is deployed. When releasing an update to a web service, the previously established data definitions and contract should be honored.
  • Security: It is extremely important to examine data security needs when considering the web service design. Encryption/decryption are often time consuming but needed. Less-is-more is even more important in such cases. Otherwise, it will result in sub-optimal response time.

Conclusion

There’s no universal right or wrong as to how much data to return in your web service response, but often, “less is more.” That said, the architecture should be flexible enough to add/remove data as needed.  We hope that this short article provides insight into key factors to consider when designing a web-service.

About the Authors: Alok Mehta is Chief Information Officer of Business Systems at Kemper and Richard D’Anna is Director of IT at Kemper