Data Virtualization and its Effect on Enterprise Architecture

The concept of virtualization in the IT world has been around for a long time. Looking at the three main pillars of infrastructure: compute, network, and storage, computer virtualization dates back to 1964 with the development of the IBM CP-40. Belcore first began work on the concept of VLANs, which could be considered the first network virtualization even though it’s really network segmentation. And storage virtualization was introduced in the early 2000s.

Compute virtualization was the first to be widely adopted because of the cost savings companies were realizing with it. However, businesses were missing the much larger picture of what virtualization could mean for enterprise architecture. Then the concept of cloud computing emerged; its foundation is the abstraction of the main three infrastructure pillars. Finally, the IT world really began to understand the power of abstraction of services.

In my previous article, I described data as the most valuable asset a business owns. Data has great value, and its potential is only limited by what questions we ask. Data within businesses also pose a great financial burden as well as new security risks. With each new application introduced into the enterprise, a new database is usually created. Big Data has compounded the problem by drawing even more IT and business organizations to collect and store data. The increased use of data is great and has limitless potential; however, enterprise architecture methods and frameworks were not prepared for this. What is occurring today is data duplication where the same data is collected multiples times because the different organizational entities do not know of the existence of data already being collected. This increases storage and development costs, and adds to an already difficult task of governance.

The solution is to add Data as a Service (DaaS) into the enterprise architecture model. DaaS acts as an IT service catalog but for data sources (see figure 1).This solves the problem of data duplication since the catalog of data sets can be examined and used if the data is already being collected. It also ensures that there is a single, common, accepted set of data that is considered the authoritative source of truth.

This solves most issues surrounding data in businesses today but not all. There are still issues with the use of multiple database vendors, network access to the data sources, data classification, and data security.

Going back to the beginning thread of the article the solution is abstraction or what is now known as data virtualization. This concept takes all the different data sources and abstracts them in a manner that they can be accessed in a common, standard way. The most common design is the use of data gateways that understand the locations of all the data and the security rules surrounding the data, and that presents the data sets using a standard API (see figure 2).

There are also advanced features such as data federation, where results from multiple data sources can be combined into one result set. Also, there are some data virtualization solutions that also offer data management functions such as the enforcement of data retention rules. Feature sets will vary between data virtualization products; however, the abstraction, common access method, and security will be a primary function.

Enterprise architecture methodologies and frameworks must change to account for the successful right use of virtualization and abstraction technologies. Also there is a challenge to think of foundational services, which roll up to middle layer services, which enable high-level services that are user facing. Especially for development, the higher the service point that their applications use, the better. This allows for greater availability as well as service mobility being able to offer the service at multiple locations. (See figure 3.)

Just as there are three major infrastructure service groups—compute, network, and storage—there will eventually be basic services above that layer that will become foundational. This is where data virtualization fits in that model. Data virtualization is a middle layer service that would become part of the platform offering, which in turn is where end-user services would reside.

This model also becomes very agile which at first it seems too standardized and rigid to support agile methods. But in the strict standardization of the lower infrastructure layers and offering the middle layers as a services with common access methods, developers save a huge amount of time not worrying where data resides or how to access it. No matter which vendor or type of database, the access model and the security model are the same. Developers can focus on the end-user functionality, which this model would enable them to rapidly produce micro-services for the consumers.

In short, IT organizations and professionals have to change the way they think about how IT delivers capabilities. First, we must think in terms of everything that IT offers is a service or XaaS. Higher level services are built on top of lower level services with the lower level services inheriting attributes of the higher level services. You can read more about these concepts in my article on quantum service theory. The key theme is that each service is treated as an object which contains attributes that describe what the service provides, the expected availability, the expected performance, etc. Using this model is how IT can easily align with the customer requirements and level set customer expectations. Another key concept along the same line of thought is IT as a Service which essentially commoditizes IT offerings. All the high-level services are presented in an IT service catalog, but more importantly, each offering is tied to a business outcome.

In an IT as a Service model, the internal IT organization behaves like a vendor, and further more, they strive to be the preferred vendor. Data is an area that is an untapped resource that IT organizations must take advantage of to allow companies to stay competitive in their market segments. The easy access, manipulation, and presentation of company data is a critical service. Data virtualization can be a key component to enabling efficient data use in a business.