By Krasimir Baylov, Managing Partner Intway and A&G Contributing Editor
The rapid development of information technology combined with the high demands of the business has brought a completely new set of challenges over the last decade. The advancement of cloud technologies allows us to solve problems that were even unthinkable years ago. Today we build extremely large software systems, which are developed by tens, hundreds, even thousands of engineers. They do not tolerate downtime and they require quick adaptation to surges in the load. If we need to describe contemporary systems with a single word, it would be complexity.
Just imagine a distributed system that has thousands of nodes. How is this system monitored? How do we handle node failures? What happens when external factors change? Apparently, having a few hundred or thousand people dealing with such cases is not a feasible solution.
Luckily there are types of systems which come handy in such situations – autonomous or self-adaptive systems. They can manage themselves without the direct intervention of humans. While the concept is not completely new (it brings it’s origins back to 2001 when Paul Horn, IBM’s Senior VP of Research introduced the concept of autonomic computing), it has become extremely popular in the last few years. Think of autonomous cars, unmanned aerial vehicles (UAVs), intrusion detection systems (IDS),…
Self-adaptive mechanisms are used widely across modern applications. They can detect anomalies, failures and apply corrective actions – restart nodes, spin off new instances, block suspicious IPs to name a few of them. However, their full potential is yet to be revealed. Today manufacturers invest millions (even billions) in building self-driving vehicles that could change our future. Aerospace industry is heavily relying on self-adaptation to reduce human errors. All these complex mechanisms are based on simple cyclic flow – the autonomic control loop.
Autonomic Control Loop
So, autonomous systems could manage themselves given some high level goal. This goal is initially defined or it could be updated based on external factors. How does it happen? At the heart of self-adaptation is the autonomic control loop. It provides a feedback system that supports the decision making process.
Each autonomic control loop has four main phases:
- Collect – collect all needed information. It’s usually achieved by obtaining information from external sensors or monitoring systems
- Analyze – analyze the collected data. Apply specific models or compare it based on established business rules.
- Decide – make a decision. Process the analyzed data and decide if and what needs to change in order to get closer to the established system goals.
- Act – turn the decision into action. Apply the necessary changes.
Once a full cycle is closed, it is repeated again. The loop continues as long as it needs to – usually until the goal is achieved, or forever (in case we pursue ever lasting self-adaptation).
Another form of autonomic control loop is the so-called MAPE-K model. It maps to the four main phases to Measure → Analyze → Plan → Execute. However, there is a new letter K here, which stands for Knowledge. Basically, all phases are organized around a common knowledge base that facilitates the entire process of analyzing all the data and ensuring that the most suitable adaptation decision is taken.
Now that we know what autonomous systems are, let’s take a look at their self-management aspects. Such systems could adapt their behavior to achieve certain goals in four main directions or aspects.
- Self-configuration – allows systems and their components to configure themselves following high level policies. Configuration parameters could be changed at startup or at runtime.
- Self-optimization – allows systems and their components to continuously monitor themselves and search for possible improvements. Such improvements could be applying updates at the moment they are released.
- Self-healing – allows systems and their components to automatically detect and resolve problems. Once a problem is detected, the system would try to resolve it and retest it.
- Self-protection – allows systems and their components to protect themselves against attacks and cascading failures. This aspect comes in two flavors. (1) The system could take actions to prevent external attacks; (2) The system could respond in a way to reduce the overall effect of the attack.
In the real world, we would rarely see systems that implement only a single aspect of self-management. In order to have a full fledged autonomous system you would need to consider all four aspects in parallel. After all, can you imagine a rover on Mars that could only self-configure but not self-heal itself?
Well, it’s not that easy to get a fully autonomous system following the big bang approach. To do so, you had better use an evolutionary approach. Start with something small and improve/extend it over time. That’s what happens with vehicles. You start with Level 0 (manual driving), go through Level 1 (driver assisted), Level 2 (partial automation), until you reach Level 5 (high automation). To put all this into a general framework which is applicable for self-adaptive systems, we could use the following scale, initially developed by IBM.
This maturity framework comes handy when you design self-adaptive systems from scratch. However, keep in mind that some platforms may provide a solid foundation for implementing Level 2 even Level 3 and 4 applications. A simple example are cloud providers. They provide you with mechanisms to set up hosting infrastructure that can handle variable load by starting or stopping individual nodes.
A question we all might be curious about is “what is the future of autonomous systems?”. Could we have a world of self-driving cars on the public roads? Would autonomous systems replace people? There is no definitive answer to this. However, there is one sure thing – autonomous systems could help us deal with the ever increasing complexity of contemporary software systems. We should use them in our favor to reduce incidents, improve our lives and discover new ways to extend our own limits as humankind.