Cloud-Integrated Cyber–Physical Systems: Reliability, Performance and Power Consumption with Shared-Servers and Parallelized Services

With the rapid development of the digital economy, cloud computing services have been widely applied, and Infrastructure-as-a-Service (IaaS) has become a key application adopted by individuals, governments, and enterprises. However, the emergence of virtualization technology has brought resource sharing and service parallelism to cloud services, posing new challenges to system modeling. Meanwhile, cloud service providers (CSPs) face the dilemma of ensuring service quality as per Service Level Agreements (SLAs) while controlling operational costs like electricity expenses, which account for up to 70% of daily operational costs of cloud computing data centers. Additionally, the global average resource utilization rate of data centers is only about 10%–20%, and the intricate interdependencies among system reliability, performance, and power consumption (PC) have not been fully considered in existing studies.

Therefore, Shuyi MA from Xi’an Jiaotong University and the City University of Hong Kong, Jin LI from Xi’an Jiaotong University, Jianping LI from the University of Chinese Academy of Sciences, and Min XIE from the City University of Hong Kong jointly conducted a research entitled “Cloud-integrated cyber–physical systems: Reliability, performance and power consumption with shared-servers and parallelized services”. This research was supported by the National Natural Science Foundation of China, the Shaanxi Province Innovative Talents Promotion Plan –Youth Science and Technology Nova Project, and the Youth Talent Promotion Project of China Association for Science and Technology .

This study constructs a systematic model to simultaneously evaluate system reliability, performance, and PC, while delineating cloud service disruptions caused by random hardware and software failures. First, the system states are described using a birth–death process that accommodates resource sharing and service parallelism. Given the relatively short service duration and regular failure distributions, transient-state transition probabilities are used instead of steady-state analysis. This birth–death process effectively links system reliability, performance, and PC through service durations determined by service assignment decisions and failure/repair distributions. Then, a multistage sample path randomization method is developed to estimate system metrics and other factors related to service availability.

The findings indicate that under the premise of reliability guarantees, the trade-off between performance and PC depends on the balance between service duration and unit power. Specifically, when the workload is moderate, service parallelism can improve performance and save energy; but as the workload increases, the performance loss caused by resource sharing becomes more obvious due to resource capacity limitations. When system availability is constrained, resource sharing should be carried out cautiously to meet deadline requirements. Additionally, the study formulates optimization models for service assignment and compares optimal decisions under different availability scenarios, workload levels, and service attributes. It is found that enhanced system performance mainly relies on service parallelism and avoiding resource sharing; reducing PC requires well-designed service parallelism and resource sharing, with the latter playing a more prominent role; reduced availability adversely affects both performance and power optimizations; workload intensity significantly impacts potential benefits, with high workloads restricting performance optimization but enhancing energy-saving potential; and the impact of service characteristics on performance and power optimization may be opposite, meaning changes in service characteristics can lead to different trade-offs between performance and PC.