Risk and the Technology Architect

What was it that first drew you in to a career in technology? For me, perhaps like you, I was excited by the technology itself and possibilities it created. The prospect of working with leading edge software and hardware, and the application of them has always been a thrill.

What I did not sign up for in particular was the risk management of the impact of technology. It probably never occurred to me that technology, working in a typical corporate would have risks beyond basic quality of service issues or system downtime. And yet as technology becomes ever more deeply integrated into daily life, real technology risk becomes unavoidable.

In the UK, the postal service commissioned a new accounting system that was implemented in 1999. The system was used to track the accounts of local, franchised postmasters who offered postal services in small towns and villages. Discrepancies soon began to emerge in these local accounts, and this had real world consequences. Some of the postmasters were dismissed, some were bankrupted, some received criminal charges and some even served time in jail. At least one suicide was attributed to this failure.

Hundreds of people lost their jobs, their livelihoods and their homes. Their lives would never be the same again.

The real source of the issues was not with the postmasters, it was errors in the system itself.

This is one very specific example of a very real-world impact. In reality, risks can range from the obvious data vulnerabilities and leaks, to the nuanced biases in cognitive systems, to the ethical minefield of robotics in machinery that is responsible for human life from medical devices to planes, trains and automobiles.

What is the role of the technology architect in predicting and managing risks like this?

As an architect, you are expected to have a greater overview of the elements and interactions of a system or sub-system than perhaps anyone else in the technology domain. Is it possible to have this higher-level view and have enough knowledge of the detail to understand the potential vulnerabilities or potential consequences of any given decision?

Over recent years, the profound, literally life and death, impact of technology has been prominent in the headlines from self-driving cars to the issues with Boeing 737 Max.

Perhaps less life and death, and yet still very impactful have been various episodes of bank outage issues in the UK, to wide-scale theft of personal data. The usual is to refer to, “computer issues”, as if the responsibility lies with an unknown and unexpected Gremlin that has somehow found its way into the company.

Whatever the impact, technology is at the heart of our lives and this will only increase. This in turn will increase the consequential risks, which in turn will increase the responsibility of the technology architect and the need to fully understand and manage the risks.

Within the world of technology, the risks that are usually forefront of our minds are typically related to budgets, timelines, downtime, service and capability. In a technology enabled world it may be useful to revisit the risk categories associated with software and systems that you may have oversight for: –

  • Quality of Service
  • Disruption of Daily Life
  • Threat to Life

The risks associated with Quality of Service are perhaps the most familiar. The system does not operate as expected in obvious ways. The consequences of the failures have a low level of impact, perhaps late or incorrect delivery of information or products, slow response times, inconsistent behaviours, perhaps increased cost or poorer returns.

The risks associated with Disruption of Daily Life is almost certainly a fast growing and complex area. Examples range from the unforeseen consequences of operating systems, personal data leaks, trading in personal data, and systemic biases in AI systems and higher level decisioning. There are a number of high-profile stories recently where cognitive systems were shown to be biased in dimensions of race and gender, having a real world impact on the financial health of individuals.

One, perhaps infamous, incident was the theft of personal data from the Ashley Madison website. The incident, as well as taking advantage of a security vulnerability, also highlighted the lack of security and protection of sensitive personal data.

At the highest level of consequential risks there is Threat to Life. It is more than likely you have read about the consequences of a failure can have. Fly-by-wire systems have little or no physical connection between the control device and the outcome. There are cars with fly-by-wire throttle pedals. The throttle pedal drives a sensor, and it is the interpretation of the sensor input that creates the engine response. The failure of the electronic control unit could possibly create a scenario where the car failed to respond in a way that is less likely in a car with a mechanical connection.

As more and more vehicles are being tested with driver assists such as self-parking or lane-change modes, or, total self-drive modes, there is both increased risk due to failure, and complex ethical questions raised in the design of the self-driving algorithms.

Ultimately, all the questions associated with risk raise increasing ethical challenges and deep ethical questions for the overall system architects.

Another dimension in considering risk is that ‘systems’ as a whole, are now significantly more complex and interlinked than they have been in the past.
Mainstream systems have generally architected to respond to predictable risks based on probability of threat, a suitable approach to solve an orderly problem that has predictable outcomes.

As a response to systems operating in more complex environments, system architectures need to be architected as complex-adaptive systems (CAS). A CAS system has properties that are designed to respond to the unpredictable as well as predictable events.

Whilst this is an old problem, new approaches to systems design are evolving to address these factors in response to the rise of socio-technical systems (systems that have a direct impact on people and the environment).

A variety of techniques that have been successfully used in heavy manufacturing, battlefield command and aerospace are now being considered used as a basis to better systems architecture.

So, what is the role and responsibility of the technology architect in this world of real-life impacts on the one hand and increased complexity on the other.

These are a range of specific dimensions an architect should be ensuring are covered as part of their role as a minimum: –

  • It is imperative that the ‘system’ is designed holistically, and all aspects of the use-case are fully defined.
  • What are the stressors on the potential system and when will they come in to play?
  • Architect solutions rather than rely on testing – for example use a Zero Trust approach to architecting for security.
  • The architect has a key role to play in working with systems assurance and testing to sign-off the overall testing plans.
  • Ensuring appropriateness of test data or AI training data to ensure real world outcomes are understood.
  • Validation of input parameters to ensure they are within expected limits.
  • Robust and graceful recovery from failures. Employ the chaos monkey to create unexpected failures.
  • Monitoring and logging principles and standards to identify and track issues/errors across systems/components.
  • Consider how the system will behave rather than function under different conditions, especially external stressors.
  • Is the system decoupled to a degree which will limit impacts of component failure?
  • Does the system have diverse paths to continue to operate around a component failure, can the system perform exaptation in the face of the unexpected?
  • Seeking independent review and deep critique of at the design stage.
  • An understanding of the factors external to your systems that can indirectly impact the system, such as human, geographical, legal and policy factors.

Risk is a very real part of being an architect and the unintended consequences of not effectively managing risks can have far reaching and life changing impacts.