“This text is written by a human “— AI Humanaiser.

The most transformative eras and, in general, the most progressive human evolution are driven by “What to do” as the compass. The “How to do?” — the methods — stipulate the velocity and partially the quality of that progress or can block it.

The Problem

An integration of components within AI differs from an integration between AI agents.

The former relates to integration with known entities that form a deterministic model of information flow. The same relates to inter-application, inter-system and inter-service transactions required by a business process at large. It is based on mapping of business functionality and information (an architecture of the business in organisations) onto available IT systems, applications, and services.

The latter shifts the integration paradigm since the very AI Agents decide that they need to integrate with something at runtime based on the overlapping of the statistical LLM and available information, which contains linguistic ties unknown even in the LLM training. That is, an AI Agent does not know what a counterpart — an application, another AI Agent or data source — it would need to cooperate with to solve the overall task given to it by its consumer/user. The AI Agent does not know even if the needed counterpart exists.

When an integration is observed for general-purpose AIs, it is mainly seen in the so-called humanitarian spheres like culture, politics, social media and platforms. These spheres have at least one thing in common — the information in there cannot be reliably verified because it is fundamentally based on the opinions of research groups competing with each other in different socio-cultural and political contexts.

In so-called natural science spheres, where the information returned by AI Agents can be reliably verified by mathematical, physics, chemistry and similar knowledge already collected by people in society, the uncertainty of the runtime needs of an AI agent is much more narrowed. In other words, if an AI Agent trained on information depicting a variety of mechanical theories and real-world mechanical facts, it is more likely than not that the AI Agent would be in need of additional, videlicet, mechanical information at runtime, and such information can be prepared up front with preknown means of access for integration.

This article focuses on the humanitarian spheres for AI Agent integration associated with unknown integration targets.

Approach to Solution

Every AI Agent operates in an environment, not in a vacuum, and may capitalise on it.

Any AI Agent may have its individual owner and provider. These owners and providers may be unaware of each others and act independently when creating their AI Agents.

No AI Agent can be self-sufficient due to its fundamental design — it depends on the prompts and real-world data at runtime.

It seems that the approaches to integration and the integration solutions differ for the humanitarian and natural science spheres.

One of the most effective methods of solving complex problems like AI Agent integration is an “approach from the opposite”: if we are not sure what the thing is, we can start with defining what it is not.

In the article “A New Concept for Authentication and Authorisation for AI Agents” and two following articles, I set a basis for the integration solution around a concept of trusted realm in the form of a Resource Registriy for an AI Agent at runtime.

In this article I’ll dig deeper into this model of integration. However, let us start with what other people use today and explain why I’ve decided not to follow them.

A Language-Based Discovery

The Language-Based Discovery method is also known as Natural Language Coordination, in the meaning of coordinates rather than “teamwork”. In this method, an AI Agent asks another AI Agent in natural language if they can handle a certain task.

A traditional realisation of this method requires either a precursoryknowledge about another AI Agent or it may be imagined like a “semantic broadcasting” to those who are listening.

If an AI Agent A is aware of an AI Agent B, these agents are coupled. If the AI Agent B changes, this can easily corrupt the AI Agent A. If you construct a chain of AI Agent invocations, not an orchestration, a change or a failure of any one of them results in the failure of the entire chain. This effect is known as “choreography breach”.

The broadcasting in a certain environment assumes a presence of listeners. Since the latter are independent, some of them may be offline at the moment of broadcasting and miss the request. The active AI Agents catch the request and assess if they a) can satisfy it and b) want to satisfy it and then respond. The request assessment may be implemented by several different methods or algorithms. Overall, this powerful pattern has its problems, and one of them is that you can design an assured response to the active AI Agents, but there are no guarantees of when this request would be processed and responded to, if at all.

The basic constraint for this pattern is sharing the language used for the request, its assessment and responses. A trivial solution used for understanding of the language is translation. However, the major problem with such an approach is old and known from the time of semantic querying of CORBA services (1996). Particularly, there is no guarantee that the spelling of the query would be understood by any other AI Agent or understood properly. For example, do the words “price” and “cost” mean the same? This concern is especially pivotal due to each AI Agent may be constructed and maintained by different owners and providers with different vocabularies. Plus, unstructured inter-influences of different human languages modify the word meanings, especially in the hetero-lingual environments.

Self-Semantic Routing / Embedding-Based Discovery

Here, an AI Agent “embeds” its intent/goal, i.e., decomposes the goal, into a vector of invocations and matches it against a vector index of other known AI Agents. In other words, instead of matching a needed capability, the match of indexes is used.

This is a deterministic model, while an AI Agent must know before the runtime what its available indexes may be. If the AI Agent identifies a new runtime need, there should be a mechanism that links the need with one of the preknown indexes. The mechanism is non-trivial — it should deal with, probably, numeric indexes and semantic needs.

This is even more difficult because indexes may change, as well as the related AI Agent capabilities may change over time. In addition to a linking mechanism, it can easily happen that no indexes may be found for concrete needs. Moreover, even if the indexes are matching, the AI Agent in need should know how to communicate with the found AI agent, which constitutes coupling.

This solution is good for static testing, not for the dynamically changing real world.

An Autonomous Self-Healing via Trial-and-Error

his is a regular learning or research method that becomes extremely costly at runtime with no guarantees for success. All difficulties and problems related to coupling (trying) of AI Agents are in place as well.

An Embodied Multi-Agent Planners

Formally, an AI Agent generates entire sub-agent plans and spins up other AI Agents as needed (realised in ReWOO, OpenAgents). However, in order to include other AI Agents in this plan, the AI Agent must know them, i.e., must be coupled with them.

Why should this planner be embodied instead of leased from the environment? Again, this is the deterministic model while the AI Agent generates the plan based on the unknown up-front prompt and real-world data to process. That is, there is nothing that can be configured a priori to spin.

Self-updating Agent Meshes

This popular one-time method offers an excessive model where AI Agents, instead of focusing on the tasks they had been created for, constantly publish and subscribe to changes in the AI Agent graph (like gossip protocols). This is also known as Capability Broadcasting and subscription.

Somehow, the authors of this method omitted to warn that there should be an environment capable of supporting massive independent publishing, as well as that it should support subscribing to “add” changes and unsubscribing from the “remove” changes, which is quite a task on its own.

Smart people do not always do everything they can do… just because they can. Integration between AI Agents are already a mess caused by their uncertainty, and the mesh only increases the chaos.

Overall, I have an impression that developers working on AI Agent integration had a virus of “inertia thinking” and applied everything they knew before. This is also known to architects as a bold, contextless design, which is a part of bad practice.

Solution

Helpful Context

To be fair to colleagues, I have enumerated a few methods that have potential, IMHO, for multi-Agent systems (MAS) and AI orchestration:

Dynamic, runtime discovery: it requires a special infrastructure where the discovery may take place, as well as the mechanism of understanding that the discovered entity is the one that is suitable for the needs of the discovering actor. This model is well known from the time of SOA but was quickly overridden because the majority of developers preferred simple, precise interfaces like in REST. The concept “give me what I want” was partially restored only recently in the GraphQL technology. Howe er, it appears as the major one for integration AI Agents against unknown providers.
Knowledge Graphs and Embedding-Based Matching: an AI Agent can query a knowledge graph or embedding space to find relevant capabilities semantically. Indeed, but on the downside, this method does not articulate what and how much knowledge about the knowledge graph or embedding search-space an AI Agent should have to query it and to understand the response. Moreover, found capabilities, if any, are far from enough because the querying AI Agent should find how to communicate with the capability provider (another AI Agent or data source) and should address both Authentication and Authorisation control related to the counterpart while both AI Agent and the resources may relate to different infrastructural realms with different security control rules and providers.
While it is common in edge AI, IoT, or blockchain-based agent networks, there is a certain ambiguity in this “method”. A Peer-to-Peer Discovery has its spectrum of use cases, but it is narrowed and constrained by the inter-coupling of the peers. They should know about each other. There were intensive debates in the industry about the architectural quality of such inter-coupling in the 2010s, known at that time as a “choreography” or a “ring”/“mesh” integration. As Novell Inc. proved on many occasions, this integration model did not scale well and was fragile due to the problems with managing failures of the network nodes and continuous extra resources needed to update routing P2P schedules.

Testing Water: is This My Cup of Tea?

In routine life, people try to be reasonable to the best of their mental abilities and consider correspondence between the task and available capabilities. The exceptions are usually attributed to “washed brains”. The notorious example is how the Soviet Bolsheviks (a radical faction of the Marxist Russian Social Democratic Labour Party (RSDLP), later renamed the Communist Party) planned to turn the great Asian rivers backward because they had some water shortage at the headwaters and they believed that they could do anything.

So, it makes sense to apply the same rationalism to AI and AI Agents before starting any integration planning and design. This means that each AI or AI Agent should, IMO, check whether the request, or task, or prompt given to it is adequate to the capabilities of its LLM. In other words, it is reasonable to verify if the domain of the prompt corresponds/matches the domain where the AI’s LLM had been trained and tuned, if at all. If there is a mismatch, the AI Agent should not even try to execute the task.

The outstanding problem exists only for the general-purpose AI or AI Agents working in the humanitarian domains. In too many cases it is very difficult to reliably segregate prompts (written in natural human languages by people of different intelligence) from the domain types like cultural, social, political, ethical, legal, biomedical, healthcare, education, history, conflict resolution and the like. Because of this, the requirements for general-purpose AI or AI Agents are tough and should cover all those domains at once.

Here is an example: I challenged MS Copilot with a question: “Do we know any AI or AI Agents that can ‘pause and resume’ themselves?” This AI is advertised as an “AI Assistant” of the general-purpose type. Obviously, I asked not about kitchen stuff, or gardening, or the best hotel for my holidays. However, the “AI Assistant” must say something about the requested topic…

The response was: “I’m afraid I can’t talk about that topic; sorry about that.” In plain English, “can’t talk” is not the same as “I do not know” or “My search for related information failed to find anything.” Correct? The given answer carries a connotation of secrecy, which raises concerns and suspicion about the dishonesty of Copilot. I think this is an example of unreasonable logic embedded in Copilot that disqualifies its assistance capabilities.

The Wishful Thinking

The trickiest part is teaching the LLM to self-identify a need, pause itself and exchange the need information with the internal modules of the AI or AI agent. Ideally, the work of the AI ought to be resumed with the added information. Conceptually, this is not different from any synchronous invocation of an SOA Service or Microservice, but this process pause-resume is provided by the programming language for you, and now you have to figure out how to implement it.

Regrettably, we have an “oops!” facet here — none of the currently known LLM models (like GPT-4, Claude, etc.) have the native, built-in ability to intentionally pause and resume themselves mid-execution; they are stateless and “one-go” procedures from the developers’ viewpoints.

As a result, we have a few consequences that many may not be aware of:

(Part 2 will appear Saturday)