Autonomous agents: An intro to self-prompting

How do autonomous agents work?

The way an autonomous agent works is deceptively simple: First, users define an end goal (e.g., “create a curriculum to learn acrylic painting”). The program then comes up with aims to complete all the steps necessary to reach that goal. During the process, it continuously creates new tasks, saves them in a database, and rearranges them based on their priority.

Rather than having a human engineer prompts for every individual task, autonomous agents are effectively self-prompting.

One of the first projects to use LLMs for building autonomous agents was BabyAGI. The tongue-in-cheek name refers to the concept of Artificial General Intelligence, a machine’s ability to complete any intellectual task as well as a human or better.

While BabyAGI is nowhere close to this hypothetical future stage of machine learning, it might just be a first tiny step in that direction.

BabyAGI itself is nothing but a short Python script – all the heavy lifting is done by a user-specified LLM such as GPT-4 or LLaMA, using the corresponding API. The scope of what’s possible therefore depends on the model used under the hood, which is where other projects like AgentGPT and Godmode come in.

Beyond BabyAGI

AgentGPT is a beginner-friendly way to test-drive an autonomous agent, since it can be used from the browser and has a graphical user interface. It is based on AutoGPT and uses the GPT-4 language model to complete tasks. After receiving the user’s prompt, it creates, completes, and rearranges tasks by itself until it reaches the specified goal – or a point where it cannot progress any further.

Godmode takes a slightly different approach and regularly asks for feedback from the user. On the one hand, this allows for some course correction should the autonomous agent head in an undesirable direction. On the other, it requires repeated interaction with the LLM – a contradiction to the application’s supposed autonomy.

AgentGPT and Godmode are part of a growing list of BabyAGI-inspired projects collected on GitHub. Other examples include a version that can be used from within Slack (with each conversation thread representing one goal), implementations written in other programming languages, and a plug-in for using BabyAGI directly in ChatGPT’s interface.

Simulating multiple AI agents

As with most projects in the LLM space, these applications are highly experimental. Still, they pose the question of where autonomous agents will go from here. A paper titled “Generative Agents: Interactive Simulacra of Human Behavior” provides some possible answers.

The researchers created a virtual environment for several autonomous agents (referred to as “generative agents”) to coexist in. These agents, powered by GPT-3.5, were given the ability to simulate human behavior, engage in conversation with each other, form memories, and even reflect on those memories. The researchers made the resulting 48 virtual hours of interactions between the 25 “inhabitants” accessible via an interactive demo.

Seeing many autonomous agents interact with each other makes one wonder in which areas this technology could be applied in the future. The experiment’s visual implementation, reminiscent of a video game from the 16-bit era, immediately calls to mind NPCs in role-playing adventures. Could they become almost human-like in the future?

The paper’s authors themselves also see potential in using autonomous agents for research on social systems and theories. Experiments could, for example, have an online forum full of simulated users reacting to external inputs as well as each other.

However, we also shouldn’t forget that we’re still figuring out how to handle the shortcomings of LLMs. Hallucination and inherited bias pose as much of a threat now as they did a few months ago. So far, only a handful of projects directly address these problems.

Fortunately, the AI space is no longer just occupied by a few tech giants, as the open-source approach is slowly establishing itself as the norm. With the technology now accessible to more people than ever, we won’t have to wait long for new, creative solutions.

Want to stay updated on the latest developments in machine learning and computer vision? Then keep an eye on our blog for more on these topics!