Reflexion: Reinforcing Language Agents with Verbal Reinforcement
Experience is the best teacher. But how can an AI agent, which has no memory of its own, learn from experience? The Reflexion framework provides an elegant answer by enabling agents to reflect on their past actions and generate their own feedback for future improvement.
Introduction
In the last article, we explored the ReAct framework, which allows an agent to reason and act in a loop. This is powerful, but it's missing a key component of intelligence: learning. A ReAct agent will make the same mistake over and over again because it has no mechanism for reflecting on its past failures.
Reflexion is a groundbreaking framework that solves this problem. It enhances a ReAct-style agent with the ability to perform "self-reflection." After a task is completed (or fails), the Reflexion agent looks back at its trajectory (the sequence of thoughts, actions, and observations) and critiques its own performance. It then translates this critique into a piece of "verbal reinforcement"—a specific, instructive piece of text—that it adds to its memory. This memory is then used in subsequent attempts, allowing the agent to learn from its mistakes and improve its performance over time.
The Core Idea: Learn from a "Memory" of Past Mistakes
The Reflexion framework builds upon the ReAct loop by adding a "post-mortem" phase.
- Execution (The "Actor"): An agent, typically using the ReAct framework, attempts to complete a task. It produces a trajectory of
(Thought, Action, Observation)steps. - Evaluation (The "Evaluator"): A separate process evaluates the outcome. Was the task successful? Was it efficient? The evaluator provides a simple score (e.g., success or failure).
- Self-Reflection (The "Reflector"): If the outcome was a failure (or suboptimal), the Reflexion process kicks in. The agent is prompted with its own trajectory and asked to reflect on why it failed. It needs to pinpoint the specific mistake in its reasoning or action plan.
- Verbal Reinforcement: The output of the self-reflection is a concise, instructive piece of text. For example: "I failed because I tried to search for two things at once. In the future, I should search for one piece of information at a time."
- Memory and Iteration: This piece of verbal reinforcement is stored in the agent's "memory" (typically a simple text buffer or a more sophisticated vector database). In the next episode, when the agent attempts the same or a similar task, this memory is included in its initial prompt, effectively "reminding" it of the lesson it learned.
A Practical Example: A Failed Web Search
Task: Find the birth dates of both Elon Musk and the current CEO of Microsoft.
Episode 1: The Failure
- Agent (ReAct):
- Thought: I need to find two birth dates. I can probably do this in one search.
- Action:
search("birth dates of Elon Musk and Microsoft CEO") - Observation: [Search results are confusing and mixed, as the current Microsoft CEO is Satya Nadella, and the query is ambiguous.]
- Thought: I am confused by the results. I cannot extract the information. I have failed.
- Evaluator: Returns "Failure."
The Self-Reflection Phase:
Prompt to the "Reflector":
You are a self-reflecting agent. You just failed at a task. Here is your trajectory:
[Insert the Thought, Action, Observation log from above]
Analyze your trajectory and explain why you failed. Your explanation should be a concise, instructive piece of advice for your future self.
Reflector LLM Output (The Verbal Reinforcement):
"I failed because my search query was too complex and ambiguous. I tried to find two distinct pieces of information at once. In the future, when I need to find multiple pieces of information, I should break the problem down and perform a separate search for each piece of information."
Episode 2: The Successful Retry
- The agent is given the same task again, but this time, its initial prompt includes its new memory.
Initial Prompt for Episode 2:
Your memory contains the following lesson:
"When I need to find multiple pieces of information, I should break the problem down and perform a separate search for each piece of information."
Question: Find the birth dates of both Elon Musk and the current CEO of Microsoft.
- Agent (ReAct), now informed by its memory:
- Thought: My memory tells me to search for one thing at a time. First, I will find Elon Musk's birth date.
- Action:
search("Elon Musk birth date") - Observation: Elon Musk was born on June 28, 1971.
- Thought: I have Elon Musk's birth date. Now I need the birth date of the current Microsoft CEO. First I need to know who that is.
- Action:
search("current CEO of Microsoft") - Observation: The current CEO of Microsoft is Satya Nadella.
- Thought: Now I know the CEO is Satya Nadella. I will search for his birth date.
- Action:
search("Satya Nadella birth date") - Observation: Satya Nadella was born on August 19, 1967.
- Thought: I have both pieces of information.
- Final Answer: Elon Musk was born on June 28, 1971, and Satya Nadella, the current CEO of Microsoft, was born on August 19, 1967.
The agent succeeded because it learned from its past mistakes.
Why Reflexion is So Powerful
- Enables Autonomous Improvement: It provides a mechanism for agents to get better over time without direct human intervention.
- More Human-Like Learning: The process of trial, error, and reflection is much closer to how humans learn than traditional machine learning.
- Improves Robustness: An agent that can learn from its failures is much more likely to be robust and reliable in the face of novel or unexpected situations.
Key Takeaways
- The Reflexion framework adds a "self-reflection" step to the ReAct loop, enabling agents to learn from their mistakes.
- After a failed task, the agent is prompted to analyze its own trajectory and generate a piece of "verbal reinforcement."
- This verbal reinforcement is stored in a memory and used to inform the agent's behavior in future tasks.
- Reflexion allows for autonomous agent improvement and leads to more robust and reliable systems.
What's Next?
Both ReAct and Reflexion rely on the LLM's general reasoning abilities. But what if the problem is best solved not with words, but with code? In the next article, we will explore Program-Aided Language Models (PAL), a technique that leverages an LLM's incredible coding abilities to solve problems that are difficult to tackle with natural language alone.
With Reflexion, we are no longer just building agents that can reason and act; we are building agents that can learn, adapt, and grow.