Skip to main content

ReAct Reasoning Loop: Think Before Acting

ReAct (Reasoning + Acting) is a loop where an agent alternates between thinking about the next step and executing it, observing the result each time. Instead of planning the entire workflow upfront (plan-and-execute), ReAct lets the agent think-act-observe at each step, adapting to surprises and refining its strategy on the fly. This flexibility is powerful for exploratory tasks, uncertain environments, and problems where the exact path isn't known in advance.

ReAct vs. Plan-and-Execute

Plan-and-execute is efficient for well-structured goals: "Deploy version 3.2 to production" has a known sequence of steps. ReAct is better for open-ended goals: "Find the root cause of this performance regression" doesn't have a predetermined path—you might discover intermediate findings that redirect your investigation.

In ReAct, the loop repeats: (1) Agent thinks about what to do next, (2) Agent executes an action (call a tool, run code, fetch data), (3) Agent observes the outcome, (4) Loop back to step 1 if the goal isn't met. This continues until the goal is satisfied or the agent concludes it's impossible.

┌─────────────────────────────────────────┐
│ Thought: "I need to understand the │
│ recent error spike in the logs" │
└──────────────────┬──────────────────────┘

v
┌─────────────────────┐
│ Action: Query logs │
│ for errors in past │
│ 6 hours │
└────────┬────────────┘

v
┌────────────────────────────┐
│ Observation: Found 4k │
│ "Connection timeout" errors│
│ between 14:00-16:00 UTC │
└────────┬───────────────────┘

v
┌──────────────────────────┐
│ Thought: "Timeouts │
│ cluster at 14:00. This │
│ was when we deployed │
│ service X. Let me check │
│ service X's metrics." │
└────────┬─────────────────┘

v
┌────────────────────────────┐
│ Action: Fetch CPU and │
│ memory metrics for service │
│ X during that window │
└──────────────────────────┘

The ReAct Prompt Structure

To implement ReAct, you prompt the LLM with a special format that includes space for thoughts, actions, and observations:

react_prompt = """
You are a reasoning agent solving a problem through iterative thought and action.

PROBLEM: {user_goal}

INSTRUCTIONS:
- Think: Reason about what you need to do next.
- Action: Call exactly one tool or API to advance toward the goal.
- Observation: Process the result and extract insights.
- Repeat until you've fully solved the problem.

FORMAT YOUR RESPONSE AS:
Thought: [Your reasoning about what to do next]
Action: [One tool call, formatted as: tool_name(arg1=value1, arg2=value2)]
Observation: [You will receive the result here; wait for me to provide it]

AVAILABLE TOOLS:
- web_search(query: str) -> List[SearchResult]
- code_interpreter(code: str) -> str
- database_query(table: str, filters: dict) -> List[Row]
- document_analyzer(doc_id: str) -> str

BEGIN:
Thought: """

The key difference from a straight planning prompt: you're asking the LLM to think-then-act-then-wait, rather than thinking about everything at once. The LLM generates Thought and Action, you execute the action, then you feed the Observation back to the LLM in the next prompt. This loop repeats until the agent decides it's done.

Here's a Python controller that drives this loop:

class ReActAgent:
def __init__(self, llm_client, tools_registry):
self.llm = llm_client
self.tools = tools_registry # Maps tool_name -> callable
self.max_steps = 15

def run(self, user_goal: str) -> dict:
"""Run the ReAct loop until goal is met or max_steps exceeded."""
context = []
thought_action_history = []

for step in range(self.max_steps):
# Call LLM to generate Thought and Action
messages = self.build_prompt(user_goal, context)
response = self.llm.completion(messages, temperature=0.7)

thought, action = self.parse_response(response)
thought_action_history.append({"thought": thought, "action": action})

# Check if agent decided it's done
if "I have completed the task" in thought or "The goal is achieved" in thought:
return {
"status": "success",
"final_thought": thought,
"history": thought_action_history
}

# Execute the action
try:
observation = self.execute_action(action)
except ActionError as e:
observation = f"Error: {e}. Adjust your approach."

context.append({
"thought": thought,
"action": action,
"observation": observation
})

return {
"status": "max_steps_exceeded",
"last_thought": thought_action_history[-1]["thought"],
"history": thought_action_history
}

def parse_response(self, llm_response: str) -> tuple[str, str]:
"""Extract Thought and Action from LLM response."""
lines = llm_response.split("\n")
thought = None
action = None

for line in lines:
if line.startswith("Thought:"):
thought = line.replace("Thought:", "").strip()
elif line.startswith("Action:"):
action = line.replace("Action:", "").strip()

return thought or "No thought provided", action or "No action provided"

def execute_action(self, action_str: str) -> str:
"""Parse and execute an action like 'web_search(query=...'."""
# Simple parser: tool_name(arg1=val1, arg2=val2)
import re
match = re.match(r"(\w+)\((.*)\)", action_str)
if not match:
raise ActionError(f"Invalid action format: {action_str}")

tool_name = match.group(1)
args_str = match.group(2)

if tool_name not in self.tools:
raise ActionError(f"Unknown tool: {tool_name}")

# Parse arguments (simplified)
args = {}
for arg in args_str.split(","):
key, val = arg.split("=")
args[key.strip()] = val.strip().strip("'\"")

result = self.tools[tool_name](**args)
return str(result)

def build_prompt(self, user_goal: str, context: list) -> list:
"""Build multi-turn conversation for LLM."""
system_message = f"""You are a reasoning agent. For each step:
1. Thought: reason about what to do next
2. Action: call exactly one tool (or say "I have completed the task")
3. Wait for Observation from the environment

GOAL: {user_goal}

AVAILABLE TOOLS:
{self._format_tools()}"""

messages = [{"role": "system", "content": system_message}]

# Add conversation history
for i, ctx in enumerate(context):
messages.append({
"role": "user",
"content": f"""Thought: {ctx['thought']}
Action: {ctx['action']}

Observation: {ctx['observation']}

Now, what's your next thought and action?"""
})

# Start a new thought-action cycle
messages.append({
"role": "user",
"content": "Thought:"
})

return messages

def _format_tools(self) -> str:
return "\n".join([f"- {name}" for name in self.tools.keys()])

Controlling LLM Reasoning Depth

One challenge with ReAct: if the LLM takes too long to reason at each step, or if it gets stuck in loops (same thought repeated), you need a way to guide it. Techniques:

1. Thought budgets: Ask the LLM to reason in 1–2 sentences, not paragraphs.

react_prompt += """
Each Thought should be 1-2 sentences only.
"""

2. Action hints: If the agent is stuck, provide a hint in the Observation:

if repeated_actions.count(action) > 2:
observation += "\nHint: You've tried this action twice. Consider a different approach."

3. Max steps enforcement: Limit to 10–15 steps; beyond that, escalate to human.

When ReAct Shines

ReAct is ideal for:

  • Exploratory research: Investigating an unknown problem.
  • Adaptive tasks: Goals that change based on intermediate findings.
  • Tool use: Scenarios where the next action depends on the last result.

It's less ideal for:

  • Linear workflows: "Deploy, test, notify" has a clear path; planning is simpler.
  • High-latency scenarios: Each step requires an LLM call; lots of round-trips = slow.
  • Cost-sensitive work: Multiple LLM calls per task add up.

Key Takeaways

  • ReAct interleaves thinking and acting: agent reasons, executes one action, observes, repeats.
  • More flexible than plan-and-execute; better for exploratory and adaptive goals.
  • Each action is a single tool call; the LLM observes the result before reasoning next steps.
  • Control reasoning depth with thought budgets and action hints.
  • Best for interactive, investigative tasks; less efficient for straightforward workflows.

Frequently Asked Questions

How many ReAct steps should I allow?

Start with 10–15. Each step requires an LLM call; beyond 15, consider stopping and escalating to human or caching intermediate results. For very complex goals, use hierarchical ReAct: a high-level agent that decomposes the goal, then sub-agents for each subtask.

Can I mix ReAct with plan-and-execute?

Yes. Generate a rough plan upfront, then use ReAct to execute each step adaptively. E.g., plan says "investigate root cause," then ReAct runs for that step, observes findings, and reports back to the plan-executor.

What if the agent gets stuck in a loop?

Detect repeated actions (same tool with same inputs in last 3 steps). When detected, provide a hint or force a different action. E.g., "You've run the same query twice; try a different approach."

Should I show intermediate Observations to the user?

For long-running agents, yes—send progress updates every 2–3 steps. Users want to see thinking, not just the final answer.

Further Reading