AutoGen Agents for Research & Code Generation (2026)
Microsoft's AutoGen is a research-driven framework optimized for agent teams that collaborate via natural conversation. Unlike LangGraph's explicit state graphs or CrewAI's task-based model, AutoGen agents exchange messages in a conversation loop until they reach consensus. This architecture excels at research-driven workflows and code generation where agents debate, refine, and execute code. This article covers AutoGen's conversation model, group chat, and practical research/code-generation patterns.
AutoGen's Conversation Model
AutoGen's core is elegantly simple: agents exchange messages in a conversation loop. Each agent receives messages, decides whether to respond, and sends a message. The loop continues until a stopping condition is met (success, max iterations, or explicit termination).
from autogen import AssistantAgent, UserProxyAgent
# Define an agent that provides code suggestions
coder = AssistantAgent(
name="Coder",
system_message="You are an expert Python programmer. Suggest efficient code solutions.",
llm_config={"model": "claude-3-5-sonnet-20241022", "api_key": "..."}
)
# Define a user proxy (executes code and gives feedback)
user_proxy = UserProxyAgent(
name="User",
human_input_mode="NEVER", # For automation; use "ALWAYS" for human control
max_consecutive_auto_reply=10,
code_execution_config={"work_dir": "./code"}
)
# Simple two-agent conversation
user_proxy.initiate_chat(coder, message="Write a function to parse JSON from a URL.")
The initiate_chat method starts a conversation. The user proxy sends a message, the coder responds, the user proxy evaluates the response (or executes code), and they loop until done. This is more natural than explicit graph nodes—it feels like a Slack conversation.
Multi-Agent Group Chat
For teams of agents (researcher, coder, reviewer), use GroupChat to manage the conversation.
from autogen import GroupChat, GroupChatManager
# Define specialist agents
researcher = AssistantAgent(
name="Researcher",
system_message="Search for information and summarize findings.",
llm_config={"model": "claude-3-5-sonnet-20241022"}
)
programmer = AssistantAgent(
name="Programmer",
system_message="Write production-ready Python code based on research.",
llm_config={"model": "claude-3-5-sonnet-20241022"}
)
reviewer = AssistantAgent(
name="Reviewer",
system_message="Review code for bugs, security, and style. Suggest improvements.",
llm_config={"model": "claude-3-5-sonnet-20241022"}
)
# Create a group chat
group_chat = GroupChat(
agents=[researcher, programmer, reviewer],
messages=[],
max_round=15, # Max iterations before stopping
speaker_selection_method="auto" # Let the model decide who speaks next
)
# Manage the group chat
manager = GroupChatManager(groupchat=group_chat, llm_config={"model": "claude-3-5-sonnet-20241022"})
# Start the group discussion
user_proxy = UserProxyAgent(name="User", human_input_mode="NEVER")
user_proxy.initiate_chat(
manager,
message="Research the latest AI safety techniques and write a Python tool to check if a model alignment metric is within safe bounds."
)
The GroupChatManager moderates the conversation, deciding which agent speaks next based on context. Agents naturally debate and refine ideas.
Code Execution with Agents
AutoGen has built-in code execution: agents can suggest code, the UserProxyAgent runs it, and results are fed back to the agents.
# Enable code execution in the user proxy
user_proxy = UserProxyAgent(
name="User",
human_input_mode="NEVER",
code_execution_config={
"work_dir": "./generated_code",
"use_docker": False, # Set to True for sandboxed execution
"timeout": 10 # Kill code after 10 seconds
}
)
# Start a code generation task
coder = AssistantAgent(
name="Coder",
system_message="Write Python code to solve the problem. Use print() to show results.",
llm_config={"model": "claude-3-5-sonnet-20241022"}
)
user_proxy.initiate_chat(
coder,
message="Write a function to find all prime numbers up to 100 and print them."
)
# The coder suggests code in a code block, the user proxy executes it,
# and shows the output back to the coder. If there's an error,
# the coder sees it and refines the code.
This loop is powerful: agents can write, test, and debug code automatically. In practice, 80% of code generation tasks complete without human intervention.
Research Workflow with AutoGen
A typical research workflow involves gathering data, analyzing it, and writing a report. AutoGen excels here:
# Agent to search for information
researcher = AssistantAgent(
name="Researcher",
system_message=(
"You are a research analyst. Find relevant information on the topic. "
"When you have sufficient data, say: RESEARCH_COMPLETE"
),
llm_config={"model": "claude-3-5-sonnet-20241022"},
human_input_mode="NEVER"
)
# Agent to write the report
writer = AssistantAgent(
name="Writer",
system_message=(
"You are a technical writer. Use the research provided to write a "
"comprehensive report. Structure it with introduction, key findings, "
"and recommendations."
),
llm_config={"model": "claude-3-5-sonnet-20241022"}
)
# Agent to review for quality
reviewer = AssistantAgent(
name="Reviewer",
system_message=(
"Review the report for clarity, accuracy, and completeness. "
"Suggest edits. When satisfied, say: APPROVED"
),
llm_config={"model": "claude-3-5-sonnet-20241022"}
)
# Group chat for the research workflow
group_chat = GroupChat(
agents=[researcher, writer, reviewer],
messages=[],
max_round=20,
speaker_selection_method="auto"
)
manager = GroupChatManager(groupchat=group_chat, llm_config={"model": "claude-3-5-sonnet-20241022"})
user_proxy = UserProxyAgent(name="User", human_input_mode="NEVER")
user_proxy.initiate_chat(
manager,
message="Research AI safety frameworks in 2026 and produce a comprehensive report."
)
The agents debate and refine until the reviewer approves. This beats serial workflows (researcher hands off to writer, writer hands off to reviewer) because agents can loop back and request clarifications.
Speaker Selection Strategy
How does AutoGen decide which agent speaks next? Several strategies:
# Method 1: Auto (LLM decides)
group_chat = GroupChat(
agents=[...],
speaker_selection_method="auto" # Uses LLM to decide
)
# Method 2: Round-robin
group_chat = GroupChat(
agents=[...],
speaker_selection_method="round_robin" # Agents take turns
)
# Method 3: Manual (you decide)
def custom_speaker_selector(agents, last_speaker, messages):
# Logic to choose the next speaker
if "error" in messages[-1].lower():
return agents[1] # Route to debugger
return agents[0] # Default to researcher
group_chat = GroupChat(
agents=[...],
speaker_selection_method=custom_speaker_selector
)
Auto speaker selection is more natural but costs a few extra LLM calls. For performance, use round-robin or custom logic.
Termination Conditions
Stop the conversation when a goal is reached:
def is_research_complete(messages):
"""Check if research is complete."""
# Look for a specific termination message
if messages and "RESEARCH_COMPLETE" in messages[-1].get("content", ""):
return True
# Or check message count
if len(messages) > 30:
return True
return False
group_chat = GroupChat(
agents=[...],
messages=[],
max_round=20
)
# Manually call step() and check termination
for _ in range(group_chat.max_round):
group_chat.step()
if is_research_complete(group_chat.messages):
print("Research complete!")
break
Explicit termination prevents infinite loops and gives you control over when to stop.
Comparing to LangGraph and CrewAI
| Aspect | AutoGen | LangGraph | CrewAI |
|---|---|---|---|
| Model | Conversation loop | State graph | Task-based |
| Agent Autonomy | High (agents debate) | Low (explicit graph) | Medium (role-based) |
| Code Execution | Built-in, native | Via tools | Via tools |
| Research Tasks | Excellent | Good | Good |
| Observability | Conversation log | State snapshots | Task logs |
| Learning Curve | Moderate | Steep (graph thinking) | Moderate |
Choose AutoGen for research and code generation; LangGraph for deterministic workflows; CrewAI for structured team roles.
Key Takeaways
- AutoGen agents communicate via natural conversation, not explicit message passing or state graphs.
- Group chat orchestrates multi-agent conversations;
GroupChatManagerdecides who speaks next. - Code execution is built-in: agents suggest code, execution happens, results feed back.
- Research workflows shine in AutoGen because agents can debate and refine findings.
- Termination conditions prevent infinite loops; explicit control over when to stop.
Frequently Asked Questions
How does AutoGen decide agent roles if there's no explicit role definition?
Roles are defined in the system_message (e.g., "You are a researcher..."). AutoGen doesn't enforce roles; it relies on the LLM's ability to stay in character.
Can I mix code execution and pure conversation in AutoGen?
Yes. Some agents execute code (via UserProxyAgent), others just converse. Mix and match based on your needs.
What if an agent loops infinitely?
Set a max_round limit on the group chat. When reached, the conversation stops. Also set agent-level max_consecutive_auto_reply to prevent one agent from monopolizing.
How much does an AutoGen workflow cost?
It depends on the speaker selection method and number of agents. Auto speaker selection adds a call per turn. A 10-turn research workflow with 3 agents might cost $0.20–0.50 (Claude 3.5 Sonnet). Optimize with custom speaker selection to reduce calls.
Can AutoGen run agents on separate machines?
Not natively. AutoGen is designed for a single Python process. For distributed agents, consider LangGraph with custom orchestration or CrewAI with task queuing.