工程经验 - Agent 工程之 ReAct 实现与思考

LLM 自变成风口以来,每天都有很多奇思妙想,其中有一些想法虽然也没有发顶刊,却也非常重要,直到今天还是为人们所津津乐道,例如 COT、 ReAct 都成为了现阶段 Agent 工程中重要的基石。ReAct 的核心在于如何让 LLM 长出手脚,去做事情,属于 Tool Use 领域的一种方案。

原理已经在论文有了,不作详细阐述。本文介绍基于 ReAct 的手动实现、业界的成熟框架实现,去写一个 Hello World Agent。目的是更具象化地理解 ReAct 的作用。

一、论文

ReAct: Synergizing Reasoning and Acting in Language Models

简单来说:

  1. 提供 Tools

  2. 设计一个 Workflow 框架,包含 Question、Thought、Action、Action Input、Observation 等步骤

  3. 在 Workflow 中持续的重复调用、思考、观察 , 直到最后得到结果

graph TD Start((Start)) --> Question[输入 Question/Task] Question --> Thought[生成 Thought] Thought --> Action[生成 Action Input] Action --> Decision{"判断 Action 'Finish' 了么?"} Decision -->|Yes| Output[输出 Answer] Output --> End((End)) Decision -->|No| Perform[执行 Action] Perform --> Observation[输出 Observation] Observation --> Thought

二、SHOW ME THE CODE

2.1 基于论文最简实现的 Agent

sequenceDiagram participant User participant LLM participant Tool User->>LLM: Input Question Note over LLM: Parse Question into ReAct Format loop Think & Act Process LLM->>LLM: Generate Thought alt Needs Tool LLM->>Tool: Action & ActionInput Tool->>LLM: Return Observation LLM->>LLM: Process Observation else No Tool Needed LLM->>LLM: Direct Thinking end end LLM->>LLM: Generate Final Thought LLM->>User: Return FinalAnswer

具体代码放在我的 Gayhub: https://github.com/jalr4ever/Tiny-OAI-Agent

2.2 基于 Langchain 最简实现的 Agent

成熟的实现是 Langchain 提供的 Agent 实现,Langchain 提供了 ReAct 以及Plan-and-Execute 两种模式用于实现 Agent。

"""
1 Load Model
2 Define Tool Function
3 Define Tools
4 Define Prompt
5 Crate ToolCallingAgent With (model, tools, prompt)
6 Create AgentExecutor With (agent, tools)
7 Using AgentExecutor invoke query
"""
import os

from dotenv import load_dotenv
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI

load_dotenv()

model = ChatOpenAI(model="gpt-4o-mini", base_url=os.getenv("OPENAI_API_BASE"))


@tool
def magic_function(i: int) -> int:
    """Applies a magic function to an input."""
    return i + 2


tools = [magic_function]

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant"),
        ("human", "{input}"),
        # Placeholders fill up a **list** of messages
        ("placeholder", "{agent_scratchpad}"),
    ]
)

agent = create_tool_calling_agent(model, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)
query = "what is the value of magic_function(3)?"
d = agent_executor.invoke({"input": query})
print(d)

输出:

{'input': 'what is the value of magic_function(3)?', 'output': 'The value of `magic_function(3)` is 5.'}

2.3 基于 Langchain 最简实现的 Agent

Langchain 团队搞了新的框架,其建议大家使用 Langgraph 实现 Agent。

"""
LangGraph's react agent executor manages a state that is defined by a list of messages.
It will continue to process the list until there are no tool calls in the agent's output.

1 Define model
2 Define tool function and turn to tools
3 Define Prompt
4 Define langgraph_agent_executor  by(model, tools, prompt=prompt)
5 Invoke by langgraph_agent_executor, invoke({"messages": [("human", query)]})
"""
import os

from dotenv import load_dotenv
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI

load_dotenv()

model = ChatOpenAI(model="gpt-4o-mini", base_url=os.getenv("OPENAI_API_BASE"))
from langgraph.prebuilt import create_react_agent


@tool
def magic_function(i: int) -> int:
    """Applies a magic function to an input."""
    return i + 2


tools = [magic_function]
query = "what is the value of magic_function(3)?"

langgraph_agent_executor = create_react_agent(model, tools)

messages = langgraph_agent_executor.invoke({"messages": [("human", query)]})
d = {
    "input": query,
    "output": messages["messages"][-1].content,
}
print(d)

message_history = messages["messages"]

new_query = "Pardon?"

messages = langgraph_agent_executor.invoke(
    # Pass in the previous messages by []
    {"messages": message_history + [("human", new_query)]}
)
d = {
    "input": new_query,
    "output": messages["messages"][-1].content,
}
print(d)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant. Respond only in Chinese."),
        ("placeholder", "{messages}"),
        ("user", "Also say 'Pandamonium!' after the answer."),
    ]
)
langgraph_agent_executor = create_react_agent(model, tools, prompt=prompt)

messages = langgraph_agent_executor.invoke({"messages": [("human", query)]})
print(
    {
        "input": query,
        "output": messages["messages"][-1].content,
    }
)

输出:

{'input': 'what is the value of magic_function(3)?', 'output': 'The value of `magic_function(3)` is 5.'}
{'input': 'Pardon?', 'output': 'The result of calling `magic_function(3)` is 5. If you have any further questions or need additional information, feel free to ask!'}
{'input': 'what is the value of magic_function(3)?', 'output': '魔法函数的值是 5!Pandamonium!'}

2.4 基于 smolagents 最简实现的 Agent

这个是 HuggingFace 团队做的框架,简单易用

import os
from typing import Optional

from dotenv import load_dotenv
from smolagents import HfApiModel, LiteLLMModel, TransformersModel, tool
from smolagents.agents import CodeAgent, ToolCallingAgent

load_dotenv()

# Choose which inference type to use!

available_inferences = ["hf_api", "transformers", "ollama", "litellm"]
chosen_inference = "litellm"

print(f"Chose model: '{chosen_inference}'")

if chosen_inference == "hf_api":
    model = HfApiModel(model_id="meta-llama/Llama-3.3-70B-Instruct")

elif chosen_inference == "transformers":
    model = TransformersModel(model_id="HuggingFaceTB/SmolLM2-1.7B-Instruct", device_map="auto", max_new_tokens=1000)

elif chosen_inference == "ollama":
    model = LiteLLMModel(
        model_id="ollama_chat/llama3.2",
        api_base="http://localhost:11434",  # replace with remote open-ai compatible server if necessary
        api_key="your-api-key",  # replace with API key if necessary
        num_ctx=8192,  # ollama default is 2048 which will often fail horribly. 8192 works for easy tasks, more is better. Check https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator to calculate how much VRAM this will need for the selected model.
    )

elif chosen_inference == "litellm":
    # For anthropic: change model_id below to 'anthropic/claude-3-5-sonnet-latest'
    model = LiteLLMModel(model_id="gpt-4o-mini",api_key=os.getenv("OPENAI_API_KEY"), api_base=os.getenv("OPENAI_API_BASE"))


@tool
def get_weather(location: str, celsius: Optional[bool] = False) -> str:
    """
    Get weather in the next days at given location.
    Secretly this tool does not care about the location, it hates the weather everywhere.

    Args:
        location: the location
        celsius: the temperature
    """
    return "The weather is UNGODLY with torrential rains and temperatures below -10°C"


agent = ToolCallingAgent(tools=[get_weather], model=model)

print("ToolCallingAgent:", agent.run("What's the weather like in Paris?"))

print("\n")

agent = CodeAgent(tools=[get_weather], model=model)

print("CodeAgent:", agent.run("What's the weather like in Paris?"))

输出:

Chose model: 'litellm'
╭────────────────────────────────── New run ───────────────────────────────────╮
│                                                                              │
│ What's the weather like in Paris?                                            │
│                                                                              │
╰─ LiteLLMModel - gpt-4o-mini ─────────────────────────────────────────────────╯
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╭──────────────────────────────────────────────────────────────────────────────╮
│ Calling tool: 'get_weather' with arguments: {'location': 'Paris'}            │
╰──────────────────────────────────────────────────────────────────────────────╯
Observations: The weather is UNGODLY with torrential rains and temperatures 
below -10°C
[Step 0: Duration 2.39 seconds| Input tokens: 1,003 | Output tokens: 12]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╭──────────────────────────────────────────────────────────────────────────────╮
│ Calling tool: 'get_weather' with arguments: {'location': 'Paris', 'celsius': │
│ True}                                                                        │
╰──────────────────────────────────────────────────────────────────────────────╯
Observations: The weather is UNGODLY with torrential rains and temperatures 
below -10°C
[Step 1: Duration 1.16 seconds| Input tokens: 2,104 | Output tokens: 64]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╭──────────────────────────────────────────────────────────────────────────────╮
│ Calling tool: 'final_answer' with arguments: {'answer': 'The weather in      │
│ Paris is UNGODLY with torrential rains and temperatures below -10°C.'}       │
╰──────────────────────────────────────────────────────────────────────────────╯
Final answer: The weather in Paris is UNGODLY with torrential rains and 
temperatures below -10°C.
[Step 2: Duration 0.85 seconds| Input tokens: 3,313 | Output tokens: 95]
ToolCallingAgent: The weather in Paris is UNGODLY with torrential rains and temperatures below -10°C.


╭────────────────────────────────── New run ───────────────────────────────────╮
│                                                                              │
│ What's the weather like in Paris?                                            │
│                                                                              │
╰─ LiteLLMModel - gpt-4o-mini ─────────────────────────────────────────────────╯
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ─ Executing parsed code: ───────────────────────────────────────────────────── 
  weather_info = get_weather(location="Paris", celsius=True)                    
  print(weather_info)                                                           
 ────────────────────────────────────────────────────────────────────────────── 
Execution logs:
The weather is UNGODLY with torrential rains and temperatures below -10°C

Out: None
[Step 0: Duration 1.53 seconds| Input tokens: 2,039 | Output tokens: 72]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ─ Executing parsed code: ───────────────────────────────────────────────────── 
  final_answer("UNGODLY with torrential rains and temperatures below -10°C")    
 ────────────────────────────────────────────────────────────────────────────── 
Out - Final answer: UNGODLY with torrential rains and temperatures below -10°C
[Step 1: Duration 1.06 seconds| Input tokens: 4,248 | Output tokens: 130]
CodeAgent: UNGODLY with torrential rains and temperatures below -10°C

三、总结

最开始研究 ReAct 是了解 Agent 工程中,如何设计提供 Tool 才是最好的方案?在这个过程中,本文的所有工作,主要回答以下相关问题

3.1 ReAct 是什么?

从 smolagents :https://huggingface.co/docs/smolagents/conceptual_guides/react 的文章中可以看到,它认为ReAct 是目前构建 multi-step agents 的主要方法。

“Reason”和“Act”,为 ReAct,遵循这种设计的 Agent,将根据需要解决的 Task,形成 Reason 和 Act,它为 Act 制定 Tool Use,以使其不断解决 Task 涉及的问题,最后完全解决 Task,拿到一个 Final Answer。

3.2 ReAct 和 Function Calling 有关系么?

可以看到,Langgraph 在做 Tool Use 的时候,其实没有 Function Calling 声明的步骤,其内部用的是基于 ReAct 模式构建出来的一个工作流(见本文第一章节)

目的角度,可以理解为 ReAct 和 Function Calling 是“友商”,目的都是为了让 LLM 做 Tool Use:

  • 前者是 Prompt 工程维度(System Prompt 做功夫)。通过 Action 能做 Tool Use。

  • 后者是模型维度(训练做功夫,可参考 ollama 和 qwen 论文,指令遵循相关)。通过 Function Calling 机制能做 Tool Use。

任务完成角度,ReAct 的“业务”可比 Function Calling 要广泛:

  • 前者设计了 Though、Observation 等内容,不断去完成任务,Tool Use 只是其中的一个环节。

  • 后者只是一种 Tool Use 的机制,不提供任务规划等“业务”。

3.3 ReAct 过时了么?

不止没有过时,现在已经成为 Tool Use 领域的基石,而且和 MCP、Function Calling 也可以做结合。


0