工程经验 - Agent 工程之 Function Calling 机制

Warren Zhan

2025-01-24

2025-02-14

片段集

Agent工程, AI, 技术, 大模型, 工程经验

128 20~26 min

一句话诠释我的理解：“Function Calling ” 让 LLM 有手有脚！

1 Minimal example

Overview

这个案例演示要做的事情如下：

sequenceDiagram participant User participant Client participant OpenAI participant MockResponse User->>Client: "How's the weather in Hangzhou?" Note over Client: Initialize OpenAI client with
API key and base URL Client->>OpenAI: send_messages(messages, tools) Note over OpenAI: Processes request and
decides to use get_weather tool OpenAI-->>Client: Returns tool_call response Note over Client: Extracts tool call info
Adds to message history Client->>MockResponse: Simulate response Note over MockResponse: Hard-coded mock data
"24°C" MockResponse-->>Client: Returns mock data Note over Client: Adds mock response
to message history Client->>OpenAI: send_messages(messages) Note over OpenAI: Generates final response
based on mock data OpenAI-->>Client: Returns formatted response Client->>User: Displays weather information

SHOW ME THE CODE

安装依赖

pip install openai==1.60.0

Minimal code example

使用的是 Deepseek 的 API，也可硅基流动，也可其他任意支持 Function Calling 的服务商 or 模型。需要你替换以下的三个参数，如下使用的 deepseek，可自行去官网申请。

from openai import OpenAI

API_KEY = 'sk-xxxxxxxxxxxxxxxx'
BASE_URL = 'https://api.deepseek.com'
MODEL = 'deepseek-chat'


def send_messages(messages, tools=None):
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        tools=tools
    )
    return response.choices[0].message

client = OpenAI(
    api_key=API_KEY,
    base_url=BASE_URL,
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather of a location; the user should supply a location first",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g., San Francisco, CA",
                    }
                },
                "required": ["location"]
            },
        }
    },
]

# Initial user message
messages = [{"role": "user", "content": "How's the weather in Hangzhou?"}]
print(f"User>\t {messages[0]['content']}")

# First call: Model decides to use the tool
message = send_messages(messages, tools)
tool = message.tool_calls[0]  # Extract the tool call
messages.append(message)  # Add the tool call to the conversation history

# Simulate the tool response (e.g., from an external API)
messages.append({"role": "tool", "tool_call_id": tool.id, "content": "24℃"})

# Second call: Model generates a response based on the tool's output
# No tools are passed here, as the model doesn't need to call another tool
message = send_messages(messages)
print(f"Model>\t {message.content}")

输出：

输出的内容不固定，但断言内容会有 24° 这个温度，因为这个是我们代码设置的

2 Function Calling Mechanism

概念

通过代码运行，到这里，相信你对 Function Calling 已经有个大概的了解。这个技术，这是一种允许 LLMs 与外部系统、API 和工具交互的技术。通过向 LLM 提供一组函数或工具，以及它们的描述和使用说明，模型可以智能地选择和调用适当的函数来完成给定的任务。注意，这里提到的是由模型自己去智能判断选择，所以其能否选择正确的 Function 使用，也是个问题。

这项能力是一项颠覆性变革，因为它使 LLMs 能够摆脱基于文本的限制，并与现实世界互动。在此之前，LLM 就是个 chat-box，你跟它聊天，它吐数据（文本 / 图 / 音视频）给你。现在LLMs 现在不仅可以生成文本，还可以通过利用外部工具和服务来执行操作、控制设备、从数据库检索信息以及执行各种任务。

调用流程

从上述的代码，分析 Function Calling 的使用流程，你可以理解为，每个 Function Calling 都会涉及对 LLM 的两次调用，如下图示，这个是每个 LLM 都要遵守的流程：

第一个请求，是 Function 的注册 & Function 参数获取；第二个请求，是 Function 结果的回复，以及最终给用户展示的文本的获取（LLM 会将 Function 结果组织到文本，响应给用户）。

sequenceDiagram participant User participant Application participant FunctionCall participant LLM User->>Application: 1. User Input Note right of Application: First Request activate Application Application->>LLM: 2. Send Request LLM-->>Application: 3. Return Function Parameters Application->>FunctionCall: 4. Call Function with Parameters FunctionCall-->>Application: 5. Return Function Result deactivate Application Note right of Application: Second Request activate Application Application->>LLM: 6. Send Function Result LLM-->>Application: 7. Return Final Response deactivate Application Application->>User: 8. Display Response

使用注意点

想要使用 Function Calling，有几点问题需要关注：

你的服务商支持 Function Calling 么？
你的服务商提供的模型支持 Function Calling 么？
你的服务商提供的 Function Calling 的请求和响应参数是什么？

第 3 点一般问题不大，因为 Function Calling 最早是 OpenAI 提出来的。服务商规定的请求和响应一般会遵循 OpenAI Function Calling 兼容。

第 1 点和第 2 点，需要重点关注，例如现在 OpenRouter 对于 deepseek 是不支持 Function Calling 的，对 qwen-max 又是支持的

最后，这里建议你按照 OpenAI 的格式（https://platform.openai.com/docs/guides/function-calling）去做。因为有的服务商，比如 OpenRouter 会提供更多参数，那有一种情况：你在 Python 这种弱类型代码，获取了 OpenRouter 的参数，再切到 OpenAI 的服务，代码跑起来就挂了。

调用细节说明

这里继续回到上述章节的代码，来看看 Function Calling 中的具体细节。

tool = message.tool_calls[0] 里面有啥？

"ChatCompletionMessageToolCall(id='tool_0_get_weather', function=Function(arguments='{"location":"Hangzhou, China"}', name='get_weather'), type='function', index=0)"

这个是第一次请求 LLM 的响应，LLM 中的 tool_calls 中包含了其想要调用的 tool，以及参数，可以看到这个 Function 就是我们一开始定义的 json 里面的 Function 的名称，以及其提供的参数，参数也确实和一开始定义的那样，是个 String。

其实在我看来这个 Tool 定义还是不大行，可能还是定义个经纬度更好，毕竟数字是唯一的。

messages.append({"role": "tool", "tool_call_id": tool.id, "content": "24℃"}) 这一步是做啥？

OpenAI 的 Function Calling 使用规定，我们自己查询到 Function 结果后，要将结果按照其定义的格式拼接到 message，并再次发送给 LLM ，从而获取最终的响应。

最后，从代码的第一步可以看到，其实 tool 的注册是可以多个的，可以注册多个 tool 到数组。tool 描述很重要，更多的描述细节，决定 LLM 是否能准确调用你的 tool。

LLM Mechanism

有的模型支持 Function Calling，有的模型不支持，那 LLM 是如何支持 Function Calling 的？

答：Function Calling 是通过训练得来的，结构化输出的微调 + 意图识别训练。或者通过 Prompt 内部做了一些手脚。具体可以了解了解这篇文章：开源模型 Function Call 方案梳理 & 让模型有工具调用（Function Calling）的能力

3 Limitations and Challenges

需要预定义的 API 集成：LLM 只能调用预先定义并集成到系统中的函数。每次要做 tool 更新集成，就得改代码。
意图误解：LLM可能错误地推断出需要调用函数。就像上文已经说过了，这点就无解，工程上可做健壮保护，但效果只能看 Researchers 如何让 LLM 变好了。