Prompt Engineering 的发展历程

eowyn0406

390人浏览 · 2026-07-02 16:56:38

eowyn0406 · 2026-07-02 16:56:38 发布

1️⃣ 原始时代：直接问（2020-2021，GPT-3 时代）

最朴素的 prompt engineering 就是直接提问：

问: "What is the capital of France?"

答: "Paris"

这叫 Zero-shot prompting。如果需要更好的效果，就加几个例子：

问: "英国首都是哪？" 答: "伦敦"

问: "日本首都是哪？" 答: "东京"

问: "法国首都是哪？" 答:

这叫 Few-shot prompting。GPT-3 (2020) 论文的核心发现就是：大模型仅通过上下文示例就能学会做任务，不需要微调。这是 Prompt Engineering 的起源。

2️⃣ 思维链时代：把推理过程写出来（2022.1，Wei et al.）

CoT (Chain-of-Thought) 论文发现，如果你在 few-shot 示例里不仅给出答案，还把推理步骤也写出来，模型会效仿并显著提高复杂推理任务的准确率：

问: "Roger有5个网球，他又买了2盒每盒3个的网球。他现在有多少个网球？"

答: "Roger开始有5个球。2盒每盒3个 = 6个球。5 + 6 = 11。答案是11。"

问: "食堂有23个苹果，用了20个做午餐，又买了6个。现在有多少苹果？"

答:

模型就会跟着输出推理链：

"原来有23个苹果。用了20个，剩3个。又买6个，3+6=9。答案是9。"

核心技巧：你在 prompt 里"演"了推理过程，模型就学会了"先想再答"。

3️⃣ ReAct 时代：把"推理"和"行动"交织（2022.10，就是你正在看的这篇）

ReAct 把 CoT 的"推理"扩展了一步——不仅要在脑子里想，还要跟外部世界交互。prompt 的样子从：

CoT: 问 → 思考 → 答案

ReAct: 问 → 思考 → 行动(搜索) → 观察(结果) → 思考 → 行动 → 观察 → 答案

人工怎么做的：标注者实际在维基百科上搜索、浏览、Ctrl+F，然后把自己真实做的每一步（搜索了什么、看到了什么、怎么想的）原原本本写下来，做成 3-6 条示范轨迹。然后把这些轨迹作为 few-shot 示例放进 prompt。

这就是手工标注推理+行动轨迹的 Prompt Engineering。

4️⃣ 自动化时代：让模型自己产生 prompt（2023 至今）

ReAct 之后的发展：

A. Self-Consistency (SC)：同一问题问21次，取多数票（你刚学到的 CoT-SC）

B. Auto-CoT：不用人工写推理链，让模型自动生成示范示例

C. DSPy 等框架：把 prompt engineering 本身变成优化问题——自动搜索最优的 few-shot 示例和指令措辞

D. Function Calling / Tool Use：OpenAI 等把 ReAct 的"调用外部工具"变成了 API 标准功能，不需要手工编写 prompt 模板了

📌 总结

ReAct 论文中原始的 Prompt Engineering 做法就是：

人工亲自执行一遍任务 → 记录下每次"想什么、做什么、看到什么"→ 把这些轨迹作为 few-shot 示例 → 拼接成 prompt → 喂给 PaLM-540B → 模型照葫芦画瓢

虽然今天看起来"原始"，但 ReAct 这一范式直接催生了后来所有的 Agent 框架（LangChain、AutoGPT、OpenAI Function Calling 等）。你今天用的 ChatGPT 的"联网搜索"功能，本质上就是 ReAct 的工业级实现 📚

Q: Few-shot prompting 怎么把轨迹给LLM的，直接输入吗?

A: 对，就是直接拼进 prompt 里输入。LLM 本质上只吃文本，你把轨迹当作文本串塞进去就行。

让我用 ReAct 论文的实际 prompt 来展示（来自附录 C）：

🔍 实际长什么样

HotpotQA 的 ReAct Prompt 结构（原文 Appendix C）

Solve a question answering task with interleaving Thought, Action, Observation steps.

Thought can reason about the current situation, and Action can be three types:

(1) Search[entity], which searches the exact entity on Wikipedia and returns the first paragraph if it exists.

(2) Lookup[keyword], which returns the next sentence containing keyword in the current passage.

(3) Finish[answer], which returns the answer and finishes the task.

Here are some examples.

然后直接贴 6 条人工标注的轨迹作为示例，每条长这样：

Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?

Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of that area.

Action 1: Search[Colorado orogeny]

Observation 1: The Colorado orogeny was an episode of mountain building in Colorado and surrounding areas.

Thought 2: It does not mention the eastern sector. So I need to look up the eastern sector.

Action 2: Lookup[eastern sector]

Observation 2: (Result 1 / 1) The eastern sector extends into the High Plains and is called the Central Plains orogeny.

Thought 3: The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.

Action 3: Search[High Plains]

Observation 3: High Plains refers to one of two distinct land regions.

Thought 4: I need to instead search High Plains (United States).

Action 4: Search[High Plains (United States)]

Observation 4: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).

Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.

Action 5: Finish[1,800 to 7,000 ft]

...（重复贴5条类似的示例轨迹）...

最后把真正要问的问题接在后面：

Question: Aside from the Apple Remote, what other device can control the program Apple Remote was originally designed to interact with?

模型看到这些之后，就会照着前面的模式输出：

Thought 1: I need to search Apple Remote...

Action 1: Search[Apple Remote]

Observation 1: ...（由外部 Wikipedia API 返回）

Thought 2: ...

...

🧩 本质：Prompt 就是一个"模板 + 填充"

┌──────────────────────────────────────────┐

│ 指令（系统指令） │

│ "Solve a QA task with Thought/Action/Obs"│

├──────────────────────────────────────────┤

│ 示例 1（人工标注轨迹） │

│ Thought 1... Action 1... Obs 1... ... │

├──────────────────────────────────────────┤

│ 示例 2 │

│ Thought 1... Action 1... Obs 1... ... │

├──────────────────────────────────────────┤

│ ...（一共3-6条示例） │

├──────────────────────────────────────────┤

│ 真正的问题 │

│ Question: xxx │

│ │

│ ← 模型从这里开始生成 Thought 1... │

└──────────────────────────────────────────┘

每一步的 Observation 不是 LLM 生成的——而是把模型输出的 Action（如 Search[Apple Remote]）发给真正的 Wikipedia API，拿回结果后再拼进下一轮的 prompt：

第1轮: prompt + "Thought 1: ... Action 1: Search[Apple Remote]"

→ LLM 生成到这里停止

→ 外部执行 Search → 得到结果

第2轮: 上轮prompt + "Observation 1: Apple Remote is..."

→ LLM 生成 "Thought 2: ... Action 2: ..."

→ 外部执行 → 得到结果

...重复直到 LLM 输出 Finish[答案]

📌 一句话总结

就是文本拼接。整个 ReAct 过程没有魔法——手工写好 3-6 条"思考→行动→观察"的示范轨迹，原样贴进 prompt text，LLM 照葫芦画瓢输出。每一步的观察结果由外部环境真实执行后追加回 prompt，形成闭环。这就是 2022 年的"Agent 框架"——纯文本 prompt + 外部工具调用的交替循环 📚

AtomGit AI 社区

AtomGit AI 社区提供模型库、数据集、Agent、Token等资源

更多推荐

土区 ChatGPT 订阅大幅涨价，浅析 OpenAI 区域定价风控机制与账号风险

土耳其区价格暴涨，意味着单纯依靠地区差价大幅省钱的时代彻底结束，当下不存在兼顾低价、稳定、低风险的完美渠道，没有绝对的性价比之王。OpenAI 后续还有全球统一定价的规划，区域差价会持续收窄，单纯追逐低价的性价比会越来越低。对于把 GPT 当成生产力工具的用户来说，比起每月省几十元，账号稳定、Codex 额度持续可用、不耽误工作效率，才是更核心的需求，选型时优先匹配自身使用强度，平衡成本与风险即可

AtomGit AI 社区

总结一下LangChain+LangGraph这套体系

GPT/LLM：是“大脑”，负责思考和生成。ChatGPT：是“产品”，一个具体的聊天应用。Agent：是“数字员工”，能自主使用工具达成目标。传统编程 vs. Agent：是“精确执行指令”与“自主决策达成目标”的区别。LangChain：是“工具包和积木”，提供各种零件和简单的线性组装方式（A→B→C）。LangGraph：是“智能流程图绘制与执行引擎”，用于构建能循环、分支、决策的复杂状态机