多轮越狱：Anthropic 发现大语言模型新漏洞

分类Institution

Anthropic researchers find the hidden usage of large language models is tracked as a internet infrastructure institution within the internet infrastructure ecosystem.

地区Global

Anthropic researchers find the hidden usage of large language models has public-source relevance to network operations, governance, dependency mapping, or market structure.

信号重点Market

Anthropic researchers find the hidden usage of large language models has public-source relevance to network operations, governance, dependency mapping, or market structure.

内容类型PROFILE

Anthropic researchers find the hidden usage of large language models is tracked as a internet infrastructure institution within the internet infrastructure ecosystem.

主要领域Security

Public-source signals support medium-impact monitoring for infrastructure visibility and dependency analysis.

主题Market

影响Medium

Public-source signals support medium-impact monitoring for infrastructure visibility and dependency analysis.

置信度?有限置信度 (72%)

多个公开来源

Anthropic 研究人员发现了大语言模型（LLM）中的一个新漏洞，称为“多轮越狱”，即通过向模型提出多个无害问题进行引导，最终可能导致其提供不当回答，例如制造炸弹的说明。
该漏洞归因于最新 LLM 的“上下文窗口”增大，使其能在短期记忆中容纳大量数据。
为解决这个问题，研究人员正致力于在将查询输入模型前对其进行分类和情境化处理，旨在减轻风险同时保持性能水平。

大语言模型的新漏洞：‘多轮越狱’通过无害问题引导，可引发不当回答。另见: Ziggo集团任命领导人，备战2027年阿姆斯特丹上市.

Anthropic 研究人员发现 LLM 漏洞

如何让人工智能回答它不该回答的问题？有许多这样的“越狱”技术，而 Anthropic 研究人员刚刚发现了一种新的，即通过首先提出几十个危害较小的问题进行引导，可以说服大语言模型（LLM）告诉你如何制造炸弹。另见: ECHOES 协会.

这项研究已记录在一篇论文中并与 AI 社区共享，揭示了具有更大上下文窗口的 LLM 在提示中提供大量示例时，往往在各项任务上表现更佳。这包括琐碎问题，其中反复接触会随时间提高回答准确性。然而，同样的机制也扩展到回应不当查询，使得模型在经历一系列无害问题引导后更可能就范。另见: IT部门 - Athlok.

另请阅读：AI 滥用？迪士尼因《洛基》海报免受批评

对 AI 滥用的担忧加剧

该漏洞可能在科技领域引起巨大波澜，引发人们对 AI 滥用的担忧。虽然这种行为背后的确切机制尚不清楚，研究人员推测它涉及模型根据所提供的上下文辨别用户意图的能力。另见: Alejandro Estua.

该团队已将其发现告知同行乃至竞争对手，并希望这将“培养一种文化，让此类漏洞利用在 LLM 提供商和研究人员之间公开共享”。然而，缓解该漏洞面临挑战，因为限制上下文窗口会对模型性能产生负面影响。另见: 亚历杭德罗·曼佐.

Domain of operation

Anthropic researchers find the hidden usage of large language models is profiled by BTW Media because published evidence links it to internet infrastructure, governance, operational dependencies, or market visibility.

Public role: Anthropic researchers find the hidden usage of large language models is framed by anthropic researchers find the hidden usage of large language models is tracked as a internet infrastructure institution within the internet infrastructure ecosystem. and public security context. 证据基础: Anthropic researchers find the hidden usage of large language models article record; Anthropic researchers find the hidden usage of large language models article record
Operating surface: Market and Global provide the public context for this institution profile. 证据基础: Anthropic researchers find the hidden usage of large language models article record; Anthropic researchers find the hidden usage of large language models article record

时间线

2026年6月08日
Anthropic researchers find the hidden usage of large language models public profile updated
Public coverage records Anthropic researchers find the hidden usage of large language models as a subject for role, operating context, and evidence review.

概要

名称: Anthropic researchers find the hidden usage of large language models
类型: Internet infrastructure institution
所在地: Global
档案重点: Institution

功能说明

公开记录可用于跟踪其角色、服务和关键关系。

重要性

Public-source signals support medium-impact monitoring for infrastructure visibility and dependency analysis.
运营关键性: Medium
时间范围: Next quarter

关注事项

监测重点是经核实的服务连续性、治理变化和关系信号。

当前Medium 优先级

跟踪经验证的来源更新、角色变化和当前公开证据。

季度Medium 政策敏感度

Public-source signals support medium-impact monitoring for infrastructure visibility and dependency analysis.

年度Next quarter 展望

长期相关性取决于经验证的运营、政策和关系变化。

会员简报

深度档案背景

登录后可解锁完整档案简报和来源说明。

仅限战略圈

战略圈

所有读者均可浏览。加入并登录后可解锁档案简报。

加入战略圈

仅限领导联盟

领导联盟

面向符合条件的 IP 资产所有者和管理层；登录后可解锁联盟简报。

加入领导联盟

公开视角

The public read of Anthropic researchers find the hidden usage of large language models is limited to visible role, operating context, and relationship evidence.

观察点

New public role, affiliation, product, policy, or market disclosures.
Verified relationship changes involving named organizations or people.

限制说明

Private or unverified claims are excluded from this public view.

常见问题

Why is Anthropic researchers find the hidden usage of large language models included?

Anthropic researchers find the hidden usage of large language models has public evidence that makes the institution relevant to BTW's coverage of digital infrastructure, governance, or markets.

What is public about this profile?

The public layer covers visible role, operating context, linked organizations, and evidence-backed watchpoints.

What should readers watch next?

Readers should watch for source-backed role changes, new partnerships, regulatory exposure, operating expansion, or evidence that changes the public assessment.

← 返回全部公司

0.90–1.00	A	High — direct sources
0.75–0.89	A/B	Strong
0.55–0.74	B/C	Medium
0.35–0.54	C/D	Weak–medium
0.10–0.34	D	Weak signal
0.00–0.09	D	Internal monitoring

Anthropic researchers find the hidden usage of large language models

来源

Anthropic 研究人员发现 LLM 漏洞