DoomArena:一个用于测试AI代理对抗不断演变的安全威胁的框架
DoomArena: A Framework for Testing AI Agents Against Evolving Security Threats

原始链接: https://arxiv.org/abs/2504.14064

本文介绍了DoomArena,一个新颖的安全评估框架,旨在测试AI智能体对抗不断演变的安全威胁的鲁棒性。DoomArena优先考虑与BrowserGym和τ-bench等框架的插件集成、可配置的威胁建模和模块化设计,使攻击能够与环境无关。这允许适应新的威胁和环境,组合现有的攻击,并分析漏洞与性能之间的权衡。 作者将DoomArena应用于最先进的网页和工具调用智能体,发现了关键的见解:智能体的漏洞因威胁模型(恶意用户与环境)而异,没有一个智能体被证明具有普遍的优越性。他们还观察到,组合多种攻击往往会放大其影响,并且基于防护栏的防御不如利用强大LLM的防御有效。DoomArena促进了全面的安全测试,允许分析漏洞并开发更强大的AI智能体。该框架可在提供的URL地址获取。

Hacker News 最新 | 往期 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 DoomArena:一个用于测试 AI 智能体对抗不断演变的安全威胁的框架 (arxiv.org) 9 分,来自 PaulHoule,1 天前 | 隐藏 | 往期 | 收藏 | 2 条评论 bsimpson 1 天前 [–] 这百分之百听起来像个 id 软件的游戏。(另外,现在是 2025 年了,为什么 HN 还在过滤表情符号?) 回复 eGQjxkKF6fif 1 天前 | 父级 [–] 可能与以下内容有关:https://news.ycombinator.com/item?id=43023508https://paulbutler.org/2025/smuggling-arbitrary-data-through...我个人认为我们都应该回滚到 ASCII 艺术,但随便吧 回复 考虑申请 YC 2025 年夏季批次!申请截止日期为 5 月 13 日 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式 搜索:

原文

View a PDF of the paper titled DoomArena: A framework for Testing AI Agents Against Evolving Security Threats, by Leo Boisvert and 11 other authors

View PDF
Abstract:We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a plug-in framework and integrates easily into realistic agentic frameworks like BrowserGym (for web agents) and $\tau$-bench (for tool calling agents); 2) It is configurable and allows for detailed threat modeling, allowing configuration of specific components of the agentic framework being attackable, and specifying targets for the attacker; and 3) It is modular and decouples the development of attacks from details of the environment in which the agent is deployed, allowing for the same attacks to be applied across multiple environments. We illustrate several advantages of our framework, including the ability to adapt to new threat models and environments easily, the ability to easily combine several previously published attacks to enable comprehensive and fine-grained security testing, and the ability to analyze trade-offs between various vulnerabilities and performance. We apply DoomArena to state-of-the-art (SOTA) web and tool-calling agents and find a number of surprising results: 1) SOTA agents have varying levels of vulnerability to different threat models (malicious user vs malicious environment), and there is no Pareto dominant agent across all threat models; 2) When multiple attacks are applied to an agent, they often combine constructively; 3) Guardrail model-based defenses seem to fail, while defenses based on powerful SOTA LLMs work better. DoomArena is available at this https URL.
From: Krishnamurthy Dvijotham [view email]
[v1] Fri, 18 Apr 2025 20:36:10 UTC (5,352 KB)
[v2] Tue, 22 Apr 2025 05:28:27 UTC (5,352 KB)
联系我们 contact @ memedata.com