光标作曲家:使用强化学习构建快速前沿模型
Composer: Building a fast frontier model with RL

原始链接: https://cursor.com/blog/composer

## Composer:加速软件工程的快速代理 Composer 是一种新型大型语言模型,旨在加速软件工程工作流程。由 Cursor 开发,它实现了**与同类模型相当的编程性能,但生成速度提高了四倍**。这种速度对于交互式编码体验至关重要,能够让开发者保持“流畅”状态。 Composer 使用强化学习,在大型代码库内的真实编码挑战中进行训练,学会有效地利用代码编辑器、搜索引擎,甚至终端命令等工具。Cursor 的内部基准测试 **Cursor Bench** 评估模型的正确性*以及*对良好软件工程实践的遵守情况。 Composer 成功的关键在于其**混合专家 (MoE) 架构**和优化的训练基础设施,利用 PyTorch、Ray 和 MXFP8 精度。这使得在数千个 GPU 上进行可扩展的训练和更快的推理成为可能。除了速度之外,Composer 经过训练,成为一个有用的助手,最大限度地减少不必要的输出,优先考虑基于证据的响应,甚至可以自主学习有用的行为,例如编写单元测试。 在 Cursor 的早期采用中,Composer 已经证明其在日常软件开发中具有价值。

## 光标作曲家:一款快速前沿模型 - 摘要 光标发布了Composer,一种用于代码生成的新型AI模型,专注于速度和交互性。虽然像GPT-5和Sonnet 4.5这样的“前沿”模型在整体质量上表现更好,但光标强调Composer的响应速度明显更快,旨在实现更流畅的编码流程。 该模型使用强化学习(RL)在编码示例上进行后训练,但基础模型的细节尚未公开。这种缺乏透明度,以及使用未公开的私有基准测试(光标基准测试),受到了Hacker News社区的批评。担忧集中在潜在的基准测试污染以及无法进行独立验证的问题上。 尽管存在这些担忧,早期用户报告与以前的模型相比,速度和可用性都有明显改善,尤其是在需要快速迭代的任务中。光标团队积极参与讨论,解释了他们的理由并承诺提供更多细节。他们强调专注于“智能+快速”的方法,认为速度对于有效的代理编码体验至关重要。
相关文章

原文

Composer is our new agent model designed for software engineering intelligence and speed. On our benchmarks, the model achieves frontier coding results with generation speed four times faster than similar models.

We achieve these results by training the model to complete real-world software engineering challenges in large codebases. During training, Composer is given access to a set of production search and editing tools and tasked with efficiently solving a diverse range of difficult problems. The final result is a large-scale model optimized for high-speed use as an agent in Cursor.

Our motivation comes from our experience developing Cursor Tab, our custom completion model. We found that often developers want the smartest model that can support interactive use, keeping them in the flow of coding. In our development process, we experimented with a prototype agent model, codenamed Cheetah, to better understand the impact of faster agent models. Composer is a smarter version of this model that keeps coding delightful by being fast enough for an interactive experience.

Composer is a mixture-of-experts (MoE) language model supporting long-context generation and understanding. It is specialized for software engineering through reinforcement learning (RL) in a diverse range of development environments. At each iteration of training, the model is given a problem description and instructed to produce the best response, be it a code edit, a plan, or an informative answer. The model has access to simple tools, like reading and editing files, and also more powerful ones like terminal commands and codebase-wide semantic search.

To measure progress, we constructed an evaluation that measures a model's usefulness to a software developer as faithfully as possible. Our benchmark, Cursor Bench, consists of real agent requests from engineers and researchers at Cursor, along with hand-curated optimal solutions to these requests. The resulting evaluation measures not just the agent’s correctness, but also its adherence to a codebase's existing abstractions and software engineering practices.

Reinforcement learning allows us to actively specialize the model for effective software engineering. Since response speed is a critical component for interactive development, we incentivize the model to make efficient choices in tool use and to maximize parallelism whenever possible. In addition, we train the model to be a helpful assistant by minimizing unnecessary responses and claims made without evidence. We also find that during RL, the model learns useful behaviors on its own like performing complex searches, fixing linter errors, and writing and executing unit tests.

Efficient training of large MoE models requires significant investment into building infrastructure and systems research. We built custom training infrastructure leveraging PyTorch and Ray to power asynchronous reinforcement learning at scale. We natively train our models at low precision by combining our MXFP8 MoE kernels with expert parallelism and hybrid sharded data parallelism, allowing us to scale training to thousands of NVIDIA GPUs with minimal communication cost. Additionally, training with MXFP8 allows us to deliver faster inference speeds without requiring post-training quantization.

During RL, we want our model to be able to call any tool in the Cursor Agent harness. These tools allow editing code, using semantic search, grepping strings, and running terminal commands. At our scale, teaching the model to effectively call these tools requires running hundreds of thousands of concurrent sandboxed coding environments in the cloud. To support this workload, we adapted existing infrastructure we built for Background Agents, rewriting our virtual machine scheduler to support the bursty nature and scale of training runs. This enabled seamless unification of RL environments with production environments.

Cursor builds tools for software engineering, and we make heavy use of the tools we develop. A motivation of Composer development has been developing an agent we would reach for in our own work. In recent weeks, we have found that many of our colleagues were using Composer for their day-to-day software development. With this release, we hope that you also find it to be a valuable tool.

¹ Benchmarked on an internal benchmark in the Cursor tool harness. We group models into classes based on score and report the best model in each class. "Fast Frontier" includes models designed for efficient inference such as Haiku 4.5 and Gemini Flash 2.5. "Best Open" includes recent open weight model releases such as Qwen Coder and GLM 4.6. "Frontier 7/2025" is the best model available in July of this year. "Best Frontier" includes GPT-5 and Sonnet 4.5, which both outperform Composer. For the Tokens per Second calculation, tokens are standardized across models to the latest Anthropic tokenizer.

联系我们 contact @ memedata.com