展示HN:Remoroo,尝试修复长期运行编码代理中的内存问题。
Show HN: Remoroo. trying to fix memory in long-running coding agents

原始链接: https://www.remoroo.com

## Remoroo:深度科技的自主研究 Remoroo 是一款自主研究引擎,旨在加速机器学习开发。与提供单一建议的编码代理不同,Remoroo 会在您的代码上运行大量实验(最多 30 个),*过夜*自动编辑、测试和评估更改,基于定义的指标——例如示例中的 `val_bpb`,该指标提高了 31%。 您通过一个简单的规范文件 (`program.md`) 定义实验,指向您的训练和评估代码。它在时间预算内运行,并严格验证和复现结果,提供清晰的“已验证”状态和通过 git 的可追溯历史。 Remoroo 并非关于*猜测*代码,而是关于通过结构化实验*证明*改进。计费基于运行时间(“积分”),并提供免费套餐。它专为寻求可扩展、可靠的方式来自动化和加速其机器学习研究流程的深度科技团队设计。

## Remoroo:编码代理的长期记忆 Remoroo (remoroo.com) 是一种新工具,旨在解决编码代理面临的一个关键挑战:在长时间运行的任务中保持上下文和记忆。与在简单编辑之外就难以应对的代理不同,Remoroo 旨在处理涉及文件访问、命令执行和迭代测试的复杂、持续数小时的“工程实验”。 其核心创新是一种“按需分页内存系统”,其灵感来自操作系统虚拟内存,可以有选择地存储和检索所需上下文。这使得 Remoroo 能够管理长时间运行期间产生的大量数据,防止代理“忘记”其目标或重复失败的尝试。 用户向 Remoroo 提供一个仓库和一个可衡量的目标,它将自主迭代,测试更改并从结果中学习。创建者正在寻求那些从事类似长时间运行的代理、训练或评估系统的人的反馈。网站上提供详细的技术文档。
相关文章

原文

Autonomous engineeringfor deep tech teams

Remoroo runs autonomous research on your code locally, overnight. It edits, tests, evaluates, keeps or reverts. You wake up to better results.

Read the docs →
▸ remoroo-session · autoresearch/mar31
Reading program.md…
Baseline: val_bpb = 2.2396 (commit 9138841)
Time budget: 20 min per experiment
30 experiments · 8 kept · 22 discarded
val_bpb: 2.23961.5484 (31% lower)
Verdict: VERIFIED · REPRODUCIBLE

The reality of manual ML research

Without Remoroo
$ vim train.py
> tweak learning_rate=3e-4
$ uv run train.py
> wait 60 minutes…
> val_bpb: 2.24 (no change)
> try batch_size 2^15…
> wait 60 more minutes…
> NaN loss.
$ git checkout .
2 hours. 0 progress.
no verdict. no structure.
no proof.
With Remoroo
$ remoroo run --local
program.md
▸ 30 experiments completed
▸ 8 kept · 22 discarded
▸ val_bpb: 2.24 → 1.55
▸ VERIFIED · REPRODUCIBLE
You slept through it.

How it works

Write a spec (e.g. program.md). Point Remoroo at it, and it runs experiments overnight.

▸ remoroo-session · autoresearch
Spec program.md (TIME_BUDGET=1200, metric: val_bpb)
File train.py (model, optimizer, training loop)
Eval prepare.py → evaluate_bpb (fixed, untouchable)
P
Plan
E
Edit
T
Train
E
Evaluate
val_bpb
vs baseline
train.py
- ATTN_PATTERN = "L" * DEPTH
+ ATTN_PATTERN = "SSSL"
Illustrative billing · credits = Haiku-hour units (× model tier — see Pricing)

Verified results

LR SCHEDULE SEARCH
val_bpb
2.24 → 1.99
11% lower
train.py
14 experiments · 6 kept
VERIFIED
ARCHITECTURE SEARCH
val_bpb
1.55 → 1.55
banded attn (SSSL)
train.py
30 experiments · 8 kept
VERIFIED
MULTI-OBJECTIVE
val_bpb + memory
3 constraints → all passed
all passed
train.py
22 experiments · 5 kept
VERIFIED

Not a coding agent.

An autonomous research engine.

Coding AgentsRemoroo
Time scaleSecondsHours to overnight
Task scopeFix one bug30-experiment search
ExecutionNone / one-shotSandboxed, time-budgeted
Metric evaluationNoneFixed eval harness
Keep / discardHuman decidesAutonomous, metric-based
Failure handlingRetry promptCase-based recovery
OutputSuggested codeVerified patch + proof
ReproducibilityNoneArtifact replay + git
BillingPer token/seatRun wall time in credits (Haiku-hour units)

It didn't guess. It proved.

Install in 30 seconds.

Free tier includes monthly run credits — see Pricing.

联系我们 contact @ memedata.com