展示HN:Libretto – 让AI浏览器自动化具有确定性
Show HN: Libretto – Making AI browser automations deterministic

原始链接: https://github.com/saffron-health/libretto

## Libretto:强大的网页集成工具包 Libretto是一个开源工具包,旨在简化为编码代理构建和维护网页集成的过程。它提供了一个实时浏览器环境和一个CLI,用于检查页面、捕获网络流量、记录用户操作和调试工作流程——所有这些对于自动化网页任务至关重要。 Libretto最初由Saffron Health为医疗保健软件集成构建,它允许代理:从示例生成脚本,将浏览器自动化转换为直接API调用(提高速度和可靠性),以及通过重放和检查实时会话自主修复损坏的脚本。它利用LLM进行选择器提取和故障诊断等任务,最大限度地减少发送给代理的上下文。 主要功能包括会话保存/恢复、支持多种AI提供商(OpenAI、Anthropic、Gemini、Vertex)以及存储在`.libretto/`中的本地配置系统。Libretto通过提示作为编码代理中的“技能”使用,同时也提供了一个命令行界面以供直接控制。 文档、支持和社区讨论请访问[Discord](link)和[GitHub Discussions](link)。

## Libretto:确定性 AI 浏览器自动化 Libretto (libretto.sh) 是一种旨在改进浏览器自动化,尤其是在医疗保健等复杂和敏感环境中的新工具。与依赖运行时 AI 的 Browseruse 和 Stagehand 等现有解决方案不同,Libretto 使用“开发时 AI”来**生成实际且可检查的代码**用于自动化。 创建者对运行时 AI 驱动工具的不可靠性和成本感到沮丧,因此构建了 Libretto 来解决 DOM 解析脆弱和代理行为不可预测的问题。它结合了 Playwright UI 自动化与直接网络请求,以提高可靠性并避免机器人检测。 主要功能包括从记录的用户操作生成脚本、逐步调试、只读模式以确保安全,以及与现有项目规范对齐的代码生成。本质上,Libretto 将重点从期望代理在运行时正确执行转移到拥有和控制自动化脚本本身。开发者正在寻求对其方法的反馈,并讨论它与 Playwright-CLI 等工具的不同之处。
相关文章

原文

npm version License: MIT GitHub Discussions Discord

Libretto is a toolkit for building robust web integrations. It gives your coding agent a live browser and a token-efficient CLI to:

  • Inspect live pages with minimal context overhead
  • Capture network traffic to reverse-engineer site APIs
  • Record user actions and replay them as automation scripts
  • Debug broken workflows interactively against the real site

We at Saffron Health built Libretto to help us maintain our browser integrations to common healthcare software. We're open-sourcing it so other teams have an easier time doing the same thing.

libretto-demo.mov
npm install libretto

# First-time onboarding: install skill, download Chromium, and pin the default snapshot model
npx libretto setup

# Check workspace readiness at any time
npx libretto status

# Manually change the snapshot analysis model (advanced override)
npx libretto ai configure <openai | anthropic | gemini | vertex>

setup detects available provider credentials (e.g. OPENAI_API_KEY) and automatically pins the default model to .libretto/config.json. Re-running setup on a healthy workspace shows the current configuration instead of re-prompting. If credentials are missing for a previously configured provider, setup offers an interactive repair flow.

Use ai configure when you want to explicitly switch providers or set a custom model string.

Libretto is designed to be used as a skill through your coding agent. Here are some example prompts:

One-shot script generation

Use the Libretto skill. Go on LinkedIn and scrape the first 10 posts for content, who posted it, the number of reactions, the first 25 comments, and the first 25 reposts.

Your coding agent will open a window for you to log into LinkedIn, and then automatically start exploring.

Interactive script building

I'm gonna show you a workflow in the eclinicalworks EHR to get a patient's primary insurance ID. Use libretto skill to turn it into a playwright script that takes patient name and dob as input to get back the insurance ID. URL is ...

Libretto can read your actions you perform in the browser, so you can perform a workflow, then ask it to use your actions to rebuild the workflow.

Convert browser automation to network requests

We have a browser script at ./integration.ts that automates going to Hacker News and getting the first 10 posts. Convert it to direct network scripts instead. Use the Libretto skill.

Libretto can read network requests from the browser, which it can use to reverse engineer the API and create a script that directly calls those requests. Directly making API calls is faster, and more reliable, than UI automation. You can also ask Libretto to conduct a security analysis which analyzes the requests for common security cookies, so you can understand whether a network request approach will be safe.

We have a browser script at ./integration.ts that is supposed to go to Availity and perform an eligibility check for a patient. But I'm getting a broken selector error when I run it. Fix it. Use the Libretto skill.

Agents can use Libretto to reproduce the failure, pause the workflow at any point, inspect the live page, and fix issues, all autonomously.

You can also use Libretto directly from the command line. All commands accept --session <name> to target a specific session.

npx libretto setup                         # interactive first-run onboarding; run yourself, not through an agent
npx libretto status                        # check AI config health and open sessions
npx libretto open <url>                    # launch browser and open a URL (headed by default)
npx libretto snapshot --objective "..." --context "..."  # capture PNG + HTML and analyze with an LLM
npx libretto exec "<code>"                 # execute Playwright TypeScript against the open page (single quoted argument)
echo "<code>" | npx libretto exec -        # intentionally read Playwright TypeScript from stdin
npx libretto run <file>                    # run the file's default-exported workflow
npx libretto resume                        # resume a paused workflow
npx libretto pages                         # list open pages in the session
npx libretto save <domain>                 # save browser session (cookies, localStorage) for reuse
npx libretto close                         # close the browser
npx libretto ai configure <provider>       # manually change snapshot analysis model
npx libretto status                        # show AI config and open sessions

All Libretto state lives in a .libretto/ directory at your project root. Configuration is stored in .libretto/config.json.

.libretto/config.json controls snapshot analysis and viewport settings:

{
  "version": 1,
  "ai": {
    "model": "openai/gpt-5.4",
    "updatedAt": "2026-01-01T00:00:00.000Z"
  },
  "viewport": { "width": 1280, "height": 800 }
}

The ai field configures which model Libretto uses for snapshot analysis — extracting selectors, identifying interactive elements, or diagnosing why a step failed. This keeps heavy visual context out of your coding agent's context window. Snapshot analysis is required.

npx libretto setup automatically pins the default model for the first provider whose credentials it finds. To explicitly change the provider or model afterward:

npx libretto ai configure <openai | anthropic | gemini | vertex>

To inspect the current configuration without changing anything:

Provider credentials are read from environment variables or a .env file at your repository root (next to your .git directory): OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY / GOOGLE_GENERATIVE_AI_API_KEY, or GOOGLE_CLOUD_PROJECT for Vertex. Set LIBRETTO_DISABLE_DOTENV=1 to skip .env loading.

The viewport field sets the default browser viewport size. Both fields are optional.

Each Libretto session gets its own directory under .libretto/sessions/<name>/ containing runtime state. Sessions are git-ignored.

  • state.json — session metadata (debug port, PID, status)
  • logs.jsonl — structured session logs
  • network.jsonl — captured network requests
  • actions.jsonl — recorded user actions
  • snapshots/ — screenshot PNGs and HTML snapshots

Profiles save browser sessions (cookies, localStorage) so you can reuse authenticated state across runs. They are stored in .libretto/profiles/<domain>.json, created via npx libretto save <domain>. Profiles are machine-local and git-ignored.

Have a question, idea, or want to share what you've built? Join the conversation on Discord for quick help or GitHub Discussions for longer-form threads.

  • Q&A — Ask questions and get help
  • Ideas — Suggest new features or improvements
  • Show and tell — Share your workflows and automations
  • General — Chat about anything Libretto-related

Found a bug? Please open an issue.

Maintained by the team at Saffron Health.

For local development in this repository:

pnpm i
pnpm build
pnpm type-check
pnpm test

Source layout:

  • packages/libretto/src/cli/ — CLI commands
  • packages/libretto/src/runtime/ — browser runtime (network, recovery, downloads, extraction)
  • packages/libretto/src/shared/ — shared utilities (config, LLM client, logging, state)
  • packages/libretto/test/ — test files (*.spec.ts)
  • packages/libretto/README.template.md — source of truth for the repo and package READMEs
  • packages/libretto/skills/libretto/ — source of truth for the Libretto skill

Run pnpm sync:mirrors after editing packages/libretto/README.template.md or anything under packages/libretto/skills/libretto/.

To check that generated READMEs, skill mirrors, and skill version metadata are in sync without fixing them, run pnpm check:mirrors. To release, run pnpm prepare-release.

联系我们 contact @ memedata.com