展示 HN：TUI-use：让 AI 代理控制交互式终端程序

展示 HN：TUI-use：让 AI 代理控制交互式终端程序
Show HN: TUI-use: Let AI agents control interactive terminal programs

原始链接: https://github.com/onesuper/tui-use

## tui-use：AI 代理的终端访问 tui-use 弥合了 AI 代理与交互式终端程序之间的差距。虽然代理可以执行 shell 命令，但它们通常难以处理需要输入工具——REPL、安装程序和 TUI，例如 `vim` 或 `htop`。tui-use 允许代理像人类一样完全与这些程序交互。它的工作原理是在伪终端 (PTY) 中启动程序，将屏幕捕获为纯文本，并启用击键输入。这种“快照”模型为代理提供了对终端的清晰、可读视图，包括突出显示的活动元素，而无需解析复杂的字节流。主要功能包括支持各种按键、滚动、搜索以及等待屏幕变化。它专为 AI 编码助手设计，例如 Claude Code、Cursor 和 Gemini CLI，从而实现交互式会话并自动化传统上由人类操作的 CLI 工具中的任务。目前，tui-use 支持 macOS 和 Linux，并计划支持 Windows。它通过守护进程运行，并提供命令行界面来管理会话和与程序交互。

## TUI-use：AI 代理与终端程序一个名为“TUI-use” (github.com/onesuper) 的新项目，允许 AI 代理控制交互式终端程序 (TUI)。该项目在 Hacker News 上引发了讨论，评论者质疑其必要性，因为已经存在像 `tmux` 和标准 Unix 管道这样的终端工具。一些用户指出，代理最终无论如何都需要 CLI 功能，而使用 TUI 可能会因增加 token 使用量而效率降低。一位评论员建议这可能是一种变通方法，以利用订阅式 LLM 访问（如 Claude），而不是 API 额度。尽管对重新发明现有解决方案持怀疑态度，但有些人认为该项目令人兴奋，认为它朝着更通用的 AI 方法迈进了一步，可以与开发者实际使用的各种工具无缝集成——包括 CLI、TUI 和 GUI。其他人已经使用像 `tmux` 这样的工具来执行类似的基于代理的任务。

原文

Like BrowserUse, but for the terminal.

tui-use gives agents access to the parts of the terminal that bash can't reach — every REPL, installer, and TUI app built for humans.

AI agents can run shell commands, read files, and call APIs. But they stall the moment a program asks for input — because most CLI tools were built for humans, not agents.

tui-use fills that gap. Spawn any program in a PTY, observe its screen as plain text, send keystrokes — all from the command line. If a human can operate it in a terminal, an agent can too.

Use cases:

REPL sessions — Run code in Python, Node, psql, or redis-cli, inspect the output, and keep going. No more one-shot scripts when you need an interactive session.
Interactive scaffolding tools — Step through npm create, cargo new, create-react-app, and any other CLI wizard that asks questions before doing anything.
Database CLIs — Connect to psql or mysql, run queries, check schemas, without needing a separate API or ORM layer.
SSH + remote interactive programs — SSH into a server and keep operating interactive programs on the other end, not just run one-off commands.
TUI applications — Navigate vim, lazygit, htop, fzf, and other full-screen programs that were never designed to be scripted.

Perfect for Claude Code, Cursor, Codex, Gemini CLI, OpenCode and other AI coding agents.

🖥️ Full VT Rendering — PTY output is processed by a headless xterm emulator. ANSI escape sequences, cursor movement, and screen clearing all work correctly. The screen field is always clean plain text.
📸 Snapshot Model — Interacting with a terminal program is just a loop: read what's on screen, decide what to type, repeat. tui-use makes that loop explicit — no async streams, no timing guesswork, no partial output to reassemble.
🔍 Highlights — Every snapshot includes a highlights field listing the inverse-video spans on screen — the standard way TUI programs indicate selected items. Agents can read which menu option, tab, or button is currently active without parsing text or guessing from cursor position.
⌨️ Rich Key Support — Send text, Enter, Ctrl+C, arrow keys, F-keys, and more. Run tui-use keys to see the full list.

From npm (recommended):

From source:

git clone https://github.com/onesuper/tui-use.git
cd tui-use
npm install
npm run build
npm link

Note: You must install the CLI (see Installation section above) before using the plugin — the plugin only provides skill definitions, the CLI provides the actual PTY functionality.

Install from self-hosted marketplace

Step 1: Add the marketplace

/plugin marketplace add onesuper/tui-use

Step 2: Install the plugin

/plugin install tui-use@tui-use

More agents coming soon...

Behind the scenes, tui-use runs a daemon that manages PTY sessions:

┌─────────────┐     HTTP      ┌─────────────┐     PTY      ┌─────────────┐
│  tui-use    │ ◄───────────► │   Daemon    │ ◄─────────►  │   Program   │
│   (CLI)     │               │ (background)│              │  (vim/htop) │
└─────────────┘               └─────────────┘              └─────────────┘
                                     │
                                     ▼
                              ┌─────────────┐
                              │  @xterm/    │
                              │  headless   │
                              │ (xterm emu) │
                              └─────────────┘

The rendering pipeline:

Target program outputs ANSI escape sequences (colors, cursor moves, screen clears)
@xterm/headless renders them into a complete terminal screen state
snapshot returns clean plain text screen content, plus metadata like highlights (inverse-video regions), title (window title), and is_fullscreen (alternate buffer detection)

Agents get the a "polaroid" snapshot of the terminal — not a raw byte stream you need to reassemble.

tui-use start <cmd>                            # Start a program
tui-use start --cwd <dir> <cmd>                # Start in specific directory
tui-use start --cwd <dir> "<cmd> -flags"       # Quote the full command to pass flags (e.g. git rebase -i)
tui-use start --label <name> <cmd>             # Start with label
tui-use start --cols <n> --rows <n> <cmd>      # Custom terminal size (default: 120x30)
tui-use use <session_id>                       # Switch to a session
tui-use type <text>                            # Type text
tui-use type "<text>\n"                        # Type with Enter
tui-use type "<text>\t"                        # Type with Tab
tui-use paste "<text>\n<text>\n"               # Multi-line paste (each line + Enter)
tui-use press <key>                            # Press a key
tui-use snapshot                               # Get current screen
tui-use snapshot --format json                 # JSON output
tui-use scrollup <n>                           # Scroll up to older content
tui-use scrolldown <n>                         # Scroll down to newer content
tui-use find <pattern>                         # Search in screen (regex)
tui-use wait                                   # Wait for screen change
tui-use wait <ms>                              # Custom timeout (default: 3000ms)
tui-use wait --text <pattern>                  # Wait until screen contains pattern
tui-use wait --format json                     # JSON output
tui-use list                                   # List all sessions
tui-use use <session_id>                       # Switch to a session
tui-use info                                   # Show session details
tui-use rename <label>                         # Rename session
tui-use kill                                   # Kill current session

tui-use daemon status                          # Check if daemon is running
tui-use daemon stop                            # Stop the daemon
tui-use daemon restart                         # Restart the daemon

TUI color/style info is mostly lost — screen contains plain text only; colors and most formatting are stripped. Selected items and active elements are captured in highlights via inverse-video detection. Window title and fullscreen mode are captured in title and is_fullscreen.
Windows not supported — requires Unix PTY (macOS/Linux). Windows support via ConPTY is planned.

The installer automatically detects your platform and uses a prebuilt binary when available. If no compatible prebuild exists, it will automatically rebuild from source (requires build tools).

Build tools (only needed if automatic rebuild fails):

macOS: xcode-select --install
Linux: sudo apt-get install build-essential python3 g++
Windows: Not yet supported

git clone <repo_url>
cd tui-use
npm install
npm run build
npm link

# Try it
tui-use start python3 examples/ask.py
tui-use wait
tui-use type "Alice"
tui-use press enter
tui-use wait
tui-use kill

A Claude Code skill is included for running the full integration test suite.

Run the following command in Claude Code:

/tui-use-integration-test

Claude will execute the test suite in order and then report PASS / FAIL for each, with actual screen output on any failure.

MIT License