展示 HN:Gemma Gem – 嵌入浏览器的 AI 模型 – 无需 API 密钥,无需云服务
Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud

原始链接: https://github.com/kessler/gemma-gem

## Gemma Gem:您的本地AI助手 Gemma Gem 是一款 Chrome 扩展程序,它将强大的 AI 助手直接带入您的浏览器,由 Google 的 Gemma 4 模型提供支持。它完全在您的设备上运行——无需互联网连接或 API 密钥——利用 WebGPU 进行快速、私密的处理。 安装后(需要下载约 500MB),Gemma Gem 可以阅读网页、与元素交互(点击按钮、填写表单),甚至执行 JavaScript。您可以通过浏览器中的图标访问它,并通过聊天界面进行交互。 该扩展程序通过内容脚本、服务工作线程和托管模型的离屏文档系统工作。它提供截图捕获、文本提取和页面滚动等工具,所有这些都可以通过自然语言控制。 用户可以自定义 AI 的“思考”过程,并管理上下文清除和特定站点禁用等设置。提供开发版本和生产版本,详细日志可通过 Chrome 的扩展程序检查工具访问。

## Gemma Gem:浏览器中的人工智能 一个名为 **Gemma Gem** 的 Chrome 扩展程序将谷歌的 Gemma 4 (2B) 人工智能模型直接带入您的浏览器,无需 API 密钥或云连接。由 ikessler 开发,它允许模型通过读取内容、截取屏幕截图、点击元素,甚至运行 JavaScript 与网页互动,可通过聊天叠加层访问。 虽然对于简单任务有效,但复杂操作可能不可靠。该项目的代码是开源的,允许进行实验和潜在的独立使用。讨论强调 Chrome 的 **Prompt API** 是一种类似的方法,以及未来利用操作系统级 LLM 的原生 Web 功能的可能性。 有人对安全性(授予 JS 执行权限)和状态持久性(浏览器崩溃)提出了担忧,但评论员指出现有的 Web 安全措施和浏览器存储选项。该项目被认为对隐私、离线使用以及作为本地 LLM 应用程序的构建模块很有价值,可能简化了处理敏感数据的开发人员的集成。
相关文章

原文

Your personal AI assistant living right inside the browser. Gemma Gem runs Google's Gemma 4 model entirely on-device via WebGPU — no API keys, no cloud, no data leaving your machine. It can read pages, click buttons, fill forms, run JavaScript, and answer questions about any site you visit.

  • Chrome with WebGPU support
  • ~500MB disk for model download (cached after first run)

Load the extension in chrome://extensions (developer mode) from .output/chrome-mv3-dev/.

  1. Navigate to any page
  2. Click the gem icon (bottom-right corner) to open the chat
  3. Wait for model to load (progress shown on icon + chat)
  4. Ask questions about the page or request actions
Offscreen Document          Service Worker           Content Script
(Gemma 4 + Agent Loop)  <-> (Message Router)    <-> (Chat UI + DOM Tools)
       |                         |
  WebGPU inference          Screenshot capture
  Token streaming           JS execution
  • Offscreen document: Hosts the model via @huggingface/transformers + WebGPU. Runs the agent loop.
  • Service worker: Routes messages between content scripts and offscreen document. Handles take_screenshot and run_javascript.
  • Content script: Injects gem icon + shadow DOM chat overlay. Executes DOM tools (read_page_content, click_element, type_text, scroll_page).
Tool Description Runs in
read_page_content Read text/HTML of the page or a CSS selector Content script
take_screenshot Capture visible page as PNG Service worker
click_element Click an element by CSS selector Content script
type_text Type into an input by CSS selector Content script
scroll_page Scroll up/down by pixel amount Content script
run_javascript Execute JS in the page context with full DOM access Service worker

Click the gear icon in the chat header:

  • Thinking: Toggle native Gemma 4 chain-of-thought reasoning
  • Max iterations: Cap on tool call loops per request
  • Clear context: Reset conversation history for the current page
  • Disable on this site: Disable the extension per-hostname (persisted)
pnpm build              # Development build (with logging, source maps)
pnpm build:prod         # Production build (logging silenced, minified)
  • WXT — Chrome extension framework (Vite-based)
  • @huggingface/transformers — Browser ML inference
  • marked — Markdown rendering in chat
  • Gemma 4 E2B (onnx-community/gemma-4-E2B-it-ONNX) — q4f16 quantization, 128K context

All logs are prefixed with [Gemma Gem]. In development builds, info/debug/warn logs are active. Production builds only log errors.

  • Service worker logs: chrome://extensions → Gemma Gem → "Inspect views: service worker"
  • Offscreen document logs: chrome://extensions → Gemma Gem → "Inspect views: offscreen.html"
  • Content script logs: Open DevTools on any page → Console
  • All extension pages: chrome://inspect#other lists all inspectable extension contexts (service worker, offscreen document, etc.)

The offscreen document logs are the most useful — they show model loading, prompt construction, token counts, raw model output, and tool execution.

The agent/ directory has zero dependencies. It defines interfaces (ModelBackend, ToolExecutor) and can be extracted to a standalone library.

Gemma Gem in action

联系我们 contact @ memedata.com