Gemini 现在可以原生嵌入视频,所以我构建了亚秒级视频搜索。
Show HN: Gemini can now natively embed video, so I built sub-second video search

原始链接: https://github.com/ssrajadh/sentrysearch

## SentrySearch:用于行车记录仪画面的语义搜索 SentrySearch 能够使用自然语言快速搜索行车记录仪视频。它的工作原理是将视频分割成片段,使用 Google 的 Gemini Embedding 模型将每个片段嵌入为视频数据,并将这些嵌入存储在本地 ChromaDB 数据库中。 用户只需输入查询(例如“红卡车闯红灯”),查询也会被嵌入,然后与视频嵌入进行匹配。最相关的片段会自动剪辑并保存为剪辑。 **主要特点:** * **直接视频嵌入:** 无需转录或字幕 – Gemini 直接处理视频像素。 * **成本优化:** 预处理(降至 480p/5fps)和静帧跳过可降低 API 成本(索引 1 小时约 2.50 美元)。 * **易于设置:** 克隆 GitHub 仓库 ([https://github.com/ssrajadh/sentrysearch](https://github.com/ssrajadh/sentrysearch)),安装依赖项,并提供 Gemini API 密钥。 * **可定制:** 可以调整片段时长、重叠和预处理。 目前处于预览阶段,SentrySearch 支持 MP4 视频,并依赖启发式方法进行静帧检测。未来的改进旨在实现更智能的片段划分,并解决潜在的 API 变更。

一位开发者使用谷歌的Gemini Embedding 2构建了一个亚秒级视频搜索工具,该工具可以直接将视频处理成向量嵌入,*无需*转录或帧标注。这使得自然语言查询——例如“一辆绿色的车超车并挡在我前面”——可以直接与视频内容进行比较。 该工具将素材编入ChromaDB,实现快速搜索和自动剪辑。 索引成本约为每小时2.50美元的素材,对于包含静态时段的素材(如监控录像)则会降低。 Hacker News上的早期讨论强调了除了行车记录仪之外的潜在用例,包括家庭监控和监控,评论者指出安全应用的可能性“令人震惊”。 开发者计划添加置信度阈值以提高搜索准确性,目前即使置信度较低也会返回最接近的匹配项。 该项目的代码可在GitHub上获取。
相关文章

原文

Semantic search over dashcam footage. Type what you're looking for, get a trimmed clip back.

ClawHub Skill

demo.mp4

SentrySearch splits your dashcam videos into overlapping chunks, embeds each chunk directly as video using Google's Gemini Embedding model, and stores the vectors in a local ChromaDB database. When you search, your text query is embedded into the same vector space and matched against the stored video embeddings. The top match is automatically trimmed from the original file and saved as a clip.

  1. Clone and install:
git clone https://github.com/ssrajadh/sentrysearch.git
cd sentrysearch
python -m venv venv && source venv/bin/activate
pip install -e .
  1. Set up your API key:

This prompts for your Gemini API key, writes it to .env, and validates it with a test embedding.

  1. Index your footage:
sentrysearch index /path/to/dashcam/footage
  1. Search:
sentrysearch search "red truck running a stop sign"

ffmpeg is required for video chunking and trimming. If you don't have it system-wide, the bundled imageio-ffmpeg is used automatically.

Manual setup: If you prefer not to use sentrysearch init, you can copy .env.example to .env and add your key from aistudio.google.com/apikey manually.

$ sentrysearch init
Enter your Gemini API key (get one at https://aistudio.google.com/apikey): ****
Validating API key...
Setup complete. You're ready to go — run `sentrysearch index <directory>` to get started.

If a key is already configured, you'll be asked whether to overwrite it.

$ sentrysearch index /path/to/dashcam/footage
Indexing file 1/3: front_2024-01-15_14-30.mp4 [chunk 1/4]
Indexing file 1/3: front_2024-01-15_14-30.mp4 [chunk 2/4]
...
Indexed 12 new chunks from 3 files. Total: 12 chunks from 3 files.

Options:

  • --chunk-duration 30 — seconds per chunk
  • --overlap 5 — overlap between chunks
  • --no-preprocess — skip downscaling/frame rate reduction (send raw chunks)
  • --target-resolution 480 — target height in pixels for preprocessing
  • --target-fps 5 — target frame rate for preprocessing
  • --no-skip-still — embed all chunks, even ones with no visual change
$ sentrysearch search "red truck running a stop sign"
  #1 [0.87] front_2024-01-15_14-30.mp4 @ 02:15-02:45
  #2 [0.74] left_2024-01-15_14-30.mp4 @ 02:10-02:40
  #3 [0.61] front_2024-01-20_09-15.mp4 @ 00:30-01:00

Saved clip: ./match_front_2024-01-15_14-30_02m15s-02m45s.mp4

Options: --results N, --output-dir DIR, --no-trim to skip auto-trimming.

$ sentrysearch stats
Total chunks:  47
Source files:  12

Add --verbose to either command for debug info (embedding dimensions, API response times, similarity scores).

Gemini Embedding 2 can natively embed video — raw video pixels are projected into the same 768-dimensional vector space as text queries. There's no transcription, no frame captioning, no text middleman. A text query like "red truck at a stop sign" is directly comparable to a 30-second video clip at the vector level. This is what makes sub-second semantic search over hours of footage practical.

Indexing 1 hour of footage costs ~$2.50 with Gemini's embedding API (default settings: 30s chunks, 5s overlap). The API bills by video duration, so this cost is driven by the number of chunks, not file size.

Two built-in optimizations help reduce costs in different ways:

  • Preprocessing (on by default) — chunks are downscaled to 480p at 5fps before embedding. This reduces upload size and token count but does not reduce the number of API calls, so it primarily improves speed rather than cost.
  • Still-frame skipping (on by default) — chunks with no meaningful visual change (e.g. a parked car) are skipped entirely. This saves real API calls and directly reduces cost. The savings depend on your footage — Sentry Mode recordings with hours of idle time benefit the most, while action-packed driving footage may have nothing to skip.

Search queries are negligible (text embedding only).

Tuning options:

  • --chunk-duration / --overlap — longer chunks with less overlap = fewer API calls = lower cost
  • --no-skip-still — embed every chunk even if nothing is happening
  • --target-resolution / --target-fps — adjust preprocessing quality
  • --no-preprocess — send raw chunks to the API

Limitations & Future Work

  • Still-frame detection is heuristic — it uses JPEG file size comparison across sampled frames. It may occasionally skip chunks with subtle motion or embed chunks that are truly static. Disable with --no-skip-still if you need every chunk indexed.
  • Search quality depends on chunk boundaries — if an event spans two chunks, the overlapping window helps but isn't perfect. Smarter chunking (e.g. scene detection) could improve this.
  • Gemini Embedding 2 is in preview — API behavior and pricing may change.

This works with any footage in mp4 format, not just Tesla Sentry Mode. The directory scanner recursively finds all .mp4 files regardless of folder structure.

  • Python 3.10+
  • ffmpeg on PATH, or use bundled ffmpeg via imageio-ffmpeg (installed by default)
  • Gemini API key (get one free)
联系我们 contact @ memedata.com