展示HN：自动化机器人数据质量改进

展示HN：自动化机器人数据质量改进
Show HN: Automate Robot Data Quality Improvement

原始链接: https://github.com/RoboticsData/score_lerobot_episodes

## LeRobot 剧集评分工具摘要该工具集提供了一种使用传统计算机视觉指标和可选的Gemini驱动的视觉-语言模型（VLM）检查来自动评估LeRobot演示剧集质量的方法。它在视觉清晰度、流畅性、碰撞检测、运行时和任务成功等维度上对剧集进行评分，为每个维度分配0-1的分数。该工具集允许用户过滤低质量剧集以改进下游训练，并比较在过滤数据集与未过滤数据集上训练的模型性能。主要功能包括：对数据集进行评分、根据用户定义的阈值过滤剧集以及与LeRobot的训练流程集成。用户可以选择基于OpenCV的视觉评分，或利用Gemini进行基于VLM的分析（需要Google API密钥）。该工具集可通过pip轻松安装，并提供命令行参数以进行自定义，包括数据集位置、输出路径和训练选项。会生成详细的评分报告和可视化效果，以帮助识别有问题剧集并优化数据集质量。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录展示HN：自动化机器人数据质量改进 (github.com/roboticsdata) 9点由 machinelearning 1天前 | 隐藏 | 过去 | 收藏 | 1评论 marshavoidance 1天前 [–] 这个工具通过分析模糊、碰撞和运动平滑度来“评估”机器人演示片段，然后从数据集中过滤掉不良片段。这似乎是一种务实的方法来解决机器人领域的数据质量问题，很期待看到它在实际训练中的效果。回复考虑申请YC冬季2026批次！申请截止日期为11月10日指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

A lightweight toolkit for quantitatively scoring LeRobot episodes.

A comprehensive toolkit for evaluating and filtering LeRobot episode datasets based on multiple quality dimensions. It combines classic Computer Vision heuristics (blur/exposure tests, kinematic smoothness, collision spikes) with optional Gemini-powered vision-language checks to give each episode a 0–1 score across multiple quality dimensions.

Use this toolkit to:

Automatically score robot demonstration episodes on visual clarity, motion smoothness, collision detection, and more
Filter low-quality episodes to improve downstream training performance
Train and compare baseline vs. filtered dataset models
Visualize score distributions and identify problematic episodes

Dimension	Function	What it measures
Visual clarity	`score_visual_clarity`	Blur, over/under-exposure, low-light frames
Smoothness	`score_smoothness`	2nd derivative of joint angles
Path efficiency	`score_path_efficiency`	Ratio of straight-line vs. actual joint-space path
Collision / spikes	`score_collision`	Sudden acceleration outliers (proxy for contacts)
Joint stability (final 2 s)	`score_joint_stability`	Stillness at the goal pose
Gripper consistency	`score_gripper_consistency`	Binary "closed vs. holding" agreement
Actuator saturation	`score_actuator_saturation`	Difference between commanded actions and achieved states
Task success (VLM)	`score_task_success` (via `VLMInterface`)	Gemini grades whether the desired behaviour happened
Task success (VLM)	`score_task_success` (via `VLMInterface`)	Gemini grades whether the desired behavior happened
Runtime penalty / outliers	`score_runtime` + `build_time_stats`, `is_time_outlier`	Episode length vs. nominal / Tukey-IQR / Z-score fences

Python 3.8 or higher
pip package manager

Clone the repository

git clone https://github.com/RoboticsData/score_lerobot_episodes.git
cd score_lerobot_episodes

Install dependencies
```
pip install -r requirements.txt
```
Set up API keys (optional)

Only required if using VLM-based scoring with Gemini:
```
export GOOGLE_API_KEY="your-api-key-here"
```
Note: The free tier rate limits of the Gemini API are fairly restrictive and might need to be upgraded depending on episode length. Check Gemini API rate limits for more info.

Score a dataset and save results:

python score_dataset.py \
  --repo_id lerobot/aloha_static_pro_pencil \
  --output ./output/lerobot/aloha_static_pro_pencil \
  --threshold 0.5

This will:

Download and load the dataset from HuggingFace
Score each episode across multiple quality dimensions
Save scores to output path
Filter episodes with aggregate score >= 0.5
Save the filtered dataset to the output directory

--repo_id: HuggingFace repository ID for the dataset (e.g., username/dataset-name)

--root: Local path to dataset root (default: downloads from HuggingFace Hub)
--output: Output directory for filtered dataset (default: None, no filtering)
--threshold: Minimum aggregate score to keep episodes (default: 0.5, range: 0.0-1.0)
--nominal: Expected episode duration in seconds (used for runtime scoring)
--vision_type: Vision scoring method, choices: opencv (default), vlm_gemini
--policy_name: Policy type for training (default: act)
--overwrite: Overwrite existing filtered dataset (default: True)
--overwrite_checkpoint: Overwrite existing training checkpoints (default: False)
--train-baseline: Train model on unfiltered dataset (default: False)
--train-filtered: Train model on filtered dataset (default: False)
--plot: Display score distribution plots in terminal (default: False)

1. Basic scoring (no filtering)

python score_dataset.py --repo_id username/my-robot-dataset

2. Score and filter dataset

python score_dataset.py \
  --repo_id username/my-robot-dataset \
  --output ./output/username/my-robot-dataset \
  --threshold 0.6

3. Score with VLM-based vision analysis

export GOOGLE_API_KEY="your-key"
python score_dataset.py \
  --repo_id username/my-robot-dataset \
  --vision_type vlm_gemini \
  --output ./filtered_data

4. Score, filter, and train both baseline and filtered models

python score_dataset.py \
  --repo_id username/my-robot-dataset \
  --output ./output/username/my-robot-dataset \
  --threshold 0.5 \
  --train-baseline True \
  --train-filtered True \
  --policy_name act

5. Visualize distributions

python score_dataset.py \
  --repo_id username/my-robot-dataset \
  --threshold 0.7 \
  --plot True

6. Use local dataset instead of downloading

python score_dataset.py \
  --repo_id username/my-robot-dataset \
  --root /path/to/local/dataset \
  --output ./filtered_output

Saved to results/{repo_id}_scores.json:

[
  {
    "episode_id": 0,
    "camera_type": "camera_0",
    "video_path": "/path/to/video.mp4",
    "aggregate_score": 0.752,
    "per_attribute_scores": {
      "visual_clarity": 0.85,
      "smoothness": 0.78,
      "collision": 0.92,
      "runtime": 0.65
    }
  },
  ...
]

Displays a formatted table showing scores for each episode:

Episode scores (0–1 scale)
─────────────────────────────────────────────────────────────────
Episode Camera                       visual_clarity  smoothness  collision  runtime  Aggregate  Status
0       camera_0                              0.850       0.780      0.920    0.650      0.752  GOOD
1       camera_1                              0.420       0.650      0.710    0.580      0.590  BAD
...
─────────────────────────────────────────────────────────────────
Average aggregate over 20 videos: 0.671
Percentage of episodes removed: 0.25, total: 5

When using --output, a new filtered dataset is created with only episodes scoring above the threshold, maintaining the original LeRobot dataset structure.

📂 Repository Structure

score_lerobot_episodes/
├── score_dataset.py      # Main scoring script
├── data.py               # Dataset loading and filtering utilities
├── vlm.py                # Vision-Language Model interface (Gemini)
├── train.py              # Training pipeline integration
├── evaluation.py         # Evaluation utilities
├── corrupt.py            # Data corruption tools for robustness testing
├── ui.py                 # Streamlit web interface (if available)
├── requirements.txt      # Python dependencies
├── README.md            # This file
├── CONTRIBUTING.md      # Contribution guidelines
├── LICENSE              # Apache 2.0 license
├── results/             # Generated score JSON files
├── output/              # Filtered datasets
└── checkpoints/         # Training checkpoints

🤖 Training and Evaluation

The toolkit integrates with LeRobot's training pipeline to compare baseline vs. filtered dataset performance.

Baseline Training: Train on the original unfiltered dataset

python score_dataset.py \
  --repo_id username/dataset \
  --train-baseline True

Filtered Training: Train on the quality-filtered dataset

python score_dataset.py \
  --repo_id username/dataset \
  --output ./filtered_data \
  --threshold 0.6 \
  --train-filtered True

Compare Both: Run both training pipelines in one command

python score_dataset.py \
  --repo_id username/dataset \
  --output ./filtered_data \
  --train-baseline True \
  --train-filtered True

Default policy: ACT (Action Chunking Transformer)
Default steps: 10,000
Batch size: 4
Checkpoints saved to ./checkpoints/{job_name}/
WandB logging enabled by default

You can customize training parameters by modifying train.py.

1. ModuleNotFoundError: No module named 'google.generativeai'

Solution: Install dependencies with pip install -r requirements.txt
If using VLM scoring, ensure google-generativeai is installed

2. API rate limit errors with Gemini

Solution: The free tier has restrictive limits. Consider:
- Using --vision_type opencv instead
- Upgrading to a paid Gemini API tier
- Processing smaller batches

3. All episodes filtered out

Error: ValueError: All episodes filtered out, decrease threshold to fix this
Solution: Lower the --threshold value (e.g., from 0.5 to 0.3)

4. Dataset not found

Solution:
- Verify the --repo_id is correct
- Check internet connection for HuggingFace Hub access
- Use --root to specify a local dataset path

5. Out of memory during training

Solution: Reduce batch_size in train.py:44 or use a smaller model

6. Permission errors when overwriting

Solution: Use --overwrite True or manually delete the output directory

We welcome contributions! Please see CONTRIBUTING.md for guidelines on:

Setting up a development environment
Code style and conventions
Submitting pull requests
Reporting issues

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

LeRobot Episode Scoring Toolkit is distributed under the Apache 2.0 License. See LICENSE for more information.