展示HN:自动化机器人数据质量改进
Show HN: Automate Robot Data Quality Improvement

原始链接: https://github.com/RoboticsData/score_lerobot_episodes

## LeRobot 剧集评分工具摘要 该工具集提供了一种使用传统计算机视觉指标和可选的Gemini驱动的视觉-语言模型(VLM)检查来自动评估LeRobot演示剧集质量的方法。它在视觉清晰度、流畅性、碰撞检测、运行时和任务成功等维度上对剧集进行评分,为每个维度分配0-1的分数。 该工具集允许用户过滤低质量剧集以改进下游训练,并比较在过滤数据集与未过滤数据集上训练的模型性能。主要功能包括:对数据集进行评分、根据用户定义的阈值过滤剧集以及与LeRobot的训练流程集成。 用户可以选择基于OpenCV的视觉评分,或利用Gemini进行基于VLM的分析(需要Google API密钥)。该工具集可通过pip轻松安装,并提供命令行参数以进行自定义,包括数据集位置、输出路径和训练选项。会生成详细的评分报告和可视化效果,以帮助识别有问题剧集并优化数据集质量。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 展示HN:自动化机器人数据质量改进 (github.com/roboticsdata) 9点 由 machinelearning 1天前 | 隐藏 | 过去 | 收藏 | 1评论 marshavoidance 1天前 [–] 这个工具通过分析模糊、碰撞和运动平滑度来“评估”机器人演示片段,然后从数据集中过滤掉不良片段。这似乎是一种务实的方法来解决机器人领域的数据质量问题,很期待看到它在实际训练中的效果。回复 考虑申请YC冬季2026批次!申请截止日期为11月10日 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

LeRobotEpisodeScoringToolkit

A lightweight toolkit for quantitatively scoring LeRobot episodes.

License: Apache 2.0 Python 3.8+ GitHub stars


A comprehensive toolkit for evaluating and filtering LeRobot episode datasets based on multiple quality dimensions. It combines classic Computer Vision heuristics (blur/exposure tests, kinematic smoothness, collision spikes) with optional Gemini-powered vision-language checks to give each episode a 0–1 score across multiple quality dimensions.

Use this toolkit to:

  • Automatically score robot demonstration episodes on visual clarity, motion smoothness, collision detection, and more
  • Filter low-quality episodes to improve downstream training performance
  • Train and compare baseline vs. filtered dataset models
  • Visualize score distributions and identify problematic episodes


Dimension Function What it measures
Visual clarity score_visual_clarity Blur, over/under-exposure, low-light frames
Smoothness score_smoothness 2nd derivative of joint angles
Path efficiency score_path_efficiency Ratio of straight-line vs. actual joint-space path
Collision / spikes score_collision Sudden acceleration outliers (proxy for contacts)
Joint stability (final 2 s) score_joint_stability Stillness at the goal pose
Gripper consistency score_gripper_consistency Binary "closed vs. holding" agreement
Actuator saturation score_actuator_saturation Difference between commanded actions and achieved states
Task success (VLM) score_task_success (via VLMInterface) Gemini grades whether the desired behaviour happened
Task success (VLM) score_task_success (via VLMInterface) Gemini grades whether the desired behavior happened
Runtime penalty / outliers score_runtime + build_time_stats, is_time_outlier Episode length vs. nominal / Tukey-IQR / Z-score fences

  • Python 3.8 or higher
  • pip package manager
  1. Clone the repository

    git clone https://github.com/RoboticsData/score_lerobot_episodes.git
    cd score_lerobot_episodes
  2. Install dependencies

    pip install -r requirements.txt
  3. Set up API keys (optional)

    Only required if using VLM-based scoring with Gemini:

    export GOOGLE_API_KEY="your-api-key-here"

    Note: The free tier rate limits of the Gemini API are fairly restrictive and might need to be upgraded depending on episode length. Check Gemini API rate limits for more info.


Score a dataset and save results:

python score_dataset.py \
  --repo_id lerobot/aloha_static_pro_pencil \
  --output ./output/lerobot/aloha_static_pro_pencil \
  --threshold 0.5

This will:

  1. Download and load the dataset from HuggingFace
  2. Score each episode across multiple quality dimensions
  3. Save scores to output path
  4. Filter episodes with aggregate score >= 0.5
  5. Save the filtered dataset to the output directory

  • --repo_id: HuggingFace repository ID for the dataset (e.g., username/dataset-name)
  • --root: Local path to dataset root (default: downloads from HuggingFace Hub)
  • --output: Output directory for filtered dataset (default: None, no filtering)
  • --threshold: Minimum aggregate score to keep episodes (default: 0.5, range: 0.0-1.0)
  • --nominal: Expected episode duration in seconds (used for runtime scoring)
  • --vision_type: Vision scoring method, choices: opencv (default), vlm_gemini
  • --policy_name: Policy type for training (default: act)
  • --overwrite: Overwrite existing filtered dataset (default: True)
  • --overwrite_checkpoint: Overwrite existing training checkpoints (default: False)
  • --train-baseline: Train model on unfiltered dataset (default: False)
  • --train-filtered: Train model on filtered dataset (default: False)
  • --plot: Display score distribution plots in terminal (default: False)

1. Basic scoring (no filtering)

python score_dataset.py --repo_id username/my-robot-dataset

2. Score and filter dataset

python score_dataset.py \
  --repo_id username/my-robot-dataset \
  --output ./output/username/my-robot-dataset \
  --threshold 0.6

3. Score with VLM-based vision analysis

export GOOGLE_API_KEY="your-key"
python score_dataset.py \
  --repo_id username/my-robot-dataset \
  --vision_type vlm_gemini \
  --output ./filtered_data

4. Score, filter, and train both baseline and filtered models

python score_dataset.py \
  --repo_id username/my-robot-dataset \
  --output ./output/username/my-robot-dataset \
  --threshold 0.5 \
  --train-baseline True \
  --train-filtered True \
  --policy_name act

5. Visualize distributions

python score_dataset.py \
  --repo_id username/my-robot-dataset \
  --threshold 0.7 \
  --plot True

6. Use local dataset instead of downloading

python score_dataset.py \
  --repo_id username/my-robot-dataset \
  --root /path/to/local/dataset \
  --output ./filtered_output

Saved to results/{repo_id}_scores.json:

[
  {
    "episode_id": 0,
    "camera_type": "camera_0",
    "video_path": "/path/to/video.mp4",
    "aggregate_score": 0.752,
    "per_attribute_scores": {
      "visual_clarity": 0.85,
      "smoothness": 0.78,
      "collision": 0.92,
      "runtime": 0.65
    }
  },
  ...
]

Displays a formatted table showing scores for each episode:

Episode scores (0–1 scale)
─────────────────────────────────────────────────────────────────
Episode Camera                       visual_clarity  smoothness  collision  runtime  Aggregate  Status
0       camera_0                              0.850       0.780      0.920    0.650      0.752  GOOD
1       camera_1                              0.420       0.650      0.710    0.580      0.590  BAD
...
─────────────────────────────────────────────────────────────────
Average aggregate over 20 videos: 0.671
Percentage of episodes removed: 0.25, total: 5

When using --output, a new filtered dataset is created with only episodes scoring above the threshold, maintaining the original LeRobot dataset structure.


📂 Repository Structure

score_lerobot_episodes/
├── score_dataset.py      # Main scoring script
├── data.py               # Dataset loading and filtering utilities
├── vlm.py                # Vision-Language Model interface (Gemini)
├── train.py              # Training pipeline integration
├── evaluation.py         # Evaluation utilities
├── corrupt.py            # Data corruption tools for robustness testing
├── ui.py                 # Streamlit web interface (if available)
├── requirements.txt      # Python dependencies
├── README.md            # This file
├── CONTRIBUTING.md      # Contribution guidelines
├── LICENSE              # Apache 2.0 license
├── results/             # Generated score JSON files
├── output/              # Filtered datasets
└── checkpoints/         # Training checkpoints

🤖 Training and Evaluation

The toolkit integrates with LeRobot's training pipeline to compare baseline vs. filtered dataset performance.

  1. Baseline Training: Train on the original unfiltered dataset

    python score_dataset.py \
      --repo_id username/dataset \
      --train-baseline True
  2. Filtered Training: Train on the quality-filtered dataset

    python score_dataset.py \
      --repo_id username/dataset \
      --output ./filtered_data \
      --threshold 0.6 \
      --train-filtered True
  3. Compare Both: Run both training pipelines in one command

    python score_dataset.py \
      --repo_id username/dataset \
      --output ./filtered_data \
      --train-baseline True \
      --train-filtered True
  • Default policy: ACT (Action Chunking Transformer)
  • Default steps: 10,000
  • Batch size: 4
  • Checkpoints saved to ./checkpoints/{job_name}/
  • WandB logging enabled by default

You can customize training parameters by modifying train.py.


1. ModuleNotFoundError: No module named 'google.generativeai'

  • Solution: Install dependencies with pip install -r requirements.txt
  • If using VLM scoring, ensure google-generativeai is installed

2. API rate limit errors with Gemini

  • Solution: The free tier has restrictive limits. Consider:
    • Using --vision_type opencv instead
    • Upgrading to a paid Gemini API tier
    • Processing smaller batches

3. All episodes filtered out

  • Error: ValueError: All episodes filtered out, decrease threshold to fix this
  • Solution: Lower the --threshold value (e.g., from 0.5 to 0.3)

4. Dataset not found

  • Solution:
    • Verify the --repo_id is correct
    • Check internet connection for HuggingFace Hub access
    • Use --root to specify a local dataset path

5. Out of memory during training

  • Solution: Reduce batch_size in train.py:44 or use a smaller model

6. Permission errors when overwriting

  • Solution: Use --overwrite True or manually delete the output directory

We welcome contributions! Please see CONTRIBUTING.md for guidelines on:

  • Setting up a development environment
  • Code style and conventions
  • Submitting pull requests
  • Reporting issues
  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Star History Chart


LeRobot Episode Scoring Toolkit is distributed under the Apache 2.0 License. See LICENSE for more information.


联系我们 contact @ memedata.com