管理你的SQL数据库模式和迁移的CLI工具
CLI to manage your SQL database schemas and migrations

原始链接: https://github.com/gh-PonyM/shed

## Shed:简化的数据库模式管理 Shed 是一款 CLI 工具,旨在简化 ETL 项目及其他项目的数据库模式管理。它利用 SQLModel ORM 和 Alembic 迁移,无需原始 SQL 模式定义。 主要功能包括: * **模式管理:** 使用 Python 模型轻松管理数据库模式。 * **自动 JSON Schema 导出:** 从 Pydantic 模型 (v2) 生成 JSON schema 用于数据验证 – 适用于 LLM 输出验证。 * **项目结构:** 创建标准化的项目布局,包含用于迁移和模型的专用文件夹。 * **环境配置:** 支持开发 (SQLite) 和生产 (PostgreSQL) 数据库的独立配置。 * **Alembic 封装器:** 简化 Alembic 命令,如 `revision` 和 `migrate`。 Shed 通过自动生成必要配置文件来简化 Alembic 设置,使开发人员和数据工程师能够专注于模型定义和数据管道。安装使用 `uv` 或 `pipx` 即可。

## SQL Schema & 迁移工具讨论 一则 Hacker News 讨论围绕着 SQL 数据库 schema 管理和迁移工具展开,起因是一个新的 CLI 工具 ([https://github.com/gh-ponym](https://github.com/gh-ponym))。 用户分享了他们使用各种方案的经验。**SQL Server Database Projects** 和 **Redgate 工具** 受到 SQL Server 用户的赞扬,而许多人更喜欢为 PostgreSQL 使用 **手写脚本**。**Goose** ([https://github.com/pressly/goose](https://github.com/pressly/goose)) 是基于 Go 的迁移的热门选择,它同时支持 Go 和原始 SQL。**Atlas** ([https://atlasgo.io](https://atlasgo.io)) 被作为 Alembic 的替代方案提出,支持 ORM 定义和 schema 扩展,但有些人对其非开源特性表示担忧。 对话强调了 ORM 在处理复杂数据库对象(如视图和存储过程)时的局限性。 许多评论者抱怨过去在多个客户端之间管理不善、不一致的数据库的经历,强调了对强大的迁移工具的需求。 讨论还涉及 SQL 客户端,**mycli** ([https://github.com/dbcli/mycli](https://github.com/dbcli/mycli)) 被提及作为标准 CLI 的替代品。 最后,强调了在开发和生产环境中使用相同的数据库(Postgres vs. SQLite)的重要性。
相关文章

原文

Are you doing some ETL project, you don't want to manage your database schema with raw SQL, and maybe validate data from external tools using those schemas e.g. when using LLM output? Then this tool might help you.

shed is a cli tool that acts like an application for your database schemas management using SQLModel orm and manages them for you using alembic. You get free json-schema export for all your pydantic.BaselModel (v2).

  • You can create your database models git repo that only manages db models, using shed to manage db and schemas
  • You can add shed as tool into an existing python project and add migration files to it, following the folder structure proposed below.

Using uv:

uv tool install https://github.com/gh-PonyM/shed.git

Using pipx:

pipx install git+https://github.com/gh-PonyM/shed.git#main

Create a new project inside the projects folder:

shed init news_agg -o projects -c postgres://user:pw@localhost:5432/db_name --env lab

This will create the following folder structure:

projects
└── news_agg
    ├── migrations
    │   └── versions
    └── models.py

This will create a config file for local and prod databases using sqlite for local and postgres for prod.

development:
  db:
    news_agg:
      connection:
        db_path: news_agg.sqlite
      type: sqlite
projects:
  news_agg:
    db:
      lab:
        connection:
          database: db_name
          host: localhost
          password: pw
          port: 5432
          username: user
        type: postgres
    module: ./projects/news_agg/models.py

Then just define your SQLModel files as in models.py.

  • Clone databases from prod to dev
  • Running short-commands for alembic like revision or migrate
  • Alembic command wrapper that takes your project and passes the db information to alembic
  • Exporting jsonschemas for your models (see schemas)

How it works under the hood

Alembic uses different config files and folders. The first entrypoint is the alembic.ini file and the top of the config template we use is this:

[alembic]
script_location = {script_dir}
prepend_sys_path = .
version_path_separator = os
sqlalchemy.url = sqlite:///:memory:
version_locations = {versions_dir}

As you can see, the folder containing the version python files can be specified as well as the script location where env.py and script.py.mako is expected to be found. Using a generated alembic.ini, env.py and script.py.mako, we have hassle creating those files for every project we need to manage data as developer / data engineer.

联系我们 contact @ memedata.com