Git 中大型文件的未来就是 Git。

Git 中大型文件的未来就是 Git。
The future of large files in Git is Git

原始链接: https://tylercipriani.com/blog/2025/08/15/git-lfs/

## Git 与大文件之战：超越 LFS 大文件长期以来一直困扰着 Git 仓库，导致存储膨胀和操作速度变慢。GitHub 的 Git LFS（大文件存储）于 2015 年出现，作为一种解决方法，将大文件单独存储。然而，LFS 引入了供应商锁定（与 GitHub 绑定）、存储成本以及协作者设置障碍等问题。 Git 项目本身一直在开发一种原生解决方案：**部分克隆**。使用 `--filter` 标志（例如 `git clone --filter='blobs:size=100k'`），部分克隆仅下载仓库的必要部分，跳过大文件直到需要时。这大大减少了克隆时间和检出大小——在一个例子中，速度提高了 97%，大小减少了 96%。虽然部分克隆需要为某些命令（如 `git diff`）获取缺失的数据，但这与 LFS 的行为类似。展望未来，**大对象承诺者**旨在通过允许 Git 主机将大文件卸载到专用存储来进一步简化此过程，从而提供类似 LFS 的好处，*而不会*产生用户端缺点。尽管仍在开发中，但这些进展预示着 Git 原生处理大文件的未来，可能使 Git LFS 过时。

## Git 与大文件：总结 Hacker News 的讨论集中在 Git 中管理大文件的挑战，以及 Git LFS 之外的潜在解决方案。许多用户对 LFS 表示不满，原因在于厂商锁定、成本（特别是 CI/CD 的带宽）以及离线工作流的限制。一个提议的替代方案“大对象承诺者”（Large Object Promisors），旨在将大文件处理的复杂性从客户端转移到服务器端，利用对象存储如 S3。这种方法允许根据文件年龄对存储进行智能分层。然而，人们对潜在的复杂性和对可访问的“承诺者”远程仓库的依赖表示担忧。一些评论者提倡使用 `git-annex` 或 Data Version Control (DVC) 等替代方案，并强调 Git 本身需要更好的浅克隆和部分克隆功能。一种反复出现的情绪是，Git 的设计最初并非为了大型二进制文件而设计的，可能需要对这种资产的版本控制进行更根本的重新思考。最终，讨论表明人们希望有一种更无缝、更高效、更经济的方式来在版本控制系统中处理大文件，许多人质疑当前形式的 Git 是否是最佳解决方案。

原文

If Git had a nemesis, it’d be large files.

Large files bloat Git’s storage, slow down git clone, and wreak havoc on Git forges.

In 2015, GitHub released Git LFS—a Git extension that hacked around problems with large files. But Git LFS added new complications and storage costs.

Meanwhile, the Git project has been quietly working on large files. And while LFS ain’t dead yet, the latest Git release shows the path towards a future where LFS is, finally, obsolete.

What you can do today: replace Git LFS with Git partial clone

Git LFS works by storing large files outside your repo.

When you clone a project via LFS, you get the repo’s history and small files, but skip large files. Instead, Git LFS downloads only the large files you need for your working copy.

In 2017, the Git project introduced partial clones that provide the same benefits as Git LFS:

Partial clone allows us to avoid downloading [large binary assets] in advance during clone and fetch operations and thereby reduce download times and disk usage.

– Partial Clone Design Notes, git-scm.com

Git’s partial clone and LFS both make for:

Small checkouts – On clone, you get the latest copy of big files instead of every copy.
Fast clones – Because you avoid downloading large files, each clone is fast.
Quick setup – Unlike shallow clones, you get the entire history of the project—you can get to work right away.

What is a partial clone?

A Git partial clone is a clone with a --filter.

For example, to avoid downloading files bigger than 100KB, you’d use:

git clone --filter='blobs:size=100k' <repo>

Later, Git will lazily download any files over 100KB you need for your checkout.

By default, if I git clone a repo with many revisions of a noisome 25 MB PNG file, then cloning is slow and the checkout is obnoxiously large:

$ time git clone https://github.com/thcipriani/noise-over-git
Cloning into '/tmp/noise-over-git'...
...
Receiving objects: 100% (153/153), 1.19 GiB

real    3m49.052s

$ du --max-depth=0 --human-readable noise-over-git/.
1.3G    noise-over-git/.
$ ^ 🤬

$ git config --global alias.pclone 'clone --filter=blob:limit=100k'
$ time git pclone https://github.com/thcipriani/noise-over-git
Cloning into '/tmp/noise-over-git'...
...
Receiving objects: 100% (1/1), 24.03 MiB

real    0m6.132s
$ du --max-depth=0 --human-readable noise-over-git/.
49M     noise-over-git/
$ ^ 😻 (the same size as a git lfs checkout)

My filter made cloning 97% faster (3m 49s → 6s), and it reduced my checkout size by 96% (1.3GB → 49M)!

But there are still some caveats here.

If you run a command that needs data you filtered out, Git will need to make a trip to the server to get it. So, commands like git diff, git blame, and git checkout will require a trip to your Git host to run.

But, for large files, this is the same behavior as Git LFS.

Plus, I can’t remember the last time I ran git blame on a PNG 🙃.

Why go to the trouble? What’s wrong with Git LFS?

Git LFS foists Git’s problems with large files onto users.

And the problems are significant:

🖕 High vendor lock-in – When GitHub wrote Git LFS, the other large file systems—Git Fat, Git Annex, and Git Media—were agnostic about the server-side. But GitHub locked users to their proprietary server implementation and charged folks to use it.
💸 Costly – GitHub won because it let users host repositories for free. But Git LFS started as a paid product. Nowadays, there’s a free tier, but you’re dependent on the whims of GitHub to set pricing. Today, a 50GB repo on GitHub will cost $40/year for storage. In contrast, storing 50GB on Amazon’s S3 standard storage is $13/year.
😰 Hard to undo – Once you’ve moved to Git LFS, it’s impossible to undo the move without rewriting history.
🌀 Ongoing set-up costs – All your collaborators need to install Git LFS. Without Git LFS installed, your collaborators will get confusing, metadata-filled text files instead of the large files they expect.

The future: Git large object promisors

Large files create problems for Git forges, too.

GitHub and GitLab put limits on file size because big files cost more money to host. Git LFS keeps server-side costs low by offloading large files to CDNs.

But the Git project has a new solution.

Earlier this year, Git merged a new feature: large object promisers. Large object promisors aim to provide the same server-side benefits as LFS, minus the hassle to users.

git-scm.com

What is a large object promisor?

Large object promisors are special Git remotes that only house large files.

In the bright, shiny future, large object promisors will work like this:

You push a large file to your Git host.
In the background, your Git host offloads that large file to a large object promisor.
When you clone, the Git host tells your Git client about the promisor.
Your client will clone from the Git host, and automagically nab large files from the promisor remote.

But we’re still a ways off from that bright, shiny future.

Git large object promisors are still a work in progress. Pieces of large object promisors merged to Git in March of 2025. But there’s more to do and open questions yet to answer.

And so, for today, you’re stuck with Git LFS for giant files. But once large object promisors see broad adoption, maybe GitHub will let you push files bigger than 100MB.

The future of large files in Git is Git.

The Git project is thinking hard about large files, so you don’t have to.

Today, we’re stuck with Git LFS.

But soon, the only obstacle for large files in Git will be your half-remembered, ominous hunch that it’s a bad idea to stow your MP3 library in Git.