本地第一，永远

本地第一，永远
Local First, Forever

原始链接: https://tonsky.me/blog/crdt-filesync/

本地优先软件是指优先将用户数据保留在本地，同时与互联网同步进行更新和备份的应用程序。与纯粹的本地软件或云软件不同，这种方法旨在平衡离线功能和偶尔的互联网连接。当本地优先的软件公司倒闭、导致用户无法同步数据时，就会出现挑战。为了解决这个问题，建议依靠流行的基于云的文件同步解决方案，例如 Dropbox，即使原始公司消失，这些解决方案也可能会持续存在。尽管与自定义选项相比相对简单，但这些文件同步器为大多数用户提供了可行的解决方案。通过在此类系统中利用无冲突复制数据类型 (CRDT)，用户可以有效解决同步期间潜在的数据不一致问题。 CRDT 可以无缝合并冲突的更改，而不会引起问题，从而确保用户之间的顺利协作并保持数据完整性。这使得本地优先的开发人员能够专注于构建控制良好的应用程序，同时使他们的客户能够跨各种平台和设备访问和管理他们的数据，最终增强整体用户体验。

在本文中，作者讨论了他们在本地优先应用程序（特别是那些具有 80% 本地功能和 20% 需要使用服务器的应用程序）货币化方面遇到的困难。他们提到黑曜石是一种灵感，它有一个免费的基础应用程序，但对网络功能收费。然而，作者质疑这种模式是否可持续，并指出有关其有效性的数据有限。作者还赞扬了 Yjs 的多主协作，并提到了用户体验和感觉超越单纯功能的重要性。他们强调承担开发成本和支付员工工资的挑战，特别是在竞争激烈的市场中，设计和营销的应用程序吸引更多用户。此外，他们还谈到了转换成本的概念，承认尽管做出了初步努力，但完全消除对互联网和基于云的服务的依赖仍然具有挑战性。最后，他们反思了历史上的各种商业模式，并质疑永久本地化、开放数据格式和自托管应用程序是否可以盈利。

So I was at the Local-First Conf the other day, listening to Martin Kleppmann, and this slide caught my attention:

Specifically, this part:

But first, some context.

For the long version, go to Ink & Switch, who coined the term. Or listen for Peter van Hardenberg explaining it on LocalFirst.fm.

Here’s my short version:

It’s software.
That prefers keeping your data local.
But it still goes to the internet occasionally to sync with other users, fetch data, back up, etc.

If it doesn’t go to the internet at all, it’s just local software.

If it doesn’t work offline with data it already has, then it’s just normal cloud software. You all know the type — sorry, Dave, I can’t play the song I just downloaded because your internet disappeared for one second...

But somewhere in the middle — local-first. We love it because it’s good for the end user, you and me, not for the corporations that produce it.

The goal of local-first software is to get control back into the hands of the user, right? You own the data (literally, it’s on your device), yada-yada-yada. That part works great.

However, local-first software still has this online component. For example, personal local-first software still needs to sync between your own devices. And syncing doesn’t work without a server...

So here we have a problem: somebody writes local-first software. Everybody who bought it can use it until the heat death of the universe. They own it.

But if the company goes out of business, syncing will stop working. And companies go out of business all the time.

What do we do?

The solution is to use something widely available that will probably outlive our company. We need something popular, accessible to everyone, has multiple implementations, and can serve as a sync server.

And what’s the most common end-user application of cloud sync?

Dropbox! Well, not necessarily Dropbox, but any cloud-based file-syncing solution. iCloud Drive, OneDrive, Google Drive, Syncthing, etc.

It’s perfect — many people already have it. There are multiple implementations, so if Microsoft or Apple go out of business, people can always switch to alternatives. File syncing is a commodity.

But file syncing is a “dumb” protocol. You can’t “hook” into sync events, or update notifications, or conflict resolution. There isn’t much API; you just save files and they get synced. In case of conflict, best case, you get two files. Worst — you get only one :)

This simplicity has an upside and a downside. The upside is: if you can work with that, it would work everywhere. That’s the interoperability part from Martin’s talk.

The downside is: you can’t do much with it, and it probably won’t be optimal. But will it be enough?

Let’s just save our state in a file and let Dropbox sync it (in my case, I’m using Syncthing, but it’s the same idea. From now on, I’ll use “Dropbox” as a common noun).

Simple:

But what happens if you change the state on two machines? Well, you get a conflict file:

Normally, it would’ve been a problem. But it’s not if you are using CRDT!

CRDT is a collection of data types that all share a very nice property: they can always be merged. It’s not always the perfect merge, and not everything can be made into a CRDT, but IF you can put your data into a CRDT, you can be sure: all merges will go without conflicts.

With CRDT, we can solve conflicts by opening both files, merging states, and saving back to state.xml. Simple!

Even in this form, Dropbox as a common sync layer works! There are some downsides, though:

conflicting file names are different between providers,
some providers might not handle conflicts at all,
it needs state-based CRDT.

The only way to avoid conflicts is to always edit locally. So let’s give each client its own file!

Now we just watch when files from other clients get changed and merge them with our own.

And because each file is only edited on one machine, Dropbox will not report any conflicts. Any conflicts inside the data will be resolved by us via CRDT magic.

What if your CRDT is operation-based? Meaning, it’s easier to send operations around, not the whole state?

You can always write operations into a separate append-only file. Again, each client only writes to its own, so no conflicts on the Dropbox level:

Now, the operations log can grow quite long, and we can’t count on Dropbox to reliably and efficiently sync only parts of the file that were updated.

In that case, we split operations into chunks. Less work for Dropbox to sync and less for us to catch up:

You can, of course, save the position in the file to only apply operations you haven’t seen. Basic stuff.

Theoretically, you should be able to do operational transformations this way, too.

A very simple proof-of-concept demo is at github.com/tonsky/crdt-filesync.

Here’s a video of it in action:

Under the hood, it uses Automerge for merging text edits. So it’s a proper CRDT, not just two files merging text diffs.

If you set out to build a local-first application that users have complete control and ownership over, you need something to solve data sync.

Dropbox and other file-sync services, while very basic, offer enough to implement it in a simple but working way.

Sure, it won’t be as real-time as a custom solution, but it’s still better for casual syncs. Think Apple Photos: only your own photos, not real-time, but you know they will be everywhere by the end of the day. And that’s good enough!

Imagine if Obsidian Sync was just “put your files in the folder” and it would give you conflict-free sync? For free? Forever? Just bring your own cloud?

I’d say it sounds pretty good.

本地第一，永远 Local First, Forever

本地第一，永远
Local First, Forever