(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=38489307

然而,将更改从源推送到远程数据库会产生与经典两阶段提交相同的潜在不一致问题。 此外,对于多个并发编辑者实时修改磁盘上的数据库,设计变得更加困难。 尽管 SQLSync 支持“仅本地”模式,在这种模式下,它不维护返回远程服务器的链接,但这可能会导致多个断开连接的位置发生冲突的更改。 为了减轻冲突并确保在多个断开连接的设备(例如手机)上同时编辑相同内容时的最终一致性,作者使用多版本并发控制扩展了 CRDT [8]。 最后,为了最大限度地减少客户端设备和主服务器之间的数据传输成本,可以采用中间层来存储本地编辑,然后再推送到源服务器。 然而,这带来了额外的开销,并且需要开发分布式版本控制算法; 它超出了本文的范围,因此我们不会详细讨论它。 参考: [6] M. Stone 等人,SQL 事务:性能分析和设计影响,Proc。 国际会议 数据管理 (SIGMOD) (ICSD),第 1-14 页,1976 年 4 月。 [7] T. Gray 等人,与关系数据库系统的交互,Commun。 ACM SIGCAT 计算系统实验特刊 (SIGCIS),卷。 13、编号。 1975 年 7 月 3 日。 [8] Y. Z. Wang 等人,在没有全局锁定的情况下实现分布式数据库:Crdt 概念和定理,arXiv:cs/000619v1,2000 年 2 月。

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
Accidental database programming (sqlsync.dev)
541 points by thunderbong 1 day ago | hide | past | favorite | 279 comments










I’m familiar with this project - the creator is a friend. I’ll try to get him on here to answer questions.

He’s a seasoned database architect. With SQLsync he’s made a way for frontend developers to query and update a remote database as if it was completely located right in the browser. Because it basically is. The power of WASM makes it possible to ship a whole SQLite database to the browser. The magic is in how it syncs from multiple clients with a clever but simple reactive algorithm.

It’s a radical attack on the whole problem. Much of our work as developers is about syncing data. When you start looking at React and REST APIs as a kind of sync procedure, this approach can open a lot of new possibilities. You don’t have to write a weird bespoke database of trees of objects fetched and cached from the API any more. You can just update and query it locally, with all the power of a relational database.



I find that moving the full query system into the front end is where most front end devs really want to be. They want a full power query system for the data instead of continuous rounds of re-inventing the transport layer, REST, GraphQL, *RPC, etc.

It's hard to adopt such a system in most traditional web shops with their specialized backend and frontend teams. You're pulling out the database, backend, transport, and auth layers and replacing them with this single block system. Most system architects grew up in the backend so they are generally pretty ignorant of this issue. As it touches both sides extensively you're probably not fitting this into an existing system, which leaves only green field new development. Finally your backend is not an AWS or Asure service, neither is it lambda friendly. All of this means that most architect types I talk to will never touch it.

This style of system mostly already exists with old tech, CouchDB+PouchDB. Which works pretty well for some things. The downsides are that the query system isn't really ideal and the auth and data scoping system is pretty foreign to most people. The easiest model to work with is when the data is totally owned by a single user, and then you use the out-of-the-box database-per-user model. High data segmentation with CRDTs removes a lot of conflict issues.

It has scaling issues though, CouchDB has really high CPU requirements when you're connecting 10k to 100k users. The tech is long in the tooth though it is maintained. On the system design side it gets really complicated when you start sharing data between users, which makes it rather unsuitable as you're just moving the complexity rather than solving it.

This approach seems to hit the same target though will likely have similar scaling issues.

Look forward to see the evolution of the system. Looks like a first step into the world.



It's not quite shipping the DB to the client, but I like the Supabase/PostgREST approach for replacing everything between the client and SQL server with a single very thin proxy that maps tables and functions to REST calls. Even the auth mechanism is fundamentally just Postgres RLS with signed tokens that you can query against in your RLS policies.


Reminds me of Meteorjs. It would let you sync a subset of your data to the client and then the client could query it any which way it wanted. They called this “Minimongo”.


I've used Meteor. I thought it was a good system. It didn't have offline capability, at least not back when I used it. It really needed to be connected to work. But conceptually, yes, it had a very similar system.


If Meteor could scale, we'd probably hear about it way more these days.

I remember having problems with 200 users serving from my above average dev pc for testing internal tools.

It's a DX dream though.



Yay, we're moving back to fat clients! What has been is what will be, and what was done is what will be done, there is nothing new under the sun.


Those pesky backends are so annoying, so why don't we just put a backend on every client?


Schema and data migrations are too tricky, so why not have every client do it.


connecting to remote DB and impersonating a user account for bug hunting is too easy right now, let’s create the need for a way to proxy to local client computers with greater access to their private information.


I'm on the fat client train with my company and I nudge my clients that way if they're open. It's just a great way to build a system.


Ohh can't wait for the inevitable next step of dropping the "web" part of web assembly and doing, ya know, native fat clients again.


I work with lean, business speculative software mostly. Which means not cross-platform native development is simply not economical to do. I generally need to be able to hit Windows, iOS, Android, and MacOS square on with one code base.

A "native" electron or capacitor distribution system is a fine extension of a local-first web client. And an advantage of building fat clients generally is they lend themselves to such distribution models much easier than say, htmx or hotwire.

Native fat client have had their benefits and lots of people still prefer them, but always had the drawback of manual data management and installs. Being able to leverage any device you own with a cloud synced local-first client really gives you the best of both worlds.

But not all software fits readily within this model.



Why not Java?


Java fails on multiple points.

First, my list failed to include web because of the context. Web is, by far, the largest and most important platform. Even if I'm building only native installers for a project, I need to be able to build web projects with the same tools to make this work.

Java also fails "one code base" requirement as desktop and mobile are very different. The poor iOS support is going to cause it to fail the "square on" requirement as well.

No on Java.



Excel is a great fat client. Writing a sync in VBA is not, but some of the pieces are already there.


Great until you have to support n versions on m platforms and half your customers are enterprisey and stay on a 6-year-old version from before your last LTS version on a now-unsupported platform because they built a core part of their business processes on a a misfeature.


I've worked on various forms of "legacy code" for most of my career. As long as the economics line up and the customer is willing to pay for the required support, then it's a fine business avenue.

If economics don't line up then you have to pull the plug and set them adrift, which is much easier and more secure with a fat client that runs without a server than say, a complex cloud system.



Yes but targeting WASM and SQLite minimizes that pain quite a bit.


Remember when targeting Macromedia Flash was going to solve the web compatibility and interactivity conundrum?


> Remember when targeting Macromedia Flash was going to solve the web compatibility and interactivity conundrum?

This sounds like the set up for a "No? Me neither." punchline. Certainly one of the features of Flash is that it gave you fairly good consistency across computers, but honestly my perception of Flash wasn't that it was going to solve some grand web problem, but more "oooh look, shiny rollover buttons!" and "ooh look, super obnoxious advertisement that forces people to pay attention to it!"



> "oooh look, shiny rollover buttons!"

That's the interactivity aspect. At the time, it was the bees knees of UI capabilities.



Yeah? It was targeted for destruction by Apple because it was buggy and insecure, not because it wasn't delivering.


Don't forget horrendous mobile performance (battery drain)!


Don't forget proprietary.


And the lack of accessibility. You generally couldn't even copy and paste text out of it.


How is this approach meant to handle data visibility and access control? Often a large part of a backend is materializing raw data into a form that the active user is allowed to view.


So if the user owns all their own data, their "data view" is their data set. A To-Do system, a personal finance app, any kind of note-taking or personal record keeping fits this model.

You create a database per user and the auth and sync are all self contained within that database. This system is multi-master, which means that any change on a client or on the server will be replicated to every other. There is no "authority" which trumps the others. The server is simply a central hub that requires the right authentication to allow the sync process to happen.

When you want to create a set of data that crosses user boundaries, it gets complicated. It's possible to do, but you're not on the easy train anymore.

Creating a system that's both easy to use, and scopes the right data view out of the system wide tables and rows we usually think of databases, is not the CouchDB nor SQLSync model.



> When you want to create a set of data that crosses user boundaries, it gets complicated.

So it sounds like this excludes most enterprise use cases?



If I'm generalizing. B2C products frequently fit but not always. B2B products generally don't but can in some circumstances.


Correct me if I'm wrong: we can avoid the idea of a master for this use case because we suppose that only a single client (also server, I guess) will write at a time?


One user can have multiple clients. This is frequently the case, many to most users have both a PC and a phone. Also when one allows reasonable sharing of the account with family, 5+ connected clients is common.


You’re wrong if clients can be used offline and sync when they come back online.


My problem is that to-do and note taking apps are more or less toy problems. A radical solution to the challenge of building these kinds of apps just doesn't seem that interesting.


That sounds awfully like Couchbase, which allows you to query/update databases that will sync to remote and the back to peers. And you can control the process (auth/business logic) with sever side JavaScript plugin with ease.


Creator of Couchbase Mobile here — I’m doing a new web-based thing[1] with a similar reactive API. I’m hoping that my encrypted block replication makes it more of a “data anywhere” solution than a “local first” database. But the paradigm of powerful databases in the browser is definitely one I’m glad to see becoming popular.

[1] https://fireproof.storage/



Very exciting I shall as I was a fan of your prior project!


Genuinely curious why not just cache the relevant bits in LocalStorage / SessionStorage? I seem to remember Chrome trying to add a literal SQL database to the browser, but it never panned out, localStorage became king. I don't mean to downplay the usefulness, just I usually opt for what the browser gives me. I'm huge on WASM and what it will do for the browser as it matures more (or grows in features).


Good question.

First to address the main point: why not cache the relevant bits in some kind of local storage. SQLSync plans on doing this, specifically using OPFS for performance (but will have fallbacks to localstorage if needed).

Second to address the question of why not use built in kv stores or browser side databases. One answer is another question: how do you solve sync?

One approach is using a data model that encodes conflict handling directly, like CRDTs. This approach is easier to put into general kv stores, as syncing requires simply exchanging messages in any order. I find this solution is well suited to unstructured collaboration like text editing, but makes it harder to coordinate centralised changes to the data. Centralised changes are nice when you start introducing authentication, compaction, and upgrades.

Another approach is doing something similar to how Git Rebase works. The idea is to let the application state and server state diverge, and then provide an efficient means for the app to periodically reset to the latest server state and replay any unacked mutations. This approach requires the ability to re-run mutations efficiently as well as efficiently track multiple diverging versions of the database state. It's certainly possible to build this model on top of local storage.

For SQLSync, I found that by controlling the entirety of SQLite and the underlying storage layer I was able to create a solution that works across platforms and offers a fairly consistent performance profile. The same solution runs in native apps, browser sessions (main thread or workers), and on serverless platforms. One of my goals is to follow the lead of SQLite and keep my solution fairly agnostic to the platform (while providing the requisite hooks for things like durable storage).



FWIW, Web SQL was always fine, but could never be standardized, because no one was ever going to redo all the work sqlite has done (when every browser already uses sqlite).

https://en.wikipedia.org/wiki/Web_SQL_Database



Firefox fought against WebSQL. Firefox then re-implemented indexedDB with SQLite on their own browser. Firefox has now largely faded into obscurity.


Tbf, the WebSQL standard was not well-written from how I've heard that story told. It was bug-for-bug exactly standardized to a particular version of SQLite, which is not a good way to write a standard.


The important thing is - Firefox has been slowly dying for a decade and SQLite has taken over the world.


Firefox is nowhere near dead on my devices!


The issue was that a specific library would be pinned at a specific version for the rest of the history of the web. As good as SQLite is, I hope to hell we're not still stuck using it to handle mining operations in the oort cloud in 200 years.


This is why wasm is great. Webpages can just ship whatever version of SQLite they want. And/or eventually migrate to something better.


There is a literal SQL store in the browser its the sqlite Wasm port. Its just panning out a little differently.


Which works only on Chrome, IIRC.


IndexDB is even better, it supports a wider variety of data serialization, can be queried and versioned


Because if this works it's amazing. Realtime sync with offline support out of the box, while not having to develop state management on client and api, but in one place. Those are very hard problems, done with less development. Will definitely give it a shot.


An old company I worked for used project management software with a check-in/out mechanism for making changes. When you "check out" a project it downloads a copy that you change locally, then "check in" uploads it back to the server. A project is "locked" while in the "checked out" state. We all felt it was an archaic mechanism in a word of live updating apps.

After 10 years of building SPA "web apps", that data synchronization mechanism feels ahead of its time.



What many people either can't or don't want to acknowledge is that ultimately whether or not you support live updates in parallel by multiple users, instead of locking so only one update can proceed at a time, is not a technical decision, it's a business decision: do the business rules that are appropriate for your application enable you to deal with concurrent live updates or not?

Ultimately that comes down to whether you can implement a process to ensure consistent resolution of any incompatibilities between multiple concurrent updates. Sometimes that can be done, and sometimes it can't, and which is the case for your application depends on your business rules, not on any technical capability.

If your business rules don't allow you to implement a resolution mechanism, you need locking so that only one update can happen at a time, whether you have the technical capability to support concurrent updates or not.



Indeed, many of the most painful technical problems are actually three business problems in a trenchcoat.


Sometimes known as, "two people with firing authority fighting a proxy war through the dev team".


So true ... and often those business decision are not yours to make.


This is one of those phrases that should turn into a saying, and be passed around for hundreds of years.

Every hard problem I have today in my career involves getting business people to define their business problem properly in order to solve it with technology. Even the hardest code I've ever written was easy compared to some projects, simply due to the business issues lurking around the project. Last week I finished a script to download a CSV and save it to a SQL table (literally) that took 3 weeks because business folks couldn't get their act together on what they wanted. I finished another project in a few days which is currently the core of a previous employers energy efficiency controls product which was easy because the person defining it did it very well, and I had no questions, just work to perform.



This literally made me lol. :-)


Nah. I've seen plenty of systems where the business rules would handle concurrent updates fine, but since they're using a traditional Web/ORM/RDBMS setup they build a last-write-wins system without thinking about it. It's one of those rare problems where the technical part is actually harder than the business part.


Database systems have been able to deal with concurrent updates for quite some time now, so I don't think doing this is technically difficult with the current state of the art. Individual dev teams might not be well versed in the current state of the art, but the correct business response to that is not to restrict your business rules but to get developers who are well versed in the current state of the art.


> Database systems have been able to deal with concurrent updates for quite some time now, so I don't think doing this is technically difficult with the current state of the art.

Traditional ACID systems can't really handle them nicely - your only choice with an update is to commit it or discard it - so you have to do a lot of handwritten logic on top, and even if the database itself handles that well, the layers above it generally don't. Event-sourcing style systems work well but they're still not really mainstream yet.



It solves so many problems and makes it so easy to implement if you go this way.

But just like mentioned it is hard to convince people that it is what they actually want.

People fall into some grand illusion that everything should be always available but in reality then one person is doing changes at a time and if somehow 2 or more people have to work on something - more often than not they should be talking or communicating with each other anyway to synchronize.

Even with GIT and fully distributed development you cannot solve conflicts automagically. You still have to communicate with others and understand context to pick correct changes.



you can only have one person work on the code at a time? that seems, very very obviously dumb


I can change A to B on my own, you can change A to C on your own.

At some point we have to communicate which change is correct.

It does not have to be synchronous and it might be via commit message - but still change alone is not enough for conflict resolution.

If you edit word document and someone then changes something there is no commit message but might be comment on document, email or im.



Unison has a neat approach to this problem: References are hashes of the abstract syntax tree, the only way to write a "collision" is to write an identical function--which isn't actually a collision at all.


Good point. I do the same in my own system and use Hashes of the source code, so there are no collisions. Slowly this technique will become mainstream I predict.


Multiple people can work on the code simultaneously and asynchronously, but conflict resolution must be done synchronously.


This totally. This is one of the reasons that classical RDBMS paradigms and software like MySQL still survive despite however people want to talk it down in favor of "Nosql" or non-relational databases like mongodb citing how fast it is or how cool it is in comparison.

For some things, you need the time tested solutions.



Sounds like RCS [1]. I remember, back when a company I worked for switched from RCS to CVS, one of my coworkers was annoyed that CVS didn't support locking checkouts.

[1] https://en.wikipedia.org/wiki/Revision_Control_System [2] https://en.wikipedia.org/wiki/Concurrent_Versions_System



And, of course, the default mode of Microsoft Team Foundation Server [0], decades after there were better patterns.

So many forgotten locks from lazy devs...

[0] https://en.m.wikipedia.org/wiki/Azure_DevOps_Server#TFVC



Now I feel old, I remember "Anything but sourcesafe" [0], which was a followup to "Visual Sourcesafe Version Control tunsafe at any speed", and having my trust evapourate when I found out Microsoft didn't dogfood their own version control system.

So long ago I can't remember exactly which but I was running a local cvs and/or subversion repository for my own work just to avoid issues like the above. s [0] https://blog.codinghorror.com/source-control-anything-but-so...

[1] https://developsense.com/visual-sourcesafe-version-control-u...

To get back on topic, the key thing an explicit database gives you is a purpose built-language (and data-integrity enforcement etc. if you do it properly), that everyone knows. (Or used to? SQL is getting more hidden by abstraction layers/eco-systems these days). I'm old, so I reach for my older, well understood tools over new and exciting. Get off my lawn. It may be over-architecting, but I'm also not working in maximising 'performance in milli/micro-seconds is vital' high load environments, or releasing updated software every other day.

The other issue is tool/eco-system fragmentation.

But when you're young and have the energy and mental capacity to abstract out the wahoo for effeciency/performance, you do, because you can, because its better at the time. In our day everyone was writing code to write to code which were effectively the pre-cursors to ORM's. It's just part of being young and committed to your craft, and wanting to get better at it - this is a good thing!

It's only as you get older you start to appreciate the "Less is More" around same time that job ads appear with "Must have 3 years of SQL-Sync experience" (no offence intended here). There are both costs and benefits but which and how much of each you only find out years later.



Are you sure? My experience of using TFVC was that it would warn you if someone else had opened the file for editing but would not actually lock it. Multiple people could edit the same file concurrently with standard automerging/conflict resolution afterwards.


Server workspaces vs local workspaces, maybe? With server, your local copy was marked read-only. Don’t recall if you could change that flag to edit anyway. We moved to local workspaces as Quickly as we could - that was a more typical offline edit, resolve conflicts at commit model. Don’t remember all the details, been 5+ years since I did anything with TFS.


Yes, “tf edit” would mark on the server that you were opening the file for editing, and cleared the read-only bit, but it didn’t lock the file for others or prevent concurrent edits in any way.


I'm definitely not sure. Could very well be the transition from CVS to Subversion that I'm remembering. It's been a long time :)


Back in the early days of TFS I was briefly at a company that went all in on MS tools. TFS was used and to avoid the lock each developer had a clone made and after checking their clone in the “TFS Guy” in the office would merge it. He also had to merge things when later checking had conflicting changes.

Now, the best part of this shit show was they had ~30 different customers and each of these customers had a clone of the main thing that would be customized. So the “TFS Guy” had to determine if to keep in the customer clone only or to propagate to the main and then to all the other clones!

Needless to say the “TFS Guy” made a lot of money.



I have to use TFS for a couple of projects where I work. I really wish we had a "TFS Guy"!


That sounds like torture, he deserved that money.


I'm a fan of this approach. SQLSync effectively is doing this continuously - however it would be possible to coordinate it explicitly, thus enabling that kind of check in/out approach. As for single-owner lock strategies, I think you could also simulate that with SQLSync - although you may not need to depending on the app. If the goal is to "work offline" and then merge when you're ready, SQLSync provides this pattern out of the box. If the goal is only one client can make any changes, then some kind of central lock pattern will need to be used (which you could potentially coordinate via SQLSync).


flashbacks to working on a team were we needed to shout across the room for people to unlock their source files in MS SourceSafe :-p




Looks very similar to JEDI [0], an early Delphi VCS system that worked that way. It gave us the tranquility to know that no conflict would appear, as only one developer could work with a locked/checked out file at a time. There was no merge those days. In contrast, files that were frequently changed in every task would always cause a blocking between developers.

[0] https://jedivcs.sourceforge.net/



There were loads of VCSs that operated this way. And I don’t miss them one bit.


Sounds like Lotus Notes.


CouchDB is a lineal descendant I guess.


There is an interaction here between the "what gets measured gets managed" principle and the sunk cost fallacy.

The problem with databases is actually complexity. Any individual feature is more or less safe, but around the time reliability, caching and indexes get matched together there is a complexity explosion and it doesn't (normally, anyhow) make sense to implement a domain-specific DB (call is a DSD?).

But, around the time a company has invested in implementing those 3 features and discovered that it has sunk a lot of resources into the DSD, is politically averse to recommending it be stripped out and there is a high real cost to taking out the tech debt in one go.

Really the problem here is SQL's syntax. If using a basic relational database was a pleasant experience that involved some familiar C-like syntax instead of broken English people would be more tempted to go with a DB instead of rolling their own. The NoSQL databases were a good step in that direction, but then they by and large overfocused on big data instead of everyday usefulness. Things like Redis took hold which is nice.

Making it easy to run SQL is a reasonable approach, but the problem is that the good databases - I like postgres - are SQL native and it is hard to get efficiency without speaking the DB's language. We really need a PostgresPostSQL database that is a perfect postgres clone but primary parser supports a language with good syntax.



What is exactly hard about sql? Every dev imho should know it. And sql syntax is good and proven too be long lasting. Maybe investing some time in actually learning it instead of bashing it will help you further.


> What is exactly hard about sql?

- No non-nullable types (at the expression level). No way to express e.g. normal boolean logic

- No real data structures (at the expression level), unless you count rows, which are not first-class values. Even collections aren't first-class

- Very awkward control flow constructs. E.g. look at how you write recursive queries. Even if/else is weird. It's a classical Turing Tarpit: everything is possible but nothing is easy

- Absurdly complex grammar. Something like 200+ keywords. Bizarre and inconsistent quoting rules

- Weird rules about where expressions can be used, and where you have to put variables. Often if you want to extract out a common subexpression and reuse it you have to completely restructure your statement.

- Missing basic functionality, e.g. no string concatenation operator (except in some nonstandard dialects)

- No libraries. No standard way of reusing code. No unit testing framework, which means in practice most projects have no testing.

It's a terrible language by any reasonable standard. Its closest cousin is COBOL, and that's honestly less awful to write.



If you run into issues with non nullable types and missing of real data structures then i guess you dont get the purpose of SQL and its tech. Its a query language to retreive data. thats the only purpose. With under the hood a crafty machine that enables atomic principles and depending on the sql tech other different functionalities like scalability and more.. No Libraries? There are a zillion battle tested libs out there. Some are used by companies like Paypal, SpaceX, Tiktok, youtube, Gmail and many more. And why you would unit test datasets in a database? You normally use mocks to test against your data layer. Most if not all software stacks have mock libs available.


> If you run into issues with non nullable types and missing of real data structures then i guess you dont get the purpose of SQL and its tech. Its a query language to retreive data. thats the only purpose.

That's no excuse for making it bad.

> No Libraries? There are a zillion battle tested libs out there.

Not in SQL.

> And why you would unit test datasets in a database?

For the same reason as anywhere else.

> Most if not all software stacks have mock libs available.

Exactly. They're a basic piece of functionality you'd expect most languages to provide, but not SQL, because it's awful.



> no string concatenation operator

`||` is the SQL standard. IMO its biggest deficiency is a single NULL nullifies the whole thing. But at least that's consistent with how NULL behaves in many other cases.



EdgeDB authors explain problems with SQL quite nicely: https://www.edgedb.com/blog/we-can-do-better-than-sql


SQL often gets criticized, and I think for good reasons, but why didn't we come up with something better?

We have dozens of programming languages in use for general programming, it is a field that is constantly evolving. Even JS, which is hard to change because that's what browsers run and you don't control your client's browser is seeing some evolution, using transpilers, and now WebAssembly.

But for databases, there is only one, and that SQL. There are some alternatives, but no one comes close to SQL in terms of usage. So maybe SQL is not that bad after all.

Maybe the reason is that the relational model is really good, and attempts to deviate from this will only work in a niche. That a declarative style is also really good, and again, you won't have much success if you deviate from this. And if you end up doing SQL with a different syntax, for most people, it will be too small of an improvement to justify changing their way.



Have you ever looked at adding a semantic layer on top of a db for those that prefer to avoid direct SQL?


You'd need databases that were open to splitting out and layering like that, and I don't think any major SQL databases are. (I mean, most SQL databases are still written in C-family languages; splitting out the PostgreSQL parser into its own library took 20 years and still hasn't really been done properly, they do some weird sync thing from the mainline postgresql codebase rather than being the actual trunk for parser development and having postgresql proper use it as a dependency). Maybe you could do something with H2?


Author here. Finally got through most of the questions, will keep checking periodically for ones I've missed (has someone built a better way to track HN discussions?).

I'm very happy to see the discussion thus far. This first post focused on the parts of frontend engineering that motivated me to create SQLSync rather than specifically how SQLSync works. I'll be addressing that in my next post. Thanks for all the great questions! Keep them coming :)



Thankyou for the very informative article, I appreciate the irony using a post titled "Stop building databases" to announce a new database :-)


You're welcome! And I'm glad you enjoyed it. Once I thought of that title I had to use it.

My only saving grace is that technically I didn't make a new DB - just using good ol SQLite. Mostly. :)



The title certain piqued my interest!


Don't give the user a mental model that reality can break ... badly, or invisibly

I fear sync'ing databases instead of client server models is one of those - either your sync mechanism will just melt, or there are deep assumptions not met

Inwoukd feel safer building a set of CRDT primitives to work with if I feel the need for fast UI and stick with forms submit for everything else -



I agree! One of my goals is to make the mental model of SQLSync easy to grok for the developers using it. I'm biased, but I find the rebase model much easier to understand than CRDTs.


I feel like you might be miss the point of the parent comment. Synchronizing between databases is a notoriously difficult problem. It's really hard to do while avoiding race conditions. To be fair, I don't understand the "git rebase" technique your project uses, but I'm doubtful it solves these problems.

The underlying issue is that users of SQLsync are going to assume that consistency works, but in suble and unsuspecting ways it won't.

As far as I can tell, the only solutions that handle distributed consistency are those that use CRDTs.



I've thought a lot about this, but sure there are still some rough edges to figure out. See this comment for a bit more of a deep dive into how SQLSync works and why it might be possible to make it easier for devs to write correct code that converges. https://news.ycombinator.com/item?id=38502987

Also, I'm considering adding a CRDT column type to SQLite to enable embedded collaborative use cases while using the relational model for the rest of your data's structure.



A CRDT column type in a relational DB would be amazing for some things I’ve built. I’m really happy to see what you’re doing here!


Trying to synchronize state between client & server is a cursed problem.

You can sidestep it altogether if you make mild UX sacrifices and revert to something more akin to the PHP/SSR model. SPA is nice, but multipart form posts still work. Just the tiniest amount of javascript can smooth out most of the remaining rough edges.

Our latest web products utilize the following client-side state:

  3rd party IdP claims for auth
  1st party session id in query args
  The current document
For the first item, I genuinely don't even know where this is stored. It's Microsoft's problem, not ours. All other state lives on the server. We treat the client more or less like a dumb terminal that punches
s all day. We don't even use first party cookies or local storage to get the job done. This dramatically improved our development experience for iOS/Safari targets.

So, I would ask this: What is the actual experience you are trying to offer, and why does it justify decoupling of client & server state?



It's a very common trend for consumer-facing GUI's to have optimistic rendering, and if you're doing that then you're juggling client/server state. I still see spinning loaders here and then but they're generally for initial content load; e.g., does Gmail make you wait when you archive an email?


Not the GP, but I would include optimistic rendering on the list of common patterns that really are a bad idea.

Optimistic rendering means your frontend and backend are tightly coupled, error recovery and synchronization is much more complex, and you are locked into (likely heavy) frontend rendering patterns that add even more complexity and coupling.

We've spent well over a decade trying to avoid the fact that frontend actions require backend logic to complete. Its a losing battle that's just made worse by trying to paper over it.

Edit to clarify: this isn't a direct comment on the OP tool. I haven't used this tool directly but having a hunch it does solve some important use cases for common patterns.



the real UX goal is to minimize data loss (ie. provide the ability to resume whatever the user was doing, from a point that's as recent as possible), and for this it becomes necessary to provide "auto-save"

at that point there's already a diverging state problem, even if it's saved to localStorage (but obviously it's much better to save it on the server)

it's absolutely ridiculous that we spent the last 10+ years simply reinventing the same wheel (all low-level tools, Angular, React, Vue, rxjs, hooks, signals, whatever, etc), without providing the useful composable primitives for the hard problems.

and now there's yet one more thing on the frontend, SQLite. okay, there's a sync thing too. it's not like we didn't try it with that adorable DB (RethinkDB).



> the real UX goal is to minimize data loss

Can't say I've been asked to build optimistic updates for that reason, but the irony there is pretty rich if that's the UX goal you've seen it used for.

In my experience optimistic rendering actually creates more data loss risk than it solves. Caching layers are involved, state is duplicated, and routing comes into question as users could leave the page before a failure state is returned.



No, I mean in general, that's the hard problem. The main goal is that the user wants to do something, but there's an instrumental goal that the system has to be resilient to typical real world problems, otherwise the users will have a bad time.

For example the user wants to interact with the service, order something, check for new messages or send one.

On the happy path things are trivial, easy, synchronous, even quite resilient. There's not much complexity to manage on "no new message" or "add to cart, yes, buy with saved card, ship to saved address, k thx bye", if the browser or server crashes, or there's a network error, the interaction is short, easy to retry, problems are not hidden (quasi instant feedback)

But as the total interaction time increases, as the state the user would need to reproduce grows (adding new card, shipping/billing address, adding a lot of stuff into the card, writing a long message) the need for partial state sync arises.



gmail makes you wait when you are deleting spam (ie. selecting a ~100 messages and clicking delete permanently, or how it's called, and it's surprisingly slow)


ElectricSQL has made massive strides towards solving this fwiw. Write sqllite in client, guarantee sync to postgres.

See: https://news.ycombinator.com/item?id=37584049



I've found that TanStack Query [0] solved all of my problems around data fetching for the front end. It supports many different frameworks and supports plenty of advanced features.

SQLSync also looks very interesting. I'll try it out in my next project!

[0]: https://tanstack.com/query/latest/



So frontend and backend don't really mean a whole lot. You build apps to do some work with a specific deployment in mind and use the tools available to make it performant?


Many times the thought “what if we just shipped the database to the client” has crossed my mind in large multi tenant apps where individual datasets were relatively small. I’ve never gone far with it as it seems sufficiently outside the norm to be a cursed architectural pattern. Would be nice to find out I was wrong.


I once did this with a calorie counting app. Even with hundreds of thousands of foods in the database, the app took much less space than most media apps or games.


I'm also interested to find out if it's cursed :) So far it's been a lot better than I expected. Apps like https://sqlsync-todo.pages.dev are trivialised with this pattern.

Tons of work to do in order to really prove it out though. But I'm pretty excited to keep pushing and see where this goes.



The correct answer is to move the UI back to the server, where the database already is, and just send html to the client side html renderer (or "web browser"). This whole post is just "front end and gone off the rails, over the cliff, and into the ocean."


It depends.


This seems to be one of those problems that entirely disappears by ditching SPAs.

Using solutions from the Hotwire or htmx family would mean that a query is just a server query - making those fast is a better-understood problem.



This isn't a problem of only websites. Should mobile and desktop ecosystems start making a big move for thin-client like the browser? Should a simple app like Apple Reminders or Google Tasks have the GUI pause if there are delays or connection issues?


> Should a simple app like Apple Reminders or Google Tasks have the GUI pause if there are delays or connection issues?

Yes, they should, because I _need_ good feedback for connection issues.

I have not used those two specific apps, but for other "online first" apps it's such a common problem. Open an app, type a note, switch back to a different app (or turn off your phone, or close laptop lid).

Later on, you want to access the note and it's not there. Why? the authors decided to be "smart" and "offline first" and made sync in background, with no feedback, or with heavily delayed feedback... and I was walking out of wifi range when i was typing the note, so connection was spotty.

The OP's demo TODO app has exactly the same problem - no indication when the data is already uploaded. So please, if your real goal is collaboration, let user know when you cannot perform it, don't sweep it under the rug hoping the things would get uploaded eventually.



This is such solid feedback - and a frustration I have with offline first apps. In particular Obsidian sync has this problem which is made worse because it doesn't seem to want to sync in the background. (other than that I love Obsidian - just would be nice to get some more feedback)

In regard to the TODO demo - I should totally add that. Thanks :) https://github.com/orbitinghail/sqlsync/issues/31



I think my preference is that it works instantly, but have some sort of indicator that it is syncing working. Users should be able to see when it is safe to assume state is consistent with what is on the server.


And thus we get back to the need for Exchange Active sync icon from the Pocket PC era! I have finally lived long enough to see a full tech cycle go from new hotness to duh everyone does it this way to a good if boring option to that's so lame and worse, legacy, to what's that?


"design-by-PM"


Agree, it's dealing with distributed systems all over and handling a very limited sandbox on the client side.

I think my sibling comments are completely underestimating the complexity and the market value here. Apple Reminders or Google Tasks are customer facing applications for single individuals, nobody but a single client is working at a time on their own data. Businesses on the other hand want their data at a central location and personnel close to the tasks using client applications. They don't want any reliability issues in the client-server-communication blocking the work being done. Heck, businesses want to be able to throw more people at the same tasks and we call it collaborative editing.



Read-only access for coarse-grained pages (as opposed to building a fine-grained ad-hoc DB) seems something reasonable (and easy) to cache for any kind of frontend.

That would allow offline viewing to a variety of apps, regardless of approach.

Last time I checked Google Docs doesn't primarily allow editing offline files, which hints how hard it is to support substantial features beyond mere reading.



If I'm not mistaken, Google Docs has allowed offline editing for a very long time? You might have to enable an extension that comes default in Chrome. For offline-capable rich text collab there's also Notion or Apple Notes.


That could just hint at complex legacy code that painted Docs into a corner and prevented them from easily supporting that feature without a full rewrite. No doubt the problem is challenging but just because Google didn't do it for docs does not mean it's necessarily some Herculean feat.


The authorization story in letting the client upload a new version of the application database (after a drop in connectivity) sounds like a total nightmare.

I just don't think there are "embarrassingly client-side, but also needs a server for some reason" web apps that would benefit from this in the real world. Even Google's or Apple's version of the simple Todo app has a lot of (useful) integrations, which means having some level of trust with the client.



Recently gave htmx a spin. It is absolutely bananas how much complexity it removes and how much more productive you become as a result. The fact that you can use whatever stack you want is also such a blessing. I tried it with ocaml + web components and it’s a 10/10 super productive experience. Only need one build tool that compiles faster than I can blink, no wiring needed between frontend and backend to map json, it is just insanely productive.


I took a close look at htmx, and my impression is that the problems it addresses can be resolved with even fewer than its 4000 lines of JS [1], and without having to familiarize myself with its many opinionated approaches. The crux of the matter is to generate HTML on the server, as much as possible. I know how to achieve that without htmx.

The marketing materials for htmx are also a bit off-putting to me, and the authors seem to be enjoying fueling the flames of old and useless holy wars about what REST and hypermedia "actually" are, and how we all need to get back to basics, and remember to thank Ted Nelson before every meal, etc. Their online book spends something like 30 pages explaining what hypermedia is... [2]. I prefer not to sign up for a religion when I choose a JS framework (honestly, that's a bit hard these days :-/).

--

1: https://github.com/bigskysoftware/htmx/blob/master/dist/htmx...

2: https://hypermedia.systems/hypermedia-reintroduction/



i don't think htmx is very opinionated: it generalizes hypermedia controls in HTML and that's about it. No strong opinions on how you use it beyond that, can work reasonably well w/ any back-end that produces HTML. The goal is to expand the set of UI use cases that can be cleanly expressed in HTML w/o resorting to (user) scripting.

Yeah, htmx is 3800 LoC, and you could do an 80/20 version of it for a lot less, but there are a lot of details to get right: history, collecting inputs, etc. plus, since it's a general purpose library, it has an extensive event system, extension mechanism, etc. There isn't a ton of dead weight to drop, although 2.0 should be smaller as we drop some bad ideas in 1.x.

I don't care too much about the holy wars around REST, I found them off-putting back in the day before I really understood the concept, but I am passionate about the uniform interface and communicating to developers why that was interesting and different. I do go a bit hot at times, especially at twitter, but, on the other hand, if I didn't, would you have ever heard of htmx? I try to balance that all out w/ reasonable essays and the book, which, I think, is worthwhile for most web developers to read (it's free online.)



My tone was a bit strong—my bad! I appreciate your thoughtful response. I must add I realize running an OSS project and writing a book is a massive effort. Big respect for not just sharing ideas but actually implementing them.


no problem at all, i get that htmx isn't for everyone and certainly that the shitposting can be a bit much at times. i appreciate your comment!


From where I sit, I welcome the opinionated approach. I don't need my presentation layer to be industry defining. Htmx allows me to quickly make pages in a way that has never been simple for me using other frameworks. I am primarily a backend engineer though. Maybe there is some glaring weakness that I am missing, but I have yet to find something simpler for my needs.


But, if I can be honest, solutions such as Hotwire or Livewire are not as snappy as a SPA.

I personally prefer InertiaJs [1], which is some kind of front-end router system with its state synced with the server in an "old style" fashion.

[1] https://inertiajs.com



I don’t know what SPA’s you have the pleasure of using, but most SPA’s I’m subjected to are an exercise in molasses like interactions and loading spinners.


Yeah, they are building SPAs incorrectly by relying on something like `react-query` which is waterfall rendering land. People don't truly understand the sacrifice they are making by using `react-query`. It isn't designed for large scale SPAs, it's designed for small sites that just need to fetch a little data.


I can’t help but feel, if SPA’s keep getting “built wrong”, then at some point, we ought to look at why people keep building them wrong.


I agree, most SPA's are very poorly designed. But others work so well that you don't even notice it. Random examples: Gmail, Notion.


> Well designed > gmail

I too, enjoy satire.



I think it disappears not when you move away from SPA's but when you move away from some high interactivity features products that are amazing to use often have.

This especially becomes true for products that are expected to work in low internet zones.



“Just don’t have a highly interactive web app.”

“Just don’t run services that require more than one VM.”

It’s a ridiculous take.



20 upvotes disagree.

SPAs are just a subset of all the JS we can possibly write - interactivity is not compromised.



Only for some class of websites with limited interactivity.

On iOS/Android, before all the WebKit apps took over, many apps would use a local SQLite database to support offline edits and implement a syncing protocol to the server. It's a lot of work, but the end product can be quite handy. (This is how you'd expect your email client to work, for example.)



One could also argue, this class of websites is what the majority of the website/webapp landscape should be.

An average SPA is, in the end, a website for most users. Take a payment form for instance - we had those 10, 20, 30 years ago, with 0.1x the effort.

Of course there's place for fat client tech, but there are 1000 websites for each 'email client' - exceptional cases often dominate the discussion.



"Limited interactivity" is misleading. It allows much broader interactivity than you think, and arguably most sites and apps would work just fine in this context.


It disappears if your customers have reliable networks, and they are either close enough to the datacenter that the database is in, or you have sufficiently smart database replication. So, often, the problem comes back, but you're synchronizing between datacenters.

Running server-side does seem to be one of the problems SQLSync wants to handle? I wonder how well it does at that compared to other ways of doing it?



Precisely an implied part of my point was, server-side caching, DB replication, CAP, etc are all relatively well-understood problems.

One can solve those without reinventing the database, as the article denounces.



These people will be blown away by server side rendering. Caches in front of API's and light weight front ends.


i don't get it, how does that solve the same problem for an interactive website?


If you want new data, you just fetch it again from the server, and the server returns inherently-fresh data, reasonably fast, along with the HTML fragments necessary for a re-render (over ajax or websockets)


i thought the whole premise of the article was that you don't want to do that, you want to cache some stuff, and instead of writing the cache stuff (a db) yourself, use a real db in your frontend.

if you wanted to just fetch data from your server, it's not a problem anyway, right? a spa can also just fetch fresh data from a server. the whole point of the frontend cache was optimising ux/latency, e.g. for apps with global users but not globally deployed servers



I'm not a HTMX believer, so excuse my potential ignorance, but as far as I know the whole principle of HTMX is to keep things so simple that things cannot go out of sync. (Mostly because HTMX is basically a set of locally cute JS snippets helping with managing the very local interaction/state of a "user clicks button -> JS disables button, sends request to backend, waits for response, puts response somewhere in the DOM, and re-enables the button (or removes it, etc)"

everything else is simply HTML,

and yes, the article is targeting the SPA crowd



so for a typical spa, you can:

1. always refetch data. always in sync, but needs a server request, so it's slow.

2. cache some data on the client. faster, but you're "building your own database". can get out of sync (redux)

3. NEW: use SQLSync. fast, client & server stay in sync, don't have to "build your own database"

what you're describing just seems like number 1, right?



> 1. always refetch data. always in sync, but needs a server request, so it's slow

Except you're not fetching data in htmx, you're fetching hypertext. This is an important distinction. With a JS app you fetch data, integrate it into some local data structure that is then rendered into some form that generates HTML that the browser then renders. With htmx, all of those steps happen in the (faster) server, and the client only receives and renders the HTML. The client-side steps are faster.

Furthermore, apps end up structured differently because you're not querying fine-grained data.



i get that, and it's a good distinction, but it's not the same as removing a server/network request


You’re not explaining anything that an SPA can’t do and you’re missing everything that an SPA can do that htmx can’t.


How does this compare to using directly an ORM lib that supports browser like TypeORM [0] via SQL.js [1]?

[0] https://typeorm.io/ [1] https://typeorm.io/supported-platforms#browser



Good question! You can use a ORM with SQLSync. Currently SQLSync doesn't provide an ORM layer for two reasons: 1. there are many that exist, it's better to integrate 2. it's a prototype so I started with the lowest common denominator which is raw SQL.

SQL.js is an inspiring project in the space and led to official upstream support for compiling SQLite to Wasm. Without these projects SQLSync would have been much more difficult (akin to the original difficulty of SQL.js in the first place). That said, SQLSync is also unique from SQL.js in that it includes a synchronization layer that coordinates with the SQLSync server to provide real time collaboration between users.



I'm currently writing a very similar article about "full-stack databases" which highlights the same pattern where many apps end recreating the logic of our backend and database in the frontend client code. The solution we're promoting is to choose a database that can run on both the server and in the client and then sync between them.

The reason we aren't using Sqlite for our product is because Sql is frankly not the right tool for querying data for an application. It doesn't easily map to the data-structures you want in your client code and nearly all SQL databases have no way to subscribe to changes to a query without polling the query repeatedly.

So if you like the idea of having a complete database on your client but also want deep integration with Typescript/Javascript check out what we're building at https://github.com/aspen-cloud/triplit



Actually, Postgres provides a great way to subscribe to real time changes via the WAL. I even maintain an open source library for this:

https://github.com/cpursley/walex



SQLite does actually provide the mechanisms required to listen for changes via update hooks. It's unfortunate that many SQLite bindings don't expose that. I'm using it in a very simple way - automatically rerun the query if data in the underlying table changes. It's perhaps not as efficient as incrementally updating the results, but with how fast queries in SQLite usually are, I find that doesn't really matter.


I like the implications of this to a "local first" architecture


> It doesn't easily map to the data-structures you want in your client code

I disagree. Normalizing data is critical for FE reactive applications, keeping data up-to-date basically requires it; all CRUD operations are much easier to handle.



Realm Sync, Mongo + Kotlin MP will cover basically all platforms (server, web, mobile, desktop)... at a cost. Actually interested by alternatives. Will this be part of your article?


postgres has some capability to do that, but does need a server.


Yeah you can subscribe to overall changes to the data on a row by row basis but can't subscribe to an actual query. Many apps and libraries imitate reactive queries by just refetching all queries from Postgres when any data changes or just repeatedly polling the query every 10 seconds or so but this puts a lot of strain on the database. You can just subscribe to the replication stream but then you're left trying to reconstruct your queries in your application code which is extremely error prone and painful


Could you explain in more detail the use case for subscribing to the actual queries (instead of the change events)?


Offline/local-first based on SQLite seems hot right now. Third one I’m reading about this week. And it sounds good to me!

But how does it compare to ElectricSQL[1] and PowerSync[2]?

[1] https://electric-sql.com/ [2] https://powersync.com/



Indeed it's a very hot space! So exciting to see all the different approaches.

ElectricSQL and PowerSync are both tackling the very hard problem of partial replication. The idea is to build a general solution which allows a traditional centralized db to bidirectionally sync only what's needed on the client side - while still supporting optimistic mutations (and all the consistency/conflict stuff that goes along with that).

The downside is implementation complexity. Both require the ability to keep track of precisely the set of data on each client in order to push out changes to only that subset of the overall database. In addition, specifying which subsets of the database state to pull down requires a new DSL and is a new thing to learn (and optimize). That said, I'm stoked they are taking on this extremely hard problem so when SQLSync is ready for partial replication someone will have already figured out the best practices.

SQLSync, on the other hand, only supports full db sync. So every client will see a consistent view of the entire database. You might immediately wonder if this is a good idea - and for some apps, it's not. But consider a personal finance app. The main goal is cross device sync, cloud backup, offline capable, etc. In this case having the entire db stored on every device is probably what you want. Another example is a document oriented data model, such as Airtable. Each Airtable could be a distinct database, thus leaving it up to the client to manage which tables they care about.

(added in edit:) By focusing on full db sync, the sync engine is much simpler than solutions that support partial replication. One benefit of this is that the backend is very lightweight. Currently the demo (https://sqlsync-todo.pages.dev) runs entirely within Cloudflare Durable Objects using very little storage and CPU time.

SQLSync has a ton of work to do to make these use cases possible (still very much a prototype), but my initial tests have been extremely promising. Hope this helps!

(edit: clarified language regarding centralized dbs and full db sync. Also added paragraph regarding full db sync)



Phillip from PowerSync here, always good to see more people working on problems in this space.

A few things to clarify:

>one multi-tenant centralized db to bidirectionally sync

PowerSync supports syncing from multiple databases.

>The downside is complexity.

I'd say this is true if you're building a partial replication system yourself. PowerSync gives you a ready-built system that's been proven at scale and therefore lets you avoid most of that complexity.

>SQLSync, on the other hand, is full db sync.

It's just as easy to sync the full db with PowerSync as do partial sync.

Edit: formatting



Thanks for the clarifying points Phillip. I'm a big fan of PowerSync! Exciting to see you guys go after the partial replication problem.

I've adjusted my comment to be more clear and hopefully more fair. I didn't mean to mis-imply anything about your service.



No worries Carl, cheers!


>But consider a personal finance app. The main goal is cross device sync, cloud backup, offline capable, etc. In this case having the entire db stored on every device is probably what you want.

A bit confused by this. If I'm a developer of a PFM, I don't want anything but a single user's financial data synced to their device. This sounds like partial replication to me.



Precisely. In the SQLSync model - every user would have a private database just for their data. For example, this is how the todo list demo works: https://sqlsync-todo.pages.dev

(Note: currently SQLSync's server tier doesn't support auth, just random 128bit ids. Auth will come as it matures - but don't use this for anything super secure at the moment).



Is this supposed to be run on the server? Then how does it really solve the frontend side of issues, I'm just trying to understand.


I sometimes see interesting links on HN but when I click on them and skim through, I still have no idea what exactly it does. This is one of them.


It’s not exactly clear in the article, but there is a client part: https://github.com/orbitinghail/sqlsync/blob/main/GUIDE.md

> Step 2: Install and configure the React library



Not involved with the project but - this is a database which run client side and sync with a database on the server


Precisely! The same database (SQLite) is running on both the client and the server. SQLSync provides a custom storage layer to SQLite that keeps everything in sync. As changes are made locally (optimistically) they are synced into the cloud where they are eventually applied to the primary db replica. The clients subscribe to changes from the primary db and then rebase themselves to stay in sync.

I really need to write up a detailed overview of how this works! Thanks for the feedback!



My front end db would look a lot different than the back end. A lot of mutations involve submitting work and waiting for distributed jobs to roll up into some kind of partial answer. This worked, That part didn't etc. Long running transactions, workflow that spans months until the final sign off.

I do need better in the moment state in the client though. Was looking at react query with perhaps websockets for cache invalidation. It's nice to see this sqlsync idea too though to consider.



This is solid feedback. One integration I'm stoked to build/find is some kind of durable job queue that can sync directly with SQLSync. Would be so nice to just commit a job request into the db and know it will eventually run and update the state. If anyone wants to experiment with this let me know! :)


Relational databases are mathematically elegant and useful in certain ways for arbitrary data retrieval and manipulation, but I don't see why they would be the best fit browser-side for any but the absolute largest and most complicated front-end systems.

Is the typical in-browser code really complex enough that one would want to introduce the object-relational mismatch and all of its associated issues?

Most in-browser code wants to operate on a limited set of very well-defined tree data structures, which is a bad fit for relational databases because relational databases are designed to represent a generic graph that can be virtualized into a tree arbitrarily and as necessary.



I'm trying to achieve something similar with SignalDB: https://signaldb.js.org/ It uses signals for reactivity and is framework agnostic with a mongodb-like query syntax


Funny how the first two submissions didn't catch on here. I guess the titles weren't catchy enough.

https://news.ycombinator.com/from?site=sqlsync.dev



It can also just be pure chance. Sometimes there is no pattern.


I am getting the impression that this might work for small data sets. Does it work for large data sets? My webapp has to work with 100s of GBs of data in MS SQL Server.

Admittedly, I haven't yet read the linked article. But, I plan to.



Currently, SQLSync copies the entire SQLite database to every client. So in order to support large datasets you would need to split up the data over many databases. Some applications can do this, some can't. It depends on the specific business and usage requirements.

Perhaps in the future, SQLSync will support partial replication which would enable this use case. As always there are trade-offs to consider with both approaches.



How much data a single user sees at a time?


as a frontend dev that went fullstack, i had a similar path too: https://dx.tips/oops-database


swyx! I can't believe I missed this post. This is gold. Thanks for sharing!


haha yours is better. nice work and keep up the good fight. am in sf if you ever want to get coffee


So this person is building meteor.js minimongo, but for sqlite. Awesome!


I misinterpreted the title, I clicked because I thought it was literally about how we should stop building and designing more actual databases, which is something that I agree with and is perhaps a hotter take than the one in the article.

There are hundreds of subtly different database out there, but there are only a dozen or two that really ought to exist. So stop making new databases for every little thing folks. They are



Can someone explain me how it's syncing the state between two different devices without any activity in the Network tab in DevTools, not even WS traffic?

I get that you can sync state between browser tabs, but I'm trying on two different devices (iPhone and Desktop).

And as far as I can tell, the Wasm layer can't perform network requests directly.

UPDATE: In the console tab I can see 'coordinatorUrl: 'wss://sqlsync.orbitinghail.workers.dev', but I was expecting to see this Websockets connection in the Network tab, and it isn't.



Good catch! SQLSync runs in a shared worker to enable cross-tab reactivity and centralise both local storage and the replication system. You can inspect the worker at the magic url: chrome://inspect/#workers


Websockets tracking in the browser can be weird. Try refreshing the page while you have WS selected in the Network tab


I did that!


I also built a similar serverless project but with a focus on no-code.

I built a support chat app with it (which we use ourselves) using only declarative HTML components. Supports both public chat and private chat with authentication.

It's fully working now but I'm now focusing on walkthroughs/guides.

https://saasufy.com/



Reminds me of an observation that any sufficiently large C / C++ program ends up writing it's own garbage collector


This looks interesting and i might give it a try, but after watching the talk i'm still a bit unclear why you choose for wasm-in-wasm for the reducer.

I suspect you would be better off by creating a rust reducer trait, and lifting that wasm-in-wasm complexity into a new crate implementing the reducer trait through wasm-in-wasm for the people who want that.

But maybe i'm missing something.



Totally fair question. Nothing is set in stone - but this approach made it very easy for me to run precisely the same code on both a generic backend (CloudFlare Durable Objects) and in the frontend stack.

As for wasm-in-wasm specifically. It would be nice to experiment with the component model to load reducers alongside SQLSync - but the UX isn't quite there yet.



Sorry for perhaps reducing what SQLSync does to any over simplistic description, but is it effectively applying the CRDT pattern on top of the client (SQLSync/sqlite) and servers (Some OLTP) individual operation logs?

edit: I mean rebase, not CRDT



Good question. I see you edited your post but i'll be clear - SQLSync doesn't use CRDTs in it's current sync architecture.

Basically each client maintains their own log of mutations. These mutations are applied optimistically to the local instance of SQLite, and in parallel we replicate the log to the server.

On the server side, it reads from all of the individual client logs in a deterministic order and applies the mutations to it's own SQLite database. Under the hood SQLSync hijacks all of the writes to storage and organises them into a format that's easy to replicate.

Finally, the the storage log from the server is replicated back down to each of the clients. This leaves the clients in a weird position as they have two versions of the database which may have diverged (probably). So to finish this up, the clients throw away their local changes and reset to the server state (i.e. git reset --hard). Then since there may be some mutations that were executed on the client but not yet run on the server the client simply re-runs those mutations (i.e. git rebase).

In this way the system continuously keeps itself in sync with the server and other clients.

Conflicts are handled by logic in the reducer which is able to inspect the state of the database to figure out what to do. Yes, this does require writing the reducer code carefully - however in testing I've found this not to be too bad because:

1. A lot of SQL operations automatically converge pretty nicely (and you probably already have to think about them converging in your existing REST api's or w/e backend api you are writing). Think about patterns like `insert...on conflict do...` for example.

2. Since the reducer is isolated and comes with a type description of all possible mutations, it's very easy to unit test the reducer with different orderings. Basically think of it as deterministic simulation testing for your API. Something that's pretty hard to do with normal backend architectures without a lot of mocking and architecture.

Hopefully that helps!



One other piece of prior art is Lotus Notes, in which the database was stored locally and synced in the background. https://en.wikipedia.org/wiki/HCL_Notes


Unless I misunderstood, feels like I’ve been doing this with Ember Data since ~2013.

https://guides.emberjs.com/release/models/

There’s also https://orbitjs.com/



I think you have misunderstood?

The article is responding to the pattern of yet another custom data model and custom data API (à la Ember).

Instead provide an SQL database (the well proven SQLite) within the front end and use SQL to interact. And sync data from-to the backend DB.

Which one could then slap on a model or ORM layer on top of - should that be one's bent.

It isn't clear how they manage the subscription to data updates/inserts/deletions - it mentions supporting triggers, but that feels icky to me.



First, thanks!

> It isn't clear how they manage the subscription to data updates/inserts/deletions - it mentions supporting triggers, but that feels icky to me.

Architecture post coming soon. In the meantime, I want to clarify that SQLSync does not use triggers for sync. Instead, I hijack SQLites page storage and added page replication to it. Writes are consolidated through an API I call the "reducer" which allows SQLSync to keep track of which logical writes correspond to which sets of page changes. The actual sync is pretty dumb: we run the reducer on both the client and the server. The client replicates down server pages, and then periodically throws out local changes, resets to the server state, and then replays any mutations that haven't yet been acked on the server.



You say things like "X is pretty dumb" and then go on saying stuff I don't understand. Pretty annoying if you ask me.

And that's despite me having worked with Cassandra, Kafka, Postgres and a variety of programming languages, DevOps tools, having worked with Vuejs and React.



Could you clarify which of:

- page storage

- reducers

- replaying mutations

- acks

you're unclear on?



In that context, I don't understand the word reducer.


The reducer takes objects representing mutations and applies the appropriate changes to the state.

It's called a reducer because it operates like a reduce/fold operation, in that if you take a given state and an ordered list of mutations, you'll end up with a new state out the other side, much like 'let newState = list.reduce(reducer, initialState)' would in JS.

The reducer model is how Elm's state management works, and Redux's, and is what allow replaying of subsequent changes against a previous state to get to a current state (which enables, amongst other things, some the features of those systems' respective dev tools).

The article links to the (nearly trivial) reducer for the todo example, which is on github here: https://github.com/orbitinghail/sqlsync/blob/ba762ce2a10afbb...



Ember.js has had more innovations contributed to modern framework ideas than any other framework, really.

EmberData, Ember Routing, Ember Multi Applications (can't remember what its called, but its a precursor to microfrontends) all in one CLI tooling etc.

I could never understand what holds Ember back from being more used. I think it used to be performance but I think they addressed that with Glimmer many years ago.



Ember Engines :)

It's not being used more for the same reason Ruby on Rails is not used more.

Both are batteries included frameworks, with a much steeper learning curve than e.g.

Plus people like to tinker and wire up things together themselves, like router, rendering, state management, testing, etc. That's more exciting than `ember new project`.

I was like that 10 years ago too, but now when I learned the ropes I just want to focus on shipping and providing value, rather than wasting my life wiring up 17 javascript packages of the month together.



Having embedded SQLlite in a programming tool on the browser myself I found this very interesting. I am always happy when these ideas become more mainstream!


In other words, "let me show you how to turn your super cool project into a CRUD app".


i used couchdb (on server, with touchdb on android and ios, pouchdb on web, ..) for this kind of thing. Clients were directly connected to cursors over the localdb. How and when that localdb was exchanging data with server-or-others, was not any more Client's problem :)


Anyone remember when "frontend applications" were actual applications and not web pages? I'm willing to bet we have reached that point where new devs literally do not remember that time.

  There comes a time [..] where we [..] need to cache data from an API. It might start off benign – storing a previous page of data for that instant back button experience, implementing a bit of undo logic, or merging some state from different API requests.
The browser is annoying to control, so you're trying to make javascript jump through hoops to do what you want...

  SQLSync is [..] designed to synchronize web application state between users, devices, and the edge.
So you want your stateless application to be stateful, to sync that state with some other state, so you can cache data easier, so you can make the browser pretend it's not a browser.

Can we not just admit the browser is a shitty client app? I get that like 50% of the software developers in the world have made the browser their literal livelihood and career, but jesus christ, we need to end this madness. Either make a full blown VM and just write normal client applications that run in it, or stop trying to make client applications out of javascript.

It's insane how much of the world's engineering effort and technology investment money has gone into putting lipstick on a pig. If we took all the money invested in browser ecosystems and put it towards spaceflight, we'd be living on Mars right now.



History doesn't repeat itself, but it does rhyme. One upon a dark age, we had mainframes and dumb terminals. Then came the first age of the PC - let's pull everything to the client. Then came servers and "thin clients". With faster processors and cheaper storage came the second age of the PC, with only the permanent data storage left on the server. As the Internet grew, centralization came back: web services and the cloud, with clients just serving dumb web pages.

And now we see the beginning of a migration back to client-side computation and storage.

Somehow, though, this latest iteration doesn't make a lot of sense. It's hard enough maintaining data consistency on a web service that may be used by hundreds or thousands of people. Imagine when this data is cached in microdatabases in unreliable browsers.

On top of that, the browser makes an absolutely horrible programming environment for client-side apps. For the programming part, Javascript is an poor language, so you wind up using heavy-duty frameworks like React to make it tolerable. For the UI representation, that's just not what HTML/CSS were ever meant for. So you get frameworks there as well. Young developers think this is just the way it is. No actually, it's more like the worst of all possible worlds. Using something like JavaFX or (I know, I know) even Visual Basic, you can produce a functional, robust UI with a tiny fraction of the effort.



years and years ago on a C++ forum someone made an observation that was eerily similar to yours. I still remember it to this day as it stuck in my head.

They made an observation that our industry goes in cyclical centralize/de-centralize cycles and that we we were (at the time) entering into a centralization cycle.

Now here I am reading a comment that we're going back into a de-centralization cycle and I wouldn't be surprised if you're the same poster.

probably 15-20 years ago (maybe more?) I made a prediction that I still think will come true.

The OS will become the "browser" and applications will run directly on the OS and will access local resources through standardized interfaces. WebAssembly and things like WebGL are already moving us in that direction. Honestly HTML5 was the first time I recognized standard updates as moving us towards that reality with things like localStorage, etc.

I honestly think if someone more imaginative had the helm at MS when the cloud started getting big they would have eaten google's lunch by leveraging their desktop dominance into the web. Instead they dd inane things like display websites on the desktop (win98 IIRC).



Could be - I've been making this observation for a long time. The cycles keep going. On the other hand, probably lots of other people have commented on it as well...

You may be right about the browser becoming the OS. Chromebooks were already a step in that direction. But JS/HTML/CSS really is a horrible combination for application programming. If the browser does become the OS, can we please get decent technology to work with?



HTML and CSS aren't too horrible as foundational tech. CSS needs a little work, but both allow for a lot of good abstractions. They may be some of the best platform we've ever had.

They are bad at just being directly programmable targets.



I expect we'll get back to native applications and move away from js/html/css.


Html and css are probably the best, easiest and most effective ui toolkits ever. Name one which is better.


> make the browser a full blown VM and just write normal programs that run in it

This is actually happening, albeit slowly, with recent efforts around WASM etc. If you want a fun hypothetical of where this all goes, check out the talk "The Birth & Death of JavaScript". Link here: https://www.destroyallsoftware.com/talks/the-birth-and-death...



You’re not wrong, but the web won because it had a superior delivery system: URLs. Everything weird about the web era of development has been about contorting everything to be URL-oriented.

But consider how WASM is now turning the browser into an app delivery client. Not a “html and json bodged into an app”, but a real honest to god app.

This project happens to be browser based because that’s convenient place to put a WASM app, and it has a decent presentation layer. But it doesn’t have to be!



Not just delivery, but also security. Browsers offer a level of isolation and safety that you generally don't get with native desktop apps. Things like iOS do bridge the gap a bit more though


> Browsers offer a level of isolation and safety that you generally don't get with native desktop apps.

They didn't originally: Java and ActiveX originally weren't sandboxed and had free run of the visitor's computer.

All major OSes today now have built-in support for process/app sandboxing. I suppose if the "rich client" frontend model (WPF, etc) was more popular then I expect desktop OS application isolation to have been introduced much sooner.

Security development happens where the market demands it, and rarely does it happen where it's actually needed.



I can't speak for ActiveX since I avoided IE like the plague, but Java applets were sandboxed. Just that the sandbox had a lot of holes.




They don't, though. Browsers are almost trivial to exploit.

There have been seven (7) 0day exploits in Chrome this year (that we know of). Know how many CVEs there were for Chrome in total in 2023? Two-hundred and forty (240). That's impressive. And this is the browser people brag about as being secure.



> Anyone remember when "frontend applications" were actual applications and not web pages?

I do. And also I remember building those apps. It was not as simple as building webapps today.

Yes, there are downsides to this model (a lot of them) BUT you can whip up a simple app with a beautiful UI in a couple of hours today. It was not like that 25 years ago.



> It's insane how much of the world's engineering effort and technology investment money has gone into putting lipstick on a pig.

I'm a JavaScript developer and yes, this is deeply disturbing. Even more so after I built a framework [1] that just copycats what PHP/Rails does (treat the server like a server, and the client/browser like a client—no SPA spaghetti). It works, it's fast, and makes development 100x easier.

I applied the same heuristic to deployments, eschewing all of the containers dogma and lo and behold, you can achieve great performance (and iterative speed) with old school, minimal tech (bare metal/VPS and just install deps and code directly on the machine).

[1] https://github.com/cheatcode/joystick



I completely agree with you, I'm going to copy part of another comment I made

-----

probably 15-20 years ago (maybe more?) I made a prediction that I still think will come true.

The OS will become the "browser" and applications will run directly on the OS and will access local resources through standardized interfaces. WebAssembly and things like WebGL are already moving us in that direction. Honestly HTML5 was the first time I recognized standard updates as moving us towards that reality with things like localStorage, etc.

I honestly think if someone more imaginative had the helm at MS when the cloud started getting big they would have eaten google's lunch by leveraging their desktop dominance into the web. Instead they did inane things like display websites on the desktop (win98 IIRC).

-----



What I find annoying is the still-existing problem that sometimes apps don't load properly and you have to refresh the browser. You don't get this with desktop apps. There are some caching capabilities in browsers but they are not being used by anyone to cache app code and resources. If I'm using an app for the first time it should properly load all code and resources or else report an error.


I totally agree! I would love a VM designed for development simplicity & performance that is built with a standard protocol & interface for accessing arbitrary VM applications over the net.


I won't forget that someone at Google didn't have the courage to enable Dart in Chrome as a successor to Javascript. And someone killed SQLLite as a in-browser db.


Does it needs to download whole database on startup, or can sync only what client queried?


Currently it's full db sync. Partial replication is in research.


This is very exciting, I really love the LoFi (Local first) wave.

The need to write the reducer in Rust for now is a big bummer. Rust is cool, but JS is easier to get started quick.

ElectricSQL is an alternative which is cool too.



Yea, 100% agree. I'm starting to figure out what that looks like in this issue: https://github.com/orbitinghail/sqlsync/issues/19


> LoFi (Local first)

Are we set on this abbreviation? Cause it's super confusing.



I agree it's super confusing actually, lol. I've seen people using it and it's cool, but a non-overloaded one would be better!


Off-topic, but was these diagrams created by hand or is there an app that will make hand-drawn diagrams like this?


I don't know what program the author used, but I'd be surprised if he did them by hand. Excalidraw (https://excalidraw.com/) will give a very similar feel.


Yup, I used excalidraw!


Reminds me of the observation that for a sufficiently complex C program, one starts to build their own garbage collector.


Coming from the backend world, I've only done small frontends that don't get this complicated. Assuming there are legit reasons for a FE to have complex state, a relational database is probably one of the first things you need. It's almost an automatic decision on a backend to have one, same should apply here. Using SQLite instead of Redux sounds reasonable.


> Using SQLite instead of Redux sounds reasonable.

The trick is reactivity. `redux` handles surgically updating the view based state changes. It's not enough to have sqlite, you also need reactivity. Further, apply strict types on top of sqlite/SQL is another big challenge.



Oh right, you'd need pubsub to make this efficient.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com