阅读桑给巴尔

阅读桑给巴尔
Reading Zanzibar

原始链接: https://macwright.com/2025/05/02/reading-zanzibar

Zanzibar是谷歌的授权系统，它提供了一种灵活且可扩展的访问控制方法。其核心在于通过“元组”定义对象、用户和角色之间的关系，使开发人员能够创建细粒度的授权规则。虽然访问控制模型定义明确，但该系统的实现是谷歌特有的，依赖于Spanner和TrueTime实现全局分布和强一致性，这些特性在谷歌基础设施之外很难复制。这对于“zookies”（一种用于避免API调用中出现陈旧数据的组件）尤其重要。几家公司开发了受Zanzibar启发的授权服务，例如Permify、AuthZed/SpiceDB和Warrant，它们都具有类似zookie的实现。然而，其他一些公司，如OpenFGA和Ory Keto，只关注API层面，并且一些公司并不那么关注可扩展性和一致性。对于小型应用来说，Zanzibar的全部复杂性可能显得过于繁琐，更简单易用的基于库的解决方案或托管授权服务可能更实用。关键在于，Zanzibar提供了一个强大的授权模型，但其实现细节是针对谷歌独特的规模和基础设施量身定制的。

这个Hacker News帖子讨论了在分布式微服务架构中实现Zanzibar（一个权限管理系统）的挑战和考量。一位评论者whs讲述了之前尝试将Zanzibar用作API网关的经历，该网关从各个服务查询权限数据，而不是集中存储。这种方法旨在进行时间点权限验证，但引入了诸如要求服务提供准确的历史权限查询和管理缓存失效等复杂性。最终，whs认为这是一种错误的做法，因为它带来了这些困难。另一位评论者jauntywundrkind强调了计时技术的进步，认为对昂贵的原子钟和GPS进行精确时间戳排序的依赖正变得不再是一个障碍。他们指出了更经济实惠的基于GPS的计时板和芯片级原子钟，并质疑对于大多数应用来说，是否真的需要极高的精度。他们还表达了对Zanzibar中使用跳跃表的兴趣。

原文

Google published Zanzibar: Google’s Consistent, Global Authorization System in 2019. It describes a system for authorization – enforcing who can do what – which maxes out both flexibility and scalability. Google has lots of different apps that rely on Zanzibar, and bigger scale than practically any other company, so it needed Zanzibar.

The Zanzibar paper made quite a stir. There are at least four companies that advertise products as being inspired by or based on Zanzibar. It says a lot for everyone to loudly reference this paper on homepages and marketing materials: companies aren’t advertising their own innovation as much as simply saying they’re following the gospel.

A short list of companies & OSS products I found:

I read the paper, and have a few notes, but the Google Zanzibar Paper, annotated by AuthZed is the same thing from a real domain expert (albeit one who works for one of these companies), so read that too, or instead.

Features

My brief summary is that the Zanzibar paper describes the features of the system succinctly, and those features are really appealing. They’ve figured out a few primitives from which developers can build really flexible authorization rules for almost any kind of application. They avoid making assumptions about ID formats, or any particular relations, or how groups are set up. It’s abstract and beautiful.

The gist of the system is:

Objects: things in your data model, like documents
Users: needs no explanation
Namespaces: for isolating applications
Usersets: groups of users
Userset rewrite rules: allow usersets to inherit from each other or have other kinds of set relationships
Tuples, which are like (object)#(relation)@(user), and are sort of the core ‘rule’ construct for saying who can access what

There’s then a neat configuration language which looks like this in an example:

name: "doc"

relation { name: "owner"}

relation {
name: "editor"
	userset_rewrite {
	union {
	child { _this f } }
	child { computed_userset { relation: "owner" } }

relation {
	name: "viewer"
	userset_rewrite {
	union {
		child {_this f} }
		child { computed_userset & relation: "editor" 3 }
		child { tuple_to_userset {
		tupleset { relation: "parent" }
		computed_userset {
		object: $TUPLE_USERSET_OBJECT # parent folder
		relation: "viewer"
		} } }
} } }

It’s pretty neat. At this point in the paper I was sold on Zanzibar: I could see this as being a much nicer way to represent authorization than burying it in a bunch of queries.

Specifications & Implementation details

And then the paper discusses specifications: how much scale it can handle, and how it manages consistency. This is where it becomes much more noticeably Googley.

So, with Google’s scale and international footprint, all of their services need to be globally distributed. So Zanzibar is a distributed system, and it is also a system that needs good consistency guarantees so that it avoid the “new enemy” problem, nobody is able to access resources that they shouldn’t, and applications that are relying on Zanzibar can get a consistent view of its data.

Pages 5-11 are about this challenge, and it is a big one with a complex, high-end solution, and a lot of details that are very specific to Google. Most noticeably, Zanzibar is built with Spanner Google’s distributed database, and Spanner has the ability to order timestamps using TrueTime, which relies on atomic clocks and GPS antennae: this is not standard equipment for a server. Even CockroachDB, which is explicitly modeled off of Spanner, can’t rely on having GPS & atomic clocks around so it has to take a very different approach. But this time accuracy idea is pretty central to Zanzibar’s idea of zookies, which are sort of like tokens that get sent around in its API and indicate what time reference the client expects so that a follow-up response doesn’t accidentally include stale data.

To achieve scalability, Zanzibar is also a multi-server architecture: there are aclservers, watchservers, a Leopard indexing system that creates compressed skip list-based representations of usersets. There’s also a clever solution to the caching & hot-spot problem, in which certain objects or tuples will get lots of requests all at once so their database shard gets overwhelmed.

Conclusions

Zanzibar is two things:

A flexible, relationship-based access control model
A system to provide that model to applications at enormous scale and with consistency guarantees

My impressions of these things match with AuthZed’s writeup so I’ll just quote & link them:

There seems to be a lot of confusion about Zanzibar. Some people think all relationship-based access control is “Zanzibar”. This section really brings to light that the ReBAC concepts have already been explored in depth, and that Zanzibar is really the scaling achievement of bringing those concepts to Google’s scale needs. link

And

Zookies are very clearly important to Google. They get a significant amount of attention in the paper and are called out as a critical component in the conclusion. Why then do so many of the Zanzibar-like solutions that are cropping up give them essentially no thought? link

I finished the paper having absorbed a lot of tricky ideas about how to solve the distributed-consistency problems, and if I were to describe Zanzibar, those would be a big part of the story. But maybe that’s not what people mean when they say Zanzibar, and it’s more a description of features?

I did find that Permify has a zookie-like Snap Token, AuthZed/SpiceDB has ZedTokens, and Warrant has Warrant-Tokens. Whereas OpenFGA doesn’t have anything like zookies and neither does Ory Keto. So it’s kind of mixed on whether these Zanzibar-inspired products have Zanzibar-inspired implementations, or focus more on exposing the same API surface.

For my own needs, zookies and distributed consistency to the degree described in the Zanzibar paper are overkill. There’s no way that we’d deploy a sharded five-server system for authorization when the main application is doing just fine with single-instance Postgres. I want the API surface that Zanzibar describes, but would trade some scalability for simplicity. Or use a third-party service for authorization. Ideally, I wish there was something like these products but smaller, or delivered as a library rather than a server.