达达如何实现内部引用

达达如何实现内部引用
How Dada enables internal references

原始链接: https://smallcultfollowing.com/babysteps/blog/2026/02/27/dada-internal-references/

## Dada 的基于位置的权限：摘要 Dada 是一种新的编程语言概念，旨在通过“基于位置的权限”系统改进 Rust 的借用检查器。Dada 不跟踪生命周期，而是跟踪数据从 *哪里* 借用——变量的“位置”。这简化了理解并实现了更灵活的借用。 Dada 与 Rust 的一个关键区别在于它处理移动和借用的方式。虽然 Rust 阻止在数据被借用时移动数据，但 Dada 允许这样做。这是可能的，因为 Dada 的引用不是指针，而是浅拷贝，并且类型系统通过更新变量类型来跟踪移动。例如，从变量借用然后移动该变量会更新借用以指向新位置。 Dada 默认使用引用，并使用类似于 `ref[variable] Type` 的语法来表示借用的数据。这与 Rust 的以指针为中心的方法形成对比。该系统在 `dada-model` 仓库中建模，展示了移动期间类型更新以及防止从错误位置借用的功能。作者认为类似的改进可以应用于 Rust，可能利用正在进行的工作，例如字段投影，以允许更灵活的借用而不会牺牲安全性。最终，Dada 的目标是建立一个系统，程序员可以将变量视为对象，简化内存管理，而无需深入了解底层内存表示。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Dada 如何实现内部引用 (smallcultfollowing.com) 9 分，vrnvu 发表于 2 小时前 | 隐藏 | 过去 | 收藏 | 2 条评论帮助 nine_k 发表于 13 分钟前 | 下一个 [–] 这很有趣，而且某种程度上很简洁，但它的可靠性如何？生命周期基于仿射类型，仿射类型有一些经过验证的数学基础来支持其属性。这保证了无效引用的不存在，无畏并发等。基于位置的系统有什么支持，有什么形式上证明它总是有效？回复 cc-d 发表于 5 分钟前 | 上一个 [–] 同意指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

In my previous Dada blog post, I talked about how Dada enables composable sharing. Today I’m going to start diving into Dada’s permission system; permissions are Dada’s equivalent to Rust’s borrow checker.

Goal: richer, place-based permissions

Dada aims to exceed Rust’s capabilities by using place-based permissions. Dada lets you write functions and types that capture both a value and things borrowed from that value.

As a fun example, imagine you are writing some Rust code to process a comma-separated list, just looking for entries of length 5 or more:

let list: String = format!("...something big, with commas...");
let items: Vec<&str> = list
    .split(",")
    .map(|s| s.trim()) // strip whitespace
    .filter(|s| s.len() > 5)
    .collect();

One of the cool things about Rust is how this code looks a lot like some high-level language like Python or JavaScript, but in those languages the split call is going to be doing a lot of work, since it will have to allocate tons of small strings, copying out the data. But in Rust the &str values are just pointers into the original string and so split is very cheap. I love this.

On the other hand, suppose you want to package up some of those values, along with the backing string, and send them to another thread to be processed. You might think you can just make a struct like so&mldr;

struct Message {
    list: String,
    items: Vec<&str>,
    //         ----
    // goal is to hold a reference
    // to strings from list
}

&mldr;and then create the list and items and store them into it:

let list: String = format!("...something big, with commas...");
let items: Vec<&str> = /* as before */;
let message = Message { list, items };
//                      ----
//                        |
// This *moves* `list` into the struct.
// That in turn invalidates `items`, which 
// is borrowed from `list`, so there is no
// way to construct `Message`.

But as experienced Rustaceans know, this will not work. When you have borrowed data like an &str, that data cannot be moved. If you want to handle a case like this, you need to convert from &str into sending indices, owned strings, or some other solution. Argh!

Dada’s permissions use places, not lifetimes

Dada does things a bit differently. The first thing is that, when you create a reference, the resulting type names the place that the data was borrowed from, not the lifetime of the reference. So the type annotation for items would say ref[list] String^{(at least, if you wanted to write out the full details rather than leaving it to the type inferencer):}

let list: given String = "...something big, with commas..."
let items: given Vec[ref[list] String] = list
    .split(",")
    .map(_.trim()) // strip whitespace
    .filter(_.len() > 5)
    //      ------- I *think* this is the syntax I want for closures?
    //              I forget what I had in mind, it's not implemented.
    .collect()

I’ve blogged before about how I would like to redefine lifetimes in Rust to be places as I feel that a type like ref[list] String is much easier to teach and explain: instead of having to explain that a lifetime references some part of the code, or what have you, you can say that “this is a String that references the variable list”.

But what’s also cool is that named places open the door to more flexible borrows. In Dada, if you wanted to package up the list and the items, you could build a Message type like so:

class Message(
    list: String
    items: Vec[ref[self.list] String]
    //             ---------
    //   Borrowed from another field!
)

// As before:
let list: String = "...something big, with commas..."
let items: Vec[ref[list] String] = list
    .split(",")
    .map(_.strip()) // strip whitespace
    .filter(_.len() > 5)
    .collect()

// Create the message, this is the fun part!
let message = Message(list.give, items.give)

Note that last line – Message(list.give, items.give). We can create a new class and move list into it along with items, which borrows from list. Neat, right?

OK, so let’s back up and talk about how this all works.

References in Dada are the default

Let’s start with syntax. Before we tackle the Message example, I want to go back to the Character example from previous posts, because it’s a bit easier for explanatory purposes. Here is some Rust code that declares a struct Character, creates an owned copy of it, and then gets a few references into it.

struct Character {
    name: String,
    class: String,
    hp: u32,
}

let ch: Character = Character {
    name: format!("Ferris"),
    class: format!("Rustacean"),
    hp: 22
};

let p: &Character = &ch;
let q: &String = &p.name;

The Dada equivalent to this code is as follows:

class Character(
    name: String,
    klass: String,
    hp: u32,
)

let ch: Character = Character("Tzara", "Dadaist", 22)
let p: ref[ch] Character = ch
let q: ref[p] String = p.name

The first thing to note is that, in Dada, the default when you name a variable or a place is to create a reference. So let p = ch doesn’t move ch, as it would in Rust, it creates a reference to the Character stored in ch. You could also explicitly write let p = ch.ref, but that is not preferred. Similarly, let q = p.name creates a reference to the value in the field name. (If you wanted to move the character, you would write let ch2 = ch.give, not let ch2 = ch as in Rust.)

Notice that I said let p = ch “creates a reference to the Character stored in ch”. In particular, I did not say “creates a reference to ch”. That’s a subtle choice of wording, but it has big implications.

References in Dada are not pointers

The reason I wrote that let p = ch “creates a reference to the Character stored in ch” and not “creates a reference to ch” is because, in Dada, references are not pointers. Rather, they are shallow copies of the value, very much like how we saw in the previous post that a shared Character acts like an Arc<Character> but is represented as a shallow copy.

So where in Rust the following code&mldr;

let ch = Character { ... };
let p = &ch;
let q = &ch.name;

&mldr;looks like this in memory&mldr;

        # Rust memory representation

            Stack                       Heap
            ─────                       ────

┌───► ch: Character {
│ ┌───► name: String {
│ │         buffer: ───────────► "Ferris"
│ │         length: 6
│ │         capacity: 12
│ │     },
│ │     ...
│ │   }
│ │   
└──── p
  │
  └── q

in Dada, code like this

let ch = Character(...)
let p = ch
let q = ch.name

would look like so

# Dada memory representation

Stack                       Heap
─────                       ────

ch: Character {
    name: String {
            buffer: ───────┬───► "Ferris"
            length: 6      │
            capacity: 12   │
    },                     │
    ..                     │
}                          │
                           │
p: Character {             │
    name: String {         │
            buffer: ───────┤
            length: 6      │
            capacity: 12   │
    ...                    │
}                          │
    }                      │
                           │
q: String {                │
    buffer: ───────────────┘
    length: 6
    capacity: 12
}

Clearly, the Dada representation takes up more memory on the stack. But note that it doesn’t duplicate the memory in the heap, which tends to be where the vast majority of the data is found.

Dada talks about values not references

This gets at something important. Rust, like C, makes pointers first-class. So given x: &String, x refers to the pointer and *x refers to its referent, the String.

Dada, like Java, goes another way. x: ref String is a String value – including in memory representation! The difference between a given String, shared String, and ref String is not in their memory layout, all of them are the same, but they differ in whether they own their contents.

So in Dada, there is no *x operation to go from “pointer” to “referent”. That doesn’t make sense. Your variable always contains a string, but the permissions you have to use that string will change.

In fact, the goal is that people don’t have to learn the memory representation as they learn Dada, you are supposed to be able to think of Dada variables as if they were all objects on the heap, just like in Java or Python, even though in fact they are stored on the stack.

Rust does not permit moves of borrowed data

In Rust, you cannot move values while they are borrowed. So if you have code like this that moves ch into ch1&mldr;

let ch = Character { ... };
let name = &ch.name; // create reference
let ch1 = ch;        // moves `ch`

&mldr;then this code only compiles if name is not used again:

let ch = Character { ... };
let name = &ch.name; // create reference
let ch1 = ch;        // ERROR: cannot move while borrowed
let name1 = name;    // use reference again

&mldr;but Dada can

There are two reasons that Rust forbids moves of borrowed data:

References are pointers, so those pointers may become invalidated. In the example above, name points to the stack slot for ch, so if ch were to be moved into ch1, that makes the reference invalid.
The type system would lose track of things. Internally, the Rust borrow checker has a kind of “indirection”. It knows that ch is borrowed for some span of the code (a “lifetime”), and it knows that the lifetime in the type of name is related to that lifetime, but it doesn’t really know that name is borrowed from ch in particular.

Neither of these apply to Dada:

Because references are not pointers into the stack, but rather shallow copies, moving the borrowed value doesn’t invalidate their contents. They remain valid.
Because Dada’s types reference actual variable names, we can modify them to reflect moves.

Dada tracks moves in its types

OK, let’s revisit that Rust example that was giving us an error. When we convert it to Dada, we find that it type checks just fine:

class Character(...) // as before
let ch: given Character = Character(...)
let name: ref[ch.name] String = ch.name
//            -- originally it was borrowed from `ch`
let ch1 = ch.give
//        ------- but `ch` was moved to `ch1`
let name1: ref[ch1.name] = name
//             --- now it is borrowed from `ch1`

Woah, neat! We can see that when we move from ch into ch1, the compiler updates the types of the variables around it. So actually the type of name changes to ref[ch1.name] String. And then when we move from name to name1, that’s totally valid.

In PL land, updating the type of a variable from one thing to another is called a “strong update”. Obviously things can get a bit complicated when control-flow is involved, e.g., in a situation like this:

let ch = Character(...)
let ch1 = Character(...)
let name = ch.name
if some_condition_is_true() {
    // On this path, the type of `name` changes
    // to `ref[ch1.name] String`, and so `ch`
    // is no longer considered borrowed.
    ch1 = ch.give
    ch = Character(...) // not borrowed, we can mutate
} else {
    // On this path, the type of `name`
    // remains unchanged, and `ch` is borrowed.
}
// Here, the types are merged, so the
// type of `name` is `ref[ch.name, ch1.name] String`.
// Therefore, `ch` is considered borrowed here.

Renaming lets us call functions with borrowed values

OK, let’s take the next step. Let’s define a Dada function that takes an owned value and another value borrowed from it, like the name, and then call it:

fn character_and_name(
    ch1: given Character,
    name1: ref[ch1] String,
) {
    // ... does something ...
}

We could call this function like so, as you might expect:

let ch = Character(...)
let name = ch.name
character_and_name(ch.give, name)

So&mldr;how does this work? Internally, the type checker type-checks a function call by creating a simpler snippet of code, essentially, and then type-checking that. It’s like desugaring but only at type-check time. In this simpler snippet, there are a series of let statements to create temporary variables for each argument. These temporaries always have an explicit type taken from the method signature, and they are initialized with the values of each argument:

// type checker "desugars" `character_and_name(ch.give, name)`
// into more primitive operations:
let tmp1: given Character = ch.give
    //    ---------------   -------
    //            |         taken from the call
    //    taken from fn sig
let tmp2: ref[tmp1.name] String = name
    //    ---------------------   ----
    //            |         taken from the call
    //    taken from fn sig,
    //    but rewritten to use the new
    //    temporaries

If this type checks, then the type checker knows you have supplied values of the required types, and so this is a valid call. Of course there are a few more steps, but that’s the basic idea.

Notice what happens if you supply data borrowed from the wrong place:

let ch = Character(...)
let ch1 = Character(...)
character_and_name(ch, ch1.name)
//                     --- wrong place!

This will fail to type check because you get:

let tmp1: given Character = ch.give
let tmp2: ref[tmp1.name] String = ch1.name
    //                            --------
    //       has type `ref[ch1.name] String`,
    //       not `ref[tmp1.name] String`

Class constructors are “just” special functions

So now, if we go all the way back to our original example, we can see how the Message example worked:

class Message(
    list: String
    items: Vec[ref[self.list] String]
)

Basically, when you construct a Message(list, items), that’s “just another function call” from the type system’s perspective, except that self in the signature is handled carefully.

This is modeled, not implemented

I should be clear, this system is modeled in the dada-model repository, which implements a kind of “mini Dada” that captures what I believe to be the most interesting bits. I’m working on fleshing out that model a bit more, but it’s got most of what I showed you here.^{For example, here is a test that you get an error when you give a reference to the wrong value.}

The “real implementation” is lagging quite a bit, and doesn’t really handle the interesting bits yet. Scaling it up from model to real implementation involves solving type inference and some other thorny challenges, and I haven’t gotten there yet – though I have some pretty interesting experiments going on there too, in terms of the compiler architecture.

This could apply to Rust

I believe we could apply most of this system to Rust. Obviously we’d have to rework the borrow checker to be based on places, but that’s the straight-forward part. The harder bit is the fact that &T is a pointer in Rust, and that we cannot readily change. However, for many use cases of self-references, this isn’t as important as it sounds. Often, the data you wish to reference is living in the heap, and so the pointer isn’t actually invalidated when the original value is moved.

Consider our opening example. You might imagine Rust allowing something like this in Rust:

struct Message {
    list: String,
    items: Vec<&{self.list} str>,
}

In this case, the str data is heap-allocated, so moving the string doesn’t actually invalidate the &str value (it would invalidate an &String value, interestingly).

In Rust today, the compiler doesn’t know all the details of what’s going on. String has a Deref impl and so it’s quite opaque whether str is heap-allocated or not. But we are working on various changes to this system in the Beyond the & goal, most notably the Field Projections work. There is likely some opportunity to address this in that context, though to be honest I’m behind in catching up on the details.