Sam Sartor
Git(Hub|Lab)
# 2022-3-4
Hey! This page is a draft. Don't expect much from it.

Self referential data structures have been a long time wart of the Rust programming language. The Pin type works well enough to support async/await, but doesn’t make it any easier to manually create such structures safely.

Self-References Today

Despite what I said above, Rust today does actually support safe self-referential structs in a limited capacity. Go to play.rust-lang.org right now and try running:

#[derive(Debug)]
struct OwnAndRef<'this> {
    owned: i32,
    refed: &'this i32,
}

// allocate some space on the stack
let mut owner = OwnAndRef {
    owned: 42,
    refed: &-1,
};

// and now borrow your own field!
owner.refed = &owner.owned;

println!("{:?}", owner);

This code is totally absent of Pin<&mut T> schenenagins but the Rust compiler is aware that owner cannot be moved elsewhere on the stack.

owner.refed = &owner.owned;
              ------------ borrow of `owner.owned` occurs here

let new_owner = owner;
                ^^^^^
                |
                move out of `owner` occurs here
                borrow later used here

How about we take this idea for a spin? Below is an example of a simple* arena-allocated tree structure. I admit it a bit artificial, since we can do the same thing with indices into a Vec (or even better, with generational_arena). But we have to start somewhere.

#[derive(Copy, Clone)]
enum Node<'this> {
    Branch(NodeRef<'this>, NodeRef<'this>),
    Leaf(i32),
}
use Node::*;

type NodeRef<'this> = &'this Cell<Node<'this>>;

struct Tree<'this> {
    arena: Arena<Cell<Node<'this>>>,
    root: NodeRef<'this>,
}

impl<'this> Tree<'this> {
    pub fn append(&self, num: i32) {
        let mut cell = self.root;
        while let Branch(_, right) = cell.get() {
            cell = right;
        }

        let old = self.arena.alloc(Cell::new(cell.get()));
        let new = self.arena.alloc(Cell::new(Leaf(num)));
        cell.set(Branch(old, new));
    }

    pub fn rotate(&self) {
        let top = self.root;
        let Branch(a, mid) = top.get() else { return };
        let Branch(b, c) = mid.get() else { return };
        mid.set(Branch(a, b));
        top.set(Branch(mid, c));
    }
}

Except there’s a problem. How do we construct an empty tree object? The tree root has to be initialized with a reference to the arena, thus making the arena immovable. But the arena also has to be moved into the new struct:

let arena = Arena::new();
let root = arena.alloc(Cell::new(Leaf));
           ---------------------------- borrow of `arena` occurs here
let mut tree = Tree { arena, root };
                      ^^^^^  ---- borrow later used here
                      |
                      move out of `arena` occurs here

Reference Transplanting

Forget about self-references for a minute and consider the following code:

let a = Box::new((42, 69));
let r = &a.0;
let b = a;
dbg!(r);

In Rust today, this is totally illegal. We obviously deserve to be chastised with “cannot move out of a because it is borrowed” for attempting such blasphemy. But does this code really need to throw an error? I’d argue that it doesn’t, since the pointer underlying r is at no risk of invalidation. The memory r points to is still intact, just bound to a differently named variable.

If our goal is a simpler feeling Rust where “more of the programs that should work actually do work”, why not ask for this to work as well?

If you haven’t read “An alias-based formulation of the borrow checker”, I would recommend skimming it now, because the terminology will be helpful. In the example above, the the origin of reference r contains a shared loan of the box a.deref().0. “Origin” is the name used by polonius instead of “lifetime”, since it refers to some set of loaned variables and not a region of the program. Still, I’ll notate the origin of r as a lifetime parameter '1.

The move out of binding a and into a new binding b on line 3 then invalidates the loan of a inside origin '1. This results in an error because the origin is still alive: it is used by dbg!(r) on line 4.

To accept this program, we’d need polonius to “transplant” the loan from the path a.deref().0 to the path b.deref().0 at line 3. The exact details need to be worked out in far greater detail, but I believe any path <prefix_a>.deref().<postfix> can be transplanted to <prefix_b>.deref().<postfix> so long as the point where the transplant occurs dominates all points with live origins containing the path, and the deref implementation in question is known to return a stable memory addresses. Box::deref is already treated magically as part of loan paths so it is easy to support, but some kind of unsafe trait StableDeref {} would be needed to generalize to other smart pointer types.

With reference transplanting in place, the compiler will also allow us to move self-referential boxes:

let mut a = Box::new((42, &0));
a.1 = &a.0;
let b = a;
dbg!(&b);

In this case, the variable a initially contains a reference with empty origin, as the borrow &0 has static lifetime. In a sense, the static lifetime can be defined as the empty set of loans. That origin is replaced with once containing a shared loan of a.deref().0 on line 2, and the origin is kept alive until after the move on line 3.

Now for the important bit: that move is legal because Box implements StableDeref and the loan can be legally transplanted from path a.deref().0 to path b.deref().0. In all other respects, the loan is totally unchanged. Prior to the move, the loan is seen by the compiler as having the first path. After the move, it is simply seen as having the later path.

Variable b is then given a new origin containing the same (now-transplanted) loan. The old origin is dead, since it is never again used, while the new origin is kept alive by the dbg! macro on line 4. This logic is totally unaware of the transplant, and follows the same rules as any assignment operation.

With that feature out of the way, we are now able to construct an instance of our Tree struct:

let arena = Arena::new();
let root = arena.alloc(Cell::new(Leaf(42)));
// Transplant the loan inside root from &arena to &tree.arena
let mut tree = Tree { arena, root };

This requires one key modification to the arena type, so that it can be moved freely after allocating the root. Calling arena.alloc(...) method should force a dereference onto some kind of ArenaInterior which has a stable memory address:

pub struct Arena<T> {
    interior: Box<ArenaInterior>,
}

impl Deref for Arena<T> {
    type Target = ArenaInterior<T>;

    fn deref(&self) -> &ArenaInterior<T> { &*self.interior }
}

unsafe impl StableDeref for Arena {}

pub struct ArenaInterior<T> { .. }

impl ArenaInterior<T> {
    pub fn alloc(&self, value: T) -> &mut T { .. }
}

Transplanting and Functions

It would be nice to have a Tree::new() function. But for that to work, we need to be capable of transplanting references into and out of functions.

This should be possible because function callees and callers can share the variables that argument values are bound to, as well a kind of pretend variable which binds the return value. A caller can transplant any passed references onto new arguments before jumping into the callee. And a callee can transplant any returned references onto the pretend return variable before popping the stack.

This all works until the callee starts mucking with loaned fields, resulting in nasty unsoundness:

let mut a = Box::new((42, &0));
a.1 = &b.0;
take(a); // transplanted the reference from `a.deref().0` to `b.deref().0`

fn take(b: Box<(i32, &i32)>) {
    b.0 += 1; // illegal!
    dbg!(b.1);
}

It is tempting to think that the view types proposal could be our savior in this case. The function can simply agree not to touch b.deref().0 and everything will be fine, right? Not quite. We can still touch b.deref().0 by dropping b. Whatever fields we agree not to touch individually, dropping is always an option.

fn take(b: Box<{1} (i32, &i32)>) {
    let c = b.1;
    drop(b); // illegal!
    dbg!(c);
}

So a transplant into a function can only be legal if the function is also aware of the terms of transplanted loan: that it is dependent on some other argument not being messed with. Our take function must actually declare the source of passed reference. For now we’ll do this with a where clause explicitly documenting that that the lifetime origin 'r contains a loan from path b.0.

fn take<'r>(b: Box<(i32, &'r i32)>)
    where &b.0 in 'r
{
    let c = b.1;
    drop(b); // ERROR: c is known to borrow from b
    dbg!(c);
}

With these “explicit origins” in place, we can get our Tree data structure will working perfectly! In fact, they allow us to ditch all the interior mutability and complicated Cell juggling. Now nodes can store non-copy data. Even better, we get type-level verification that each node has a a most a single parent (the holder of the exclusive reference).

#[derive(Clone)]
enum Node<'mem> {
    Branch(NodeMut<'mem>, NodeMut<'mem>),
    Leaf(String),
}
use Node::*;

type NodeMut<'mem> = &'mem mut Node<'mem>;

struct Tree<'mem> {
    arena: Arena<Node<'mem>>,
    root: NodeMut<'mem>,
}

impl<'mem> Tree<'mem> {
    pub fn new(root: String) -> Self
        where &*return.arena in 'mem
    {
        let arena = Arena::new();
        let root = arena.alloc(Leaf(root));
        Tree { arena, root }
    }

    pub fn append(&mut self, text: String)
        where &*self.arena in 'mem
    {
        let mut edge = self.root;
        while let Branch(_, right) = edge {
            edge = right;
        }

        let old = self.arena.alloc(edge.clone());
        let new = self.arena.alloc(Leaf(num));
        *edge = Branch(old, new);
    }

    pub fn rotate(&mut self)
        where &*self.arena in 'mem
    {
        let Branch(ref mut a, ref mut mid) = self else { return };
        let Branch(ref mut b, ref mut c) = mid else { return };
        swap(a, c);
        swap(a, mid);
        swap(b, c);
    }
}

I’m using return as an identifier for “whatever variable the returned object is moved to”. We can work on making the syntax less confusing later.

Notice that the elided lifetime on &mut self (call it 'a) is only related to 'mem by an implied bound 'mem: 'a. This “outlives” relationship should be fairly obvious, since we could later invalidate 'a by moving the tree object, yet the references with origin 'mem would be transplanted and so continue to live on. In the words of the aliasing formulation, we would say that the origin 'a includes the &self.arena loan also in the origin 'mem, but that it may depend on many other loans besides.

To get a more idea of how self-references would work with explicit loans, let’s work on another example. The owning_ref crate is good inspiration here because it is a primative sort of self-reference that we should obviously be able to handle, but is also hard to write soundly.

pub struct OwningMut<'r, O, T> {
    owned: Box<O>,
    reference: &'r mut T,
}

impl<'a, T> OwningMut<'a, T, T> {
    pub fn new(value: T) -> Self
    where
        &mut *return.owned in 'a
    {
        let owned = Box::new(value);

        // creates origin '0 requiring exclusive loan L0 with path `owned.deref_mut()`
        // inputs fact '0: '1
        let reference: &mut '1 T = &mut '0 *owned;

        let new = OwningRef::<'2, _, _> {
            // transplant L0 from `owned.deref_mut()` to `new.deref_mut()`
            owned: owned,
            // inputs fact '1: '2
            reference: reference,
        };

        // transplant L0 from `new.deref_mut()` to the caller (a.k.a. `return.deref_mut()`)
        // inputs fact 'a: '2
        // -> L0 escapes into placeholder origin 'a
        // -> OK because return.deref_mut() was declared as required by 'a
        new
    }
}

impl<'a, O, T> OwningRef<O, T> {
    pub fn map<'b, V>(self, f: impl FnOnce(&'a, T) -> &'b V) -> OwningRef<'b, O, V>
    where
        'b: 'a,
        &mut *self.owned in 'a,
        &mut *return.owned in 'b,
    {
        // create placeholder loan L0 with path `self.owned.deref_mut()`
        // inputs fact that L0 is required by 'a

        let new = OwningRef<'0, _, _> {
            // transplant L0 from `self.owned.deref_mut()` to `new.owned.deref_mut()`
            owned: self.owned,
            // inputs fact '0: 'a
            reference: f(self.reference),
        };

        // transplant L0 from `new.owned.deref_mut()` to `return.owned.deref_mut()`
        // inputs fact 'b: '0
        // -> 'b: 'a is OK because it was declared
        // -> L0 in 'b is OK because it was declared
        new
    }

    pub fn as_owner_mut(&mut self) -> &mut O {
        // Ok because we have exclusive access to self
        &mut *self.owned
    }

    fn into_owner(self) -> Box<T>
        &mut *self.owned in 'a,
    {
        // create placeholder loan L0 with path `self.owned.deref_mut()`
        // inputs fact that L0 is required by 'a

        // invalidates L0
        // -> is OK because 'a is unused
        self.owner
    }
}

Notice that the as_owner_mut function above would be unsound except for the protections afforded by explicit origins. In fact, it is completely impossible to call. All instances of OwningMut are assumed to have a loan of self.owned live in self.reference, and this method does not declare that fact. If we try to add the declaration, we get an error:

ERROR: cannot borrow `*a` as mutable more than once at a time

pub fn as_owner_mut(&mut self) -> &mut O
where
    &mut *self.owned in 'a
    ______________________
         |
         | first mutable borrow declared here
{
    &mut *self.owned
    ^^^^^^^^^^^^^^^^ second mutable borrow occurs here
}

Explicit origins also work for self-references on the stack, although the API for a sort of stack-allocated owning ref is not very pretty. We need to make sure it is movable until it reaches its final location on the stack, requiring some generator-esque statefulness.

struct StackOwningRef<'a, T> {
    owned: T,
    reference: Option<&'a T>,
}

impl<T> StackOwningRef<'static, T> {
    fn new(value: T) -> Self {
        Self {
            owned: value,
            reference: None,
        }
    }
}

impl<'a, T> StackOwningRef<'a, T> {
    fn init(&mut self) where &self.owned in 'a {
        self.reference = &self.owned;
    }

    fn unwrap(self) -> T {
        self.owner
    }
}

// inputs fact '0: 'static
let a: StackOwningRef<'0, i32> = StackOwningRef::new(42);

// inputs fact '0: '1
// OK because there are no loans in '0 to invalidate
let mut b: StackOwningRef<'1, i32> = a;

// create shared loan L0 with path `a.owned`
// input fact L0 in '1
b.init();

// Use the reference!
dbg!(b.as_ref().unwrap());

// Kill '1
b.reference = None;

// OK because loan L0 is unused by any live origins
let mut c: StackOwningRef<'2, i32> = b;

Generics

So we have a working system for self-references completely free of Pin, right? Alas, not quite. Imagine what happens once a generic function gets involved.

TODO: show stuff blowing up

What we need is some sort of wrapper type which remembers the where clause for us while our type is off in generic land.

pub struct Tree { ... }

impl TreeWrapper {
    pub fn new<'mem>(inner: Tree<'mem>) where
        &mut inner.arena in 'mem
    { ... }

    pub fn unwrap<'mem>(self) -> Tree<'mem>
        &mut return.arena in 'r
    { ... }
}

This is pretty nearly how Pin is intended to work

impl<'unsafe> Tree<'unsafe> {
    pub fn unpin<'mem>(self: Pin<&'this mut Self>) -> &'this mut Tree<'mem>
    where
        &*return.arena in 'mem
    { ... }
}

To view or not to view?

Ergonomics

Conclusion

TODO