Sam Sartor
Git( Hub| Lab)
# Rusty Self-Referential Types 2023-5-31
Hey! This page is a rough draft.

This idea has been living rent-free in my head since the original Polonius blog post and I’m desperate to evict it. Getting straight to the point, Rust users should be able to write something like

pub struct Syntax<'a>
where
    &*self.content in 'a,
{
    pub content: String,
    pub tokens: Vec<&'a str>,
}

impl<'a> Syntax<'a> {
    pub fn new(content: String) -> Self {
        Syntax {
            tokens: content.split_whitespace().collect(),
            content,
        }
    }
}

Of course the devil is in the details. Thankfully there are only two interesting things going on here:

Explicit Origins

In the example above, we provide an explicit origin for a borrow in a struct. However, the hypothetical syntax suggests that explicit origins should also be available on free functions. This is not particularly useful (for reasons I’ll get to later) but it does help to illuminate the semantics.

fn foo<'a>(container: &mut Box<i32>, element: &'a i32)
where
    &*container in 'a,
{
    // Error: cannot borrow `container` as mutable because it is also borrowed as immutable
    *container += 42;
    // Ok! Can borrow `container` immutably more than once.
    dbg!(container);
}

fn bar<'a>(container: &mut Box<i32>, element: &'a mut i32)
where
    &*container in 'a,
{
    // Error: can not borrow `container` as immutable when it is borrowed as mutable
    dbg!(container);
    // Ok! We have exclusive access _although there is technically an alias_.
    *element += 1;
}

These explicit origins are sorta similar to view types in the way they can restrict access to the contents of a particular object. But unlike view types (1) they help the compiler emit correct noalias information and (2) they potentially allow the user to regain access to the object after all relevant references have been killed. Additionally, while view types would be an entirely new sort of thing within the language, the borrow checker already has the machinery to juggle references and their origins even when the reference is from one field of a struct to another (see the discussion at the end). Ideally speaking, explicit origins would simply add the ability to maintain this existing information across function boundaries. Probably it is a little more complicated in practice.

For example, we would probably require explicit origins to only appear on types, because otherwise self-references do not work well with traits:

// Cool type! No need to declare anything.
pub struct Syntax {
    pub content: String,
    pub tokens: Vec<&'a str>,
}

// Let's provide an iterator. Still don't declare anything.
impl<'a> IntoIterator for Syntax<'a> {
    ...
}

impl<'a> Syntax<'a> {
    // Basic tokenization with a self-reference to the content.
    // Make sure to declare the self-reference.
    pub fn new(content: String) -> Self
    where
        &*return.content in 'a,
    {
        Syntax {
            tokens: content.split_whitespace().collect(),
            content,
        }
    }
}

// Error: `impl Iterator for Syntax` is not aware of the self-reference
Syntax::new("the lazy dog").into_iter();

In this case we need to be absolutely certain the IntoIterator impl can not do something illegal like drop self.content while stepping through self.tokens. We could have the user write the same explicit origin on every impl and then try to establish some kind of equivalence when querying trait implementations during the type check. But that sounds like a nightmare. Far easier for everyone if explicit origins are declared at the type level rather than at the function level. Then we can be sure that all implementations are aware of the same self-reference, and will always be safe to call:

// ERROR: cannot return value referencing local variable `self`
// -->

pub struct Syntax<'a>
where
    &*self.content in 'a,
//  -------------------- `self.content` is declared as borrowed here
{
    pub content: String,
    pub tokens: Vec<&'a str>,
}

impl<'a> IntoIterator for Syntax<'a> {
    type Item = &'a str;
    type Iter = IntoIter<&'a str>;

    pub fn into_iter(self) -> Self::Iter {
        self.tokens.into_iter()
//      ^^^^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by current function
    }
}


// OK: all impls must  agree on the clauses.
Syntax::new("the lazy dog").into_iter();

Traditionally we would try to solve this sort of case by explicitly marking FutureWithBuffer: !Unpin and then fighting with some kind of pin-projection macro since Rust doesn’t offer a first-class solution for pin-projection. Wait a minute…

Oh, type-level explicit origins are exactly what you need to allow first-class pin-projection. So long as !Unpin (or !Move) is inferred from a type-level explicit origin and not declared by the user, all fields on the type can be safely projected: the borrow checker knows what fields borrow what other fields and can prevent illegal usage directly. Simple!

(actually not at all simple but I am saving that for a later section)

Reference Transplanting

As mentioned in the beginning, this idea has another half. In my opinion it is the simpler half; in the sense that Rust feels simpler when more of the programs that should work actually do work. Here is such a program:

let box_a = Box::new(42);
let pointer = &*box_a;
let box_b = box_a;
*pointer;

Currently the compiler treats pointer as invalidated when the box is moved. But there is no actual reason for that besides its being an easy heuristic to implement. Instead we could teach the compiler to “transplant” any reference origin like <T as Deref + StableDeref>::deref(a.b.c).d.e.f to a different origin like <T as Deref + StableDeref>::deref(g.h.i).d.e.f in any expressions appearing after a.b.c is unconditionally moved to g.h.i. To be honest, this sounds like another one of those things that would be easier to implement after NLL is replaced by Polonius, but I’ll let actual experts weigh in rather than make vague assumptions.

If the origin of a live reference does not pass through a StableDeref, moves would cause invalidation as usual.

let mut self_ref_tuple: (&i32, i32) = (&0, 42);
self_ref_tuple.0 = &self_ref_tuple.1;
return self_ref_tuple; // ERROR!

Critically, the reference transplanting feature is completely orthogonal to the explicit origins feature. Its just that the two features combine respectively to create create self-referential boxes and then to make them movable.

Pin Improvements

The above two sections are each 50% of the idea, and this section is the remaining 50%. In an ideal universe this writeup would be all over and we could get on with bikeshedding syntax, but sadly there are remaining problems with the pin API which must be sorted out. To be clear, this section does not provide definitive solutions. It is just a sketch of my favorite idea, to get the discussion going.

The yet unstated assumption underlying type-level explicit origins is that no interaction with a value can occur without either fully resolving its type or requiring it to have some implementation. For example, we know that the below function is sound even in the presence of self-referential types because any received type must first implement MyMove:

pub fn move_a_bunch_of_times<T: MyMove>(x: T) -> T {
    let y = x.my_move();
    let z = y.my_move();
    z.my_move()
}

Unless the declared explicit origins in a self-referential type pass through some StableDeref, the type can not implement MyMove::my_move without triggering a borrow checker error, and so can not be passed to move_a_bunch_of_times. But of course, a function does not actually need to declare T: MyMove in order to move instances of T. As things stand, a generic function can always move and drop completely opaque values without making any assumptions about their trait implementations.

The intended solution to this problem is the Pin wrapper type. Through Pin, a generic function can agree to further restrictions on allowed accesses. Alas, pinning also thoroughly trashes the ergonomics of any first-class system for self-referential structs. While a user interacting monomorphic code could completely ignore the pinning API and rely entirely on type-level explicit origins, any user interacting with polymorphic code must constantly pin, unpin, and pin-project their types. This seemingly creates a need for ergonomic pin-projection, to accompany type-level explicit origins and reference transplanting. Although no such proposal is included here, it is tempting to try and automatically derive required projections based on the type-level explicit origins. Maybe someone else can make this work but I think it would cause major backwards compatibility hazards.

Instead I think it is worth reconsidering Rust’s need for a Move auto-trait. This may seem redundant given pinning exists, but there are some compelling reasons to include both in the Rust language, listed below.

Critically, the Move trait wouldn’t mark types which can be moved. All types can be moved at least once: on construction. It would actually mark types that can always be moved. A value which does not implement Move can still be moved if the language has a good reason to think it is in a movable state (e.g. it was just returned from a function, all self-references were just killed, its lifetime is 'static, etc). In general a value will become immovable only because the borrow checker can not eliminate the possibility that its moving will invalidate a live reference, and a T: Move clause (implicit or explicit) would simply inform the borrow checker that this is initially the case.

Reason #1: Initialization

In retrospect, the greatest oversight of the pin API was its lack of an initialization guarentee to mirror the drop guarentee. The Linux kernel people would especially appreciate access to a language item like:

pub trait Init {
    pub fn init(self: Pin<&mut Self>);
}

pub unsafe fn init_in_place<T: ?Sized>(to_init: *mut T) {
    // Calls Init::init recursivly, like drop_in_place does for Drop::drop.
}

Provided it was paired with a requirement that

Data in memory can not be considered pinned until Init::init is called exactly once, and that memory must remain pinned without being invalidated or repurposed until Drop::drop is called

Maybe we could have even added a Init::deinit function to act exactly like Drop::drop but receiving Pin<&mut self>?

But it is far too late to introduce such a guarantee, unless the language introduces a new auto-trait. If Move is added to the language, and all existing generic functions automatically require it (silently in older editions and sometimes explicitly via cargo-fix in newer editions), then we can also introduce the below initialization requirement without causing unsoundness in any existing uses of pin.

Data in memory can not be considered pinned until Init::init is called exactly once, unless the data implements Move. Similarly, pinned data in memory must remain pinned without being invalidated or repurposed until Drop::drop is called, unless the data in memory implements Unpin + Move.

Then the ecosystem can gradually add T: ?Move clauses to relevant generic functions along with support for the initilization guarantee, eventually allowing pinned datastructures such as Linux’s mutex type to be used freely and simply alongside any library function or language feature.

Even with ?Move managing most interactions between generic code and self-referential types, pin will remain useful as a way of accessing the initialization and drop guarantees.

Reason #2: Projection

With newer types relying on impl !Move instead of impl !Unpin to restrict dangerous accesses, we can special-case Pin to automatically project structurally on any structs or enums which automatically implement Unpin. This would not break any existing code, since any code which relies on manual pin projection must somehow specify !Unpin. Making all automatic projection structural sounds dangerous, but it would be acceptable for all new structs/enums which either implement Unpin + Move or implement Unpin + !Move by explicitly declaring their reasons for being immovable. Obviously some custom datastructures like VecDeque can never provide pin-projection methods (or support for ?Move elements), but they would never be subject to automatic projection in the first place.

It sounds easy to me, but someone smarter should definitely check my homework.

Reason #3: Compatibility

In this case, the intention would be for all code to eventually migrate away from relying on the Unpin trait, even though Pin itself would remain useful for init/drop guarantees. But in the meantime it would be critical to maintain trivial compatibility between post-move and pre-move code. Thankfully, the pin API can also fulfil this role. For any code compiled under a previous edition, the compiler could automatically infer T: ?Move where T is not required to be Unpin and where T is always behind the Pin wrapper in the function signature. This is sound as any such function could not soundly move T in Rust today, and would allow !Move values to be passed from newer code to older code so long as they are first pinned.

Obviously when a crate migrates to a new edition, cargo-fix should toss in ?Move or Move clauses where needed to maintain the previous behavior. Then to achieve even better ergonomics crate authors could manually remove all usages of Unpin and some usages of Pin across a breaking version bump, after making sure their code meets the new safety requirements.

Self-referential async blocks would still be a little tricky, since migrating them from !Unpin to !Move would break cross-edition compatibility quite badly. But there are lots ways around that from simple new wrapper types to nasty special cases in the compiler.

Even More Explicit Origin Discussion

My biggest syntax annoyance with this proposal is the need to declare lifetime parameters on structs. Lifetimes on types are a huge foot-gun for beginning Rustaceans and anyway it doesn’t seem like self-referential types even need lifetime parameters. Most current proposals imagine something like a 'self lifetime taking the place of a generic parameter. But 'self does not inherently refer to a real binding or region of the control-flow-graph that a reference can borrow from. Even a self-reference borrows some part of a real variable somewhere in a function body, and the struct containing that self-reference is generic over the location of that variable. It could be that 'self is a kind of implicit generic parameter, but that is just syntax. Under the hood, it should still create a sort of type-level explicit origin.

Explicit origins as presented above exist mostly to inform the compiler of aliases within a struct. Stuff like “the user can’t modify this field without invalidating the pointer in this other field”. And framed through that lens it is easy to imagine (slightly contrived) cases where the lifetime parameter is very useful:

struct MaybeSelfRef<'a, 'b>
where
    &*self.value in 'a,
    &*self.value in 'b,
{
    pub value: Box<i32>,
    pub first_ref: &'a i32,
    pub second_ref: &'b i32,
}

impl MaybeSelfRef<'a, 'b> {
    pub fn new(
        value: Box<i32>,
        first_ref: Option<&'a i32>,
        second_ref: Option<&'b i32>,
    ) -> Self {
        MaybeSelfRef {
            first_ref: first_ref.unwrap_or(&*value),
            second_ref: second_ref.unwrap_or(&*value),
            value,
        }
    }
}

But if you don’t care too much about this case, it is easy to abbreviate the syntax. Especially in such a way that user code does not become infected with lifetime parameters in punishment for including a single self-referential type somewhere deep down in a struct. Easiest would be a new lifetime elision rule applied to lifetime parameters which participate in an explicit origin on a type used in argument, return, or field position. But you could also ditch the explicit origin syntax entirely and come up with a &{value} i32 syntax or something instead.

Even without explicit origins, Rust actually has perfectly good support for self-referential types today. So long as you never need to mutably borrow or move such a type…

pub struct Syntax<'a> {
    pub content: String,
    pub tokens: Vec<&'a str>,
}

impl<'a> Syntax<'a> {
    pub fn something_immutable(&'a self) { ... }
    pub fn something_mutable(&'a mut self) { ... }
    pub fn something_move(self) { ... }
}

let mut syntax = Syntax {
    tokens: Vec::new(),
    content,
};

// Make the `syntax` binding self-referential!
syntax.tokens.extend(syntax.content.split_whitespace()); /*
                     --------------------------------- immutable borrow occurs here */

// We are even allowed to call methods.
syntax.something_immutable();

// As long as they don't require exclusive access to self.
syntax.something_mutable(); /*
^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here */

// Or try to move self.
syntax.something_move(); /*
^^^^^^
   | |
   | move out of `syntax` occurs here
   | borrow later used here */

The only purpose of the explicit origin syntax is to declare these sorts of loans in function signatures, so that callee and caller agree on the unusual terms.

At first glance it looks like an explicit origin annotation &self.a in 'a creates a new assumption within the compiler. That is, that any struct with an explicit origin is a subtype of the equivalent struct without the annotation. But actually the opposite is the case. In fact, we can see above that the type has exactly the same set of possible inhabitants. Explicit origins on a function argument weaken the callee’s assumptions such that some of the inhabitants are allowed to cross function boundaries where that would previously create an error.

To be specific, Rust normally assumes that the origin of a lifetime parameter 'a is some set of bindings outside of the declaring function. Thus, it is impossible for the callee to invalidate any reference originating from 'a. The presence of an explicit origin should strictly relax this assumption. In turn, each caller is allowed to forgo the assumption that the passed reference is invalidated before being killed. Instead the callee has additional responsibility for not invalidating the reference while it is live and the caller can assume that if the reference is invalidated by the callee, it is only after being killed.