Learning Lifetimes In Rust Sucks

I don’t like most Lifetime tutorials, therefore this is a tutorial that actually teaches them (at least in my own weird way). But before talking about lifetimes we have to talk about what it means to live in rust.

 

We start with a definition you may be familiar with, or may be hiding from:

$$\text{Life} = \text{Birth} + \text{Death}$$

where $\text{birth}$ = variable declaration/binding.

and $\text{death}$ = memory is automatically dropped/deallocated when it’s at the end of it’s scope. The following death scenarios outline all the variations of variables approaching end of scope, i.e. dying memory

 

Death Scenarios

Death Situation #1: Natural end of scope

{ // start block
    let x = String::from("hello"); // scope of x starts here
} // scope of x ends here; x dropped here

Death Situation #2: The value gets moved into a function’s scope, and dies at the end of such function’s scope

fn take_ownership(s: String) {
    // s is alive in this scope
} // s dies here

let x = String::from("hello");
take_ownership(x); // x dies here (moved into function; moved because only one variable can own a value, from Rust's ownership laws)

Death Situation #3: reassignment (move of one variable to another; ownership transfer)

let mut x = String::from("jozef");
x = String::from("lumaj"); // old String value of x (i.e. String:"jozef") dropped here
Tangent: Move Types vs Copy Types

Marker Traits:

Clone Types:

Copy Types:

pub trait Copy: Clone { //needs to implement the Clone trait
	// Empty, that's it, because it's a marker trait
}

Move Types

let y: i32 = 10 //integers are by default Copy
let x = y //value is copied here, therefore y is still alive in this scope

let z = String::from("z")
let x = z //value is moved here, z is dropped since it does not have ownership of this value anymore (not a valid pointer anymore)

 

Tangent$^2$: Does Stack vs Heap Matter?

Associating Stack data with Copy types and Heap data with Move types is a good mental model, though it doesnt give exatly the true reason why this correlation exists:

Here are some typical values, showing the correlation (but not causation) between Stack/Copy and Heap/Move types:

CopyMove
Stack-Allocatedi32, bool, (i32, f64), &T, pointers to heap-allocated data[String; 2], closure capturing String
Heap-AllocatedN/A since needs Drop-implemented logicString, Vec<T>, Box<T>

 

Death Situation #4: call drop() manually

let x = String::from("hello");
drop(x); // x dropped immediately

core::mem::drop

Tangent: The Drop trait (core::ops::Drop)

The Drop trait is used to customize the behaviour of how values are dropped out of memory, used for more complex use cases and nested types.

pub trait Drop {
   // The only method; here for you to customize the behaviour of drop
   fn drop(&mut self);
}

An example of implementing the Drop trait would be in the compiler, where threads (created from the rayon crate) need termination flags to be sent to them to end and have any associated memory destroyed. Which is not just a basic scope-ending scenario, its more complicated and therefore needs it’s own custom logic

impl Drop for ThreadPool {
    fn drop(&mut self) {
        self.registry.terminate(); 
        // this essentially just sends termination flags to the threads in the pool
    }
}
Tangent$^2$: Destructors & Drop Glue

The previous core::mem::drop() and core::ops::Drop.drop() are both methods that are still high-level abstractions. How does the compiler see a scope ending and decide to run a destructor? What really even is a destructor??

(1) Destructors for Copy Types: A destructor for a Copy type is not really a destructor:

(2) Destructors for types with Drop Implemented:

The documentation for Drop states:

This destructor consists of two components:

Such “Drop Glue” is extra code that is injected into your compiled program, done by the compiler, in order to recursively destroy all memory objects going out of scope.

// for example, you can see the logic here. 
// This function, called needs_drop_components_with_async, goes through all of the potential types this can be called on and returns back components which need to be dropped in a vector, and does it recursively. 
 
 // the following code is in the middle of the needs_drop_components_with_async function
 // ... and we are in the midst of a match statement 
// If any field needs drop, then the whole tuple does.
ty::Tuple(fields) => fields.iter().try_fold(SmallVec::new(), move |mut acc, elem| {
	acc.extend(needs_drop_components_with_async(tcx, elem, asyncness)?); // call itself since this tuple is still too high level, we need to get to the base case of dropping
	Ok(acc)
}),

 

Death Situation #5: Temporary expressions dropped immediately

String::from("temp").len(); 

 

What Lifetime’s really are:

Lifetime Annotations - what we are really here for!

(1) ◆ Explicit Lifetime Annotations ◆

Named-Annotations Syntax: 'place_your_name_here

Lifetime Annotations are used in the following data structures:

The reason why it’s need for these objects specifically is because they:

(1) Introduce some form of abstraction/generalization over data, as they all use some sort of parameters in their definitions

This abstraction/generalization naturally introduces ambiguity to the compiler, as we will see soon.

(2) Introduce their own scopes, where there is a barrier between their internal and external access to values. (Most) Inner scopes only know their own world and cannot guarantee that the outside world is not deleting data that our above data structures are referencing.

  1. &T: A shared reference/borrow of any given type T, which is essentially a pointer to some data, where you cannot change the value being pointed to
  2. &mut T: A mutable reference/borrow of any given type T, which is a pointer as well but now we are able to edit the data being referenced to.

and so, you will see explicit lifetimes being always associated with some reference, such as &'a T

The two above points are interrelated because parameters are the only values which can (usually) traverse such scopes, making them a potential target of ambiguity of their life-status in the eyes of a given scope.

 

The lifetime system exists because references can outlive the values they point to (i.e. the actual value dies before the reference does, this is common in nested scopes), which would create dangling pointers (i.e. referencing something that doesn’t exist).

 

Let’s look at how the syntax actually plays out:

Function Signatures + Lifetime Annotations:

Here is a simple example of how we would specify lifetime annotations for functions:

fn print_refs<'a, 'b>(x: &'a i32, y: &'b i32) {
    println!("x is {} and y is {}", x, y);
}

 

Important Tangent: Generics in Rust

A function in second order lambda calculus would be something like: $\lambda \alpha: *. \lambda x: \alpha. x$, where if I had to do a crude 1-to-1 translation into some rust pseduo-code, then it would look something like:

fn function_outside(param_alpha: any_type_you_want){ 
	fn function_inside(param_x: param_alpha){
		return param_x
	} 
}

This is ugly, therefore my assumption is that the rust developers said we need something like fn function_inside(params_for_types)(params_for_values), but then this is weird to read and makes parsing a harder job for the compiler too. So why not use angle brackets? Our rust pseudo code will look something like this now:

fn function_inside<param_alpha: any_type_you_want>(param_x: param_alpha) {...}

Actually why even have a generic name for a type like param_alpha: any_type_you_want, types are already intrinsically generic names for things!

aside; tiny tangent: … maybe this we shouldn’t just throw out this colon-type-specification thing (i.e. param_alpha: any_type_you_want), maybe it’s useful for type bounds 😳. It’s just another, higher-order, abstraction on types, like how the lambda calculus defines it, but no need to get deep into it here.

Let’s simplify it again with this knowledge:

fn function_inside<param_alpha>(param_x: param_alpha)

// or even simpler:
fn func<T>(x: T)

Therefore we can see that a lifetime parameter/annotation is just a type parameter, which syntactically differs from regular generics from their ' apostrophe, and semantically differs from regular generics by that they are specifying how long a type lives rather than what a sequence of bits is supposed to represent to us, as a class of objects generally.

We use generics for lifetimes for simple equality checks between lifetimes (i.e. they are relative), the most basic showcase of equality checks 'a = 'a and 'a != 'b (a and b can coincidentally be the same value of lifetime, but we are not constraining them to be equal in our lifetime annotation), so we can easily tell the compiler relative differences between lifetimes since that is all we need to ensure that one value does not outlive another. To specify that values of type 'b outlives type 'a we actually use a colon syntax like: 'b: 'a when defining b as a generic.

this colon syntax is called “lifetime coercion” because you’re forcing one lifetime to be at least as long as the lifetime of another

The actual concrete values of these generic lifetimes are the actual variable’s inherent lifetimes, which these generics become associated with in say a function signature, which I will show off below in the greater context around where lifetime annotations are specified.

 

What’s going on in our print_refs function?

aside: And in general, when dealing with Lifetimes we have to be good Hegelians and understand that the tension between inside of a function and the outside is interdependent through the variables we passed through it. My gripe with many Lifetime tutorials stems from the isolation, like just looking at a function’s signature, they explain Lifetimes through. Its simplification for simplification’s sake, and leads to confusion since we don’t have all the context.

//define the same function
fn print_refs<'a, 'b>(x: &'a i32, y: &'b i32) {
    println!("x is {} and y is {}", x, y);
}

// We need to actually use the function in action to understand the lifetime annotations: 
{ //scope 1
	let w: i32 = 1;
	{ //scope 2
		let u: i32 = 2;
		print_refs(&w, &u);
	} // u dies here, this is the end of it's lifetime
	
} // w dies here, this is the end of it's lifetime
{
	let w: i32 = 1;
	let u: i32 = 2;
	print_refs(&w, &u);
} // w and u and the function have the same scope
{
	let w: i32 = 1;
	let u: i32 = 2;
	{
		print_refs(&w, &u);
	} // the function is in a nested scope so w and u definitely still live after this nested scope
} // w and u have the same scope

 

Now let’s spice up this example by adding in a return value which is a reference:

//define the same function but with return value
fn print_refs<'a, 'b>(x: &'a i32, y: &'b i32) -> &'a i32 {
    println!("x is {} and y is {}", x, y);
    x 
}

{ 
	let w: i32 = 1;
	let output; // defining it here so we can reference it in this scope later
	{
		let u: i32 = 2;
		output = print_refs(&w, &u);
		println!("{}", output);
	} 
	println!("{}", output); // also valid
} // since 'a was assinged the life of x, and we inputted &w in x's place, output will actually die here, the same lifetime as w!!!!

// if our return value was instead -> &'b i32, then would would get a compiler error:
// error: lifetime may not live long enough
// and if you ran this with the signature fn print_refs<'a, 'b>(x:&i32, y: &i32) -> &i32, you would get the error:
// missing lifetime specifier

While the previous example was ambiguous for the compiler, the following example is also ambiguous for the developer (without understanding the broader scope of course)

fn print_refs<'a, 'b>(x: &'a i32, y: &'b i32) -> &??? {
    if *x > *y {
        x  // Lifetime 'a
    } else {
        y  // Lifetime 'b
    }
}

// the compiler needs to know: after the function returns, how long can I safely use this reference?
// therefore in this case we would need a definitive answer, and we have two choices here:

// (1) we can make both lifetimes the same.
fn print_refs<'a>(x: &'a i32, y: &'a i32) -> &'a i32 {...}

// (2) we can use the shorter of the two lifetimes:
// 'b: 'a means b outlives a, we coerce b to live at least as long as a, since a is for sure the shorter of the two then returning a refernence based on 'b we know lives when returning
fn print_refs<'a, 'b: 'a>(x: &'a i32, y: &'b i32) -> &'a i32 {...}

 

Structs + Lifetime Annotations:

// (1) single reference, i.e. one lifetime parameter
struct Article<'a> {
    title: &'a str,
    content: &'a str,
}

// (2) multiple references with the same lifetime
struct BlogPost<'a> {
    author: &'a str,
    article: &'a Article<'a>, //we had to specify the nested lifetime too
}

// (3) multiple references with different lifetimes
struct Comment<'a, 'b> {
    text: &'a str,
    post: &'b BlogPost<'b>,
}

// (4) mix of owned data and references
struct User<'a, T> {
    id: T,              // owned - no lifetime needed
    name: String,         // owned - no lifetime needed
    bio: &'a str,         // borrowed - needs lifetime
}

// (5) using a struct that has explicit lifetime annotations as a parameter to a function, where we are using a reference to this struct; a reference to a struct that has references for it's interior data. This ensure that during the function call, whatever is referenced in the struct is still valid
fn dummy_func<'b>(person: &SomeStruct<'b>) {
    println!("This is a struct with an interior reference: {}", SomeStruct.data_field);
}

The above patterns highlight how you’ll use it syntactically, but the semantics become lost when we are just looking at these blocks.

Lets look at the below example:

{
	let v: i32 = 999;
	let user_concrete: User;
	{
		let w: i64 = 888888;
		let user_concrete = User { 
			name: String::from("Lumaj"), 
			thirty_two: &v, 
			sixty_four: &w
		};
		println!("{:#?}", user_concrete); // works
	}
	println!("{:#?}", user_concrete); // WONT WORK, 'b = lifetime of w, that's dead!
	// error[E0381]: used binding `user_concrete` isn't initialized
	// i.e. user_concrete was already dropped due to the compiler listening to your annotations and choosing to drop the value at the end of the inner scope
}

 

Enums + Lifetime Annotations:

enum Either<'a> {
    Num(i32),
    Ref(&'a i32),
}

enum Content<'a> {
    Article { title: &'a str, body: &'a str },
    Image { url: &'a str, alt: &'a str },
    Empty,
}

enum Reference<'a, 'b> {
    First(&'a str),
    Second(&'b str),
    Both(&'a str, &'b str),
}

 

Impls + Lifetime Annotations:

In the most simple case you would want to just repeat the lifetimes that are added to the struct or enum that you are implementing functionality on:

struct Article<'a> {
    title: &'a str,
}

impl<'a> Article<'a> {
    fn new(title: &'a str) -> Self {
        Article { title }
    }
    
    fn title(&self) -> &'a str {
        self.title
    }
}

Though any given method with an impl can introduce it’s own lifetimes, just like any other function, which do not have to be specified at the impl signature (therefore at the impl block’s signature, all you have to do is specify the given type’s own references):

impl<'a> Article<'a> {
   fn compare<'b>(&self, other: &'b str) -> bool {
       self.title == other
   }
}

 

Traits + Lifetime Annotations:

// this example is taken directly from rust's documentation
struct Borrowed<'a> {
    x: &'a i32,
}

// Annotate lifetimes to impl.
// The default trait is in the standard library, its for giving a fallback default value easily
// as you can see, the actual Default trait itself didnt need to be defined with the lifetime itself, only when it came to implementing it's behaviour did we have to specify it
impl<'a> Default for Borrowed<'a> {
    fn default() -> Self {
        Self {
            x: &10,
        }
    }
}

But we can also have traits where in the definition itself it requires that we are using a reference with an explicit lifetime:

trait Summarizable<'a> {
    fn summary(&self) -> &'a str;
}

So if we double-up on both the struct and the trait needing an explicit lifetime annotation it would look something like:

struct Article<'a> {
    title: &'a str,
}

// the impl + trait + struct all defining explicit lifetime annotation
impl<'a> Summarizable<'a> for Article<'a> { // 3 levels of abstraction, 3 lifetime annotations
    fn summary(&self) -> &'a str {
        self.title
    }
}

Generics can have trait bounds (e.g. T: Copy + Clone, i.e. T must implement Copy and Clone), but also lifetimes can be part of those trait bounds too:

fn print_summary<'a, T: Summarizable<'a>>(item: &T) {
    // ...
}

fn process<'a, 'b, T>(item: &'a T, context: &'b str) -> &'a str
where
    T: Display + 'a, // T must implement Display and outlive 'a
    'b: 'a,  // 'b must outlive 'a
{
    // ...
}

 

Reserved names for lifetimes:

Anonymous Lifetimes:
// These are functionally identical:
struct Article<'a> {
    title: &'a str,
}

struct Article<'_> {
    title: &'_ str,
}

impl Article<'_> {  // matches the struct definition
    fn new(title: &str) -> Article<'_> { ... }
}
'static Lifetimes:
// an example of a value that has a `'static` lifetime is the `str` type. These values get stored in the program's binary and are therefore always available. 
fn get_greeting() -> &'static str {
    "Hello" // string literal is 'static by default, let the borrow checker know that our function is doing this return type of such a lifetime
}

Tangent: const vs let vs static in rust

const

const fn:

const fn square(x: i32) -> i32 { x * x }
const VALUE: i32 = square(5);

let

static

// Globals are declared outside all other scopes. let variables cannot be outside of main()
static DUMMY: &str = "Dummy";
const NUM: i32 = 10;

fn main() { ...

 

(2) ◇ Implicit Lifetime Annotations ◇

Lifetime Elision:

Elision rules for functions and methods:

(1) No output references:

fn print(s: &str)
// the compiler automatically expands this to:
fn print<'a>(s: &'a str)

(2) One input reference, one output references:

fn first_word(s: &str) -> &str
// the compiler automatically expands this to:
fn first_word<'a>(s: &'a str) -> &'a str

(3) &self or &mut self for methods in impl blocks

impl Article {
    fn title(&self) -> &str  // self's lifetime used for output
    // the compiler automatically expands this to:
    fn title<'a>(&'a self) -> &'a str
}

The actual rules of elision are for the compiler, the consequences of these were showcased above and are more practical to know, but it’s good to detail the below regardless to understand why the above applies. You can think of these rules as fancy autocomplete, and you only have to step in once things get too ambiguous:

  1. The compiler assigns automatically a lifetime to each reference that is a parameter. 'a will get assigned to the first parameter, 'b to the second, and so on.
  2. If there is exactly one input lifetime parameter, that lifetime is assigned to the output lifetime parameters. fn func_name<'a, T>(param_1: &'a T) -> &'a T , fn func_name<'a, T>(param_1: &'a T) -> (&'a T, &'a T), etc.
  3. If there’s more than one input parameter to a method, but one of these input parameters is &self or &mut self, then assign all output parameters the lifetime of the &self or &mut self value. This rule actually makes lifetime annotations in methods very uncommon.

Some people use anonymous lifetimes to make clear that elision is happening. You may find this in some documentation.

// Without '_' - completely implicit
fn process(s: &str) -> &str { s }

// With '_' - says "yes, I know there are lifetimes here"
fn process(s: &'_ str) -> &'_ str { s }

And that’s all you need to know about Lifetime annotations! Hope that clears everything up and gives you a good guide to reference.

The compiler’s error messages will be your best friend, as in a lot of cases it will point you in the right direction for what explicit lifetimes to use.

Thanks for reading my 7,500-word essay on Lifetimes!!

 

goodbye.gif