Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions active/0000-undefined-struct-layout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
- Start Date: 2014-05-17
- RFC PR #:
- Rust Issue #:

# Summary

Leave structs with unspecified layout by default like enums, for
optimisation & security purposes. Use something like `#[repr(C)]` to
expose C compatible layout.

# Motivation

The members of a struct are always laid in memory in the order in
which they were specified, e.g.

```rust
struct A {
x: u8,
y: u64,
z: i8,
w: i64,
}
```

will put the `u8` first in memory, then the `u64`, the `i8` and lastly
the `i64`. Due to the alignment requirements of various types padding
is often required to ensure the members start at an appropriately
aligned byte. Hence the above struct is not `1 + 8 + 1 + 8 == 18`
bytes, but rather `1 + 7 + 8 + 1 + 7 + 8 == 32` bytes, since it is
laid out like

```rust
#[packed] // no automatically inserted padding
struct AFull {
x: u8,
_padding1: [u8, .. 7],
y: u64,
z: i8,
_padding2: [u8, .. 7],
w: i64
}
```

If the fields were reordered to

```rust
struct B {
y: u64,
w: i64,

x: u8,
i: i8
}
```

then the struct is (strictly) only 18 bytes (but the alignment
requirements of `u64` forces it to take up 24).

There is also some security advantage to being able to randomise
struct layouts, for example,
[the Grsecurity suite](http://grsecurity.net/) of security
enhancements to the Linux kernel provides
[`GRKERNSEC_RANDSTRUCT`](http://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Randomize_layout_of_sensitive_kernel_structures)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's completely unnecessary if you're confident that you are memory safe, which (modulo compiler bugs) Rust can give you (except unsafe blocks).

IMO C-struct-compatibility is a major selling point of Rust, it should be the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're writing a kernel in Rust, you presumably aren't guaranteeing that all the programs your kernel runs are also written in Rust. To that end, being able to randomize fields sounds plausibly useful.

However, I would imagine it's probably better done by writing a custom item decorator that randomizes the field order (and places whatever #[repr()] attribute is necessary to tell the compiler to use the declaration order). Which is to say, the kernel author can write the necessary extension, Rust doesn't need to provide it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@o11c, it is also a major selling point of Rust to be highly efficient. IMO, this selling point is more important than C interop.

Of course, easy C interop will still be a selling point, but I suspect it is the wrong default. If we stick to our current default, every struct that is not used for C interop will pay the price of unoptimized representation, unless we add an annotation to the struct definition. This violates the notion of "pay for what you use". And given that the number of structs intended for C interop is relatively few, requiring this annotation for the majority of structs would be comparable to the burden of const-correctness in C++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kballard that is a good point. It's a trivial syntax extension: https://gist.github.com/huonw/be05427dc80e44f1a594

I'll remove randomisation as a reason for this RFC.

which randomises "sensitive kernel datastructures" at compile time.

Notably, Rust's `enum`s already have undefined layout, and provide the
`#[repr]` attribute to control layout more precisely (specifically,
selecting the size of the discriminant).

# Drawbacks

Forgetting to add `#[repr(C)]` for a struct intended for FFI use can
cause surprising bugs and crashes. There is already a lint for FFI use
of `enum`s without a `#[repr(...)]` attribute, so this can be extended
to include structs.

# Detailed design

A struct declaration like

```rust
struct Foo {
...
}
```

has no fixed layout, that is, a compiler can chose whichever order of
fields it prefers.

A fixed layout can be selected with the `#[repr]` attribute

```rust
#[repr(C)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole idea is a good one; +1. Bikeshed: I'm not sure if repr(C) is the right notation - you might want a defined layout for other reasons - maybe repr(fixed)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to bikeshed, maybe repr(declaration) or repr(as_written) would be more descriptive?

I think repr(C) is actually OK, because it's specifying that C layout rules should be used (i.e. declaration order), although it could easily be interpreted as "struct for C FFI". There are other possible "fixed" layouts (e.g. sorting by field size, or even alphabetically).

In any case, I don't particularly care about the name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought, but then it occurred to me that the only people who will insist on such control over representation would be coming from a C background anyway. I do like the analogy to repr(C) on enums, and it would be a shame to have repr(C) and repr(fixed) be aliases of each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #[repr(C)] is the only thing that makes sense for producing a C-compatible struct. If #[repr(fixed)] would produce something other than #[repr(C)] then it might be worth having, but if it produces the exact same layout as #[repr(C)] then it seems unnecessarily redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. repr(C) sounds like the best bet.

struct Foo {
...
}
```

This will force a struct to be laid out like the equivalent definition
in C.

# Alternatives

- Have non-C layouts opt-in, via `#[repr(smallest)]` and
`#[repr(random)]` (or similar similar).
- Have layout defined, but not declaration order (like Java(?)), for
example, from largest field to smallest, so `u8` fields get placed
last, and `[u8, .. 1000000]` fields get placed first. The `#[repr]`
attributes would still allow for selecting declaration-order layout.

# Unresolved questions

- How does this interact with binary compatibility of dynamic libraries?