-
-
Notifications
You must be signed in to change notification settings - Fork 166
Compact AST Representation
andychu edited this page May 10, 2020
·
54 revisions
- Reddit Thread: Representing ASTs as byte strings with with small integers rather than pointers
- sajson: Last week, I described a JSON parse tree data structure that, worst case, requires N words for N characters of JSON text. Single allocation JSON?
-
Compiling tree transforms to operate on packed representations at ECOOPLDI 2017
- Conference Presentation on YouTube
- This is much more ambitious, but also more limited. Not only are there no pointers, but the tree traversal is equivalent to a linear scan through memory. It compiles a custom programming language and rearranges the operations so this is true.
- See limitations at the end of the paper.
- Good references in that papers:
- The Lattner/Adve paper below
- Cache-Conscious Data Structure Layout
- Compact Normal Form in Glasgow Haskell Compiler -- reduces both space and garbage collection pressure.
- Ken Thompson's postfix encoding for regexes (no pointers) is mentioned here: https://swtch.com/~rsc/regexp/regexp1.html
This is related: more general, but more complex.
- Reddit: Transparent Pointer Compression for Linked Data Structures by Lattner, Adve -- divide address in to 4 GiB pools so that internal pointers are 32-bit.
- Jai Demo: Relative Pointers
- Object-Relative Addressing: Compressed Pointers in 64-bit Java Virtual Machines
- Reddit: Idea for Tiny Relative Pointers. Using the LSB for external vs. internal might be a good idea. I also wanted a bit for const vs. non-const.
-
Vertical object layout and compression for fixed heaps
- Implemented in the Virgil Language by Ben Titzer
- Vertical Object Layout basically means transposing objects and fields. Analogous to arrays of structs -> struct of arrays. Except the "array" is every instance of a given type on the heap.
-
CppCon 2019: Steven Pigeon “Small is beautiful: Techniques to minimise memory footprint” -- a lot of C++ metaprogramming tricks with
constexpr
, etc. Goes to the extreme on size but says little about speed. No benchmarks and no concrete applications, just small pieces of C++. -
Why Sorbet Is Fast -- Fast static type checker for Ruby written in nice C++. 32-bit IDs for interned strings for fast comparison.
GlobalState
andRef
. - Flat Buffers don't need to be deserialized before using them. Also there is possible precursor here with flat_file_memory_management (no code)
-
Pointer Compression in V8 (March 2020). Chrome switched to being a 64-bit process in 2014, so they got this problem. Each v8 instance is limited to around 4 GB, although they still want to have multiple instances in a single address space (I guess for web workers, where all the workers have the same privileges).
- In order to simplify integration of Pointer Compression into existing code, we decided to decompress values on every load and compress them on every store. Thus changing only the storage format of tagged values while keeping the execution format unchanged.
- we observed that Pointer Compression reduces V8 heap size up to 43%! In turn, it reduces Chrome’s renderer process memory up to 20% on Desktop.
- Oil Blog Post: What is oheap?
- Oheap V1 was a read-only format for transporting an ASDL tree across the Python/C++ boundary. Though it turned out there wasn't a single clean boundary in the shell's architecture.
-
OHeap2 was a read-write format, for OVM2.
- At first I used it to represent .pyc files (Python's
CodeObject
) - Meant to replace marshal, pickle, and zipimport.
- 4-16-N model: 4 byte refs, 16 byte cells, and N byte slabs.
- See
ovm2/oheap2.py
- OHeap2 might still be a good idea, but I judged OVM2 not to be a good idea
- At first I used it to represent .pyc files (Python's