-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Description
We currently have three closely-related symbol types.
Symbol is the fundamental type. A Symbol is an index. All operations work on that index. StableHash is not implemented for it, but there's no reason why it couldn't be. A Symbol can be a gensym, which gets special treatment -- it's a guaranteed unique index, even if its chars have been seen before.
InternedString is a thin wrapper around Symbol. You can convert a Symbol to an InternedString. It has two differences with Symbol.
- Its
PartialOrd/Ord/Hashimpls use the chars, rather than the index. - Gensym-ness is ignored/irrelevant.
LocalInternedString is an alternative that contains a &str. You can convert both Symbol and InternedString to LocalInternedString. Its PartialOrd/Ord/Hash impls (plus PartialEq/Eq) naturally work on chars. Its main use is to provide a way to look some or all of the individual chars within a Symbol or InternedString, which is sometimes necessary.
I have always found the differences between these types confusing and hard to remember. Furthermore, the distinction between Symbol and InternedString is subtle and has caused
bugs.
Also, gensyms in general make things a lot more complicated, and it would be great to eliminate them.
Here's what I would like as a final state.
Symbolexists.InternedStringdoes not exist.LocalInternedStringperhaps exists, but is only used temporarily when code needs access to the chars within aSymbol. Alternatively,Symbolcould provide awith()method (likeInternedStringcurrently has) that provides access to the chars, and thenLocalInternedStringwouldn't be needed.Symbol's impl ofHashuses the index, and its impl ofStableHashuses the chars.- Not sure about
Symbol's impl ofPartialOrd/Ord. If a stable ordering is really needed (perhaps for error messages?) we could introduce aStableOrdtrait and use that in the relevant places, or do a custom sort, or something. - Gensyms don't really exist. They are simulated: when you call
gensym(), it just appends a unique suffix. It's worth noting that gensyms are always identifiers, and so the unique suffix can use a non-identifier char. AndInternercould keep a counter. So "foo" would gensym to something lke "foo$1", "foo$2", etc. Once the suffix is added, they would just be treated as normal symbols (in terms of hashing, etc.) I would hope that identifier gensyms would never be compared with non-identifier symbols, so a false positive equality match should be impossible. (Different types for identifier symbols and non-identifier symbols would protect against that, but might cause other difficulties.) Alternatively, syntax_pos::symbol::Symbol::gensym() is incompatible with stable hashing. #49300 talks about other ways of dealing with gensyms. - All this should also help performance, because we'd end up with more operations on indexes, and only the necessary ones on chars (which require TLS lookups).
I haven't even touched on the way lifetimes work in the interner, which are subtle and error-prone. But having fewer types would only make improvements on that front simpler.
Thoughts?
CC @petrochenkov @Zoxc @eddyb @Mark-Simulacrum @michaelwoerister