-
Notifications
You must be signed in to change notification settings - Fork 14k
std: Stabilize the std::hash module #20654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
(rust_highfive has picked a reviewer for you, use r? to override) |
|
HashMap changes LGTM |
c75ed93 to
97d6695
Compare
|
LGTM. |
6a74f96 to
71e522e
Compare
|
The commit message just gets longer! I've rebased with as many goodies as possible (those that have been implemented). This removes all the I've purged "sip" as a name from everything other than r? @aturon |
|
continuing from #19673
@alexcrichton That's right, such an algorithm would have to be aware of generics. There is another approach to this problem: attempting to optimize these calls to .hash(state) while still doing them for all members.
I think the impl of Hash for &str has two calls to .write().
Perhaps using .finish() imposes inevitable overhead, but I wouldn't worry much about it. In my comment, I tried to make an implementation that would do much less work incrementally, yes, because @thestinger didn't support his ideas with a concrete design. Nothing came out of that. I have overestimated LLVM's understanding of pointer arithmetic optimization. |
Ah yes good point, I forgot about the byte we include at the end!
Thanks for clarifying! |
src/liballoc/rc.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need the hash::Writer constraint here any longer?
src/libstd/collections/hash/set.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion re: factoring out an H with S: HashState<Hasher=H>
07ee0f4 to
aae9a28
Compare
|
Ok, I believe I took care of all unnecessary |
aae9a28 to
596fac0
Compare
|
Awesome work, @alexcrichton |
4ce42a1 to
11fa2ec
Compare
This commit aims to prepare the `std::hash` module for alpha by formalizing its
current interface whileholding off on adding `#[stable]` to the new APIs. The
current usage with the `HashMap` and `HashSet` types is also reconciled by
separating out composable parts of the design. The primary goal of this slight
redesign is to separate the concepts of a hasher's state from a hashing
algorithm itself.
The primary change of this commit is to separate the `Hasher` trait into a
`Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was
actually just a factory for various states, but hashing had very little control
over how these states were used. Additionally the old `Hasher` trait was
actually fairly unrelated to hashing.
This commit redesigns the existing `Hasher` trait to match what the notion of a
`Hasher` normally implies with the following definition:
trait Hasher {
type Output;
fn reset(&mut self);
fn finish(&self) -> Output;
}
This `Hasher` trait emphasizes that hashing algorithms may produce outputs other
than a `u64`, so the output type is made generic. Other than that, however, very
little is assumed about a particular hasher. It is left up to implementors to
provide specific methods or trait implementations to feed data into a hasher.
The corresponding `Hash` trait becomes:
trait Hash<H: Hasher> {
fn hash(&self, &mut H);
}
The old default of `SipState` was removed from this trait as it's not something
that we're willing to stabilize until the end of time, but the type parameter is
always required to implement `Hasher`. Note that the type parameter `H` remains
on the trait to enable multidispatch for specialization of hashing for
particular hashers.
Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is
simply used as part `derive` and the implementations for all primitive types.
With these definitions, the old `Hasher` trait is realized as a new `HashState`
trait in the `collections::hash_state` module as an unstable addition for
now. The current definition looks like:
trait HashState {
type Hasher: Hasher;
fn hasher(&self) -> Hasher;
}
The purpose of this trait is to emphasize that the one piece of functionality
for implementors is that new instances of `Hasher` can be created. This
conceptually represents the two keys from which more instances of a
`SipHasher` can be created, and a `HashState` is what's stored in a
`HashMap`, not a `Hasher`.
Implementors of custom hash algorithms should implement the `Hasher` trait, and
only hash algorithms intended for use in hash maps need to implement or worry
about the `HashState` trait.
The entire module and `HashState` infrastructure remains `#[unstable]` due to it
being recently redesigned, but some other stability decision made for the
`std::hash` module are:
* The `Writer` trait remains `#[experimental]` as it's intended to be replaced
with an `io::Writer` (more details soon).
* The top-level `hash` function is `#[unstable]` as it is intended to be generic
over the hashing algorithm instead of hardwired to `SipHasher`
* The inner `sip` module is now private as its one export, `SipHasher` is
reexported in the `hash` module.
And finally, a few changes were made to the default parameters on `HashMap`.
* The `RandomSipHasher` default type parameter was renamed to `RandomState`.
This renaming emphasizes that it is not a hasher, but rather just state to
generate hashers. It also moves away from the name "sip" as it may not always
be implemented as `SipHasher`. This type lives in the
`std::collections::hash_map` module as `#[unstable]`
* The associated `Hasher` type of `RandomState` is creatively called...
`Hasher`! This concrete structure lives next to `RandomState` as an
implemenation of the "default hashing algorithm" used for a `HashMap`. Under
the hood this is currently implemented as `SipHasher`, but it draws an
explicit interface for now and allows us to modify the implementation over
time if necessary.
There are many breaking changes outlined above, and as a result this commit is
a:
[breaking-change]
11fa2ec to
511f0b8
Compare
This commit aims to prepare the `std::hash` module for alpha by formalizing its
current interface whileholding off on adding `#[stable]` to the new APIs. The
current usage with the `HashMap` and `HashSet` types is also reconciled by
separating out composable parts of the design. The primary goal of this slight
redesign is to separate the concepts of a hasher's state from a hashing
algorithm itself.
The primary change of this commit is to separate the `Hasher` trait into a
`Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was
actually just a factory for various states, but hashing had very little control
over how these states were used. Additionally the old `Hasher` trait was
actually fairly unrelated to hashing.
This commit redesigns the existing `Hasher` trait to match what the notion of a
`Hasher` normally implies with the following definition:
trait Hasher {
type Output;
fn reset(&mut self);
fn finish(&self) -> Output;
}
This `Hasher` trait emphasizes that hashing algorithms may produce outputs other
than a `u64`, so the output type is made generic. Other than that, however, very
little is assumed about a particular hasher. It is left up to implementors to
provide specific methods or trait implementations to feed data into a hasher.
The corresponding `Hash` trait becomes:
trait Hash<H: Hasher> {
fn hash(&self, &mut H);
}
The old default of `SipState` was removed from this trait as it's not something
that we're willing to stabilize until the end of time, but the type parameter is
always required to implement `Hasher`. Note that the type parameter `H` remains
on the trait to enable multidispatch for specialization of hashing for
particular hashers.
Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is
simply used as part `derive` and the implementations for all primitive types.
With these definitions, the old `Hasher` trait is realized as a new `HashState`
trait in the `collections::hash_state` module as an unstable addition for
now. The current definition looks like:
trait HashState {
type Hasher: Hasher;
fn hasher(&self) -> Hasher;
}
The purpose of this trait is to emphasize that the one piece of functionality
for implementors is that new instances of `Hasher` can be created. This
conceptually represents the two keys from which more instances of a
`SipHasher` can be created, and a `HashState` is what's stored in a
`HashMap`, not a `Hasher`.
Implementors of custom hash algorithms should implement the `Hasher` trait, and
only hash algorithms intended for use in hash maps need to implement or worry
about the `HashState` trait.
The entire module and `HashState` infrastructure remains `#[unstable]` due to it
being recently redesigned, but some other stability decision made for the
`std::hash` module are:
* The `Writer` trait remains `#[experimental]` as it's intended to be replaced
with an `io::Writer` (more details soon).
* The top-level `hash` function is `#[unstable]` as it is intended to be generic
over the hashing algorithm instead of hardwired to `SipHasher`
* The inner `sip` module is now private as its one export, `SipHasher` is
reexported in the `hash` module.
And finally, a few changes were made to the default parameters on `HashMap`.
* The `RandomSipHasher` default type parameter was renamed to `RandomState`.
This renaming emphasizes that it is not a hasher, but rather just state to
generate hashers. It also moves away from the name "sip" as it may not always
be implemented as `SipHasher`. This type lives in the
`std::collections::hash_map` module as `#[unstable]`
* The associated `Hasher` type of `RandomState` is creatively called...
`Hasher`! This concrete structure lives next to `RandomState` as an
implemenation of the "default hashing algorithm" used for a `HashMap`. Under
the hood this is currently implemented as `SipHasher`, but it draws an
explicit interface for now and allows us to modify the implementation over
time if necessary.
There are many breaking changes outlined above, and as a result this commit is
a:
[breaking-change]
This commit aims to prepare the `std::hash` module for alpha by formalizing its
current interface whileholding off on adding `#[stable]` to the new APIs. The
current usage with the `HashMap` and `HashSet` types is also reconciled by
separating out composable parts of the design. The primary goal of this slight
redesign is to separate the concepts of a hasher's state from a hashing
algorithm itself.
The primary change of this commit is to separate the `Hasher` trait into a
`Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was
actually just a factory for various states, but hashing had very little control
over how these states were used. Additionally the old `Hasher` trait was
actually fairly unrelated to hashing.
This commit redesigns the existing `Hasher` trait to match what the notion of a
`Hasher` normally implies with the following definition:
trait Hasher {
type Output;
fn reset(&mut self);
fn finish(&self) -> Output;
}
This `Hasher` trait emphasizes that hashing algorithms may produce outputs other
than a `u64`, so the output type is made generic. Other than that, however, very
little is assumed about a particular hasher. It is left up to implementors to
provide specific methods or trait implementations to feed data into a hasher.
The corresponding `Hash` trait becomes:
trait Hash<H: Hasher> {
fn hash(&self, &mut H);
}
The old default of `SipState` was removed from this trait as it's not something
that we're willing to stabilize until the end of time, but the type parameter is
always required to implement `Hasher`. Note that the type parameter `H` remains
on the trait to enable multidispatch for specialization of hashing for
particular hashers.
Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is
simply used as part `derive` and the implementations for all primitive types.
With these definitions, the old `Hasher` trait is realized as a new `HashState`
trait in the `collections::hash_state` module as an unstable addition for
now. The current definition looks like:
trait HashState {
type Hasher: Hasher;
fn hasher(&self) -> Hasher;
}
The purpose of this trait is to emphasize that the one piece of functionality
for implementors is that new instances of `Hasher` can be created. This
conceptually represents the two keys from which more instances of a
`SipHasher` can be created, and a `HashState` is what's stored in a
`HashMap`, not a `Hasher`.
Implementors of custom hash algorithms should implement the `Hasher` trait, and
only hash algorithms intended for use in hash maps need to implement or worry
about the `HashState` trait.
The entire module and `HashState` infrastructure remains `#[unstable]` due to it
being recently redesigned, but some other stability decision made for the
`std::hash` module are:
* The `Writer` trait remains `#[experimental]` as it's intended to be replaced
with an `io::Writer` (more details soon).
* The top-level `hash` function is `#[unstable]` as it is intended to be generic
over the hashing algorithm instead of hardwired to `SipHasher`
* The inner `sip` module is now private as its one export, `SipHasher` is
reexported in the `hash` module.
And finally, a few changes were made to the default parameters on `HashMap`.
* The `RandomSipHasher` default type parameter was renamed to `RandomState`.
This renaming emphasizes that it is not a hasher, but rather just state to
generate hashers. It also moves away from the name "sip" as it may not always
be implemented as `SipHasher`. This type lives in the
`std::collections::hash_map` module as `#[unstable]`
* The associated `Hasher` type of `RandomState` is creatively called...
`Hasher`! This concrete structure lives next to `RandomState` as an
implemenation of the "default hashing algorithm" used for a `HashMap`. Under
the hood this is currently implemented as `SipHasher`, but it draws an
explicit interface for now and allows us to modify the implementation over
time if necessary.
There are many breaking changes outlined above, and as a result this commit is
a:
[breaking-change]
fix: Infinite loop while elaborting predicates
This commit aims to prepare the
std::hashmodule for alpha by formalizing itscurrent interface whileholding off on adding
#[stable]to the new APIs. Thecurrent usage with the
HashMapandHashSettypes is also reconciled byseparating out composable parts of the design. The primary goal of this slight
redesign is to separate the concepts of a hasher's state from a hashing
algorithm itself.
The primary change of this commit is to separate the
Hashertrait into aHasherand aHashStatetrait. Conceptually the oldHashertrait wasactually just a factory for various states, but hashing had very little control
over how these states were used. Additionally the old
Hashertrait wasactually fairly unrelated to hashing.
This commit redesigns the existing
Hashertrait to match what the notion of aHashernormally implies with the following definition:This
Hashertrait emphasizes that hashing algorithms may produce outputs otherthan a
u64, so the output type is made generic. Other than that, however, verylittle is assumed about a particular hasher. It is left up to implementors to
provide specific methods or trait implementations to feed data into a hasher.
The corresponding
Hashtrait becomes:The old default of
SipStatewas removed from this trait as it's not somethingthat we're willing to stabilize until the end of time, but the type parameter is
always required to implement
Hasher. Note that the type parameterHremainson the trait to enable multidispatch for specialization of hashing for
particular hashers.
Note that
Writeris not mentioned in either ofHashorHasher, it issimply used as part
deriveand the implementations for all primitive types.With these definitions, the old
Hashertrait is realized as a newHashStatetrait in the
collections::hash_statemodule as an unstable addition fornow. The current definition looks like:
The purpose of this trait is to emphasize that the one piece of functionality
for implementors is that new instances of
Hashercan be created. Thisconceptually represents the two keys from which more instances of a
SipHashercan be created, and aHashStateis what's stored in aHashMap, not aHasher.Implementors of custom hash algorithms should implement the
Hashertrait, andonly hash algorithms intended for use in hash maps need to implement or worry
about the
HashStatetrait.The entire module and
HashStateinfrastructure remains#[unstable]due to itbeing recently redesigned, but some other stability decision made for the
std::hashmodule are:Writertrait remains#[experimental]as it's intended to be replacedwith an
io::Writer(more details soon).hashfunction is#[unstable]as it is intended to be genericover the hashing algorithm instead of hardwired to
SipHashersipmodule is now private as its one export,SipHasherisreexported in the
hashmodule.And finally, a few changes were made to the default parameters on
HashMap.RandomSipHasherdefault type parameter was renamed toRandomState.This renaming emphasizes that it is not a hasher, but rather just state to
generate hashers. It also moves away from the name "sip" as it may not always
be implemented as
SipHasher. This type lives in thestd::collections::hash_mapmodule as#[unstable]Hashertype ofRandomStateis creatively called...Hasher! This concrete structure lives next toRandomStateas animplemenation of the "default hashing algorithm" used for a
HashMap. Underthe hood this is currently implemented as
SipHasher, but it draws anexplicit interface for now and allows us to modify the implementation over
time if necessary.
There are many breaking changes outlined above, and as a result this commit is
a:
[breaking-change]