-
Notifications
You must be signed in to change notification settings - Fork 64
[performance] Improve speed of our CRC-64 hasher #624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/Java.Interop.Tools.JavaCallableWrappers/Java.Interop.Tools.JavaCallableWrappers/Crc64.cs
Outdated
Show resolved
Hide resolved
jonathanpeppers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to make this generate the same hashes as before?
We have test failures like:
String lengths are both 16. Strings differ at index 0.
Expected: "348bbd9fecf1b865"
But was: "b9c1bdfc7cd47543"
Question is, do we need to? This is kinda/sorta like the MD5 -> CRC64 migration previously: most scenarios won't care. Some scenarios will, e.g. if you have a unit test which depends upon the generated name, a'la |
Weighing the performance gain vs compatibility loss, I'd say it's well worth it... |
How much does this improve the build by? We do have people that hardcode these hashes in their apps... 👀 |
For a small XF app (with a single page and a single control), Debug configuration: And for |
|
Draft release notes While on the topic of making a breaking change to get a performance gain, the draft release notes for this PR can be placed in a comment on this PR so that whoever bumps the Java.Interop version in Xamarin.Android can easily collect the notes from this PR to include in that bump. Preparing a release note for this PR that is as complete as possible will help document the team's understanding of the user experience impact of this change. To include in the release note:
Implementation considerations:
Xamarin.Android 10.1 notes about the MD5 to CRC-64 transition: |
jonathanpeppers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 looks good to me
|
@grendello, @brendanzagaeski: Would it make sense here to "change" the prefix so that changes are "more obvious," e.g. instead of |
|
Yeah, I like the sound of changing the prefix to make the change more obvious. Good thought. |
7e2dc40 to
8d160f0
Compare
Modify the implementation to use splicing in order to process data in a
smaller number of iterations, at the expense of a bit more memory for
the extra tables. A quick test of calculating checksum for a 1GB file
shows a ~5.5x improvement:
Allocating 1043332542 bytes to hold file data
Reading file 'android-ndk-r21-linux-x86_64.zip' into memory
Calculating checksum using the old method... Hash: 197913BA13B9A1CC; Time: 00:00:04.0649818
Calculating checksum using the new method... Hash: 8203D033719B6B96; Time: 00:00:00.7251845
The calculated hash is different (although still stable and correct)
because the new code uses tables generated using a slightly different
set of parameters. Instead of using a single lookup table, the new
code takes the slice-by-8 approach (explained e.g. here https://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
which processes 8 bytes at a time, using 8 lookup tables instead of one.
|
Draft commit message: |
|
Draft release notes update Thanks to the follow-up PRs that restored the original checksum results while also keeping this speedup, the release note I'll plan to include for this is now much simpler: To suggest any edits to this, feel free to reply in this PR. |
Modify the `Crc64` implementation to use splicing in order to process
data in a smaller number of iterations, at the expense of a bit more
memory for the extra tables. A quick test of calculating checksum for
a 1GB file shows a ~5.5x improvement:
Allocating 1043332542 bytes to hold file data
Reading file 'android-ndk-r21-linux-x86_64.zip' into memory
Calculating checksum using the old method... Hash: 197913BA13B9A1CC; Time: 00:00:04.0649818
Calculating checksum using the new method... Hash: 8203D033719B6B96; Time: 00:00:00.7251845
The calculated hash is different (although still stable and correct)
because the new code uses tables generated using a slightly different
set of parameters. Instead of using a single lookup table, the new
code takes the [slice-by-8 approach][0] which processes 8 bytes at a
time, using 8 lookup tables instead of one.
[Pycrc][1] was used to generate the `crc_update()` and `crc_reflect()`
methods, which were then ported to C# in [`gen-crc-tables.cs`][2], via:
pycrc.py --model crc-64-jones --std=C99 --algorithm bbf --generate {h,c} --reflect-out=true
Because the calculated hash now differs, we've changed the output
prefix from `crc64` to `c64r2`, to make the change slightly more
prominent and obvious.
Preliminary integration into xamarin-android shows build improvements
for a small Xamarin.Forms app, Debug configuration:
* First build:
Time Elapsed 00:00:36.74 (master)
Time Elapsed 00:00:34.43 (crc64)
* Incremental build (`.cs` file touched):
Time Elapsed 00:00:08.68 (master)
Time Elapsed 00:00:08.51 (crc64)
And for `<GenerateJavaStubs/>` task, which outputs 9.5MB of data:
* First build:
606 ms GenerateJavaStubs 1 calls (master)
570 ms GenerateJavaStubs 1 calls (crc64)
* Incremental build:
702 ms GenerateJavaStubs 1 calls (master)
617 ms GenerateJavaStubs 1 calls (crc64)
[0]: https://create.stephan-brumme.com/crc32/#slicing-by-8-overview
[1]: https://pycrc.org/index.html
[2]: https://gist.github.com/grendello/fdb968c795c310b389c611cb8c2fd7d5#file-gen-crc-tables-cs
const ulong POLY = 0xad93d23594c935a9;
static ulong[,] table = new ulong[8,256];
static ulong crc_reflect (ulong data, ulong data_len)
{
uint i;
ulong ret;
ret = data & 0x01;
for (i = 1; i < data_len; i++) {
data >>= 1;
ret = (ret << 1) | (data & 0x01);
}
return ret;
}
static ulong crc_update (ulong crc, byte[] data)
{
int idx = 0;
uint i;
bool bit;
byte c;
int data_len = data.Length;
while (data_len-- > 0) {
c = data[idx];
for (i = 0x01; (i & 0xff) != 0; i <<= 1) {
bit = (crc & 0x8000000000000000) != 0;
if ((c & i) != 0) {
bit = !bit;
}
crc <<= 1;
if (bit) {
crc ^= POLY;
}
}
crc &= 0xffffffffffffffff;
}
crc &= 0xffffffffffffffff;
return crc_reflect (crc, 64) ^ 0x0000000000000000;
}
static void GenerateTable ()
{
ulong crc;
byte[] b = { 0 };
for (int n = 0; n < 256; n++) {
b[0] = (byte)n;
table[0,n] = crc_update (0, b);
}
for (ulong n = 0; n < 256; n++) {
crc = table[0,n];
for (ulong k = 1; k < 8; k++) {
crc = table[0,crc & 0xff] ^ (crc >> 8);
table[k,n] = crc;
}
}
Console.WriteLine ("static readonly ulong[,] Table = {");
for (int i = 0; i < 8; i++) {
Console.WriteLine ("\t{");
for (int j = 0; j < 256; j++) {
if (j > 0) {
Console.Write (", ");
} else {
Console.Write ("\t\t");
}
if (j % 4 == 0) {
Console.WriteLine ();
Console.Write ("\t\t");
}
Console.Write ("0x");
Console.Write (table[i,j].ToString ("x16"));
}
Console.WriteLine ();
Console.WriteLine ("\t},");
}
Console.WriteLine ("};");
}
Modify the implementation to use splicing in order to process data in a
smaller number of iterations, at the expense of a bit more memory for
the extra tables. A quick test of calculating checksum for a 1GB file
shows a ~5.6x improvement:
The calculated hash is different (although still stable and correct)
because the new code uses tables generated using a slightly different
set of parameters. Instead of using a single lookup table, the new
code takes the slice-by-8 approach (explained e.g. here https://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
which processes 8 bytes at a time, using 8 lookup tables instead of one.