Skip to content

Conversation

@grendello
Copy link
Contributor

@grendello grendello commented Apr 10, 2020

Modify the implementation to use splicing in order to process data in a
smaller number of iterations, at the expense of a bit more memory for
the extra tables. A quick test of calculating checksum for a 1GB file
shows a ~5.6x improvement:

Allocating 1043332542 bytes to hold file data
Reading file 'android-ndk-r21-linux-x86_64.zip' into memory
Calculating checksum using the old method... Hash: 197913BA13B9A1CC; Time: 00:00:04.0649818
Calculating checksum using the new method... Hash: 8203D033719B6B96; Time: 00:00:00.7251845

The calculated hash is different (although still stable and correct)
because the new code uses tables generated using a slightly different
set of parameters. Instead of using a single lookup table, the new
code takes the slice-by-8 approach (explained e.g. here https://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
which processes 8 bytes at a time, using 8 lookup tables instead of one.

Copy link
Member

@jonathanpeppers jonathanpeppers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make this generate the same hashes as before?

We have test failures like:

String lengths are both 16. Strings differ at index 0.

Expected: "348bbd9fecf1b865"

But was: "b9c1bdfc7cd47543"

@jonpryor
Copy link
Contributor

Is there a way to make this generate the same hashes as before?

Question is, do we need to?

This is kinda/sorta like the MD5 -> CRC64 migration previously: most scenarios won't care. Some scenarios will, e.g. if you have a unit test which depends upon the generated name, a'la adb shell am start …/crc64....ActivityName. As such, hash changes can be problematic, even if rare.

@grendello
Copy link
Contributor Author

Is there a way to make this generate the same hashes as before?

Question is, do we need to?

This is kinda/sorta like the MD5 -> CRC64 migration previously: most scenarios won't care. Some scenarios will, e.g. if you have a unit test which depends upon the generated name, a'la adb shell am start …/crc64....ActivityName. As such, hash changes can be problematic, even if rare.

Weighing the performance gain vs compatibility loss, I'd say it's well worth it...

@jonathanpeppers
Copy link
Member

Weighing the performance gain vs compatibility loss, I'd say it's well worth it...

How much does this improve the build by? We do have people that hardcode these hashes in their apps... 👀

@grendello
Copy link
Contributor Author

grendello commented Apr 10, 2020

Weighing the performance gain vs compatibility loss, I'd say it's well worth it...

How much does this improve the build by? We do have people that hardcode these hashes in their apps... eyes

For a small XF app (with a single page and a single control), Debug configuration:

First build:
Time Elapsed 00:00:36.74 (master)
Time Elapsed 00:00:34.43 (crc64)

Incremental build (.cs file touched, MainActivity):
Time Elapsed 00:00:08.68 (master)
Time Elapsed 00:00:08.51 (crc64)

And for GenerateJavaStubs (which outputs 9.5MB of data for this app):

First build:
      606 ms  GenerateJavaStubs                          1 calls (master)
      570 ms  GenerateJavaStubs                          1 calls (crc64)

Incremental build:
      702 ms  GenerateJavaStubs                          1 calls (master)
      617 ms  GenerateJavaStubs                          1 calls

@brendanzagaeski
Copy link
Contributor

brendanzagaeski commented Apr 10, 2020

Draft release notes

While on the topic of making a breaking change to get a performance gain, the draft release notes for this PR can be placed in a comment on this PR so that whoever bumps the Java.Interop version in Xamarin.Android can easily collect the notes from this PR to include in that bump. Preparing a release note for this PR that is as complete as possible will help document the team's understanding of the user experience impact of this change.

To include in the release note:

  • It would probably be good to include most of the text from the similar change that was published in Xamarin.Android 10.1 (d16-4) (pasted below for reference), adjusting the mentions of "md5" as needed.
  • Include why this breakage is being made, providing at least one example of the build performance gain and perhaps also mentioning that few projects reported issues for the previous MD5 to CRC-64 transition.

Implementation considerations:

  • As with the MD5 to CRC-64 change, an MSBuild option that allows reverting to the previous implementation should perhaps be provided for at least one release.
  • That option should likely produce a build warning indicating that it will be removed in a future release.
  • The warning should be localizable and have a corresponding entry in Documentation/guides/messages in xamarin-android.
  • Because this change is not as tightly constrained on release timing as the MD5 change was, the user experience could potentially be made even better by keeping the old implementation as the default for one release so that users are forced to explicitly change to the new implementation to solve the warning. This would provide an additional mechanism to alert that a breaking change is being made, which would in theory help prevent surprises where apps build successfully but crash on device.

Xamarin.Android 10.1 notes about the MD5 to CRC-64 transition:

### Breaking change for generated Java type names

> [!IMPORTANT]
> Any project that explicitly uses one of the old default Java type names that
> starts with *md5* will break in Xamarin.Android 10.1.
>
> To restore the previous behavior temporarily, you can set the
> `$(AndroidPackageNamingPolicy)` MSBuild property to `LowercaseMD5` in your
> *.csproj* file:
>
> ```xml
> <PropertyGroup>
>     <AndroidPackageNamingPolicy>LowercaseMD5</AndroidPackageNamingPolicy>
> </PropertyGroup>
> ```
>
> This backward compatibility option ***will be removed*** in the upcoming
> Xamarin.Android 10.2 version, so authors of projects that currently include
> literal uses of these default names are encouraged to transition to
> alternatives like the `[Register]` attribute and the `[Activity]` attribute as
> soon as possible.

#### Example breakage for a custom view

For a custom view like:

```csharp
namespace AndroidApp1
{
    public class View1 : View
    {
        /* ... */
    }
}
```

One scenario that will break is if the literal Java type name
`md559d7fc296fc238aa7bec92ba27f2cb33.View1` is used in a layout file:

```xml
<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:orientation="vertical"
    android:layout_width="match_parent"
    android:layout_height="match_parent">
    <md559d7fc296fc238aa7bec92ba27f2cb33.View1
        android:layout_width="match_parent"
        android:layout_height="match_parent" />
</LinearLayout>
```

This will result in an error when the app tries to inflate the layout:

```
Android.Views.InflateException: Binary XML file line #1 in com.contoso.androidapp1:layout/layout1: Binary XML file line #1 in com.contoso.androidapp1:layout/layout1: Error inflating class md559d7fc296fc238aa7bec92ba27f2cb33.View1 ---> Android.Views.InflateException: Binary XML file line #1 in com.contoso.androidapp1:layout/layout1: Error inflating class md559d7fc296fc238aa7bec92ba27f2cb33.View1 ---> Java.Lang.ClassNotFoundException: md559d7fc296fc238aa7bec92ba27f2cb33.View1 ---> Java.Lang.ClassNotFoundException: Didn't find class "md559d7fc296fc238aa7bec92ba27f2cb33.View1" on path: DexPathList
```

The typical fix is to use the type name of C# class in the layout file instead:

```diff
 <?xml version="1.0" encoding="utf-8"?>
 <LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
     android:orientation="vertical"
     android:layout_width="match_parent"
     android:layout_height="match_parent">
-    <md559d7fc296fc238aa7bec92ba27f2cb33.View1
+    <AndroidApp1.View1
         android:layout_width="match_parent"
         android:layout_height="match_parent" />
 </LinearLayout>
```

Another option is to use a `[Register]` attribute to customize the generated
Java type name:

```diff
 namespace AndroidApp1
 {
+    [Register("com.contoso.androidapp1.view1")]
     public class View1 : View
     {
```

And then use that customized name in the layout file:

```diff
 <?xml version="1.0" encoding="utf-8"?>
 <LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
     android:orientation="vertical"
     android:layout_width="match_parent"
     android:layout_height="match_parent">
-    <md559d7fc296fc238aa7bec92ba27f2cb33.View1
+    <com.contoso.androidapp1.view1
         android:layout_width="match_parent"
         android:layout_height="match_parent" />
 </LinearLayout>

#### Example breakage for an activity

For an activity like:

```csharp
namespace AndroidApp1
{
    public class MainActivity : Activity
    {
        /* ... */
    }
}
```

One scenario that will break is if the literal Java type name
`md559d7fc296fc238aa7bec92ba27f2cb33.MainActivity` is used in the
*AndroidManifest.xml* file:

```xml
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
          android:versionCode="1"
          android:versionName="1.0"
          package="com.contoso.androidapp1">
  <uses-sdk android:minSdkVersion="21" android:targetSdkVersion="28" />
  <application android:allowBackup="true" android:icon="@mipmap/ic_launcher" android:label="@string/app_name" android:roundIcon="@mipmap/ic_launcher_round" android:supportsRtl="true" android:theme="@style/AppTheme">
    <activity android:label="@string/app_name" android:name="md559d7fc296fc238aa7bec92ba27f2cb33.MainActivity">
      <intent-filter>
        <action android:name="android.intent.action.MAIN" />
        <category android:name="android.intent.category.LAUNCHER" />
      </intent-filter>
    </activity>
  </application>
</manifest>
```

This will result in an error when the app tries to start the activity:

```
java.lang.RuntimeException: Unable to instantiate activity ComponentInfo{com.contoso.androidapp1/md559d7fc296fc238aa7bec92ba27f2cb33.MainActivity}: java.lang.ClassNotFoundException: Didn't find class "md559d7fc296fc238aa7bec92ba27f2cb33.MainActivity" on path: DexPathList
```

The typical fix is to add an `[Activity]` attribute on the C# class to generate
the `<activity>` manifest element automatically:

```diff
 namespace AndroidApp1
 {
+    [Activity(Label = "@string/app_name", MainLauncher = true)]
     public class MainActivity : Activity
     {
```

And then remove the handwritten `<activity>` element from *AndroidManifest.xml*:

```diff
  <application android:allowBackup="true" android:icon="@mipmap/ic_launcher" android:label="@string/app_name" android:roundIcon="@mipmap/ic_launcher_round" android:supportsRtl="true" android:theme="@style/AppTheme">
-   <activity android:label="@string/app_name" android:name="md559d7fc296fc238aa7bec92ba27f2cb33.MainActivity">
-     <intent-filter>
-       <action android:name="android.intent.action.MAIN" />
-       <category android:name="android.intent.category.LAUNCHER" />
-     </intent-filter>
-   </activity>
 </application>
```

#### Background information

This change comes from [GitHub PR 3649][github-pr-3649], which switched the
checksum algorithm used to create names for generated Java types from MD5 to
CRC-64 to improve compatibility with build environments that have FIPS
compliance enforced.

The use of MD5 checksums for these names was originally introduced in
Xamarin.Android 5 to reduce the chance of conflicts among the names.  See the
[release notes from that version][xa-51-package-names] and the [related
documentation][acw-doc] for additional examples of how to resolve issues related
to this change.

[github-pr-3649]: https://github.com/xamarin/xamarin-android/pull/3649
[xa-51-package-names]: https://github.com/xamarin/release-notes-archive/blob/master/release-notes/android/xamarin.android_5/xamarin.android_5.1/index.md#Android_Callable_Wrapper_Naming
[acw-doc]: https://docs.microsoft.com/xamarin/android/platform/java-integration/android-callable-wrappers

Copy link
Member

@jonathanpeppers jonathanpeppers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks good to me

@jonpryor
Copy link
Contributor

@grendello, @brendanzagaeski: Would it make sense here to "change" the prefix so that changes are "more obvious," e.g. instead of crc64… we do c64r2…?

@brendanzagaeski
Copy link
Contributor

Yeah, I like the sound of changing the prefix to make the change more obvious. Good thought.

@grendello grendello force-pushed the faster-crc64 branch 3 times, most recently from 7e2dc40 to 8d160f0 Compare April 14, 2020 16:44
Modify the implementation to use splicing in order to process data in a
smaller number of iterations, at the expense of a bit more memory for
the extra tables.  A quick test of calculating checksum for a 1GB file
shows a ~5.5x improvement:

    Allocating 1043332542 bytes to hold file data
    Reading file 'android-ndk-r21-linux-x86_64.zip' into memory
    Calculating checksum using the old method... Hash: 197913BA13B9A1CC; Time: 00:00:04.0649818
    Calculating checksum using the new method... Hash: 8203D033719B6B96; Time: 00:00:00.7251845

The calculated hash is different (although still stable and correct)
because the new code uses tables generated using a slightly different
set of parameters.  Instead of using a single lookup table, the new
code takes the slice-by-8 approach (explained e.g. here https://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
which processes 8 bytes at a time, using 8 lookup tables instead of one.
@jonpryor
Copy link
Contributor

Draft commit message:

Modify the `Crc64` implementation to use splicing in order to process
data in a smaller number of iterations, at the expense of a bit more
memory for the extra tables.  A quick test of calculating checksum for
a 1GB file shows a ~5.5x improvement:

	Allocating 1043332542 bytes to hold file data
	Reading file 'android-ndk-r21-linux-x86_64.zip' into memory
	Calculating checksum using the old method... Hash: 197913BA13B9A1CC; Time: 00:00:04.0649818
	Calculating checksum using the new method... Hash: 8203D033719B6B96; Time: 00:00:00.7251845

The calculated hash is different (although still stable and correct)
because the new code uses tables generated using a slightly different
set of parameters.  Instead of using a single lookup table, the new
code takes the [slice-by-8 approach][0] which processes 8 bytes at a
time, using 8 lookup tables instead of one.

[Pycrc][1] was used to generate the `crc_update()` and `crc_reflect()`
methods, which were then ported to C# in [`gen-crc-tables.cs`][2], via:

	pycrc.py --model crc-64-jones --std=C99 --algorithm bbf --generate {h,c} --reflect-out=true

Because the calculated hash now differs, we've changed the output
prefix from `crc64` to `c64r2`, to make the change slightly more
prominent and obvious.

Preliminary integration into xamarin-android shows build improvements
for a small Xamarin.Forms app, Debug configuration:

  * First build:

        Time Elapsed 00:00:36.74 (master)
        Time Elapsed 00:00:34.43 (crc64)

  * Incremental build (`.cs` file touched):

        Time Elapsed 00:00:08.68 (master)
        Time Elapsed 00:00:08.51 (crc64)

And for `<GenerateJavaStubs/>` task, which outputs 9.5MB of data:

  * First build:

        606 ms  GenerateJavaStubs                          1 calls (master)
        570 ms  GenerateJavaStubs                          1 calls (crc64)

  * Incremental build:

        702 ms  GenerateJavaStubs                          1 calls (master)
        617 ms  GenerateJavaStubs                          1 calls (crc64)

[0]: https://create.stephan-brumme.com/crc32/#slicing-by-8-overview
[1]: https://pycrc.org/index.html
[2]: https://gist.github.com/grendello/fdb968c795c310b389c611cb8c2fd7d5#file-gen-crc-tables-cs

	const ulong POLY = 0xad93d23594c935a9;
	static ulong[,] table = new ulong[8,256];

	static ulong crc_reflect (ulong data, ulong data_len)
	{
		uint i;
		ulong ret;

		ret = data & 0x01;
		for (i = 1; i < data_len; i++) {
			data >>= 1;
			ret = (ret << 1) | (data & 0x01);
		}
		return ret;
	}

	static ulong crc_update (ulong crc, byte[] data)
	{
		int idx = 0;
		uint i;
		bool bit;
		byte c;
		int data_len = data.Length;

		while (data_len-- > 0) {
			c = data[idx];
			for (i = 0x01; (i & 0xff) != 0; i <<= 1) {
				bit = (crc & 0x8000000000000000) != 0;
				if ((c & i) != 0) {
					bit = !bit;
				}
				crc <<= 1;
				if (bit) {
					crc ^= POLY;
				}
			}
			crc &= 0xffffffffffffffff;
		}
		crc &= 0xffffffffffffffff;

		return crc_reflect (crc, 64) ^ 0x0000000000000000;
	}

	static void GenerateTable ()
	{
		ulong crc;
		byte[] b = { 0 };

		for (int n = 0; n < 256; n++) {
			b[0] = (byte)n;
			table[0,n] = crc_update (0, b);
		}

		for (ulong n = 0; n < 256; n++) {
			crc = table[0,n];

			for (ulong k = 1; k < 8; k++) {
				crc = table[0,crc & 0xff] ^ (crc >> 8);
				table[k,n] = crc;
			}
		}

		Console.WriteLine ("static readonly ulong[,] Table = {");
		for (int i = 0; i < 8; i++) {
			Console.WriteLine ("\t{");
			for (int j = 0; j < 256; j++) {
				if (j > 0) {
					Console.Write (", ");
				} else {
					Console.Write ("\t\t");
				}

				if (j % 4 == 0) {
					Console.WriteLine ();
					Console.Write ("\t\t");
				}
				Console.Write ("0x");
				Console.Write (table[i,j].ToString ("x16"));
			}
			Console.WriteLine ();
			Console.WriteLine ("\t},");
		}
		Console.WriteLine ("};");
	}

@jonpryor jonpryor merged commit 9b88ce7 into dotnet:master Apr 14, 2020
grendello added a commit to grendello/xamarin-android that referenced this pull request Apr 14, 2020
@grendello grendello deleted the faster-crc64 branch April 14, 2020 20:14
@brendanzagaeski
Copy link
Contributor

Draft release notes update

Thanks to the follow-up PRs that restored the original checksum results while also keeping this speedup, the release note I'll plan to include for this is now much simpler:

### Build and deployment performance

- [Java.Interop GitHub PR 624](https://github.com/xamarin/java.interop/pull/624),
  [Java.Interop GitHub PR 627](https://github.com/xamarin/java.interop/pull/627),
  [Java.Interop GitHub PR 628](https://github.com/xamarin/java.interop/pull/628):
  Update the CRC-64 algorithm used during builds to take advantage of a more
  efficient calculation technique.  This reduced the total incremental build
  time from about 8.7 second to about 8.5 seconds for a small Xamarin.Forms app
  when one line of a C# file was changed between builds.

To suggest any edits to this, feel free to reply in this PR.

jonpryor pushed a commit that referenced this pull request Apr 22, 2020
Modify the `Crc64` implementation to use splicing in order to process
data in a smaller number of iterations, at the expense of a bit more
memory for the extra tables.  A quick test of calculating checksum for
a 1GB file shows a ~5.5x improvement:

	Allocating 1043332542 bytes to hold file data
	Reading file 'android-ndk-r21-linux-x86_64.zip' into memory
	Calculating checksum using the old method... Hash: 197913BA13B9A1CC; Time: 00:00:04.0649818
	Calculating checksum using the new method... Hash: 8203D033719B6B96; Time: 00:00:00.7251845

The calculated hash is different (although still stable and correct)
because the new code uses tables generated using a slightly different
set of parameters.  Instead of using a single lookup table, the new
code takes the [slice-by-8 approach][0] which processes 8 bytes at a
time, using 8 lookup tables instead of one.

[Pycrc][1] was used to generate the `crc_update()` and `crc_reflect()`
methods, which were then ported to C# in [`gen-crc-tables.cs`][2], via:

	pycrc.py --model crc-64-jones --std=C99 --algorithm bbf --generate {h,c} --reflect-out=true

Because the calculated hash now differs, we've changed the output
prefix from `crc64` to `c64r2`, to make the change slightly more
prominent and obvious.

Preliminary integration into xamarin-android shows build improvements
for a small Xamarin.Forms app, Debug configuration:

  * First build:

        Time Elapsed 00:00:36.74 (master)
        Time Elapsed 00:00:34.43 (crc64)

  * Incremental build (`.cs` file touched):

        Time Elapsed 00:00:08.68 (master)
        Time Elapsed 00:00:08.51 (crc64)

And for `<GenerateJavaStubs/>` task, which outputs 9.5MB of data:

  * First build:

        606 ms  GenerateJavaStubs                          1 calls (master)
        570 ms  GenerateJavaStubs                          1 calls (crc64)

  * Incremental build:

        702 ms  GenerateJavaStubs                          1 calls (master)
        617 ms  GenerateJavaStubs                          1 calls (crc64)

[0]: https://create.stephan-brumme.com/crc32/#slicing-by-8-overview
[1]: https://pycrc.org/index.html
[2]: https://gist.github.com/grendello/fdb968c795c310b389c611cb8c2fd7d5#file-gen-crc-tables-cs

	const ulong POLY = 0xad93d23594c935a9;
	static ulong[,] table = new ulong[8,256];

	static ulong crc_reflect (ulong data, ulong data_len)
	{
		uint i;
		ulong ret;

		ret = data & 0x01;
		for (i = 1; i < data_len; i++) {
			data >>= 1;
			ret = (ret << 1) | (data & 0x01);
		}
		return ret;
	}

	static ulong crc_update (ulong crc, byte[] data)
	{
		int idx = 0;
		uint i;
		bool bit;
		byte c;
		int data_len = data.Length;

		while (data_len-- > 0) {
			c = data[idx];
			for (i = 0x01; (i & 0xff) != 0; i <<= 1) {
				bit = (crc & 0x8000000000000000) != 0;
				if ((c & i) != 0) {
					bit = !bit;
				}
				crc <<= 1;
				if (bit) {
					crc ^= POLY;
				}
			}
			crc &= 0xffffffffffffffff;
		}
		crc &= 0xffffffffffffffff;

		return crc_reflect (crc, 64) ^ 0x0000000000000000;
	}

	static void GenerateTable ()
	{
		ulong crc;
		byte[] b = { 0 };

		for (int n = 0; n < 256; n++) {
			b[0] = (byte)n;
			table[0,n] = crc_update (0, b);
		}

		for (ulong n = 0; n < 256; n++) {
			crc = table[0,n];

			for (ulong k = 1; k < 8; k++) {
				crc = table[0,crc & 0xff] ^ (crc >> 8);
				table[k,n] = crc;
			}
		}

		Console.WriteLine ("static readonly ulong[,] Table = {");
		for (int i = 0; i < 8; i++) {
			Console.WriteLine ("\t{");
			for (int j = 0; j < 256; j++) {
				if (j > 0) {
					Console.Write (", ");
				} else {
					Console.Write ("\t\t");
				}

				if (j % 4 == 0) {
					Console.WriteLine ();
					Console.Write ("\t\t");
				}
				Console.Write ("0x");
				Console.Write (table[i,j].ToString ("x16"));
			}
			Console.WriteLine ();
			Console.WriteLine ("\t},");
		}
		Console.WriteLine ("};");
	}
@jpobst jpobst added this to the 10.4 (16.7 / 8.7) milestone Apr 23, 2020
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants