You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the library there currently exists an implementation that uses the Java "long" primitive type, the LongBuilderFactory.
It uses signed longs and stores 17 decimal digits per long. (Note: this is less efficient than the 32-bit version that uses the "int" primitive type and stores 9 decimal digits per int i.e. 18 digits per long.) The modular multiplication is based on basic multiplication by the inverse of the modulus, without any particular tricks.
Since Java doesn't have unsigned primitives (unlike e.g. C++), using signed longs this way makes a lot of things easier. However they make some thing less efficient as well.
Since Java 8 there has been some kind of support for unsigned long arithmetic. This would make it possible to implement another long builder factory, based on the "unsigned long". (Historically this is how the apfloat C++ version was implemented.) Data would be still stored as longs of course, but the arithmetic would be unsigned arithmetic.
This would have the following advantages:
An unsigned long can store 19 decimal digits
Modular multiplication could be done more efficiently by choosing the moduli as $2^{64}-2^{32}+1$, $2^{64}-2^{34}+1$ and $2^{64}-2^{40}+1$ as the modulus reduction just requires a few shifts, subtracts and additions and no multiplications
The latter could be an important point since the modular multiplication consumes as much as 30% of the execution time in very long transforms.
There would be some disadvantages though:
Maximum power of two in the transform length is only 232 (in the current signed long implementation it's 248)
To fully benefit from it, you need at least Java 18, which has Math.unsignedMultiplyHigh(long, long)
The new numbers wouldn't be serialization compatible with the old long numbers
On modern CPUs a bunch of shifts, adds and subtracts might not actually be any faster than two multiplications
As an alternative approach, the moduli could be chosen so that the transform length is maximized (and ditch the shifting-only based modular multiplication algorithm), this would still have a performance benefit from storing more digits per long, and a longer maximum transform length than with the current LongBuilderFactory. Suitable moduli would be { 41, 36, 25 } * 3 * 257+1
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
In the library there currently exists an implementation that uses the Java "long" primitive type, the LongBuilderFactory.
It uses signed longs and stores 17 decimal digits per long. (Note: this is less efficient than the 32-bit version that uses the "int" primitive type and stores 9 decimal digits per int i.e. 18 digits per long.) The modular multiplication is based on basic multiplication by the inverse of the modulus, without any particular tricks.
Since Java doesn't have unsigned primitives (unlike e.g. C++), using signed longs this way makes a lot of things easier. However they make some thing less efficient as well.
Since Java 8 there has been some kind of support for unsigned long arithmetic. This would make it possible to implement another long builder factory, based on the "unsigned long". (Historically this is how the apfloat C++ version was implemented.) Data would be still stored as longs of course, but the arithmetic would be unsigned arithmetic.
This would have the following advantages:
The latter could be an important point since the modular multiplication consumes as much as 30% of the execution time in very long transforms.
There would be some disadvantages though:
Math.unsignedMultiplyHigh(long, long)
As an alternative approach, the moduli could be chosen so that the transform length is maximized (and ditch the shifting-only based modular multiplication algorithm), this would still have a performance benefit from storing more digits per long, and a longer maximum transform length than with the current LongBuilderFactory. Suitable moduli would be { 41, 36, 25 } * 3 * 257+1
Beta Was this translation helpful? Give feedback.
All reactions