Skip to content

Conversation

PeterPtroc
Copy link

Description of PR

Add a RISC-V-specific compilation unit: org/apache/hadoop/util/bulk_crc32_riscv.c.

  • Contains a no-op constructor reserved for future HW capability detection and dispatch.
  • Keeps runtime behavior unchanged (falls back to the generic software path in bulk_crc32.c).
  • Wire CMake to select bulk_crc32_riscv.c on riscv32/riscv64, mirroring other platforms.

This PR establishes the foundational build infrastructure for future RISC-V Zbc (CLMUL) CRC32/CRC32C acceleration without changing current behavior. Follow-ups (HADOOP-19655) will introduce HW-accelerated implementations and runtime dispatch.

How was this patch tested?

  • Ensured native build for hadoop-common compiles cleanly with RISC-V selection.
  • Verified by test_bulk_crc32.
  • No new tests added, as this patch is scaffolding-only without any behavior change.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 22m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 17s trunk passed
+1 💚 compile 14m 21s trunk passed
-1 ❌ mvnsite 1m 56s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 shadedclient 94m 44s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 11s the patch passed
+1 💚 compile 13m 13s the patch passed
+1 💚 cc 13m 13s the patch passed
+1 💚 golang 13m 13s the patch passed
+1 💚 javac 13m 13s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-1 ❌ mvnsite 1m 54s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 shadedclient 38m 33s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 23m 54s hadoop-common in the patch passed.
+1 💚 asflicense 1m 57s The patch does not generate ASF License warnings.
198m 14s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7903/1/artifact/out/Dockerfile
GITHUB PR #7903
Optional Tests dupname asflicense compile cc mvnsite javac unit codespell detsecrets golang
uname Linux af19edaaec5b 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1b159b6
Default Java Red Hat, Inc.-1.8.0_312-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7903/1/testReport/
Max. process+thread count 1279 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7903/1/console
versions git=2.27.0 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 2s trunk passed
+1 💚 compile 14m 4s trunk passed
-1 ❌ mvnsite 1m 55s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 shadedclient 94m 56s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 12s the patch passed
+1 💚 compile 13m 25s the patch passed
+1 💚 cc 13m 25s the patch passed
+1 💚 golang 13m 25s the patch passed
+1 💚 javac 13m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-1 ❌ mvnsite 1m 57s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 shadedclient 38m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 23m 39s hadoop-common in the patch passed.
+1 💚 asflicense 1m 53s The patch does not generate ASF License warnings.
176m 33s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7903/2/artifact/out/Dockerfile
GITHUB PR #7903
Optional Tests dupname asflicense compile cc mvnsite javac unit codespell detsecrets golang
uname Linux c0759ac9ab3e 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3d607d4
Default Java Red Hat, Inc.-1.8.0_312-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7903/2/testReport/
Max. process+thread count 2145 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7903/2/console
versions git=2.27.0 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@PeterPtroc
Copy link
Author

Due to some CI infrastructure issues, I will paste the result of validating this patch on a RISC-V machine. Below are the command and the results.

Command:

mvn -Pnative \
  -Dtest=org.apache.hadoop.util.TestNativeCrc32 \
  -Djava.library.path="$HADOOP_COMMON_LIB_NATIVE_DIR" \
  test

Results

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.util.TestNativeCrc32
[INFO] Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.72 s -- in org.apache.hadoop.util.TestNativeCrc32
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 22, Failures: 0, Errors: 0, Skipped: 0

@PeterPtroc PeterPtroc marked this pull request as ready for review August 27, 2025 16:33
@PeterPtroc
Copy link
Author

Hi @pan3793 @slfan1989 , could you please take a look when you have a moment? This PR adds RISC-V CRC32 scaffolding and keeps behavior unchanged. Happy to address any feedback. Thanks!

@pan3793
Copy link
Member

pan3793 commented Aug 28, 2025

@PeterPtroc I suppose most developers here do not have RISC-V env, is it possible to have a docs about how to verify it by leveraging QEMU or some common tools?

@PeterPtroc
Copy link
Author

PeterPtroc commented Aug 29, 2025

@pan3793 Thanks for the suggestion! Below is a concise doc to verify the correctness of the crc32riscv implementation:

I mainly verify on RISC‑V by using QEMU together with the openEuler RISC‑V image.

Download the image

For me, from the above link, download these four files: RISCV_VIRT_CODE.fd, RISCV_VIRT_VARS.fd, openEuler-25.03-riscv64.qcow2.xz, and start_vm.sh; then log in as root with the password: openEuler12#$.

Install required packages

yum install -y gcc gcc-c++ gcc-gfortran libgcc cmake
yum install -y wget openssl openssl-devel zlib zlib-devel automake libtool make libstdc++-static glibc-static git snappy snappy-devel fuse fuse-devel doxygen clang cyrus-sasl cyrus-sasl-devel libtirpc libtirpc-devel
yum install -y java-17-openjdk.riscv64 java-17-openjdk-devel.riscv64 java-17-openjdk-headless.riscv64

Install Protobuf 2.5.0 (with RISC‑V patches)

mkdir protobuf && cd protobuf

# Fetch sources
git clone https://gitee.com/src-openeuler/protobuf2.git
cd protobuf2
tar -xjf protobuf-2.5.0.tar.bz2
cp *.patch protobuf-2.5.0 && cd protobuf-2.5.0

# Apply patches (adds riscv64 support and build fixes)
patch -p1 < 0001-Add-generic-GCC-support-for-atomic-operations.patch
patch -p1 < protobuf-2.5.0-gtest.patch
patch -p1 < protobuf-2.5.0-java-fixes.patch
patch -p1 < protobuf-2.5.0-makefile.patch
patch -p1 < add-riscv64-support.patch

# Autotools setup
libtoolize
yum install -y automake
automake-1.17 -a
chmod +x configure

# Configure, build, install
./configure --build=riscv64-unknown-linux --prefix=/usr/local/protobuf-2.5.0
make
make check
make install
ldconfig

# Publish protoc 2.5.0 into local Maven repo (riscv64 classifier)
mvn install:install-file \
  -DgroupId=com.google.protobuf \
  -DartifactId=protoc \
  -Dversion=2.5.0 \
  -Dclassifier=linux-riscv64 \
  -Dpackaging=exe \
  -Dfile=/usr/local/protobuf-2.5.0/bin/protoc

cd ..

Install Protobuf 3.25.5

# Download and unpack
wget -c https://github.com/protocolbuffers/protobuf/releases/download/v25.5/protobuf-25.5.tar.gz
tar -xzf protobuf-25.5.tar.gz
cd protobuf-25.5

# Abseil dependency
git clone https://github.com/abseil/abseil-cpp third_party/abseil-cpp

# Configure and build
cmake ./ \
  -DCMAKE_BUILD_TYPE=RELEASE \
  -Dprotobuf_BUILD_TESTS=off \
  -DCMAKE_CXX_STANDARD=20 \
  -DCMAKE_INSTALL_PREFIX=/usr/local/protobuf-3.25.5

make install -j "$(nproc)"

# Publish protoc 3.25.5 into local Maven repo (riscv64 classifier)
mvn install:install-file \
  -DgroupId=com.google.protobuf \
  -DartifactId=protoc \
  -Dversion=3.25.5 \
  -Dclassifier=linux-riscv64 \
  -Dpackaging=exe \
  -Dfile=/usr/local/protobuf-3.25.5/bin/protoc

# Make protoc available on PATH and verify
sudo ln -sfn /usr/local/protobuf-3.25.5/bin/protoc /usr/local/bin/protoc
protoc --version

Verify CRC32 using Hadoop native

# Clone Hadoop
git clone https://github.com/apache/hadoop.git
cd hadoop

# Increase Maven memory
export MAVEN_OPTS="-Xmx8g -Xms6g"

# Build Hadoop Common (native enabled)
nohup mvn -pl hadoop-common-project/hadoop-common -am -Pnative -DskipTests clean install > build.log 2>&1 &

# Point to built native library directory
cd hadoop-common-project/hadoop-common
export HADOOP_COMMON_LIB_NATIVE_DIR="$PWD/target/native/target/usr/local/lib"
export LD_LIBRARY_PATH="$HADOOP_COMMON_LIB_NATIVE_DIR:$LD_LIBRARY_PATH"

# Run the CRC32 native test
nohup mvn -Pnative -Dtest=org.apache.hadoop.util.TestNativeCrc32 \
  -Djava.library.path="$HADOOP_COMMON_LIB_NATIVE_DIR" test > test.log 2>&1 &

@PeterPtroc
Copy link
Author

Hi @cnauroth , could you please have a look? This PR adds RISC-V CRC32 scaffolding and keeps behavior unchanged. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants