Skip to content

Conversation

normanrz
Copy link
Member

@normanrz normanrz commented Aug 29, 2025

Switch out jblosc (which has been last updated 7 years ago!) with blosc-java. I had to do this, because the JNA version that jblosc references does not work with apple silicon. Also, blosc-java bundles the blosc library so no system library is required.


(Please delete unneeded items, merge only when none are left open)

  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)

@normanrz normanrz self-assigned this Aug 29, 2025
Copy link
Contributor

coderabbitai bot commented Aug 29, 2025

📝 Walkthrough

Walkthrough

Removed system libblosc from Dockerfiles, CI, and docs; swapped JVM dependency from jblosc to blosc-java; refactored BloscCompressor to use blosc-java enums and wrapper API; updated tracingstore code to unwrap new enum defaults; added changelog entry.

Changes

Cohort / File(s) Summary of Changes
Docker images
.github/Dockerfile.datastore, .github/Dockerfile.tracingstore, .github/Dockerfile.webknossos
Removed libblosc1 from apt-get install lines; other packages unchanged.
CI workflow
.github/workflows/build_test_deploy.yml
Stopped installing libblosc1 in Install System Dependencies steps; now installs only libdraco-dev where applicable.
Developer docs
DEV_INSTALL.md
Removed Blosc from dependency list and install commands (c-blosc, libblosc1).
Build dependencies
project/Dependencies.scala
Replaced org.lasersonlab:jblosc:1.0.1 with com.scalableminds:blosc-java:0.1-1.21.4; updated comment.
Datastore compressor refactor
webknossos-datastore/.../Compressor.scala
Rewrote BloscCompressor to use blosc-java wrapper and enums (Blosc.Compressor, Blosc.Shuffle), removed direct JBlosc/JNA usage, changed public fields to enum types, delegated compress/decompress to Blosc API, updated validation and defaults.
Tracingstore usage updates
webknossos-tracingstore/.../EditableMappingIOService.scala, .../volume/VolumeDataZipHelper.scala, .../volume/Zarr3BucketStreamSink.scala
Unwrapped new enum defaults via .getValue, consolidated/added imports; no public signature changes.
Changelog
unreleased_changes/8882.md
Added entry: “Use blosc-java instead of jblosc.”

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

backend, dependencies, refactoring

Suggested reviewers

  • fm3
  • frcroth

Poem

I hop through code with twitching nose,
Swapped old crumbs for enumled prose.
Blosc-java sings, no native fuss,
Clean builds, warm carrots—deliver with us. 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0858f84 and 22fd760.

📒 Files selected for processing (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch blosc-java

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@normanrz normanrz requested a review from fm3 August 29, 2025 15:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (3)

36-43: Close ZipFile in detectVolumeDataZipFormat to prevent FD leak

tryo(new ZipFile(...)).map(...) never closes the ZipFile. Wrap it with Using.resource (and keep tryo to preserve Box semantics).

Apply:

-  private def detectVolumeDataZipFormat(zipFile: File): Box[VolumeDataZipFormat] =
-    tryo(new java.util.zip.ZipFile(zipFile)).map { zip =>
-      val relevantFile: Option[ZipEntry] =
-        ZipIO.entries(zip).find(entry => entry.getName.endsWith("zarr.json") || entry.getName.endsWith(".wkw"))
-      if (relevantFile.exists(_.getName.endsWith("zarr.json"))) {
-        VolumeDataZipFormat.zarr3
-      } else VolumeDataZipFormat.wkw
-    }
+  private def detectVolumeDataZipFormat(zipFile: File): Box[VolumeDataZipFormat] =
+    tryo {
+      Using.resource(new java.util.zip.ZipFile(zipFile)) { zip =>
+        val relevantFile =
+          ZipIO.entries(zip).find(e => e.getName.endsWith("zarr.json") || e.getName.endsWith(".wkw"))
+        if (relevantFile.exists(_.getName.endsWith("zarr.json"))) VolumeDataZipFormat.zarr3
+        else VolumeDataZipFormat.wkw
+      }
+    }

71-76: Avoid opening ZipFile twice and close properly

Two separate new ZipFile(zipFile) calls (Lines 72 and 75) both leak. Open once with Using.resource for the header discovery and read.

Apply:

-      firstHeaderFilePath <- ZipIO
-        .entries(new ZipFile(zipFile))
-        .find(entry => entry.getName.endsWith(Zarr3ArrayHeader.FILENAME_ZARR_JSON))
-        .toFox
-      firstHeaderString <- ZipIO.readAt(new ZipFile(zipFile), firstHeaderFilePath).toFox
+      firstHeaderFilePath <- tryo {
+        Using.resource(new ZipFile(zipFile)) { zip =>
+          ZipIO.entries(zip).find(_.getName.endsWith(Zarr3ArrayHeader.FILENAME_ZARR_JSON))
+        }
+      }.toFox
+      firstHeaderString <- tryo {
+        Using.resource(new ZipFile(zipFile)) { zip =>
+          ZipIO.readAt(zip, firstHeaderFilePath)
+        }
+      }.toFox

165-171: Close FileOutputStream in inputStreamToTempfile

out is never closed, causing descriptor leaks. Use Using.resource. Also consider copying to Files.createTempFile via NIO for robustness, but Using is sufficient here.

Apply:

   private def inputStreamToTempfile(inputStream: InputStream): File = {
     val tempFile = File.createTempFile("data", "zip")
     tempFile.deleteOnExit()
-    val out = new FileOutputStream(tempFile)
-    IOUtils.copy(inputStream, out)
+    Using.resource(new FileOutputStream(tempFile)) { out =>
+      IOUtils.copy(inputStream, out)
+    }
     tempFile
   }
🧹 Nitpick comments (8)
.github/Dockerfile.tracingstore (1)

4-5: Slimmer, more reliable installs.

Use --no-install-recommends and install curl explicitly for the HEALTHCHECK.

-RUN apt-get update \
-  && apt-get -y install libbrotli1 libdraco4 \
+RUN apt-get update \
+  && apt-get -y install --no-install-recommends libbrotli1 libdraco4 curl \
   && rm -rf /var/lib/apt/lists/*

Confirm curl is present at runtime (healthcheck uses it). If base already provides curl, this is a no-op.

.github/Dockerfile.datastore (1)

4-5: Use --no-install-recommends and ensure curl for HEALTHCHECK.

Healthcheck uses curl but it’s not installed here; add it and avoid extra recommends.

-RUN apt-get update \
-  && apt-get -y install libbrotli1 libdraco4 \
+RUN apt-get update \
+  && apt-get -y install --no-install-recommends libbrotli1 libdraco4 curl \
   && rm -rf /var/lib/apt/lists/*
.github/Dockerfile.webknossos (1)

4-6: Harden package install; add curl explicitly.

Nodesource script runs apt update; still, install only what’s needed and include curl for HEALTHCHECK.

-RUN curl -sL "https://deb.nodesource.com/setup_${VERSION_NODE}" | bash - \
-  && apt-get -y install libbrotli1 postgresql-client libdraco4 git nodejs \
+RUN curl -sL "https://deb.nodesource.com/setup_${VERSION_NODE}" | bash - \
+  && apt-get -y install --no-install-recommends libbrotli1 postgresql-client libdraco4 git nodejs curl \
   && rm -rf /var/lib/apt/lists/*
project/Dependencies.scala (1)

61-62: Fix comment and verify artifact availability

  • The comment says “import dev.zarr.bloscjava” but the code uses com.scalableminds.bloscjava.Blosc. Update the comment to avoid confusion.
  • Ensure com.scalableminds:blosc-java:0.1-1.21.4 is published to your resolvers for CI/dev (all required native variants included).
-    // Blosc compression. import dev.zarr.bloscjava
+    // Blosc compression. import com.scalableminds.bloscjava
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (2)

235-244: Validate accepted shuffle names

Error message lists “noshuffle, shuffle, bitshuffle”. Confirm Blosc.Shuffle.fromString accepts these exact tokens (and not “byteshuffle”). If it expects other spellings, add aliases or adjust message.


284-289: Make Blosc thread count configurable

compress/decompress(..., 1) forces single-threaded operation. Consider wiring this from config or defaulting to an environment/system setting to preserve performance.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (1)

120-137: Optional: factor out common Blosc config builder

compressorConfiguration and compressor are duplicated across files. Consider a small factory/helper to avoid drift.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (1)

5-8: LGTM: import updates align with blosc-java + zarr3 usage

The added imports are coherent with the new calls (tryo/JsonHelper/WKWFile/Zarr3ArrayHeader/AdditionalCoordinate/BucketPosition/FileOutputStream). Consider adding scala.util.Using as well for safe resource handling in this file (see comments below).

Apply:

+import scala.util.Using

Also applies to: 15-15, 20-20

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e768761 and 0858f84.

📒 Files selected for processing (11)
  • .github/Dockerfile.datastore (1 hunks)
  • .github/Dockerfile.tracingstore (1 hunks)
  • .github/Dockerfile.webknossos (1 hunks)
  • .github/workflows/build_test_deploy.yml (2 hunks)
  • DEV_INSTALL.md (2 hunks)
  • project/Dependencies.scala (1 hunks)
  • unreleased_changes/8882.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (4 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (4)
  • BloscCompressor (185-197)
  • BloscCompressor (199-291)
  • IntCompressionSetting (24-24)
  • StringCompressionSetting (23-23)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (4)
  • BloscCompressor (185-197)
  • BloscCompressor (199-291)
  • StringCompressionSetting (23-23)
  • IntCompressionSetting (24-24)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (3)
  • BloscCodecConfiguration (226-233)
  • BloscCodecConfiguration (235-245)
  • shuffleSettingFromInt (239-244)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (4)
  • BloscCompressor (185-197)
  • BloscCompressor (199-291)
  • StringCompressionSetting (23-23)
  • IntCompressionSetting (24-24)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (1)
  • shuffleSettingFromInt (239-244)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (1)
util/src/main/scala/com/scalableminds/util/io/ZipIO.scala (2)
  • ZipIO (15-309)
  • GZIPOutputStream (66-68)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
🔇 Additional comments (8)
.github/workflows/build_test_deploy.yml (2)

153-155: No remaining blosc/jblosc references—cleanup complete.


93-95: CI deps cleanup verified: no lingering libblosc/jblosc references
Ripgrep found only the unreleased_changes/8882.md note about switching to blosc-java—no code or CI configs reference libblosc or jblosc.

DEV_INSTALL.md (1)

61-61: Confirm dev Ubuntu packages are sufficient without libblosc1

We removed libblosc1; please verify local runs don’t rely on it and that libdraco-dev suffices for runtime on dev machines (Dockerfiles use libdraco4).

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (1)

187-195: Good switch to enums and supportedCnames derivation

Moving to Blosc.Compressor/Blosc.Shuffle and deriving supported names from the enum reduces drift and stringly-typed errors.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala (2)

6-6: Import simplification looks good

Wildcard zarr3 import matches usage and reduces churn.


117-124: Correct unwrapping of enum defaults for codec and compressor maps

Using getValue for cname/shuffle aligns JSON codec config (string/int) with the new enum-based defaults.

Also applies to: 126-134

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (1)

120-127: Consistent usage of blosc enum defaults

Same pattern as EditableMappingIOService; good consistency.

Also applies to: 129-137

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (1)

173-181: Defaults unwrap is correct for blosc-java enums

Passing defaultCname.getValue as StringCompressionSetting and defaultShuffle.getValue as IntCompressionSetting matches the Compressor contract and blosc-java’s enum API. No action needed.

Please confirm that Blosc.Compressor.values().map(_.getValue) includes your chosen default (e.g., "lz4") at runtime to avoid validation failures when the compressor is instantiated.

@normanrz normanrz enabled auto-merge (squash) September 1, 2025 08:22
@normanrz normanrz merged commit a579d76 into master Sep 1, 2025
5 checks passed
@normanrz normanrz deleted the blosc-java branch September 1, 2025 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants