-
Notifications
You must be signed in to change notification settings - Fork 29
use blosc-java instead of jblosc #8882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughRemoved system libblosc from Dockerfiles, CI, and docs; swapped JVM dependency from jblosc to blosc-java; refactored BloscCompressor to use blosc-java enums and wrapper API; updated tracingstore code to unwrap new enum defaults; added changelog entry. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (3)
36-43
: Close ZipFile in detectVolumeDataZipFormat to prevent FD leak
tryo(new ZipFile(...)).map(...)
never closes the ZipFile. Wrap it withUsing.resource
(and keeptryo
to preserve Box semantics).Apply:
- private def detectVolumeDataZipFormat(zipFile: File): Box[VolumeDataZipFormat] = - tryo(new java.util.zip.ZipFile(zipFile)).map { zip => - val relevantFile: Option[ZipEntry] = - ZipIO.entries(zip).find(entry => entry.getName.endsWith("zarr.json") || entry.getName.endsWith(".wkw")) - if (relevantFile.exists(_.getName.endsWith("zarr.json"))) { - VolumeDataZipFormat.zarr3 - } else VolumeDataZipFormat.wkw - } + private def detectVolumeDataZipFormat(zipFile: File): Box[VolumeDataZipFormat] = + tryo { + Using.resource(new java.util.zip.ZipFile(zipFile)) { zip => + val relevantFile = + ZipIO.entries(zip).find(e => e.getName.endsWith("zarr.json") || e.getName.endsWith(".wkw")) + if (relevantFile.exists(_.getName.endsWith("zarr.json"))) VolumeDataZipFormat.zarr3 + else VolumeDataZipFormat.wkw + } + }
71-76
: Avoid opening ZipFile twice and close properlyTwo separate
new ZipFile(zipFile)
calls (Lines 72 and 75) both leak. Open once withUsing.resource
for the header discovery and read.Apply:
- firstHeaderFilePath <- ZipIO - .entries(new ZipFile(zipFile)) - .find(entry => entry.getName.endsWith(Zarr3ArrayHeader.FILENAME_ZARR_JSON)) - .toFox - firstHeaderString <- ZipIO.readAt(new ZipFile(zipFile), firstHeaderFilePath).toFox + firstHeaderFilePath <- tryo { + Using.resource(new ZipFile(zipFile)) { zip => + ZipIO.entries(zip).find(_.getName.endsWith(Zarr3ArrayHeader.FILENAME_ZARR_JSON)) + } + }.toFox + firstHeaderString <- tryo { + Using.resource(new ZipFile(zipFile)) { zip => + ZipIO.readAt(zip, firstHeaderFilePath) + } + }.toFox
165-171
: Close FileOutputStream in inputStreamToTempfile
out
is never closed, causing descriptor leaks. UseUsing.resource
. Also consider copying toFiles.createTempFile
via NIO for robustness, butUsing
is sufficient here.Apply:
private def inputStreamToTempfile(inputStream: InputStream): File = { val tempFile = File.createTempFile("data", "zip") tempFile.deleteOnExit() - val out = new FileOutputStream(tempFile) - IOUtils.copy(inputStream, out) + Using.resource(new FileOutputStream(tempFile)) { out => + IOUtils.copy(inputStream, out) + } tempFile }
🧹 Nitpick comments (8)
.github/Dockerfile.tracingstore (1)
4-5
: Slimmer, more reliable installs.Use --no-install-recommends and install curl explicitly for the HEALTHCHECK.
-RUN apt-get update \ - && apt-get -y install libbrotli1 libdraco4 \ +RUN apt-get update \ + && apt-get -y install --no-install-recommends libbrotli1 libdraco4 curl \ && rm -rf /var/lib/apt/lists/*Confirm curl is present at runtime (healthcheck uses it). If base already provides curl, this is a no-op.
.github/Dockerfile.datastore (1)
4-5
: Use --no-install-recommends and ensure curl for HEALTHCHECK.Healthcheck uses curl but it’s not installed here; add it and avoid extra recommends.
-RUN apt-get update \ - && apt-get -y install libbrotli1 libdraco4 \ +RUN apt-get update \ + && apt-get -y install --no-install-recommends libbrotli1 libdraco4 curl \ && rm -rf /var/lib/apt/lists/*.github/Dockerfile.webknossos (1)
4-6
: Harden package install; add curl explicitly.Nodesource script runs apt update; still, install only what’s needed and include curl for HEALTHCHECK.
-RUN curl -sL "https://deb.nodesource.com/setup_${VERSION_NODE}" | bash - \ - && apt-get -y install libbrotli1 postgresql-client libdraco4 git nodejs \ +RUN curl -sL "https://deb.nodesource.com/setup_${VERSION_NODE}" | bash - \ + && apt-get -y install --no-install-recommends libbrotli1 postgresql-client libdraco4 git nodejs curl \ && rm -rf /var/lib/apt/lists/*project/Dependencies.scala (1)
61-62
: Fix comment and verify artifact availability
- The comment says “import dev.zarr.bloscjava” but the code uses
com.scalableminds.bloscjava.Blosc
. Update the comment to avoid confusion.- Ensure
com.scalableminds:blosc-java:0.1-1.21.4
is published to your resolvers for CI/dev (all required native variants included).- // Blosc compression. import dev.zarr.bloscjava + // Blosc compression. import com.scalableminds.bloscjavawebknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (2)
235-244
: Validate accepted shuffle namesError message lists “noshuffle, shuffle, bitshuffle”. Confirm
Blosc.Shuffle.fromString
accepts these exact tokens (and not “byteshuffle”). If it expects other spellings, add aliases or adjust message.
284-289
: Make Blosc thread count configurable
compress/decompress(..., 1)
forces single-threaded operation. Consider wiring this from config or defaulting to an environment/system setting to preserve performance.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (1)
120-137
: Optional: factor out common Blosc config builder
compressorConfiguration
andcompressor
are duplicated across files. Consider a small factory/helper to avoid drift.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (1)
5-8
: LGTM: import updates align with blosc-java + zarr3 usageThe added imports are coherent with the new calls (tryo/JsonHelper/WKWFile/Zarr3ArrayHeader/AdditionalCoordinate/BucketPosition/FileOutputStream). Consider adding
scala.util.Using
as well for safe resource handling in this file (see comments below).Apply:
+import scala.util.Using
Also applies to: 15-15, 20-20
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (11)
.github/Dockerfile.datastore
(1 hunks).github/Dockerfile.tracingstore
(1 hunks).github/Dockerfile.webknossos
(1 hunks).github/workflows/build_test_deploy.yml
(2 hunks)DEV_INSTALL.md
(2 hunks)project/Dependencies.scala
(1 hunks)unreleased_changes/8882.md
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala
(4 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (4)
BloscCompressor
(185-197)BloscCompressor
(199-291)IntCompressionSetting
(24-24)StringCompressionSetting
(23-23)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (4)
BloscCompressor
(185-197)BloscCompressor
(199-291)StringCompressionSetting
(23-23)IntCompressionSetting
(24-24)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (3)
BloscCodecConfiguration
(226-233)BloscCodecConfiguration
(235-245)shuffleSettingFromInt
(239-244)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (4)
BloscCompressor
(185-197)BloscCompressor
(199-291)StringCompressionSetting
(23-23)IntCompressionSetting
(24-24)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (1)
shuffleSettingFromInt
(239-244)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (1)
util/src/main/scala/com/scalableminds/util/io/ZipIO.scala (2)
ZipIO
(15-309)GZIPOutputStream
(66-68)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
🔇 Additional comments (8)
.github/workflows/build_test_deploy.yml (2)
153-155
: No remaining blosc/jblosc references—cleanup complete.
93-95
: CI deps cleanup verified: no lingering libblosc/jblosc references
Ripgrep found only the unreleased_changes/8882.md note about switching to blosc-java—no code or CI configs reference libblosc or jblosc.DEV_INSTALL.md (1)
61-61
: Confirm dev Ubuntu packages are sufficient without libblosc1We removed libblosc1; please verify local runs don’t rely on it and that libdraco-dev suffices for runtime on dev machines (Dockerfiles use libdraco4).
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (1)
187-195
: Good switch to enums and supportedCnames derivationMoving to
Blosc.Compressor
/Blosc.Shuffle
and deriving supported names from the enum reduces drift and stringly-typed errors.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala (2)
6-6
: Import simplification looks goodWildcard zarr3 import matches usage and reduces churn.
117-124
: Correct unwrapping of enum defaults for codec and compressor mapsUsing
getValue
for cname/shuffle aligns JSON codec config (string/int) with the new enum-based defaults.Also applies to: 126-134
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (1)
120-127
: Consistent usage of blosc enum defaultsSame pattern as EditableMappingIOService; good consistency.
Also applies to: 129-137
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (1)
173-181
: Defaults unwrap is correct for blosc-java enumsPassing
defaultCname.getValue
asStringCompressionSetting
anddefaultShuffle.getValue
asIntCompressionSetting
matches theCompressor
contract and blosc-java’s enum API. No action needed.Please confirm that
Blosc.Compressor.values().map(_.getValue)
includes your chosen default (e.g., "lz4") at runtime to avoid validation failures when the compressor is instantiated.
Switch out jblosc (which has been last updated 7 years ago!) with blosc-java. I had to do this, because the JNA version that jblosc references does not work with apple silicon. Also, blosc-java bundles the blosc library so no system library is required.
(Please delete unneeded items, merge only when none are left open)
$PR_NUMBER.md
file inunreleased_changes
or use./tools/create-changelog-entry.py
)