-
Notifications
You must be signed in to change notification settings - Fork 236
IPIP 0499: CID Profiles #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
lets make the fanout match the max links from files and rename profile to `-wide` this will make it easier to discuss in ipfs/specs#499
Co-authored-by: Bumblefudge <[email protected]>
Import.* config params for controlling DAG width were added in: ipfs/kubo#10774
Thank you for kicking this off, and filling initial state. I've incorporated specific "dag width" settings for Next:
|
This comment was marked as off-topic.
This comment was marked as off-topic.
I pushed a bunch of edits to move the conversation forward. This is sorely needed in the ecosystem, and the hope is that by building consensus we can improve developer experience when working with UnixFS and the overall health of the UnixFS ecosystem. Feedback is always appreciated. |
1. UnixFS DAG layout (e.g. balanced, trickle) | ||
1. UnixFS DAG width (max number of links per `File` node) | ||
1. `HAMTDirectory` fanout (must be a power of 2) | ||
1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this number is dynamic based on the lengths of the actual link entries in the dag, we will need to specify what algorithm that estimation follows. I would put such things in a special "ipfs legacy" profile to be honest, along with cidv0, non-raw leaves etc. We probably should heavily discourage coming up with profiles that do weird things, like dynamically setting params or not using raw-leaves for things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, each layout would have its own set of layout-params:
- balanced:
- max-links: N
- trickle:
- max-leaves-per-level: N
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should heavily discourage coming up with profiles that do weird things, like dynamically setting params or not using raw-leaves for things.
Yeah, that's exactly what we're doing by defining this profile.
src/ipips/ipip-0499.md
Outdated
1. Whether empty directories are included in the DAG | ||
- Some implementations apply filtering before merkleizing filesystem entries in the DAG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird, because then we need to consider empty files, hidden files, unreadable files, symlinks and symlink follows, so probably need to mention all those as part of the profile too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is motivated by Git's default behaviour which ignores empty directories.
But we can mention here the rest.
|
||
### Compatibility | ||
|
||
UnixFS Data encoded with the profiles defined in this IPIP is fully compatible with existing implementations, as it is fully compliant with the UnixFS specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot be compliant with details that are not specified as of today..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contingent on #331
src/ipips/ipip-0499.md
Outdated
1. UnixFS chunk size | ||
1. UnixFS DAG layout (e.g. balanced, trickle) | ||
1. UnixFS DAG width (max number of links per `File` node) | ||
1. `HAMTDirectory` fanout (must be a power of 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can alternatively be called "bitwidth" and you just use the number of bits for this, it's what we're doing in all the other hamts we have. So the default bitwidth is 8 = 256 leaves, bitwidth of 5 would be 32, etc.
1. Leaf Envelope: either `dag-pb` or `raw` | ||
1. Whether empty directories are included in the DAG | ||
- Some implementations apply filtering before merkleizing filesystem entries in the DAG. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of other things to consider?
- Directory wrapping at the top level (for just files, kubo has an option to wrap in a directory so you get file metadata)
- Presence and accurate setting of
Tsize
- at one point we were going to deprecate this field for some cases, although I think all our encoders now do it properly, you could just mandate this in the spec though -- all valid profiles must properly encodeTsize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this as a parameter.
According to the latest version of https://github.com/ipfs/specs/pull/331/files, the calculation is done as follows:
To compute the Tsize of a child DAG, sum the length of the dag-pb outside message binary length and the blocksizes of all nodes in the child DAG.
If calculated according to this, does it make accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds about right, I remember there being some nuance in exactly what's included in the size calculation, making it not super stable if you get it slightly wrong (as we did for some variants in go-unixfsnode for a while)
Co-authored-by: Hector Sanjuan <[email protected]>
Co-authored-by: Rod Vagg <[email protected]>
Currently, CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID.
This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. They can be used to verify data across implementations, provide recommended settings depending on retrieval performance goals, and more.