Skip to content

Conversation

@basnijholt
Copy link
Contributor

@basnijholt basnijholt commented Jun 21, 2023

Currently adlfs is telling me that my folder is a file.

When printing props, I see Hdi_isfolder is capitalized whereas in the code it is not.

For example on my folder (folder/.dev/Air) I have the following props which are passed to _details (which sets whether it is a file or directory):

{'name': 'folder/.dev/Air', 'container': 'mycontainer', 'snapshot': None, 'version_id': None, 'is_current_version': None, 'blob_type': <BlobType.BLOCKBLOB: 'BlockBlob'>, 'metadata': {'Hdi_isfolder': 'true'}, 'encrypted_metadata': None, 'last_modified': datetime.datetime(2023, 6, 14, 22, 42, 5, tzinfo=datetime.timezone.utc), 'etag': '"0x8DB6D28944E30E9"', 'size': 0, 'content_range': None, 'append_blob_committed_block_count': None, 'is_append_blob_sealed': None, 'page_blob_sequence_number': None, 'server_encrypted': True, 'copy': {'id': None, 'source': None, 'status': None, 'progress': None, 'completion_time': None, 'status_description': None, 'incremental_copy': None, 'destination_snapshot': None}, 'content_settings': {'content_type': 'application/octet-stream', 'content_encoding': None, 'content_language': None, 'content_md5': None, 'content_disposition': None, 'cache_control': None}, 'lease': {'status': 'unlocked', 'state': 'available', 'duration': None}, 'blob_tier': None, 'rehydrate_priority': None, 'blob_tier_change_time': None, 'blob_tier_inferred': None, 'deleted': False, 'deleted_time': None, 'remaining_retention_days': None, 'creation_time': datetime.datetime(2023, 6, 14, 22, 42, 5, tzinfo=datetime.timezone.utc), 'archive_status': None, 'encryption_key_sha256': None, 'encryption_scope': None, 'request_server_encrypted': True, 'object_replication_source_properties': [], 'object_replication_destination_policy': None, 'last_accessed_on': None, 'tag_count': None, 'tags': None, 'immutability_policy': {'expiry_time': None, 'policy_mode': None}, 'has_legal_hold': None, 'has_versions_only': None}

Then _details sets this as a file.

Closes #440

Currently `adlfs` is telling me that my folder is a file.

When printing `props`, I see `Hdi_isfolder` is capitalized whereas in the code it is not.

For example on my folder (`folder/.dev/Air`) I have the following `props` which are passed to `_details` (which sets whether it is a file or directory):
```
{'name': 'folder/.dev/Air', 'container': 'mycontainer', 'snapshot': None, 'version_id': None, 'is_current_version': None, 'blob_type': <BlobType.BLOCKBLOB: 'BlockBlob'>, 'metadata': {'Hdi_isfolder': 'true'}, 'encrypted_metadata': None, 'last_modified': datetime.datetime(2023, 6, 14, 22, 42, 5, tzinfo=datetime.timezone.utc), 'etag': '"0x8DB6D28944E30E9"', 'size': 0, 'content_range': None, 'append_blob_committed_block_count': None, 'is_append_blob_sealed': None, 'page_blob_sequence_number': None, 'server_encrypted': True, 'copy': {'id': None, 'source': None, 'status': None, 'progress': None, 'completion_time': None, 'status_description': None, 'incremental_copy': None, 'destination_snapshot': None}, 'content_settings': {'content_type': 'application/octet-stream', 'content_encoding': None, 'content_language': None, 'content_md5': None, 'content_disposition': None, 'cache_control': None}, 'lease': {'status': 'unlocked', 'state': 'available', 'duration': None}, 'blob_tier': None, 'rehydrate_priority': None, 'blob_tier_change_time': None, 'blob_tier_inferred': None, 'deleted': False, 'deleted_time': None, 'remaining_retention_days': None, 'creation_time': datetime.datetime(2023, 6, 14, 22, 42, 5, tzinfo=datetime.timezone.utc), 'archive_status': None, 'encryption_key_sha256': None, 'encryption_scope': None, 'request_server_encrypted': True, 'object_replication_source_properties': [], 'object_replication_destination_policy': None, 'last_accessed_on': None, 'tag_count': None, 'tags': None, 'immutability_policy': {'expiry_time': None, 'policy_mode': None}, 'has_legal_hold': None, 'has_versions_only': None}
```

Then `_details` sets this as a `file`.
@basnijholt
Copy link
Contributor Author

Interestingly, in the same container, code, and environment, on another folder I get these props with hdi_isfolder uncapitalized:

props {'name': 'folder/Air', 'container': 'mycontainer', 'snapshot': None, 'version_id': None, 'is_current_version': None, 'blob_type': <BlobType.BLOCKBLOB: 'BlockBlob'>, 'metadata': {'hdi_isfolder': 'true'}, 'encrypted_metadata': None, 'last_modified': datetime.datetime(2023, 2, 17, 0, 3, 8, tzinfo=datetime.timezone.utc), 'etag': '"0x8DB107A59F70DB3"', 'size': 0, 'content_range': None, 'append_blob_committed_block_count': None, 'is_append_blob_sealed': None, 'page_blob_sequence_number': None, 'server_encrypted': True, 'copy': {'id': None, 'source': None, 'status': None, 'progress': None, 'completion_time': None, 'status_description': None, 'incremental_copy': None, 'destination_snapshot': None}, 'content_settings': {'content_type': None, 'content_encoding': None, 'content_language': None, 'content_md5': None, 'content_disposition': None, 'cache_control': None}, 'lease': {'status': 'unlocked', 'state': 'available', 'duration': None}, 'blob_tier': None, 'rehydrate_priority': None, 'blob_tier_change_time': None, 'blob_tier_inferred': None, 'deleted': False, 'deleted_time': None, 'remaining_retention_days': None, 'creation_time': datetime.datetime(2023, 2, 17, 0, 3, 8, tzinfo=datetime.timezone.utc), 'archive_status': None, 'encryption_key_sha256': None, 'encryption_scope': None, 'request_server_encrypted': True, 'object_replication_source_properties': [], 'object_replication_destination_policy': None, 'last_accessed_on': None, 'tag_count': None, 'tags': None, 'immutability_policy': {'expiry_time': None, 'policy_mode': None}, 'has_legal_hold': None, 'has_versions_only': None}

@basnijholt
Copy link
Contributor Author

@TomAugspurger or @hayesgb, any feedback? Is there a chance of getting this merged?

@TomAugspurger
Copy link
Contributor

Thanks @basnijholt. I didn't really understand the changes until reading #418.

Does azurite support these metadata fields? If so, could you add a test that hits this?

@TomAugspurger
Copy link
Contributor

Did you have a chance to look at tests with Azurite?

@TomAugspurger
Copy link
Contributor

Can you add a test?

@basnijholt
Copy link
Contributor Author

Unfortunately, I no longer work with Azure Data Lake and cannot find the bandwidth to sit down and write a proper test for this.

The code change is really trivial though and this code ran in production for many months in an internal project.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 10, 2023 via email

@TomAugspurger
Copy link
Contributor

Azurite does support metadata. Added a test and changelog entry.

@TomAugspurger TomAugspurger merged commit 32132c4 into fsspec:main Dec 31, 2023
@TomAugspurger
Copy link
Contributor

Thanks @basnijholt!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support virtual directory stubs with uppercase "Hdi_isfolder" metadata

3 participants