Skip to content
Discussion options

You must be logged in to vote

I believe the dataset module is the preferred way to do this:

import pyarrow.dataset as ds

dataset = ds.dataset(source_path)

for batch in dataset.to_batches(filter=cat_dict[cat]["cat_filter"]):
    ...

See https://arrow.apache.org/docs/python/dataset.html for some more docs

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@ikrommyd
Comment options

@ikrommyd
Comment options

@ikrommyd
Comment options

@sidneymau
Comment options

@ikrommyd
Comment options

Answer selected by ikrommyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants