-
-
Couldn't load subscription status.
- Fork 743
Open
Description
I find myself often wanting to run code on a worker, rather than on my local client. This happens in a few settings:
- My workers have access to a data store that I don't, so I need to call something like
dd.read_parquetremotely (cc @martindurant @jcrist ) - My workers are far away from my client, so client-heavy operations like joblib or Hyperband incur a serious bottleneck from client-scheduler communication (cc @stsievert )
- My workers have hardware or libraries like GPUs/RAPIDS that I don't have locally (cc @quasiben @kkraus14)
Today I can do this by writing a function and submitting that function as a task
def f():
import dask_cudf
df = dask_cudf.read_parquet("s3://...")
return df.sum().compute()
result = client.submit(f).result()It might make sense to provide syntax around this to make it more magical (or it might not). We might do something like the following:
with dask.distributed.remote as result:
import dask_cudf
df = dask_cudf.read_parquet("s3://...")
result = df.sum().compute()I know that @eriknw has done magic like this in the past. We could enlist this help. However, we may not want to do this due to the magical and novel behavior.
Metadata
Metadata
Assignees
Labels
No labels