Skip to content

Run local code remotely on a worker #4003

@mrocklin

Description

@mrocklin

I find myself often wanting to run code on a worker, rather than on my local client. This happens in a few settings:

  1. My workers have access to a data store that I don't, so I need to call something like dd.read_parquet remotely (cc @martindurant @jcrist )
  2. My workers are far away from my client, so client-heavy operations like joblib or Hyperband incur a serious bottleneck from client-scheduler communication (cc @stsievert )
  3. My workers have hardware or libraries like GPUs/RAPIDS that I don't have locally (cc @quasiben @kkraus14)

Today I can do this by writing a function and submitting that function as a task

def f():
    import dask_cudf
    df = dask_cudf.read_parquet("s3://...")
    return df.sum().compute()

result = client.submit(f).result()

It might make sense to provide syntax around this to make it more magical (or it might not). We might do something like the following:

with dask.distributed.remote as result:
    import dask_cudf
    df = dask_cudf.read_parquet("s3://...")
    result = df.sum().compute()

I know that @eriknw has done magic like this in the past. We could enlist this help. However, we may not want to do this due to the magical and novel behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions