Data

Store the data in a folder called data.

E.g.,

mkdir data
ln -s /mnt/disks/solar-wind-joseph/psp_reference_1min_labeled.zarr/ data

Visual Studio Code

install the extension Command Variable in VSCode.

Run Dask in a distributed cluster fashion.

You can improve the performance of your code by running it in a distributed cluster. To do this, you need to start a Dask cluster and connect your Python scripts to it. The following steps will help you get started:

[Remote server] Dask needs the library installed as a dependency. To do that, you have to use the following:

pip install -e .

To run a local Dask cluster as a daemon and connect Python scripts to it, follow these steps:

✅ Step 1: Install Dask and Distributed

pip install "dask[distributed]"

✅ Step 2: Start a Dask Scheduler as a Daemon

Use dask-scheduler to start the scheduler process:

nohup dask-scheduler > scheduler.log 2>&1 &

This runs the scheduler in the background and logs output to scheduler.log.

By default, it runs on TCP port 8786, with a dashboard on port 8787. ✅ Step 3: Start Dask Workers

Launch one or more workers to connect to the scheduler:

nohup dask-worker tcp://127.0.0.1:8786 > worker1.log 2>&1 &

[Local Computer] Dask has a built in dashboard that is useful to debug and check the progress of Dask jobs. With the past commands, the dashboard is running on port 8787 at the remote machine. You can do a port forwarding with ssh to simulate the remote port as a local port. From your local machine, you can connect to Dask dashboard using:

ssh -L 8787:localhost:8787 [server_name] (e.g., ssh -L 8787:localhost:8787 fdl-daniela)

[Remote Server] You can set up the number of process and threads using the following code:

nohup dask-worker tcp://127.0.0.1:8786 --nworkers 8 --nthreads 1 > worker1.log 2>&1 &
nohup dask-worker tcp://127.0.0.1:8786 --memory-limit 12GB --nworkers 1 --nthreads 1 > worker1.log 2>&1 &

We can lunch Dask with cuda support for several GPU

# GPU 0
CUDA_VISIBLE_DEVICES=0 dask-cuda-worker tcp://127.0.0.1:8786 --nthreads 8 > worker1.log 2>&1 &

# GPU 1
CUDA_VISIBLE_DEVICES=0 dask-cuda-worker tcp://127.0.0.1:8786 > worker1.log 2>&1 &

CUDA_VISIBLE_DEVICES=0 dask-cuda-worker tcp://127.0.0.1:8786 --nthreads 8 > worker1.log 2>&1 &

# Or run all workers in one command:
dask-cuda-worker tcp://127.0.0.1:8786 --device-memory-limit=16GB

You can launch multiple workers similarly (on the same or other machines). ✅ Step 4: Connect Your Python Script to the Cluster

Use dask.distributed.Client to connect to the running scheduler:

from dask.distributed import Client

client = Client("tcp://127.0.0.1:8786")  # Use the IP/hostname and port of the scheduler

print(client)

Now all your Dask operations will use the external daemonized cluster. ✅ Optional: Clean Up All Background Processes

To stop everything:

pkill -f dask-scheduler
pkill -f dask-worker

PS: Whenever changing branches, parameters names, you need to restart the dask procedure.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode-sample		.vscode-sample
conf		conf
notebooks		notebooks
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data

Visual Studio Code

Run Dask in a distributed cluster fashion.

About

Uh oh!

Releases

Packages

Languages

spaceml-org/PSP-KDM

Folders and files

Latest commit

History

Repository files navigation

Data

Visual Studio Code

Run Dask in a distributed cluster fashion.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages