Store the data in a folder called data.
E.g.,
mkdir data
ln -s /mnt/disks/solar-wind-joseph/psp_reference_1min_labeled.zarr/ datainstall the extension Command Variable in VSCode.
You can improve the performance of your code by running it in a distributed cluster. To do this, you need to start a Dask cluster and connect your Python scripts to it. The following steps will help you get started:
[Remote server] Dask needs the library installed as a dependency. To do that, you have to use the following:
pip install -e .To run a local Dask cluster as a daemon and connect Python scripts to it, follow these steps:
✅ Step 1: Install Dask and Distributed
pip install "dask[distributed]"✅ Step 2: Start a Dask Scheduler as a Daemon
Use dask-scheduler to start the scheduler process:
nohup dask-scheduler > scheduler.log 2>&1 &This runs the scheduler in the background and logs output to scheduler.log.
By default, it runs on TCP port 8786, with a dashboard on port 8787. ✅ Step 3: Start Dask Workers
Launch one or more workers to connect to the scheduler:
nohup dask-worker tcp://127.0.0.1:8786 > worker1.log 2>&1 &[Local Computer] Dask has a built in dashboard that is useful to debug and check the progress of Dask jobs. With the past commands, the dashboard is running on port 8787 at the remote machine. You can do a port forwarding with ssh to simulate the remote port as a local port. From your local machine, you can connect to Dask dashboard using:
ssh -L 8787:localhost:8787 [server_name] (e.g., ssh -L 8787:localhost:8787 fdl-daniela)[Remote Server] You can set up the number of process and threads using the following code:
nohup dask-worker tcp://127.0.0.1:8786 --nworkers 8 --nthreads 1 > worker1.log 2>&1 &
nohup dask-worker tcp://127.0.0.1:8786 --memory-limit 12GB --nworkers 1 --nthreads 1 > worker1.log 2>&1 &We can lunch Dask with cuda support for several GPU
# GPU 0
CUDA_VISIBLE_DEVICES=0 dask-cuda-worker tcp://127.0.0.1:8786 --nthreads 8 > worker1.log 2>&1 &
# GPU 1
CUDA_VISIBLE_DEVICES=0 dask-cuda-worker tcp://127.0.0.1:8786 > worker1.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 dask-cuda-worker tcp://127.0.0.1:8786 --nthreads 8 > worker1.log 2>&1 &
# Or run all workers in one command:
dask-cuda-worker tcp://127.0.0.1:8786 --device-memory-limit=16GBYou can launch multiple workers similarly (on the same or other machines). ✅ Step 4: Connect Your Python Script to the Cluster
Use dask.distributed.Client to connect to the running scheduler:
from dask.distributed import Client
client = Client("tcp://127.0.0.1:8786") # Use the IP/hostname and port of the scheduler
print(client)Now all your Dask operations will use the external daemonized cluster. ✅ Optional: Clean Up All Background Processes
To stop everything:
pkill -f dask-scheduler
pkill -f dask-workerPS: Whenever changing branches, parameters names, you need to restart the dask procedure.