Ray Worker Setup (Remote Execution)

This guide explains how to connect a remote machine as a Ray worker to run heavy tasks for Dig-A-Plan.

The typical setup uses Tailscale to create a private network between machines.


1) Concepts

  • Head machine: runs the main stack (Python/FastAPI, Julia SDDP service, Ray head, dashboards).

  • Worker machine: joins the Ray cluster and executes Ray tasks.

  • Tailscale IP: the “private address” that Tailscale gives to each machine (it often looks like 100.xx.yy.zz).
    Use this address to connect machines together (SSH to the worker, and connect the Ray worker to the Ray head), even if they are on different networks.


2) Install and setup Tailscale

Tailscale creates a private network between the head and worker machines, so they can communicate even if they are on different networks.

Linux / Ubuntu / WSL

  1. Install Tailscale

   curl -fsSL https://tailscale.com/install.sh | sh
  1. Start Tailscale and log in

   sudo tailscale up

Follow the login link shown in the terminal. Make sure both machines are connected to the same Tailscale account/team.

  1. Get the Tailscale IP

   tailscale ip

You will get an IP like 100.xx.yy.zz. When starting the Ray worker, use the head machine’s Tailscale IP as HEAD_HOST


3) Prerequisites

On both machines

  • The project is set up on both machines (e.g. run make install-all and then you can run make venv-activate and the required tools are installed).

  • Both machines can reach each other over a private network, it means that both are connected to the same Tailscale network (same account/team). You can verify connectivity with:

    tailscale status

On the worker machine

  • You can SSH into the worker from the head machine. For this, you need an address like user@worker-host or user@<worker-ip>.

Required environment variable

  • SERVER_RAY_PORT must be set (usually via .envrc). If you use direnv, run:

    direnv allow

4) Start the stack on the head machine

On the head machine:

make start

This typically:

  • starts Docker services (Python/FastAPI, Julia SDDP service, Grafana/Prometheus/Mongo),

  • opens a tmux session with panes for logs and an interactive shell.


5) Start the Ray worker (on the worker machine)

Once you are logged in to the worker machine (SSH session), start the worker:

make run-ray-worker

This command runs the script ./scripts/start-ray-worker.sh.

That script will ask you:

    “Enter head host IP (e.g., 100.66.5.3):”

You need to type the IP address of the head machine that the worker can reach.

The script will runs:

    ray start --address="HEAD_IP:SERVER_RAY_PORT"

which means that start Ray on this worker and connect it to the Ray head running at HEAD_IP on port SERVER_RAY_PORT.

Then it continuously shows cluster status every 5 seconds:

    watch -n 5 ray status

6) Verify the worker is connected

On the worker machine

After you run make run-ray-worker, the script shows ray status every few seconds.
You should see the worker listed as an active node in the cluster.

If you don’t see it, the worker is not connected to the head yet (check the Troubleshooting section).

On the head machine

You can confirm the cluster from the Ray dashboard:

  • http://localhost:8265

If your Ray dashboard uses a different port in your setup, use that configured port.


7) Troubleshooting

“Need to set SERVER_RAY_PORT”

Your environment is missing SERVER_RAY_PORT.

  • Ensure .envrc contains it

  • Run:

  direnv allow
  • Or export it manually in your shell.

    export SERVER_RAY_PORT

Worker cannot connect to head

  • Confirm both machines are on the same Tailscale network:

  tailscale status
  • Ensure you used the correct head IP .

  • Verify the Ray port is reachable (firewall/ufw). If needed on the head:

  make permit-remote-ray-port

Ray is stuck / old sessions exist

On the machine with the issues:

    make clean-ray

8) Notes for maintainers

  • make connect-ray-worker runs ./scripts/connect-ray-worker.sh (prompts for user@worker then runs ssh).

  • make run-ray-worker runs ./scripts/start-ray-worker.sh (interactive head IP prompt and ray start --address=...).

  • make start runs ./scripts/start-servers.sh and opens a tmux session (logs + worker + interactive panes).