Install a cluster¶
Enterprise only
On-prem compute is available on organizations with an Enterprise plan. Without it, the On-prem training cluster menu item is hidden. See Plan permissions.
Connect your hardware to Arena after a Manager has enabled on-prem and defined at least one resource class (unless you use the CLI path that creates the class for you).
Prerequisites¶
Enterprise organization
Arena CLI installed (
pip install agilerl) andarena loginorARENA_API_KEYif you use the CLI (Authentication)Worker sizing in the class matches the machines you will train on
Install with Docker Swarm or Helm on Kubernetes — one path per cluster, not both:
Path |
You need |
|---|---|
Docker Swarm |
SSH access to the manager and worker hosts from the machine running install |
Helm |
|
Step 1: Enable on-prem and add a class¶
UI path (most teams):
Profile menu → On-prem training cluster.
Enable on-prem.
Add resource class — set Name, Number of nodes, and Compute resource (per worker node) fields.
CLI path: with the Arena CLI installed, arena on-prem install can enable the provider and create the class named in the command when they do not exist yet. You still need an Enterprise org and a logged-in CLI session.
For a second hardware pool (different GPU type or site), add another class with a distinct Name, size CPUs / GPUs / Memory for those workers, and run install again on the matching hosts.
Step 2: Choose Docker Swarm or Helm¶
Pick one path per cluster — Swarm or Kubernetes (Helm), not both. On the training cluster page, expand Config on the class row and use the Docker Swarm / Helm toggle. The install command and downloaded bundle match the selected type.
Type |
Where install runs |
Host access |
|---|---|---|
Docker Swarm |
SSH to manager and workers |
CLI installs Docker on fresh nodes when needed |
Helm |
Your laptop against |
No SSH; uses the current kube context |
Step 3: Run install¶
Arena CLI (recommended)¶
Install the CLI first (pip install agilerl — see Arena CLI). Copy the command from the Arena CLI (recommended) block at the top of the Config panel (it uses the class Name from that row), or run:
arena login
arena on-prem install MY-CLASS --setup-type dockerSwarm \
--manager MANAGER_HOST --workers gpu1.example.com,gpu2.example.com
Helm on your cluster:
arena on-prem install MY-CLASS --setup-type helm
See On-prem CLI for teardown and flags.
When copying from Config, the class already exists — the command installs workers for that saved class. A standalone arena on-prem install NEW-NAME from your terminal can also create the provider and class first, then install.
Download setup (.tar)¶
In Config, choose Swarm or Helm, then Download setup (.tar).
Copy the archive to a bastion, extract it,
cd arena-train.Run
./setup.shor follow the README tab in the same panel.
Step 4: Verify¶
On On-prem training cluster, the class Setup bundle column should show Current.
Create a test experiment, open Resources, and confirm your on-prem class appears (sorted above cloud classes, zero credits).
Train and confirm the run leaves Pending within the usual window (see troubleshooting).
Upgrade when Update recommended appears¶
Arena shipped a new training image. Re-run arena on-prem install for that class or download a fresh .tar and roll out on your cluster until Setup bundle returns to Current.
Teardown¶
Remove the stack on your side before deleting the class in Arena:
arena on-prem teardown MY-CLASS --setup-type dockerSwarm --manager MANAGER_HOST
arena on-prem teardown MY-CLASS --setup-type helm
Then Delete the class on the training cluster page if you no longer need it.