On-prem compute¶
Enterprise only
On-prem compute is available on organizations with an Enterprise plan. Without it, the On-prem training cluster menu item is hidden. See Plan permissions.
Enterprise organizations can run training on hardware they operate (a private data centre, a GPU rack, or a Kubernetes cluster in your VPC). Arena still hosts the UI, experiment state, and scheduling; jobs execute on your machines after you connect them with a setup bundle from the platform.
On-prem classes appear on the same experiment Resources step as cloud classes.
Who can use it¶
Who |
Requirement |
What they can do |
|---|---|---|
Manager on an Enterprise org |
Profile menu → On-prem training cluster |
Enable the provider, add and edit resource classes, download setup bundles, copy install commands |
Any org member on Enterprise |
Provider enabled and at least one class enabled |
Pick on-prem classes when building or editing experiments |
Organizations that are not on Enterprise do not see On-prem training cluster in the profile menu. Members do not open the training cluster page; ask a Manager to configure on-prem for the org.
Moving an org to Enterprise is a contract or billing change (see Credits and plans), not a self-serve toggle.
Prerequisites¶
Before you connect hardware:
An Enterprise organization with a Manager who can open On-prem training cluster
Arena CLI installed if you plan to run install from your laptop (
pip install agilerl, thenarena login)Outbound network from your cluster to Arena (for the private tunnel Arena sets up in the install bundle)
Hardware that matches the class you define: for Advanced Training projects, worker classes need at least one GPU per worker node
You install with Docker Swarm or Helm on Kubernetes — pick one path per cluster, not both. Fulfill only the bullets for the path you choose:
Path |
You need |
|---|---|
Docker Swarm |
A Swarm manager host and worker hosts reachable by SSH from where you run install |
Helm |
A Kubernetes cluster and a working local |
Multi-worker pools often need shared storage (typically NFS) so the Ray head and workers see the same paths. Arena does not add that to the download bundle — see Shared storage in the install guide.
Configure in Arena¶
Open profile menu → On-prem training cluster.
Click Enable on-prem (one provider per organization).
Click Add resource class and set Name, worker CPU/GPU/memory, and Number of nodes (see Resource classes).
You can add many resource classes on the same org — each one is a separate worker pool with its own install bundle and Config install on your hardware. For example, one class for an NVIDIA L4 rack and another for A100 machines: use Name (and optional Description) to label the hardware (nvidia-l4, nvidia-a100). Arena has no GPU model picker for on-prem; set CPUs, GPUs (count), and Memory per worker to match those machines, then run install separately for each class row.
Disabling the provider hides on-prem classes from new experiment runs until you enable it again. Existing class definitions stay in the table.
Install on your hardware¶
Arena generates connection settings in each class setup bundle. You do not configure AgileRL’s gateway manually.
Pick one path:
Path |
When to use |
Prerequisite |
|---|---|---|
UI + Config panel |
You created the class in the table and want copy-paste commands |
Provider enabled, class saved; expand Config on the row |
Arena CLI from terminal |
Automate install from your laptop |
CLI installed ( |
Download setup (.tar) |
Air-gapped hosts or no Arena CLI on your machine |
Same as UI path — class must exist; extract on a bastion and run |
Re-run install or re-download when the class row shows Update recommended.
Full steps: Install a cluster.
After install¶
Confirm Setup bundle shows Current on the class row.
Start or edit an experiment and pick your on-prem class on the Resources step.
More detail¶
Install a cluster — end-to-end Swarm, Helm, CLI, and
.tarpathsTraining cluster page — provider toggle, class table, deployment panel
Resource classes — fields, enable/disable, experiment picker