Classic vs Advanced Training¶
When you create a project, the Create Project dialog includes a Training switcher with two options: Classic RL and Advanced Training. That choice is fixed for the life of the project. Every experiment in the project uses the same mode.
Classic RL (Training switcher on Classic RL)¶
Use this when training agents in Gym-style environments with standard RL algorithms.
Environment step: built-in environments plus org custom environments from Environments in the sidebar.
Agent step: classic algorithms (DQN, Rainbow DQN, PPO, Recurrent PPO, DDPG, TD3, plus MADDPG, MATD3, and IPPO for multi-agent envs). Neural network holds layer settings; algorithm hyperparameters use Standard and Advanced accordions. Optional hyperparameter mutation is on the HPO step.
Agents page: deployments appear under Classic RL.
No Pipelines tab on the project page.
The quickstart tutorial assumes Classic RL.
Advanced Training (Training switcher on Advanced Training)¶
Use this for dataset-driven and LLM workflows, simulation-style Gym training with advanced algorithms, and multi-stage pipelines.
Environment step: pick a dataset type (reasoning, preference, SFT, tabular, non-tabular) or, for some setups, a Gym environment inside an advanced experiment.
Agent step: LLM-oriented algorithms and extra wizard fields. GPU resources are required before algorithm selection.
Pipelines tab on the project page chains Advanced Training experiments into staged workflows.
Agents page: dataset-backed deployments under Advanced Training.
Datasets in the sidebar is where you upload and validate data for these runs.
Some dataset types and features depend on your plan. See Plan permissions.
Comparison¶
Classic RL |
Advanced Training |
|
|---|---|---|
Set at |
Project creation (Training switcher) |
Project creation (Training switcher) |
Primary input |
Gym environment |
Dataset (or Gym in advanced flows) |
Pipelines |
No |
Yes |
Typical algorithms |
DQN, PPO, TD3, … (nine classic trainers) |
GRPO, DPO, SFT, Supervised, … |
Move experiment to other project |
Only within same mode |
Only within same mode |
The experiment list only offers destination projects in the same mode.
Choosing a mode¶
Choose Classic RL when you are benchmarking algorithms on standard or custom Gym environments.
Choose Advanced Training when you need Arena datasets, LLM training, reasoning rewards, preference data, or pipelines.
If you are unsure, start with Classic RL for a smaller wizard and the built-in environment catalog. You can add a separate advanced project later without touching existing work.