Supervised training¶

Some Advanced Training experiments minimize supervised loss on labeled data instead of running a full RL loop. The Agent step exposes the catalog entries below.

Enterprise only

Supervised and LatentPPO run on tabular and non-tabular datasets, which require an Enterprise plan on the organization. SFT runs on language datasets and only needs a plan that includes Advanced Training. See Plan permissions.

Algorithms on Agent¶

Name	Typical use
Supervised	Tabular or non-tabular datasets with input and target columns
SFT	Language SFT datasets with prompt and target columns
LatentPPO	Latent module trained between pretrained blocks; shares much of the supervised wizard flow

Which dataset unlocks which algorithm¶

Dataset on Environment	Algorithms you usually see
Tabular	Supervised only
Non-tabular object detection (with a prior Supervised saved model)	Supervised and LatentPPO
Other non-tabular	Supervised
SFT	SFT
Reasoning or preference	RL-style options in LLM algorithms, not this page

Supervised and SFT still use the full wizard: Agent (algorithm and network), Environment (dataset binding), Training (steps, batch size, optimizer-related fields), and Resources when you need specific compute.

Metrics on Results¶

Default charts depend on dataset task type and algorithm:

Algorithm	Typical default training charts
Supervised (classification)	Training score (accuracy), evaluation fitness, loss, validation loss, steps
Supervised (object detection)	Training score (mean IoU), pixel accuracy, validation pixel accuracy, loss, validation loss, steps
LatentPPO (object detection)	Composite reward (training score), evaluation mean IoU, loss decomposition (`loss_seg`, `loss_policy`, `loss_kl`), configured reward components (`mean_iou`, `dice`, `ce`, `boundary`), advantage mean, PPO clip fraction, pixel accuracy when applicable, steps

Score vs fitness: for Supervised, the score chart tracks the training fitness metric (accuracy, mean IoU, MSE). For LatentPPO, score is the weighted composite reward during training; use fitness for validation mean IoU.

After training, deploy Supervised or LatentPPO checkpoints from Agents → Advanced Training (Connect, then predict snippets). See Create and deploy an agent.

Supervised training¶

Algorithms on Agent¶

Which dataset unlocks which algorithm¶

Metrics on Results¶

Related¶