Supervised training

Some Advanced Training experiments minimize supervised loss on labeled data instead of running a full RL loop. The Agent step exposes the catalog entries below.

Enterprise only

Supervised and LatentPPO run on tabular and non-tabular datasets, which require an Enterprise plan on the organization. SFT runs on language datasets and only needs a plan that includes Advanced Training. See Plan permissions.

Algorithms on Agent

Name

Typical use

Supervised

Tabular or non-tabular datasets with input and target columns

SFT

Language SFT datasets with prompt and target columns

LatentPPO

Latent module trained between pretrained blocks; shares much of the supervised wizard flow

Which dataset unlocks which algorithm

Dataset on Environment

Algorithms you usually see

Tabular

Supervised only

Non-tabular object detection (with a prior Supervised saved model)

Supervised and LatentPPO

Other non-tabular

Supervised

SFT

SFT

Reasoning or preference

RL-style options in LLM algorithms, not this page

Supervised and SFT still use the full wizard: Agent (algorithm and network), Environment (dataset binding), Training (steps, batch size, optimizer-related fields), and Resources when you need specific compute.

Metrics on Results

Default charts depend on dataset task type and algorithm:

Algorithm

Typical default training charts

Supervised (classification)

Training score (accuracy), evaluation fitness, loss, validation loss, steps

Supervised (object detection)

Training score (mean IoU), pixel accuracy, validation pixel accuracy, loss, validation loss, steps

LatentPPO (object detection)

Composite reward (training score), evaluation mean IoU, loss decomposition (loss_seg, loss_policy, loss_kl), configured reward components (mean_iou, dice, ce, boundary), advantage mean, PPO clip fraction, pixel accuracy when applicable, steps

Score vs fitness: for Supervised, the score chart tracks the training fitness metric (accuracy, mean IoU, MSE). For LatentPPO, score is the weighted composite reward during training; use fitness for validation mean IoU.

After training, deploy Supervised or LatentPPO checkpoints from AgentsAdvanced Training (Connect, then predict snippets). See Create and deploy an agent.