Logs and metrics

Once a run has left Draft, Arena splits training output into log lines and metric charts. Both attach to the experiment.

Training logs

Opening logs

From the experiments table row menu or results views, choose View logs. Arena opens a full-page log table for that experiment.

View logs does nothing useful for Draft experiments. Schedule training at least once first.

On the logs page

You get a searchable log table with a time range filter. Narrow start and end times to match a mutation or evaluation window, then read fitness lines for each agent. Use Refresh to load newer lines while a run is still going. Finished runs use the run’s end time; Running or Stopping runs extend the window to “now.”

Search filters log text (keywords or substrings). Search text is remembered while you stay on the page. Large queries can take a while; wait for the table to fill or refresh again.

Metrics

Metrics power the Results tab, checkpoint score columns, and the Resume Experiment checkpoint picker.

When charts appear

Status

What the UI does

Running or Stopping

Live metrics for in-progress runs

Succeeded, Successful, Stopped, Completed

Completed-run metrics

Failed, Draft, Pending, and others

No automatic score fetch for charts

Where you see them

  • Project Results tab: sections for active and finished runs.

  • Expanded checkpoint rows: Steps, Training Score, Evaluation Score, Size.

  • Resume Experiment dialog: optional scores per checkpoint.

  • Deploy checkpoint: Evaluation score: and Training score: (N/A when absent).

For Supervised and LatentPPO runs, Arena also fetches validation loss and (for object detection) pixel accuracy for checkpoint and resume views, even when those values are not shown as table columns.

Supervised vs LatentPPO score semantics

Metric

Supervised

LatentPPO

score (training)

Task fitness (accuracy, mean IoU, MSE)

Weighted composite reward

fitness (evaluation)

Validation fitness

Validation mean IoU

pixel_accuracy

Object detection training secondary

Object detection training secondary

mean_iou, dice, ce, boundary

Per-step reward components (subset configured in run spec); training series are means of per-step batch values, not interval-pooled like Supervised score

advantage_mean

Mean PPO advantage per reporting interval

ppo_clip_fraction

Fraction of samples clipped by the PPO surrogate per interval

LatentPPO object-detection runs are detected from the dataset-backed run spec (non-tabular environment plus a UNet-style decoder), not from algorithm.task_type alone.

When you land on Results from Train, a pipeline stage, or a row action, Arena briefly highlights that experiment’s row, then returns the table to normal selection.

Halting and logs

Metrics keep updating until the run reaches a terminal status.

Stopping: Train, halt, and resume (Stop training).