Logs and metrics¶
Once a run has left Draft, Arena splits training output into log lines and metric charts. Both attach to the experiment.
Training logs¶
Opening logs¶
From the experiments table row menu or results views, choose View logs. Arena opens a full-page log table for that experiment.
View logs does nothing useful for Draft experiments. Schedule training at least once first.
On the logs page¶
You get a searchable log table with a time range filter. Narrow start and end times to match a mutation or evaluation window, then read fitness lines for each agent. Use Refresh to load newer lines while a run is still going. Finished runs use the run’s end time; Running or Stopping runs extend the window to “now.”
Search filters log text (keywords or substrings). Search text is remembered while you stay on the page. Large queries can take a while; wait for the table to fill or refresh again.
Metrics¶
Metrics power the Results tab, checkpoint score columns, and the Resume Experiment checkpoint picker.
When charts appear¶
Status |
What the UI does |
|---|---|
Running or Stopping |
Live metrics for in-progress runs |
Succeeded, Successful, Stopped, Completed |
Completed-run metrics |
Failed, Draft, Pending, and others |
No automatic score fetch for charts |
Where you see them¶
Project Results tab: sections for active and finished runs.
Expanded checkpoint rows: Steps, Training Score, Evaluation Score, Size.
Resume Experiment dialog: optional scores per checkpoint.
Deploy checkpoint: Evaluation score: and Training score: (N/A when absent).
For Supervised and LatentPPO runs, Arena also fetches validation loss and (for object detection) pixel accuracy for checkpoint and resume views, even when those values are not shown as table columns.
Supervised vs LatentPPO score semantics¶
Metric |
Supervised |
LatentPPO |
|---|---|---|
score (training) |
Task fitness (accuracy, mean IoU, MSE) |
Weighted composite reward |
fitness (evaluation) |
Validation fitness |
Validation mean IoU |
pixel_accuracy |
Object detection training secondary |
Object detection training secondary |
mean_iou, dice, ce, boundary |
— |
Per-step reward components (subset configured in run spec); training series are means of per-step batch values, not interval-pooled like Supervised score |
advantage_mean |
— |
Mean PPO advantage per reporting interval |
ppo_clip_fraction |
— |
Fraction of samples clipped by the PPO surrogate per interval |
LatentPPO object-detection runs are detected from the dataset-backed run spec (non-tabular environment plus a UNet-style decoder), not from algorithm.task_type alone.
When you land on Results from Train, a pipeline stage, or a row action, Arena briefly highlights that experiment’s row, then returns the table to normal selection.
Halting and logs¶
Metrics keep updating until the run reaches a terminal status.
Stopping: Train, halt, and resume (Stop training).