Logs and metrics¶

Once a run has left Draft, Arena splits training output into log lines and metric charts. Both attach to the experiment.

Training logs¶

Opening logs¶

From the experiments table row menu or results views, choose View logs. Arena opens a full-page log table for that experiment.

View logs does nothing useful for Draft experiments. Schedule training at least once first.

On the logs page¶

You get a searchable log table with a time range filter. Narrow start and end times to match a mutation or evaluation window, then read fitness lines for each agent. Use Refresh to load newer lines while a run is still going. Finished runs use the run’s end time; Running or Stopping runs extend the window to “now.”

Search filters log text (keywords or substrings). Search text is remembered while you stay on the page. Large queries can take a while; wait for the table to fill or refresh again.

Metrics¶

Metrics power the Results tab, checkpoint score columns, and the Resume Experiment checkpoint picker.

When charts appear¶

Status	What the UI does
Running or Stopping	Live metrics for in-progress runs
Succeeded, Successful, Stopped, Completed	Completed-run metrics
Failed, Draft, Pending, and others	No automatic score fetch for charts

Where you see them¶

Project Results tab: sections for active and finished runs.
Expanded checkpoint rows: Steps, Training Score, Evaluation Score, Size.
Resume Experiment dialog: optional scores per checkpoint.
Deploy checkpoint: Evaluation score: and Training score: (N/A when absent).

For Supervised and LatentPPO runs, Arena also fetches validation loss and (for object detection) pixel accuracy for checkpoint and resume views, even when those values are not shown as table columns.

Supervised vs LatentPPO score semantics¶

Metric	Supervised	LatentPPO
score (training)	Task fitness (accuracy, mean IoU, MSE)	Weighted composite reward
fitness (evaluation)	Validation fitness	Validation mean IoU
pixel_accuracy	Object detection training secondary	Object detection training secondary
mean_iou, dice, ce, boundary	—	Per-step reward components (subset configured in run spec); training series are means of per-step batch values, not interval-pooled like Supervised score
advantage_mean	—	Mean PPO advantage per reporting interval
ppo_clip_fraction	—	Fraction of samples clipped by the PPO surrogate per interval

LatentPPO object-detection runs are detected from the dataset-backed run spec (non-tabular environment plus a UNet-style decoder), not from algorithm.task_type alone.

When you land on Results from Train, a pipeline stage, or a row action, Arena briefly highlights that experiment’s row, then returns the table to normal selection.

Halting and logs¶

Metrics keep updating until the run reaches a terminal status.

Stopping: Train, halt, and resume (Stop training).