Upload an object detection dataset¶

Object detection datasets are non-tabular datasets whose labels are a folder of .tiff segmentation masks. After upload they train with Supervised (and LatentPPO) on Advanced Training.

Enterprise only

Non-tabular datasets require an Enterprise plan on the organization. See Tabular and non-tabular access.

Prerequisites¶

An Advanced Training project on an Enterprise organization.
A dataset directory on disk with a single image folder of samples plus a folder of .tiff masks. Each mask filename stem matches its image’s filename stem. The folder names are up to you — you’ll select which folder holds the targets in the modal. This tutorial uses images/ and targets/ as examples.

Expected layout¶

Object detection uses one image feature (the images/ folder, mapped to the model’s image modality) and a folder of mask targets:

Dataset Root/
  images/         # → "image" modality: sample_01.png, sample_02.png, …
  targets/        # .tiff masks: sample_01.tiff, sample_02.tiff, … (any folder name)

You select the mask folder on the Targets step, and preprocessing uses that selection — so the folder can have any name (targets/ here is just an example).

1. Open Datasets¶

Click Datasets in the sidebar, then New dataset.

Sidebar with Datasets highlighted — **Datasets** in the sidebar.¶

2. Choose the type¶

Pick category Other, then Non-tabular. Category Other is disabled unless the organization is on an Enterprise plan.

Create dataset modal with Other and Non-tabular selected — **Other** category and **Non-tabular** type with **Task Type** visible.¶

3. Set the task type¶

Set Task Type to Object detection. The modal switches to the mask-folder targets flow (a single .parquet targets file is not used for detection).

Task Type set to Object Detection — **Object Detection** selected as the task type.¶

4. Upload the whole directory¶

On Files, upload your dataset directory in one go — both the images/ feature folder and the targets/ folder of .tiff masks. The diagram shows the expected object-detection layout, and each subfolder is listed under Detected directories. Large uploads continue in the background while you keep working.

Directory upload with the detected images and targets folders — The `images` and `targets` directories detected after the directory upload.¶

5. Select the targets¶

On Targets, you don’t upload again — you pick which of the directories you just uploaded holds your targets. Arena pre-selects the most likely folder (here, targets); change it if needed.

Targets step with the targets folder selected from the upload — **Targets** step with the `targets` mask folder selected from your upload.¶

6. Map the feature and save¶

On Feature Mapping, pick a supported vision model (for example ResNet50) and map the images feature to the image modality. The targets folder you selected on the previous step is excluded from feature mapping automatically. Save the mapping.

Feature mapping with the images feature set to the image modality — **Feature Mapping** with a vision model and the `images` feature mapped to **image**.¶

7. Saved dataset¶

Confirm save. The dataset appears in the list for use in Advanced Training experiments.

Datasets list with new object detection dataset — The dataset listed after save.¶

8. Preprocess the dataset¶

Preprocessing runs the dataset’s selected encoder over the raw files once and stores the encoded result. Every experiment then trains on ready-made tensors instead of re-encoding images and masks on each run. For object detection, preprocessing encodes the feature images and prepares the mask targets. You can run a full rebuild or an incremental run that only encodes newly added files.

Open the dataset and switch to the Preprocessing tab. The tab appears for tabular and non-tabular datasets on Enterprise plans.

Preprocessing tab empty state with Preprocess dataset button — **Preprocessing** tab before the first run.¶

Click Preprocess dataset to schedule a job. Arena spins up a cluster, encodes the dataset, and reports progress on this tab; metrics appear when the job finishes. When the status shows Preprocessing completed successfully, the dataset is ready to attach in an Advanced Training experiment.

Preprocessing job in progress on the Preprocessing tab — Preprocessing job in progress.¶

Upload an object detection dataset¶

Prerequisites¶

Expected layout¶

1. Open Datasets¶

2. Choose the type¶

3. Set the task type¶

4. Upload the whole directory¶

5. Select the targets¶

6. Map the feature and save¶

7. Saved dataset¶

8. Preprocess the dataset¶

Related¶