Upload an object detection dataset

Object detection datasets are non-tabular datasets whose labels are a folder of .tiff segmentation masks. After upload they train with Supervised (and LatentPPO) on Advanced Training.

Enterprise only

Non-tabular datasets require an Enterprise plan on the organization. See Tabular and non-tabular access.

Prerequisites

  • An Advanced Training project on an Enterprise organization.

  • A dataset directory on disk with a single image folder of samples plus a folder of .tiff masks. Each mask filename stem matches its image’s filename stem. The folder names are up to you — you’ll select which folder holds the targets in the modal. This tutorial uses images/ and targets/ as examples.

Expected layout

Object detection uses one image feature (the images/ folder, mapped to the model’s image modality) and a folder of mask targets:

Dataset Root/
  images/         # → "image" modality: sample_01.png, sample_02.png, …
  targets/        # .tiff masks: sample_01.tiff, sample_02.tiff, … (any folder name)

You select the mask folder on the Targets step, and preprocessing uses that selection — so the folder can have any name (targets/ here is just an example).

1. Open Datasets

Click Datasets in the sidebar, then New dataset.

Sidebar with Datasets highlighted

Datasets in the sidebar.

2. Choose the type

Pick category Other, then Non-tabular. Category Other is disabled unless the organization is on an Enterprise plan.

Create dataset modal with Other and Non-tabular selected

Other category and Non-tabular type with Task Type visible.

3. Set the task type

Set Task Type to Object detection. The modal switches to the mask-folder targets flow (a single .parquet targets file is not used for detection).

Task Type set to Object Detection

Object Detection selected as the task type.

4. Upload the whole directory

On Files, upload your dataset directory in one go — both the images/ feature folder and the targets/ folder of .tiff masks. The diagram shows the expected object-detection layout, and each subfolder is listed under Detected directories. Large uploads continue in the background while you keep working.

Directory upload with the detected images and targets folders

The images and targets directories detected after the directory upload.

5. Select the targets

On Targets, you don’t upload again — you pick which of the directories you just uploaded holds your targets. Arena pre-selects the most likely folder (here, targets); change it if needed.

Targets step with the targets folder selected from the upload

Targets step with the targets mask folder selected from your upload.

6. Map the feature and save

On Feature Mapping, pick a supported vision model (for example ResNet50) and map the images feature to the image modality. The targets folder you selected on the previous step is excluded from feature mapping automatically. Save the mapping.

Feature mapping with the images feature set to the image modality

Feature Mapping with a vision model and the images feature mapped to image.

7. Saved dataset

Confirm save. The dataset appears in the list for use in Advanced Training experiments.

Datasets list with new object detection dataset

The dataset listed after save.

8. Preprocess the dataset

Preprocessing runs the dataset’s selected encoder over the raw files once and stores the encoded result. Every experiment then trains on ready-made tensors instead of re-encoding images and masks on each run. For object detection, preprocessing encodes the feature images and prepares the mask targets. You can run a full rebuild or an incremental run that only encodes newly added files.

Open the dataset and switch to the Preprocessing tab. The tab appears for tabular and non-tabular datasets on Enterprise plans.

Preprocessing tab empty state with Preprocess dataset button

Preprocessing tab before the first run.

Click Preprocess dataset to schedule a job. Arena spins up a cluster, encodes the dataset, and reports progress on this tab; metrics appear when the job finishes. When the status shows Preprocessing completed successfully, the dataset is ready to attach in an Advanced Training experiment.

Preprocessing job in progress on the Preprocessing tab

Preprocessing job in progress.