Non-tabular datasets

Non-tabular is under Other: file trees (images, folders, and similar) for supervised Advanced Training with a supported model and per-feature mapping.

Enterprise only

Non-tabular (supervised) datasets require an Enterprise plan on the organization. See Tabular and non-tabular access.

Non-tabular datasets use the same access rule as tabular: an Enterprise plan on the organization.

Create

  1. New datasetOtherNon-tabular

  2. Task Type — All four options, including Object detection

  3. Files — Upload your whole directory in one go — both your feature folders and your targets (large uploads continue in the background). The diagram shows the task-type-specific layout, and every uploaded folder is listed under Detected directories.

  4. Targets — Select which of the directories you uploaded holds your targets (Arena pre-selects the most likely one). For Object detection this is a folder of .tiff masks whose filename stems match the samples; for the other task types it’s a single .parquet file with an id column plus a target or label column, whose label column you confirm here. The targets folder/file can have any name — preprocessing uses your selection, not the name. An upload option is offered only if you didn’t include targets in the directory.

  5. Feature Mapping — Pick a supported model and map the remaining detected directories to model modalities; set optional encoder kwargs and ignore index where the form shows them. The directory you chose as the targets is excluded automatically.

File structure

Each subfolder is one feature that maps to a single model modality. Different modalities go in different folders — for example, a multimodal dataset for an image-and-text model:

Dataset Root/
  image/             # → "image" modality
  text/              # → "text" modality
  labels.parquet     # targets: id + target/label (any name)

Naming a folder after a modality (image, text) lets Arena map it automatically. For object detection you upload a single image feature plus a folder of .tiff masks:

Dataset Root/
  images/            # → "image" modality
  masks/             # targets: .tiff masks, stems matching samples (any name)

You pick the targets file/folder on the Targets step, and preprocessing uses that selection — so it can be named anything (the names above are just examples).

Data tab

Data shows a file browser and an edit mode for mapping, task type, model, and labels. You can remove selected files from the dataset when editing.

Object detection needs a labels folder and a model whose kwargs the form understands.

Preprocessing

Preprocessing matches tabular: custom encoder code, class picker, Run Preprocessing. Tabular’s “columns must be selected” rule does not apply; non-tabular jobs follow the rules on that tab for file-based data.

Experiments

Advanced Training only. The wizard’s environment flow picks up model, paths, and supervised algorithms.

When to use non-tabular instead of tabular

Use non-tabular for images, multimodal encoders, or object detection. Use tabular when every feature and the label live as columns in one table file.

See also