Data management

After you have added data to the project, you need to distribute the data to the train, validation and test datasets used for training, validating, and testing the machine learning model.

To redistribute data, follow the steps:

  1. Open the project file.

  2. In the Data tab, click Redistribute Sets. The redistribute data tool window appears.


  3. Select from which data sets to take the data and choose how to configure the labels in the dataset. Go with the default presets.

  4. Use the grapical tool to distribute the data to the different datasets. The target size can either be based on Label Count or Annotated Time.


ℹ️

It is recommended to keep the training set significantly bigger than the validation and test sets. Some standard splits are (train/validation/test) 60/20/20 or 80/10/10. The more data you have collected, the smaller you can make your validation and test set target size.

  1. Click Redistribute. The data is distributed from the Unassigned dataset to the other datasets according to the values in the redistribution tool.