# Data Management (Your First Edge AI Project)

Note: This page is a part of the tutorial Your First Edge AI Project

# Background

Before we can build and train an Edge AI model, we need to manage annotated data into different sets. These data sets will be used for training, validation and testing of the models that we will build.

# Create a new project

Right-click on the Workspace folder (HumanActivity in this example) in the Explorer and create a New Project File selecting Add -> New Project File....

Next, choose the right Project Type (Classification in this example), name your project and click "OK".

# Import data in the Data tab

Now it is time to fill the new project with data.

In the Data tab, click on the "Add Segments..." button.

Select a folder containing your recording sessions ("Data" in our example).

Read more: Import pre-collected data into our .imsession format or use Imagimob Capture to capture data from a sensor or an edge device.

Import tool

The import tool locates all sessions, all corresponding labels, and data tracks within the folder.

Select the "label1" label track, and the "accel" data track and press "OK".

You will find that the data by default is put in the Unassigned dataset, similar to the view below.

Fig. segments overview.

# Data sets and data distribution

To get a valid project we need to distribute the data to the train, validation and test datasets used for training, validating, and testing the AI models that you will create. To do that, open the Redistribute Sets dialog, shown below.

Fig. segments overview.

Here you can choose from which data sets to take the data and choose how to configure the labels in the dataset. Go with the default presets. Finally, use the grapical tool to distribute the data to the different datasets. The target sizes can either be based on Label Count or Annotated Time.

As a rule of thumb, the training set should be significantly bigger than the other sets. Some standard splits are (train/validation/test) 60/20/20 or 80/10/10. Generally speaking, the more data you have collected, the smaller you can make your validation and test set target size.

More information about datasets can be found at Wikipedia (opens new window).

Click "OK". Once all the sessions (data + labels) have been imported you will see an overview of the database contents, similar to the image below. Notice that the data has now been moved from the Unassigned dataset to the other datasets according to the values in the Redistribution tool.

Fig. segments overview.

Note: Save your progress in Studio at any time by pressing 'Ctrl + S'

Note: Read more about best practices, error handling, and other details of Data Management in Studio in Data Management In Depth

Next Section (Model Building)