# Data Management (Your First Edge AI Project)

Note: This page is a part of the tutorial Your First Edge AI Project

# Background

Before we can build and train an Edge AI model, we need to manage annotated data into different sets. These data sets will be used for training, validation and testing of the models that we will build.

# Create a new project

Right-click on the Workspace folder (HumanActivity in this example) in the Explorer and create a New Project File selecting Add -> New Project File....

Next, choose the right Project Type (Classification in this example), name your project and click "OK".

# Import data in the Data tab

Now it is time to fill the new project with data.

In the Data tab, click on the "Add Segments..." button.

Select a folder containing your recording sessions ("Data" in our example).

Read more: Import pre-collected data into our .imsession format or use Imagimob Capture to capture data from a sensor or an edge device.

Import tool

The import tool locates all sessions, all corresponding labels, and data tracks within the folder.

Select the "label1" label track, and the "accel" data track and press "OK".

Once all the sessions (data + labels) have been imported you will see an overview of the database contents.

Fig. segments overview.

The database is automatically divided into three different sets (train/validation/test) used for training, validating, and testing the AI models that you will create.

More information about datasets can be found at Wikipedia (opens new window).

# Data sets and data distribution

Once the data has been imported, we can manage it in different ways to affect the training.

The first thing that we are interested in is the split between the different data sets. This can be controlled by adjusting the slider of the target size (%) for the different sets as below.

Note: As a rule of thumb, the training set should be significantly bigger than the other sets. Some standard splits are (train/validation/test) 60/20/20 or 80/10/10. Generally speaking, the more data you have collected, the smaller you can make your validation and test set target size.

After updating the target size(s), select the method in the drop list and click on the button for your change to take effect.

Shuffling strategies

  • Shuffle (Balanced Label Count): Shuffle data based on the number of labels
  • Shuffle (Balanced Annotated): Shuffle data based on annotated/label time

After shuffling, as we can see, the Actual size is the same as the Target size.

The class table under the Classes tab shows the distribution of labels/symbols across the different data sets. If your database is successfully validated, you can move to the next part of the guide. See the example below.

A validated database

Note: Save your progress in Studio at any time by pressing 'Ctrl + S'

Note: Read more about best practices, error handling, and other details of Data Management in Studio in Data Management In Depth

Next Section (Model Building)