# Label Analysis & Distribution

Now that we have finished data labelling and imported the data into Studio, the next step is label analysis. By performing a label analysis, we can have a clear idea about the symbol distribution at each dataset and most importantly we make sure to have balanced datasets for training, validation and testing.

# Data

In Imagimob Studio, you can set your desired distribution size for each dataset. To do that, direct to the project file you have created at the Data Labelling and click Data.

Next, you can adjust the slider and get your desired value on Target Size for each dataset, and click Shuffle with one of three balancing metrics to distribute your data for each dataset. As shown in the picture below, the Actual Size is close to the Target Size after shuffling. But if you get the error "All symbols are not present in all sets", then you could try a different balancing metric or adjust your dataset distribution manually at Segments.

At Labels, you can see each dataset's label count. And for each dataset's data time length, you can check at Length. We can also get to know how much Annotated data we feed into each dataset and their percentage among collected data.

# Classes

Next, let us check Classes where you can find the distribution of each class among datasets.

You can also adjust their weight so that you can make sure the class balance before training. Switch between two different display units to see the annotated time of different classes and the label count of classes among datasets.

# Class Weight

Giving a good weight for each class can help to improve the overall F1 classification score after the training phase. The basic principle of choosing right weights is giving a higher weight for the minority class and a lower weight for the majority class.

For example, in our case, the weight that we chose for different classes is based on the ratio of the training set's annotated time. Or you can also use the inverse of the annotated time on the training set as the class weight.

Previous: Data Labelling

Next:Model Building & Training