ML Assisted Labeling
Machine learning-assisted labeling (also called auto-labeling) is the process of using a trained machine learning model to generate labels for the dataset automatically. ML-assisted labeling saves a lot of time by improving the accuracy and speeding up the labeling process.
ML assisted labeling involves the following stages -
- Label a subset of dataset manually
- Train the machine learning model
- Use the trained model to label the additional dataset
- Evaluate and fine-tune the machine learning model
- Iterate
How does the ML assisted labeling work in Studio?
You can start using ML-assisted labeling by first adding labels to a subset of your dataset manually and then assigning the dataset to the train and the validation set. After distributing the data, train the initial machine learning model on your manually labeled data. Once the model is trained, you can use the model to start labeling the remaining dataset. The next step is to inspect and correct the new labels generated by the model, in case the model misses to label some events. After evaluating the performance of the model, you can either use the same machine learning model to label more datasets or train a new model by adding more data to the 'train' and 'validation' set. As you keep iterating, over time the model will accurately label more and more data automatically.
ML assisted labeling is supported for models that are trained in DEEPCRAFT™ Studio only.
Prerequisites:
To make optimized use of auto-labeling, consider the following:
-
Organize most of your sessions so that each session only contains instances of one unique class. For instance, sessions should contain either the class 'up' or 'down'. You should not mix same classes in same sessions, except for in a few sessions that will be used testing. In case you do not organise your sessions as recommended, you can still use this feature, but you will need to label more data manually before you see good results.
-
Follow the same principle to collect and import your data before using ML-assisted labeling. This way to organise the sessions will also help with manual labeling as it will be clear which class is labeled in a particular session.
-
Keep descriptive names for the folders before importing data into the DEEPCRAFT™ Studio, so that the name of the sessions are easy to interpret. For instance, organise your data in the 'background', 'up' or 'down' folders, that helps to know which class is labeled in a session.
-
Locate all the data in your project which is not labeled to the 'Unassigned set'.
-
Label your data with one class at a time. For instance, start labeling the class down on the subset of dataset for all the events and then begin with the class up.
How to label data using ML assisted labeling?
To label using the ML assisted labeling, follow the steps:
-
Navigate to your project directory and double-click the project file (.improj).
The project file opens in a new tab. -
Decide a class to start labeling, for instance, lets start with labeling the class down.
-
In the Data tab, open the session files and add labels to a part of the dataset manually. To know how to add labels in the dataset, refer to Adding labels.
-
After labeling the dataset, assign most of the newly labeled sessions into the train set and a few sessions into the validation set.
-
In Set column, select Train or Validation from the drop down list for the respective sessions.
The training set will be used to train the machine learning model and the validation set will be used to evaluate the model performance on data it is not trained on.
After labeling approximately 80 events, you can start training the first model.
-
In the Training tab, click the Generate Model List and configure the parameters in Model Wizard to generate a list of models. You can start with the default settings and change, if needed. To know how to generate a model, refer to Generating model.
-
Start a new training job and wait for it to complete.
-
After the training job is completed, compare the models and select the one with the best validation statistics. You will see that the platform selects the best model for you. However, you must verify by looking at the distribution of the Predicted Observations and the Actual Observations in the Validation Statistics tab.
-
Download the the trained machine learning model (.h5 file). You can now use this model to label additional data automatically.
-
In the Data tab, select the unlabeled sessions which are not assigned to the training or validation set. You can select multiple entries in the list by clicking and holding down the shift key or manually by clicking on the checkboxes to the left of each entry.
- Right-click and select Generate Labels from the list of options. The Generate Labels Wizard window appears.
-
In Generate Labels Wizard dialog, select Model Assisted and select the model that you just downloaded.
-
In Method, select Model Assisted as the option to generate labels.
-
In Source, browse to select the trained model that you downloaded.
-
In Target track, select the Target type as Existing Label Track. In our example all of the sessions contain a label track named 'label'. We select to put the generated labels in this track.
-
Use the default settings and click Next.
-
Wait for the ML assisted labeling to complete and click OK.
-
Open the newly auto-labeled session to inspect the labeling. If the model missed labeling some events, add labels to the events manually. Similarly, inspect the other newly labeled sessions.
-
Depending on the results of the ML-assisted labeling, you have two options:
-
Use the same machine learning model to label more sessions
-
Inspect and correct the newly labeled sessions, add the sessions to the 'train' and 'validation' set and train a new model, then use the new model to label more sessions. As you iterate, you will see that the results gets better as your model is trained on more and more data.
-
Labeling the other class
-
ML-assisted labeling yeilds the best results when labeling one class at a time. After labeling the dataset with class down, lets start with labeling the class up. Before moving over to label the next class, move all sessions in the project into the Unassigned set.
-
Repeat the process from step 2 as listed above to label the new class up. Similarly, label all your classes in the dataset.
-
In case, you see a warning Set does not contain labels from all classes..., ignore the warning and continue labeling. This warning appears because we are just including one class at a time into the training. However, when you are done with labeling and want to train a model that can classify all classes simultaneously, you should resolve this warning.
-
You can use the From existing label track method of the Generate labels wizard to modify manual or previously generated label tracks. This is useful if you want to offset multiple label tracks, or change the size of the labels.