# Cleaning up Labels & Data
# Analysing Data
After we imported the data, we see that the data has some issues.
Now, let's tackle the first file. Notice that it only has 1% annotated data. This is strange since we used the tool to automatically label the whole file. We see for all the other files it's 100% as expected. Let's mouse over the cross to find out more.
Timestamps are not sorted is the kind of message you would see if you have subsequent data points where the next point has a timestamp that's before the previous one. This is definitely not something we would want to have in the data and it's likely caused by some issues with the data.
To solve this we right click on the row. Then select either label or data file and click open session.
We can notice 2 things here:
- Mousing over the exclamation mark shows that there is something wrong with the data and we see the same message we saw earlier. If we had navigated to the file before we imported the data into the project we would still have been able to see it but doing it this way means we get a better overlook over all the data.
- The reason we saw that only 1% of the data is labelled is that due to the timestamp issues of the data the Studio thinks that the data starts at 32 seconds hence it doesn't allow you to add labels where there's no data.
# Fixing the Problem
Now, let's actually fix the issue. First, go back to the project file view and now instead of opening the session file let's open the directory instead.
Now, let's inspect the data. You can use whatever tool you want. I will use Excel.
Now the problem has become clear, our data recording has tossed in some random values. They appear to be leftovers from a previous recording or some other bug. We can see that the first 2 rows start at time 32 seconds. After that it starts at 0.2 seconds. This means that the problem is that we need to delete the first 2 rows. One that's done hit save. Note: make sure when you're saving that you're not changing the data format. In my case Excel added " for all the line so I opened the data with an editor and did a find & replace for "->
Opening the file again in Studio we now see that the problem is fixed. But it looks like the label is still not fixed so we will do that manually.
To fix the label, we will move the mouse to the left edge of the label and click and drag the label for the desired length
Now, we fixed everything but let's see if anything is left to check. So we go back to the project file and hit Rescan Files to update the files.
Now the rest of the of the issues are usually that the labels are at or exceeding the data and it's good practice to have the labels inside of the data and not right up at the edge. This is because the behaviour at the edge points is hard to predict.
I trimmed the offending a bit from start and end and rescanned resulting in the following output.
[Next: Dataset Distribution (TBD)]