Data Collection
Data collection is one of the most important steps in building an efficent machine learning model. It is the process of creating new data or collecting relevant data from multiple sources for training the machine learning model. The performance of the machine learning model depends on the data on which it has been trained, so collecting qualitative data is crucial for developing a robust model.
DEEPCRAFT™ Studio provides the following options for data collection:
-
Collect real-time data using Graph UX: Graph UX allows you to collect real-time data from built-in PC microphone, camera, sensor or any development board. You can collect labeled as well as unlabeled data, as per your requirement.
-
Bring your own Data: Studio provides the flexibility to import your existing datasets from your local hard drive, allowing you to leverage your already available data.
How to collect real-time data using Graph UX?
Here are the main steps to stream real-time data using different boards or sensors:
Step 1: Choose the Board
Select the board you want to use to stream data into the Studio. You can choose from the Infineon PSoC™ 6 Artificial Intelligence Evaluation Kit, Infineon PSoC™6 Wi-Fi BT Pioneer Kit, or any other sensor or development kit.
Step 2: Flash the Streaming Firmware onto the Board
Before collecting data, you need to flash the streaming firmware onto the board. For the Infineon PSoC™ 6 Artificial Intelligence Evaluation Kit and Infineon PSoC™6 Wi-Fi BT Pioneer Kit, we provide the streaming firmware (Hex files). For other sensors or development boards, you will need to implement your own custom firmware using Tensor Streaming Protocol Version 2. Refer to Tensor Streaming Protocol for Real-Time Data Collection for more details.
To know how to flash the streaming firmware onto the boards, refer to Streaming Firmware for PSoC™ 6 AI Evaluation Kit and Streaming Firmware for PSoC™ 62S2 Wi-Fi BT Pioneer Kit respectively.
For flashing information on boards other than Infineon boards, refer to your specific board documentation.
Step 3: Start collecting real-time data
Utilize the Graph UX functionality within Studio to stream data from any sensor or development board. Refer to the following sections to get started:
- Real-Time Data Collection with PSoC™ 6 AI Evaluation Kit
- Real-Time Data Collection from Sensors using old streaming firmware/protocol
- Implement your own firmware
Best practices
The following are some of the best practices for collecting data:
-
Quality over Quantity: Prioritize collecting high-quality data over large quantities. As the saying goes, “garbage in, garbage out.” While large datasets are valuable, the accuracy and relevance of the data are paramount for building better models.
-
Imbalanced data: In classification projects, ensure that you have balanced datasets. This means having an equal representation of each class to avoid bias and improve model performance.
-
Diverse data: Collect diverse data to build a robust model as diversity in data helps the model generalize better and perform well across different scenarios.