Status
Open
We estimate that the final, edited video will be 10-15 minutes in duration.
Goal: Show how to generate datasets for AI training (Computer Vision).
- Can be Bounding Box, Segmentation, Depth etc.
Outline/Video Flow:
- Introduce yourself and the goal of the video/what will viewers learn.
- What synthetic data generation (SDG) is and why is it important? (e.g., filling gaps where real data is expensive)
- How do you setup this project? What is the structure?
- How do you set up a scene and a Replicator script?
- Explain the folder structure and key files.
- Briefly explain the role of each key file.
- How do you implement the randomization and synthetic data generation?
- How do you randomize environment variables, such as lighting, materials, object placement? Why is this randomization important?
- How do you define annotations (i.e. bounding boxes, segmentation masks, keypoints)?
- How do you export the dataset?
- Demo: Generate a small dataset of x, y with randomized lighting and show exported JSON + images.
- How do would you use this dataset for AI computer vision training?
- How would you train a model, i.e. using RoboFlow or custom models like FasterRCNN, Yolo.. etc using this dataset?
Please include:
- Takeaway: Understand how to prototype synthetic datasets for training models.
- Additional info:
- Best to build on top of existing tutorials from NVIDIA docs and reference them
- Please make a few suggestions on synthetic data topics