Replicator for Synthetic Data Generation

Status

Open

We estimate that the final, edited video will be 10-15 minutes in duration.

Goal: Show how to generate datasets for AI training (Computer Vision).

Can be Bounding Box, Segmentation, Depth etc.

Outline/Video Flow:

Introduce yourself and the goal of the video/what will viewers learn.

Please include:

What synthetic data generation (SDG) is and why is it important? (e.g., filling gaps where real data is expensive)

How do you setup this project? What is the structure?

How do you set up a scene and a Replicator script?
Explain the folder structure and key files.
Briefly explain the role of each key file.

How do you implement the randomization and synthetic data generation?

How do you randomize environment variables, such as lighting, materials, object placement? Why is this randomization important?
How do you define annotations (i.e. bounding boxes, segmentation masks, keypoints)?
How do you export the dataset?
Demo: Generate a small dataset of x, y with randomized lighting and show exported JSON + images.

How do would you use this dataset for AI computer vision training?

How would you train a model, i.e. using RoboFlow or custom models like FasterRCNN, Yolo.. etc using this dataset?

Takeaway: Understand how to prototype synthetic datasets for training models.
Additional info:

Best to build on top of existing tutorials from NVIDIA docs and reference them
Please make a few suggestions on synthetic data topics