Scale AI gets into the synthetic data game

Scale AI’s path to becoming a $7.3 billion company was paved in real data from images, text, voice and video. Now, it is using that foundation to get into the synthetic data game, one of the hotter and emerging categories in AI.

They announced Wednesday an early access program to Scale Synthetic, a product that machine learning engineers can use to enhance their existing real-world data sets, according to the company. Scale hired two executives to build out this new division of its business. Scale hired Joel Kronander, who previously headed up machine learning at Nines and was a former computer vision engineer at Apple working on 3D mapping, as its new head of synthetic data. The company also hired Vivek Raju Muppalla as its director of synthetic services. Muppalla was previously director of engineering for AI and simulation at Unity Technologies.

Synthetic data is as it sounds: fake data that has been created by machine learning algorithms rather than using information from the real world. It can be a powerful and handy tool for generating data — like medical imaging — when privacy is a top concern. Developers can use synthetic data to add more complexity to their training models and help remove biases that can often be found in collected real-world data sets.

Scale initially combined software with real images, text, voice and video data labeled by people to give autonomous vehicle companies the labeled data needed to train machine learning models to develop and deploy robotaxis, self-driving trucks and automated bots used in warehouses and on-demand delivery. The startup has since morphed into a data management platform company with customers spanning government, finance, e-commerce, autonomous vehicle and enterprise industries.

Founder and CEO Alexandr Wang described its new offering offering as a hybrid approach to data, akin to lab-grown meat.

“We start with real data, just like how lab raw meat starts from real animal cells, and then grow and iterate and build the product from there,” he told TechCrunch. By using real-world data as the base to create synthetic data, the company is able to offer a really unique and powerful offering for customers, Wang said, adding that this was a gap they saw in the market.

Scale customers saw that gap as well. The company’s push into synthetic data was in response to demand from its customers, Wang told TechCrunch, who said they started building out the product less than a year ago. Autonomous vehicle technology developer Kodiak Robotics, Tractable AI and the U.S. Department of Defense have all tapped Scale for its new synthetic data product, Wang said.

Scale, which today employs about 450 employees, views synthetic data as a top priority in 2022, and an area that it will continue to invest in as it builds out its product line. But that doesn’t mean it will take over its real data business. Wang sees synthetic data as a complementary tool that will help developers “get more bang for their buck out of their algorithms and other AI and particularly with edge cases.

For instance, autonomous vehicle companies typically use simulation to recreate scenarios from the real world and play it back through to see how the autonomous system will handle it. But real-world data might not provide the scenario they’re looking for.

“You don’t run into scenarios in the real world too often where there might be, say 100 bicyclists crossing at once,” Wang explained. “We can start from real-world data and then synthetically add all the bicyclists or all the people and then that way, you can train the algorithm properly.”

Newsletter Signup

Subscribe to our weekly newsletter below and never miss the latest news.