NVIDIA Releases New AI Models and Developer Tools to Advance Autonomous Vehicle Ecosystem

Autonomous vehicle (AV) stacks are evolving from many distinct models to a unified, end-to-end architecture that executes driving actions directly from sensor data. This transition to using larger models is drastically increasing the demand for high-quality, physically based sensor data for training, testing and validation. To help accelerate the development of next-generation AV architectures, NVIDIA Read Article

Autonomous vehicle (AV) stacks are evolving from many distinct models to a unified, end-to-end architecture that executes driving actions directly from sensor data. This transition to using larger models is drastically increasing the demand for high-quality, physically based sensor data for training, testing and validation.

To help accelerate the development of next-generation AV architectures, NVIDIA today released NVIDIA Cosmos Predict-2 — a new world foundation model with improved future world state prediction capabilities for high-quality synthetic data generation — as well as new developers tools.

Cosmos Predict-2 is part of the NVIDIA Cosmos platform, which equips developers with technologies to tackle the most complex challenges in end-to-end AV development. Industry leaders such as Oxa, Plus and Uber are using Cosmos models to rapidly scale synthetic data generation for AV development.

Cosmos Predict-2 Accelerates AV Training

Building on Cosmos Predict-1 — which was designed to predict and generate future world states using text, image and video prompts — Cosmos Predict-2 better understands context from text and visual inputs, leading to fewer hallucinations and richer details in generated videos.

Cosmos Predict-2 enhances text adherence and common sense for a stop sign at the intersection.

By using the latest optimization techniques, Cosmos Predict-2 significantly speeds up synthetic data generation on NVIDIA GB200 NVL72 systems and NVIDIA DGX Cloud.

Post-Training Cosmos Unlocks New Training Data Sources

By post-training Cosmos models on AV data, developers can generate videos that accurately match existing physical environments and vehicle trajectories, as well as generate multi-view videos from a single-view video, such as dashcam footage. The ability to turn widely available dashcam data into multi-camera data gives developers access to new troves of data for AV training. These multi-view videos can also be used to replace real camera data from broken or occluded sensors.

Post-trained Cosmos models generate multi-view videos to significantly augment AV training datasets.

The NVIDIA Research team post-trained Cosmos models on 20,000 hours of real-world driving data. Using the AV-specific models to generate multi-view video data, the team improved model performance in challenging conditions such as fog and rain.

AV Ecosystem Drives Advancements Using Cosmos Predict

AV companies have already integrated Cosmos Predict to scale and accelerate vehicle development.

Autonomous trucking leader Plus, which is building its solution with the NVIDIA DRIVE AGX platform, is post-training Cosmos Predict on trucking data to generate highly realistic synthetic driving scenarios to accelerate commercialization of their autonomous solutions at scale. AV software company Oxa is also using Cosmos Predict to support the generation of multi-camera videos with high fidelity and temporal consistency.

New NVIDIA Models and NIM Microservices Empower AV Developers

In addition to Cosmos Predict-2, NVIDIA today also announced Cosmos Transfer as an NVIDIA NIM microservice preview for easy deployment on data center GPUs.

The Cosmos Transfer NIM microservice preview augments datasets and generates photorealistic videos using structured input or ground-truth simulations from the NVIDIA Omniverse platform. And the NuRec Fixer model helps inpaint and resolve gaps in reconstructed AV data.

NuRec Fixer fills in gaps in driving data to improve neural reconstructions.

CARLA, the world’s leading open-source AV simulator, integrated Cosmos Transfer and NVIDIA NuRec — a set of application programming interfaces and tools for neural reconstruction and rendering — into its latest release. This enables CARLA’s user base of over 150,000 AV developers to render synthetic simulation scenes and viewpoints with high fidelity and to generate endless variations of lighting, weather and terrain using simple prompts.

Developers can try out this pipeline using open-source data available on the NVIDIA Physical AI Dataset. The latest dataset release includes 40,000 clips generated using Cosmos, as well as sample reconstructed scenes for neural rendering. With this latest version of CARLA, developers can author new trajectories, reposition sensors and simulate drives.

Such scalable data generation pipelines unlock the development of end-to-end AV model architectures, as recently demonstrated by NVIDIA Research’s second consecutive win at the End-to-End Autonomous Grand Challenge at CVPR.

The challenge offered researchers the opportunity to explore new ways to handle unexpected situations — beyond using only real-world human driving data — to accelerate the development of smarter AVs.

NVIDIA Halos Advances End-to-End AV Safety

To bolster the operational safety of AV systems, NVIDIA earlier this year introduced NVIDIA Halos — a comprehensive safety platform that integrates the company’s full automotive hardware and software safety stack with state-of-the-art AI research focused on AV safety.

Bosch, Easyrain and Nuro are the latest automotive leaders to join the NVIDIA Halos AI Systems Inspection Lab to verify the safe integration of their products with NVIDIA technologies and advance AV safety. Lab members announced earlier this year include Continental, Ficosa, OMNIVISION, onsemi and Sony Semiconductor Solutions.

Watch the NVIDIA GTC Paris keynote from NVIDIA founder and CEO Jensen Huang at VivaTech, and explore GTC Paris sessions.