Accelerating Robotics Systems with Large Language Models: Introducing DrEureka

Discover how companies are responsibly integrating AI in production. This invite-only event in San Francisco will explore the intersection of technology and business. Find out how you can attend here.

Large language models (LLMs) have the potential to revolutionize the field of robotics by accelerating the training process and bridging the gap between simulated environments and the real world, according to a recent study conducted by scientists at Nvidia, the University of Pennsylvania, and the University of Texas, Austin. The study introduces DrEureka, a technique that utilizes LLMs to automatically create reward functions and randomization distributions for robotics systems.

Traditionally, when training robotics models for new tasks, a policy is trained in a simulated environment and then deployed in the real world. However, the difference between simulation and reality poses a significant challenge known as the “sim-to-real” gap. Configuring and fine-tuning the policy for optimal performance often requires manual adjustments between simulation and real-world environments.

LLMs offer a solution to this problem by combining their vast knowledge and reasoning capabilities with the physics engines of virtual simulators. They can learn complex low-level skills by designing reward functions that guide the robotics reinforcement learning system to find the correct sequences of actions for a given task.

However, transferring a policy learned in simulation to the real world typically requires extensive manual tweaking of reward functions and simulation parameters. This is where DrEureka comes in. DrEureka builds upon a previous technique called Eureka, which uses LLMs to generate reward functions based on a task description. These reward functions are then run in simulation, and the results are used to modify and improve them.

While Eureka’s reward functions are effective in training RL policies in simulation, they don’t account for the complexities of the real world. DrEureka addresses this issue by automating the configuration of domain randomization (DR) parameters. DR techniques randomize the physical parameters of the simulation environment to ensure that the RL policy can adapt to unpredictable perturbations in the real world.

DrEureka employs a multi-step process to optimize reward functions and domain randomization parameters simultaneously. It starts by generating reward functions based on task descriptions and safety instructions. The model then runs tests to determine the suitable range of physics parameters. Using this information, the LLM selects the optimal domain randomization configurations. Finally, the policy is retrained with the DR configurations to make it robust against real-world noise.

The researchers evaluated DrEureka on various robotic platforms and tasks and found promising results. Policies trained with DrEureka outperformed human-designed systems in quadruped locomotion, forward velocity, and distance traveled. In dexterous manipulation tasks, the best policy trained by DrEureka performed significantly better than human-developed policies.

One of the most interesting findings was the application of DrEureka to the task of having a robo-dog balance and walk on a yoga ball. The LLM was able to design a reward function and DR configurations that allowed the trained policy to be transferred to the real world without additional configurations.

The study also highlighted the importance of safety instructions in ensuring that LLMs generate logical instructions that transfer effectively to the real world. The researchers believe that DrEureka demonstrates the potential of using LLMs to automate the design aspects of low-level skill learning, accelerating progress in robot learning research.

Overall, DrEureka offers a promising approach to bridging the sim-to-real gap in robotics and enhancing the capabilities of robotics systems for real-world applications. By leveraging LLMs’ abilities to generate reward functions and optimize domain randomization parameters, companies can accelerate the training of robotics systems and achieve super-human performance in various tasks.