The Next Ten Years of Robotics

As of 2023, the progress in deploying robots in the real world is hard to miss: autonomous vehicles actively drive passengers without safety drivers in San Francisco and Phoenix, personal drones for videography can autonomously track human movement despite hard-to-sense obstacles like tree branches, and lightweight robotic manipulators have become more accessible to people with motor impairments. Nevertheless, these robots still lack the same generalization abilities that human drivers have, fall short of our desired robustness to unforeseen environments and interactions and do not posses the same personalization abilities that human caretakers exhibit when working with patients. Looking towards the next decade of robotics, what fundamental research directions will help us deploy robots widely and safely throughout society in 2033?

We highlight several such directions—on topics such as uncertainty representation, interaction, modeling and evaluation—that researchers throughout academia, government, and industry can contribute to to advance the next generation of reliable robots.

Real-world robot safety

An assurance of safety is critical to justify widespread deployment of any robotic system. Yet, safety can be defined in numerous ways, and there is a disconnect between the “system-level” view of regulators and the “actor-level” view of practitioners and researchers. Part of this disconnect is due to a lack of structure in key application areas: household robotic tasks and urban traffic are less predictable than other areas, such as air traffic control. An additional challenge is certifying safety of modern machine learning methods that are commonly applied for perception and decision making.

For roboticists, the frontier of safety research is both at the hardware level (e.g., soft or compliant robot design, advanced sensors such as robotic skin or tactile sensing), and at the algorithmic or mathematical level (e.g., verification of human prediction models, detecting out-of-distribution data so a robot can stop and ask for help mid-interaction). Future research should seek to incorporate safety in a realistic and context-dependent way (e.g., navigating around people demands different notions of safety than helping dress the elderly).

Unified representations of uncertainty

Uncertainty is critical for real-world robot decision making.
A good model of uncertainty can enable information gathering actions, safer decisions during unexpected events, and adaptation to new information. Unfortunately, uncertainty is often modeled separately in each component of a robot’s autonomy stack (perception, planning, control, and hardware), leading to a cascade effect wherein a robot cannot take any action if simultaneously considering all sources of uncertainty. Even within one component, it remains unclear how to best model uncertainty (e.g., should perception uncertainty be represented at the pixel level, in bounding boxes or in the state of the robot?). We anticipate that fundamental research on uncertainty representations will enable more effective robot decision-making and can have impact across robotics sectors (e.g., uncertainty-aware algorithms for fruit picking may translate to personal robots).

Long-term interaction & co-adaptation

As robots repeatedly interact with people, humans will adapt to robot behavior. This raises the question: how are humans adapting to robots? We have already seen how humans adapt to other types of automation, such as recommendation systems built into social media, news, advertising and vehicle route guidance. A growing body of work has found that humans adapt in unexpected ways, often causing unintended consequences such as users becoming increasingly polarized through echo chambers and social influence. We anticipate a similar trend in embodied systems like robots, and we should be co-adapting robots by formally studying these effects, modeling them and then actively accounting for them in the design of robots that interact with people.

Real-world human interaction datasets

Large real-world datasets have revolutionized computer vision and natural language processing, and are increasingly having impact in robotics domains like manipulation. Over the next decade, we are excited about the potential of real-world datasets on the human-robot interaction domain as well. However, this leap poses several difficult challenges.

The first is challenge is capturing realistic multi-agent interactions beyond the lab setting. Recently, data released by autonomous driving companies such as Waymo has led to significantly more realistic predictive models of human driving behavior and driving simulators. We should similarly produce large-scale datasets for settings like robotics in the home or warehouses, where current datasets (e.g. HARMONIC and MoGaze) are still small and collected in highly controlled environments. This endeavor is more challenging than in the autonomous driving domain: home / warehouse environments are less structured and therefore will require very large or personalized datasets to capture the wide variety of events and interactions. An exciting frontier is to connect with the computer vision community and its recent advances in 2D-to-3D lifting technology. By converting in-the-wild (e.g., YouTube) videos of humans to full spatio-temporal meshes of human bodies, we can unlock larger and more realistic datasets of human-human interaction for building realistic simulators and behavior predictors that would otherwise not be possible.

Secondly, we need to acknowledge the difference between human-human interaction data, and human-robot interaction data. Although the latter type of data is harder to collect, because it is system dependent and models the closed-loop interaction between the human and the specific robot, it is nonetheless more informative about how robot behavior influences people.

Interplay between data and models

Fundamentally, the reason that we collect datasets is to enable the design of models. For example, the ImageNet dataset has led to substantial innovation in deep convolutional neural models which have pushed the envelope in computer vision. However, in many robotics applications, the community has not settled on a single modeling structure, and therefore it can be difficult to reuse data to design and fit multiple models. For example, in the autonomous vehicle motion prediction domain, many datasets are structured to enable deep neural predictions of driver motion from raw pixel observations. Other paradigms, such as those which model drivers as rational actors in a game, require highly structured intermediate data representations—such as road graphs—and are not able to consume other forms of data like raw pixel observations so readily. We believe that substantial research effort is needed to identify common formats and primitives which will enable the broadest variety of models to be designed and compared on an even playing field.

Evaluating progress in robotics

Right now, it is challenging to evaluate progress in robotics, especially when robots interact in unstructured environments and with human beings. Broadly, evaluation methods can be broken down into four categories:

  1. Small-sample user studies
  2. “Static” dataset evaluation
  3. “Closed-loop” simulation
  4. Real-world deployment

So far, (1, 4) dominate in most “traditional” robotics research, and (2,3) appear to be more common in “learning-oriented” robotics. We believe that each of these approaches gives different insights, and ultimately our community should invest in all of them. However, it is important that we find some consensus on the value of each technique, and how to interpret results. For example, our industry colleagues can be notoriously tight-lipped about the performance of new methods on physical robots or vehicles. Moreover, it is hard to know what the right metrics are to evaluate the “success” of a robot interaction with a human because these metrics may be highly personal, for example different social or cultural norms, or hard to specify mathematically, for example trust.

Concluding thoughts

It is an exciting time to be a roboticist: we are starting to see robots interacting with real end-users and improving our everyday lives and societies. At the same time, the risks of current robot deployment have also become clearer; some of the most salient examples are autonomous vehicle crashes and unintended behaviors, like blocking fire trucks responding to emergencies. Thus, we have sought to highlight key threads of fundamental and applied research that are critical to ensuring we can bring robots out into the real world in a safe and beneficial way. We hope this inspires current and future robotics researchers to consider how they formulate, implement, and evaluate robot safety.

About the Authors

Andrea Bajcsy is an an incoming assistant professor in the Robotics Institute at Carnegie Mellon University (joining Fall 2023). She studies safe human-robot interaction, particularly when robots learn from and learn about people.

David Fridovich-Keil is an assistant professor at the University of Texas at Austin. David’s research spans optimal control, dynamic game theory, learning for control, and robot safety. While he has also worked on problems in distributed control, reinforcement learning, and active search, he is currently investigating the role of dynamic game theory in multi-agent interactive settings such as traffic.

Sylvia Herbert is an assistant professor of Mechanical and Aerospace Engineering at the University of California San Diego. Her group works on the safety analysis and control of autonomous systems, with a focus on algorithms that blend rigor with efficiency and scalability.

Shreyas Kousik is an assistant professor in the George W. Woodruff School of Mechanical Engineering at the Georgia Institute of Technology. His research focuses on defining and implementing robot safety in the full autonomy stack, from hardware through to perception, planning, and control.

DisclaimerAny views or opinions represented in this blog are personal, belong solely to the blog post authors and do not represent those of ACM SIGBED or its parent organization, ACM.