Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation

Adnan Abdullah¹, Ruo Chen¹, Ioannis Rekleitis², Md Jahidul Islam¹

International Symposium of Robotics Research (ISRR 2024), Long Beach, California, USA

Pre-print Demo

Overview

Underwater ROVs (Remotely Operated Vehicles) are unmanned submersible vehicles designed for exploring and operating in the depths of the ocean. Despite using high-end cameras, typical teleoperation engines based on first- person (egocentric) views limit a surface operator's ability to maneuver and navigate the ROV in complex deep-water missions.

In this paper, we present an interactive teleoperation interface that (i) offers on-demand “third”-person (exocentric) visuals from past egocentric views, and (ii) facilitates enhanced peripheral information with augmented ROV pose in real-time. We achieve this by integrating a 3D geometry-based Ego-to-Exo view synthesis algorithm into a monocular SLAM system for accurate trajectory estimation. The proposed closed-form solution only uses past egocentric views from the ROV and a SLAM backbone for pose estimation, which makes it portable to existing ROV platforms. Unlike data-driven solutions, it is invariant to applications and waterbody-specific scenes.

We validate the geometric accuracy of the proposed framework through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. A subjective evaluation on 15 human teleoperators further confirms the effectiveness of the integrated features for improved teleoperation. We demonstrate the benefits of Ego-to-Exo view generation and real-time pose rendering for remote ROV teleoperation by following navigation guides such as cavelines inside underwater caves. This new way of interactive ROV teleoperation opens up promising opportunities for future research in subsea telerobotics.

**Field expedition at Devil's Springs, FL, US**

**Field expedition at Cueva del Agua, Murcia, Spain**

Ego-to-Exo Pipeline

We design an efficient Ego-to-Exo (egocentric to exocentric) view generation framework integrated into a monocular visual SLAM system for underwater ROV teleoperation. The proposed Ego-to-Exo algorithm keeps track of the ROV camera poses and exploits a buffer of egocentric views for exocentric view synthesis. We then transform and project a pre-sampled 3D model of the ROV, in the form of a point cloud, into those views to generate realistic augmented visuals with more peripheral information. Such views offer comprehensive information of the surrounding scene with global semantics. In addition to the views, the integrated SLAM system provides real-time pose and map updates for atomic tasks such as obstacle avoidance, object following, next-best-view planning, etc.

The monocular vision-only pipeline ensures generalized utility and computational efficiency. The proposed Ego-to- Exo solution, EOB viewpoint-based parameterization, and ROV point cloud projection – are carried out by closed- form solutions to ensure real-time performance. As opposed to data-driven approaches for exocentric view synthesis, the proposed framework is invariant to the changes in waterbody style, scene geometry, and application scenario – making it transferable to existing underwater ROV platforms.

Proof of Concept: 2D Indoor Navigation

The proof-of-concept experiments are conducted with TurtleBot4, a 2D ground robot that can be teleoperated with egocentric views from its front-facing monocular camera. It has only two degrees of freedom (DOF) for linear and angular velocity - which simplifies the motion kinematics for tracking its instantaneous position and orientation. We teleoperate it to collect monocular visual data in office, laboratory, and hallway scenarios.

As shown in this figure, we use standard checkerboard corners as reference points from egocentric views and then evaluate the reprojection errors for those points from exocentric views. This test is iterated over different sets of past egocentric images, each corresponding to a different EOB distance. A checkerboard is viewed from different EOB distances (further back into the past) indicated by the parameter f. More specifically, f is the number of frames between the current egocentric view and the selected EOB view. We observe that the estimation is accurate for lower values of f, and gradually degenerates for f > 100.

We also visualize the accuracy of the estimated ground plane in these experiments. As shown in the two sample images, the estimated ground plane (and the drawn sample cube) validates the geometric accuracy for f = 70 case. On the other hand, a misaligned ground plane for f = 260 case demonstrates the underlying error in pose estimation as well as in the reprojection process.

Field Deployment: 3D Underwater Cave Exploration

Experimental setup. We extend our experiments to underwater cave exploration scenarios, where the ROV possesses full 6-DOF motion. For remote teleoperation, we consider the scenarios where human operators maneuver an underwater ROV from the surface by following the caveline and other navigation markers as a guide. The mission objective is to navigate the ROV 75-300 feet deep inside the cave through its complex structures, and then safely return it to the surface. In addition to evaluating the geometric accuracy and robustness, we consider how informative the generated Ego-to-Exo views are compared to traditional consoles for underwater ROV teleoperation.

Real-time map update and teleoperation. In addition to the Ego-to-Exo view generation and ROV pose rendering, our framework simultaneously updates a 3D map with extracted feature points from the SLAM system. The above figure shows an ROV's trajectory mapped during an underwater cave mission in Devil’s Springs, Florida. The popup frames show samples of: (a) egocentric images; (b) synthesized exocentric images with rendered ROV pose; (c) the underlying camera pose updates; and (d) an exocentric view of the 3D map. As seen, the generated Ego-to-Exo views embed significantly more peripheral information compared to the existing egocentric views. The exocentric view of the ROV pose and its relative distance from cave walls or overhead obstacles are useful to surface operators for obstacle avoidance and efficient decision-making. Additionally, the 3D map shows the ROV's past trajectory and its current pose which are useful to analyze the mission progress, which is not possible in traditional teleoperation consoles. Such a global view of the trajectory map is also useful during emergency evacuation and recovery. Beyond underwater cave exploration, these features will be crucial in ROV-based subsea surveillance and search-and-rescue operations as well.

Subjective User Study

15 human subjects evaluate the ease of operation with our developed console and compare it to traditional consoles. Their feedback is recorded using the System Usability Scale (SUS), with our interface achieving an average SUS score of 77.5. We also formulate an independent set of questions that reflect the teleoperator's preference for the novel features of our method. The individual questions and corresponding scores are presented in the table below.

#	Question	Mean, Std
System Usability Scale (SUS)
1	I think that I would like to use this system frequently.	4.3, 0.6
2	I found the system unnecessarily complex.	2.0, 0.7
3	I thought the system was easy to use.	4.3, 0.4
4	I think that I would need the support of a technical person to be able to use this system.	2.0, 0.6
5	I found the various functions in this system were well integrated.	4.0, 0.8
6	I thought there was too much inconsistency in this system.	2.3, 0.7
7	I would imagine that most people would learn to use this system very quickly.	4.4, 0.5
8	I found the system very cumbersome to use.	2.0, 0.6
9	I felt very confident using the system.	3.7, 0.7
10	I needed to learn a lot of things before I could get going with this system.	1.4, 0.5
Custom Questions
11	The proposed exocentric view is beneficial for ROV teleoperation.	4.5, 0.5
12	I found the EOB distance tuning feature useful to get the best view.	4.5, 0.5
13	The generated 3D map provides a better understanding of the ROV's global location and its surroundings.	4.6, 0.9

Practical Usage in Underwater ROV Teleoperation

Multiple augmented viewpoints. This figure shows our expedition in cave segments at Devil's Springs, Florida. The experiment reveals that when ROVs move slowly against strong currents, extending the viewpoint distance of the generated exocentric views can significantly improve teleoperation. This is achieved by tuning the queue parameters r, c, and n in the proposed TeleOp interface. We consistently find that exocentric views are more informative, especially for about 5-10 seconds preceding the ROV position during navigation. The multiple preceding views offered by our interface are particularly useful for mapping large structures such as newly discovered cave segments or shipwrecks. As this figure shows, the past ROV trajectories provide more spatial context in the augmented map, enabling operators to control the ROV efficiently around complex underwater structures.

Safer navigation in hazy low-light conditions Underwater caves present a unique formation of silt and sediment on their floor that results from erosion over extended periods. The silt is susceptible to disturbance from external factors, such as the motion of underwater ROVs or the turbulence generated by their propellers. Although ROV operators pay close attention to avoid contact with floor and cave-walls, it is often unavoidable due to buoyancy imbalance and strong flow of water. Dislodging the sediments results in cloudy or hazy conditions that obscure visibility. Bright lights from the ROV reflect from these suspended particles and make it even more challenging to capture clear imagery of the surroundings. In such cases, third-person EOB views from behind the ROV offer a clearer and more informative perspective for navigation; see the figure on the right. It improves spatial awareness and helps the operator to safely move away from the sediment formations toward open, accessible areas and avoid obstructing other scuba divers in the process.

Acknowledgments

This work is supported in part by the NSF grants 2330416, 1943205, and 2024741. The authors would like to acknowledge the help from Woodville Karst Plain Project (WKPP), El Centro Investigador del Sistema Acu ́ıfero de Quintana Roo A.C. (CINDAQ), Global Underwater Explorers (GUE), Ricardo Constantino, and Project Baseline in providing access to challenging underwater caves.