Underwater ROVs (Remotely Operated Vehicles) are unmanned submersible vehicles designed for exploring and
operating in the depths of the ocean. Despite using high-end cameras, typical teleoperation engines based on first-
person (egocentric) views limit a surface operator's ability to maneuver and navigate the ROV in complex deep-water
missions.
In this paper, we present an interactive teleoperation interface that (i) offers on-demand “third”-person (exocentric)
visuals from past egocentric views, and (ii) facilitates enhanced peripheral information with augmented ROV pose in real-time.
We achieve this by integrating a 3D geometry-based Ego-to-Exo view synthesis algorithm into a monocular SLAM system
for accurate trajectory estimation. The proposed closed-form solution only uses past egocentric views from the ROV and a
SLAM backbone for pose estimation, which makes it portable to existing ROV platforms. Unlike data-driven solutions, it is
invariant to applications and waterbody-specific scenes.
We validate the geometric accuracy of the proposed framework through extensive experiments of 2-DOF
indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. A subjective evaluation on 15
human teleoperators further confirms the effectiveness of the integrated features for improved teleoperation. We demonstrate
the benefits of Ego-to-Exo view generation and real-time pose rendering for remote ROV teleoperation by following navigation
guides such as cavelines inside underwater caves. This new way of interactive ROV teleoperation opens up promising
opportunities for future research in subsea telerobotics.
Ego-to-Exo Pipeline
We design an efficient Ego-to-Exo (egocentric to exocentric) view generation framework integrated into a
monocular visual SLAM system for underwater ROV teleoperation. The proposed Ego-to-Exo algorithm keeps track
of the ROV camera poses and exploits a buffer of egocentric views for exocentric view synthesis. We then transform
and project a pre-sampled 3D model of the ROV, in the form of a point cloud, into those views to generate realistic
augmented visuals with more peripheral information. Such views offer comprehensive information of the
surrounding scene with global semantics. In addition to the views, the integrated SLAM system provides real-time pose
and map updates for atomic tasks such as obstacle avoidance, object following, next-best-view planning, etc.
The monocular vision-only pipeline ensures generalized utility and computational efficiency. The proposed Ego-to-
Exo solution, EOB viewpoint-based parameterization, and ROV point cloud projection – are carried out by closed-
form solutions to ensure real-time performance. As opposed to data-driven approaches for exocentric view synthesis, the
proposed framework is invariant to the changes in waterbody style, scene geometry, and application scenario – making it
transferable to existing underwater ROV platforms.
Proof of Concept: 2D Indoor Navigation
The proof-of-concept experiments are conducted with TurtleBot4, a 2D ground robot that can
be teleoperated with egocentric views from its front-facing monocular camera. It has only two degrees of freedom
(DOF) for linear and angular velocity - which simplifies the motion kinematics for tracking its instantaneous position and
orientation. We teleoperate it to collect monocular visual data in office, laboratory, and hallway scenarios.
As shown in this figure, we use standard checkerboard corners as reference points from egocentric views and then evaluate the
reprojection errors for those points from exocentric views. This test is iterated over different sets of past egocentric
images, each corresponding to a different EOB distance. A checkerboard is viewed from different
EOB distances (further back into the past) indicated by the parameter f. More specifically, f is the number of frames
between the current egocentric view and the selected EOB view. We observe that the estimation is accurate for
lower values of f, and gradually degenerates for f > 100.
We also visualize the accuracy of the estimated ground plane
in these experiments. As shown in the two sample images, the estimated ground plane (and the drawn sample cube) validates the
geometric accuracy for f = 70 case. On the other hand, a misaligned ground plane for f = 260 case demonstrates
the underlying error in pose estimation as well as in the reprojection process.
Field Deployment: 3D Underwater Cave Exploration
Experimental setup.
We extend our experiments to underwater cave exploration scenarios, where the ROV possesses full 6-DOF motion.
For remote teleoperation, we consider the scenarios where human operators
maneuver an underwater ROV from the surface by following the caveline and other navigation markers as a guide.
The mission objective is to navigate the ROV 75-300 feet deep inside the cave through its complex structures, and
then safely return it to the surface. In addition to evaluating the geometric accuracy and robustness, we consider how
informative the generated Ego-to-Exo views are compared to traditional consoles for underwater ROV teleoperation.
Real-time map update and teleoperation.
In addition to the Ego-to-Exo view generation and ROV pose rendering, our
framework simultaneously updates a 3D map with extracted feature points from the SLAM system. The above figure shows an
ROV's trajectory mapped during an underwater cave mission in Devil’s Springs, Florida. The popup frames show samples of:
(a) egocentric images; (b) synthesized exocentric images with rendered ROV pose; (c) the underlying camera pose updates;
and (d) an exocentric view of the 3D map. As seen, the generated Ego-to-Exo views embed significantly more peripheral information
compared to the existing egocentric views. The exocentric view of the ROV pose and its relative distance from cave walls or overhead obstacles
are useful to surface operators for obstacle avoidance and efficient decision-making. Additionally, the 3D map shows
the ROV's past trajectory and its current pose which are useful to analyze the mission progress, which is not possible
in traditional teleoperation consoles. Such a global view of the trajectory map is also useful during emergency evacuation
and recovery. Beyond underwater cave exploration, these features will be crucial in ROV-based subsea surveillance and
search-and-rescue operations as well.
Subjective User Study
15 human subjects evaluate the ease of operation with our
developed console and compare it to traditional consoles.
Their feedback is recorded using the System Usability Scale
(SUS), with our interface achieving an average SUS
score of 77.5. We also formulate an independent set of
questions that reflect the teleoperator's preference for the
novel features of our method. The individual questions and
corresponding scores are presented in the table below.
#
Question
Mean, Std
System Usability Scale (SUS)
1
I think that I would like to use this system frequently.
4.3, 0.6
2
I found the system unnecessarily complex.
2.0, 0.7
3
I thought the system was easy to use.
4.3, 0.4
4
I think that I would need the support of a technical person to be able to use this system.
2.0, 0.6
5
I found the various functions in this system were well integrated.
4.0, 0.8
6
I thought there was too much inconsistency in this system.
2.3, 0.7
7
I would imagine that most people would learn to use this system very quickly.
4.4, 0.5
8
I found the system very cumbersome to use.
2.0, 0.6
9
I felt very confident using the system.
3.7, 0.7
10
I needed to learn a lot of things before I could get going with this system.
1.4, 0.5
Custom Questions
11
The proposed exocentric view is beneficial for ROV teleoperation.
4.5, 0.5
12
I found the EOB distance tuning feature useful to get the best view.
4.5, 0.5
13
The generated 3D map provides a better understanding of the ROV's global location and its surroundings.
4.6, 0.9
Practical Usage in Underwater ROV Teleoperation
Multiple augmented viewpoints.
This figure shows our expedition in cave segments at Devil's Springs, Florida. The experiment reveals that when ROVs move slowly against strong currents,
extending the viewpoint distance of the generated exocentric views can significantly improve teleoperation. This is
achieved by tuning the queue parameters r, c, and n in the proposed TeleOp interface. We consistently find that
exocentric views are more informative, especially for about 5-10 seconds preceding the ROV position during navigation.
The multiple preceding views offered by our interface are particularly useful for mapping large structures such as newly
discovered cave segments or shipwrecks. As this figure shows, the past ROV trajectories provide more spatial context in
the augmented map, enabling operators to control the ROV efficiently around complex underwater structures.
Safer navigation in hazy low-light conditions
Underwater caves present a unique formation of silt and sediment on their floor that results from erosion over extended periods.
The silt is susceptible to disturbance from external factors, such as the motion of underwater ROVs or the turbulence
generated by their propellers. Although ROV operators pay close attention to avoid contact with floor and cave-walls,
it is often unavoidable due to buoyancy imbalance and strong flow of water. Dislodging the sediments results in
cloudy or hazy conditions that obscure visibility. Bright lights from the ROV reflect from these suspended particles
and make it even more challenging to capture clear imagery of the surroundings. In such cases, third-person EOB views
from behind the ROV offer a clearer and more informative perspective for navigation; see the figure on the right. It improves spatial
awareness and helps the operator to safely move away from the sediment formations toward open, accessible areas and
avoid obstructing other scuba divers in the process.
Acknowledgments
This work is supported in part by the NSF grants 2330416, 1943205, and 2024741. The authors would like to acknowledge
the help from Woodville Karst Plain Project (WKPP), El Centro Investigador del Sistema Acu ́ıfero de Quintana
Roo A.C. (CINDAQ), Global Underwater Explorers (GUE), Ricardo Constantino, and Project Baseline in providing access
to challenging underwater caves.