CavePI: Autonomous Exploration of Underwater Caves by Semantic Guidance


Overview




Enabling autonomous robots to safely and efficiently navigate, explore, and map underwater caves is of significant importance to water resource management, hydrogeology, archaeology, and marine robotics. In this project, we demonstrate the system design and algorithmic integration of a visual servoing framework for semantically guided autonomous underwater cave exploration. We present the hardware and edge-AI design considerations to deploy this framework on a novel AUV named CavePI. The guided navigation is driven by a computationally light yet robust deep visual perception module, delivering a rich semantic understanding of the environment. Subsequently, a robust control mechanism enables CavePI to track the semantic guides and navigate within complex cave structures. We evaluate the CavePI system through field experiments in underwater caves and spring-water sites, and further validate its ROS-based digital twin in a simulation environment. Our results highlight how these integrated design choices facilitate reliable navigation under feature-deprived, GPS-denied, and low-visibility conditions.



CavePI System Design



The CavePI platform includes visual and acoustic sensors. A front-facing fisheye camera, housed within a transparent dome at the head of the AUV, captures forward-facing visuals with a 160◦ field-of-view (FOV). It is worth noting that the cylindrical enclosure is a 6" tube while the dome has a 4" diameter and we use a custom-built interface to connect the two. The downward-facing low-light camera, mounted inside the computational enclosure, captures downward-facing visuals with an 80◦ × 64◦ FOV for caveline detection and navigation. Additionally, a Ping2 sonar altimeter-echosounder from Blue Robotics™ is mounted on the underside of the robot; the sonar has a range of 100 meters, allowing it to detect obstacles in the surrounding environment beneath CavePI. These sensory components collectively provide robust environmental awareness for autonomous navigation in challenging underwater environments.




The computational and electronic components of CavePI are housed within an acrylic cylindrical enclosure, with a thickness of 6.35 mm and a depth rating of 65 meters. The computational elements include a Raspberry Pi-5, a Nvidia™ Jetson Nano, and a Pixhawk™ flight controller. The Jetson Nano is dedicated to processing visual data from the cameras, performing image processing tasks critical for scene perception and state estimation. The Raspberry Pi-5 manages planning and control modules, ensuring real-time underwater navigation. The Pixhawk flight controller acts as a bridge between hardware and software, receiving actuation commands from the Raspberry Pi-5 and transmitting them to the thrusters and lights via the MAVLink communication protocol. Additionally, the Pixhawk integrates a 9-DOF IMU, offering 3-axis gyroscope, accelerometer, and magnetometer measurements, which are used to calculate the attitude of CavePI during underwater operations. The enclosure also contains a a 14.8 V (18 Ah) battery pack, voltage regulators, electronic speed controllers (ESCs) to drive three-phase brushless motors in the thrusters, and a Bar-30 pressure sensor. The pressure data from the Bar-30 sensor is processed to determine CavePI’s underwater depth, ensuring reliable and accurate interoceptive perception during operations. The end-to-end integration of CavePI ensures that each computational component operates in sync, tied to a ROS2 Humble-based middleware backbone. The modular design also allows for future upgrades, ensuring that the CavePI can be tailored to meet evolving research in marine ecosystem exploration and monitoring.



CavePI Digital twin




We develop a digital twin model of CavePI by using the Unified Robot Description Format (URDF), with links and joints carefully assigned to represent the various CAD components designed in SolidWorks. To replicate the sensor suite of the physical CavePI, Gazebo plugins are integrated to simulate the front-facing camera, down-facing camera, IMU, pressure sensor, and sonar. Additional plugins are employed to simulate environmental forces, including buoyancy, thrust, and hydrodynamic drag, thereby enhancing the physical realism. A controlled open-water scenario is created in Gazebo to simulate realistic missions, featuring a thin line arranged in a rectangular loop to mimic a caveline. Since the simulated environment lacks real-world perception challenges such as low light or turbid water conditions, simpler edge detection and contour extraction techniques are used to identify the caveline from the down-facing camera feed instead of deploying computationally intensive deep visual learning models. The remaining navigation and control subsystems mirror the real-world implementation and operate via ROS nodes.


CavePI Navigation Pipeline




The CavePI AUV is designed to autonomously navigate underwater by following a caveline and other navigation markers. However, a caveline appears a few pixels wide in the bottom-facing camera’s FOV and is significantly challenging to detect in noisy low-light conditions. Object detection models generate bounding boxes around the target, which, given the caveline’s irregular shape and orientation, can encompass significant background regions, complicating the estimation of the heading angle. To address this, we opt for semantic segmentation, which provides pixel-level contours of the caveline as a more precise tracking-by-detection. Onboard resource constraints of CavePI significantly influence the choice of model architecture. The Jetson Nano allocates GPU memory dynamically from a limited 2 GB pool, which must also accommodate the dual camera feeds and Nano-Pi control signal communication. Considering these limitations, we selected a lightweight architecture with MobileNetV3 backbone and DeepLabV3 head, requiring only about 314 MB GPU memory for 11 million parameters, it proved insufficient for accurately segmenting the thin caveline in turbid water conditions.


The path planning uses a Pure Pursuit Controller to track a caveline set up in any shape. The path planning begins by extracting caveline contours, C, from the segmentation map I, for semantic guidance. As the caveline is often detected as fragmented contours, the center of the farthest contour is identified and selected as the next waypoint in CavePI’s path. The heading angle of this waypoint is then calculated with respect to the image center, about the x-axis of the image frame, following the right-hand rule. This heading angle serves as a high-level navigation command, which is transmitted to the Raspberry Pi-5 for execution within the computational subsystem.



Experimental Analysis



Indoor Experimentaion in the Laboratory Tank

The line following experiments are initially conducted in a 2 m×3 m laboratory water tank with a maximum depth of 1.5 m. As a first setup, a line is laid on the tank floor in a rectangular loop, without any depth variation. Subsequently, we add small obstacles and vertical slopes in the tank to simulate the rough underwater terrains. Upon detecting the line via the down-facing camera, CavePI autonomously follows the loop while maintaining a specified depth. The system’s depth-holding capability is assessed in terms of difference between the actual depth, measured using depth sensor data, and target depth. The line-following accuracy is quantified by measuring the tracking error, defined as the distance between the line and the optical center of the down-facing camera on the plane of the line.




Outdoor Experimentaion in Open-Water Springs

These experiments are conducted in shallow riverine areas (2 m - 6 m depth) near springs’ outlets. The experiments included 15 open-water trials in spring areas where a 20-meter rope line was laid in both irregular loop patterns and linear configurations along the uneven riverbed. The field results exhibit greater variation compared to the controlled laboratory experiments due to environmental disturbances like presence of strong currents. Despite these challenges, the depth controller demonstrated robust performance, maintaining a depth within ±10 cm without requiring trial-specific parameter tuning. In contrast, the heading controller proved less resilient under strong currents, resulting in much higher tracking errors.




Outdoor Experimentaion in a Cave/Grotto


These experiments include 10 trailss inside deep natural underwater grottos and caves (15 m - 30 m depth). The cave trials used an actual caveline, arranged in linear pattern, which varied in color, texture, and thickness. Despite being equipped with two onboard lights, CavePI faced considerable challenges in low-light environments due to glare and backscattering effects, which impeded accurate semantic perception. Nighttime trials conducted in an underwater cave/grotto system revealed that the lightweight segmentation models struggled to detect the caveline from camera images. During two one-hour dives, support divers reported instances where submerged tree roots, mosses, and other thin structures were mistakenly identified as the caveline. However, once manually repositioned onto the caveline by the divers, CavePI was able to maintain tracking for up to one minute.





Acknowledgments



This work is supported in part by the UF research grant #132763 and NSF award #2330416. We are thankful to Dr. Nare Karapetyan, Ruo Chen, and David Blow for facilitating our field trials at Ginnie open-water springs and Blue Grotto.