Abstract:
Objective Ship target visible light and infrared image fusion is a key technology for target recognition and characteristic research. Traditional registration methods face challenges such as significant differences in imaging conditions, difficult feature matching, and low accuracy in complex environments, which struggle to meet real-time and robustness requirements. Neural Radiance Fields (NeRF) demonstrate substantial potential in 3D scene reconstruction and view synthesis. However, existing NeRF methods primarily target single-modal images and lack effective support for multi-modal registration and fusion. Therefore, it is necessary to develop an improved NeRF-driven method for intelligent augmentation, registration, and fusion of ship visible light and infrared images to enhance detection and recognition accuracy. For this purpose, this paper proposes an improved NeRF-based intelligent augmentation, registration, and fusion method for ship visible light and infrared images.
Methods This paper constructs a multi-modal image registration and fusion system based on improved NeRF, with the overall workflow illustrated in Fig.1. A multi-modal data acquisition platform is established for ship target imaging experiments in visible light and infrared bands, with acquisition scenes shown in Fig.3 and system configuration in Fig.4. A multi-modal NeRF reconstruction network is developed to generate 3D point cloud models, with spatial relationships depicted in Fig.5 and reconstructed models in Fig.6. A cross-modal geometric consistency constraint mechanism is designed to optimize the pre-registration point cloud spatial relationship (Fig.7(a)) into a precise alignment state (Fig.7(b)), achieving sub-pixel-level registration through a joint optimization loss function. An adaptive weight fusion strategy incorporating multi-scale attention mechanisms is employed to dynamically adjust contributions from different modalities, and it is compared and validated against traditional fusion methods (MS-SRIF, PCA-MSIF, CNN-LPIF), with fusion effects shown in Fig.12. The system's registration accuracy and fusion quality are comprehensively verified using simulated and real ship datasets.
Results and Discussions The multi-modal ship image fusion system based on the improved NeRF demonstrates excellent experimental performance. Registration accuracy verification in Tab.3 shows that the point cloud registration success rate reaches 98.43%, indicating high geometric alignment accuracy. The NeRF-RGB model evaluation results in Tab.5 further confirm strong consistency between reconstructed and reference images, with SSIM reaching 0.8193 and PSNR achieving 21.92 dB. Quantitative evaluation in Tab.6 reveals that CNN-LPIF excels in detail enhancement, achieving leading information entropy metrics, while PCA-MSIF outperforms other methods in structural consistency with the best PSNR and SSIM values. In contrast, MS-SRIF exhibits more balanced performance but with relatively higher errors. Comparative experiments also demonstrate that the proposed model requires only 25 minutes of training time, achieving an optimal balance between accuracy and efficiency. Overall, comprehensive evaluation results identify PCA-MSIF as the top-performing method.
Conclusions This paper proposes an improved NeRF-based method for intelligent augmentation, registration, and fusion of ship visible light and infrared images, constructing a multi-modal data acquisition and reconstruction system to achieve sub-pixel-level precise registration and high-fidelity fusion. Experimental results demonstrate that the system outperforms traditional methods in detail preservation, structural consistency, and error control, with the lowest comprehensive evaluation F value of 0.0906 and high consistency in NeRF-RGB rendered images (SSIM 0.8193). The point cloud matching rate remains above 98% under a 5 mm threshold. For various ship targets, the method generates clear fused images, enhancing robustness and real-time performance in maritime surveillance and providing reliable technical support for multi-modal target recognition, with broad application prospects.