改进NeRF驱动的舰船可见光和红外图像智能增广配准与融合方法

Intelligent augmentation and fusion of shipborne visible and infrared images driven by improved NeRF

  • 摘要: 多模态图像融合能够综合多源传感器优势,显著提升目标检测与识别精度。然而,不同模态成像机理差异导致高精度空间配准面临设备精度要求高、时空同步困难及采集成本大的挑战。为此,文中提出一种基于改进神经辐射场(Neural Radiance Fields,NeRF)的图像增广与精确配准方法,生成空间配准的多模态增广图像,从而有效缓解高精度配准训练数据不足的问题。首先,结合多视角实拍图像,利用神经辐射场技术进行三维点云重建获取场景的精确几何与纹理信息。其次,精确约束相机位姿,确保多模态增广图像精确配准。最后,将增广得到的高精度配准图像分别采用多尺度稀疏表示融合(MS-SRIF)、主成分分析多尺度融合(PCA-MSIF)以及卷积神经网络拉普拉斯金字塔融合(CNN-LPIF)三种算法进行融合。实验结果表明,在典型场景中,渲染图像与实拍图像的余弦相似度均超过99%,点云匹配率达到96.66%,验证了增广数据在空间位置和纹理细节上的高度一致性和高配准精度。基于增广数据的融合结果在视觉和客观指标上均表现优异,证明了高配准精度显著提升了融合质量。融合效果分析表明,MS-SRIF在细节保真方面具有优势,PCA-MSIF在梯度保持与结构信息方面表现优异,而CNN-LPIF则在平滑性与抗噪性上具有更好的适应性。综上,文中提出的改进NeRF方法实现了可见光-红外图像的高精度配准,并基于配准结果进行智能数据增广,有效解决了多模态融合中的配准难题与数据稀缺问题,构建了完整的多模态图像处理技术框架。

     

    Abstract:
    Objective Ship target visible light and infrared image fusion is a key technology for target recognition and characteristic research. Traditional registration methods face challenges such as significant differences in imaging conditions, difficult feature matching, and low accuracy in complex environments, which struggle to meet real-time and robustness requirements. Neural Radiance Fields (NeRF) demonstrate substantial potential in 3D scene reconstruction and view synthesis. However, existing NeRF methods primarily target single-modal images and lack effective support for multi-modal registration and fusion. Therefore, it is necessary to develop an improved NeRF-driven method for intelligent augmentation, registration, and fusion of ship visible light and infrared images to enhance detection and recognition accuracy. For this purpose, this paper proposes an improved NeRF-based intelligent augmentation, registration, and fusion method for ship visible light and infrared images.
    Methods This paper constructs a multi-modal image registration and fusion system based on improved NeRF, with the overall workflow illustrated in Fig.1. A multi-modal data acquisition platform is established for ship target imaging experiments in visible light and infrared bands, with acquisition scenes shown in Fig.3 and system configuration in Fig.4. A multi-modal NeRF reconstruction network is developed to generate 3D point cloud models, with spatial relationships depicted in Fig.5 and reconstructed models in Fig.6. A cross-modal geometric consistency constraint mechanism is designed to optimize the pre-registration point cloud spatial relationship (Fig.7(a)) into a precise alignment state (Fig.7(b)), achieving sub-pixel-level registration through a joint optimization loss function. An adaptive weight fusion strategy incorporating multi-scale attention mechanisms is employed to dynamically adjust contributions from different modalities, and it is compared and validated against traditional fusion methods (MS-SRIF, PCA-MSIF, CNN-LPIF), with fusion effects shown in Fig.12. The system's registration accuracy and fusion quality are comprehensively verified using simulated and real ship datasets.
    Results and Discussions The multi-modal ship image fusion system based on the improved NeRF demonstrates excellent experimental performance. Registration accuracy verification in Tab.3 shows that the point cloud registration success rate reaches 98.43%, indicating high geometric alignment accuracy. The NeRF-RGB model evaluation results in Tab.5 further confirm strong consistency between reconstructed and reference images, with SSIM reaching 0.8193 and PSNR achieving 21.92 dB. Quantitative evaluation in Tab.6 reveals that CNN-LPIF excels in detail enhancement, achieving leading information entropy metrics, while PCA-MSIF outperforms other methods in structural consistency with the best PSNR and SSIM values. In contrast, MS-SRIF exhibits more balanced performance but with relatively higher errors. Comparative experiments also demonstrate that the proposed model requires only 25 minutes of training time, achieving an optimal balance between accuracy and efficiency. Overall, comprehensive evaluation results identify PCA-MSIF as the top-performing method.
    Conclusions This paper proposes an improved NeRF-based method for intelligent augmentation, registration, and fusion of ship visible light and infrared images, constructing a multi-modal data acquisition and reconstruction system to achieve sub-pixel-level precise registration and high-fidelity fusion. Experimental results demonstrate that the system outperforms traditional methods in detail preservation, structural consistency, and error control, with the lowest comprehensive evaluation F value of 0.0906 and high consistency in NeRF-RGB rendered images (SSIM 0.8193). The point cloud matching rate remains above 98% under a 5 mm threshold. For various ship targets, the method generates clear fused images, enhancing robustness and real-time performance in maritime surveillance and providing reliable technical support for multi-modal target recognition, with broad application prospects.

     

/

返回文章
返回