MBAFuse：红外与可见光图像的多分支自编码器融合网络

张王卫; 代孟豪; 黄敏; 李璞; 王昌海; 周斌; 李现伟

doi:10.3788/IRLA20250366

MBAFuse：红外与可见光图像的多分支自编码器融合网络

MBAFuse: Multi-branch self-encoder fusion network for infrared and visible images

摘要

摘要: 红外与可见光图像融合旨在生成同时包含红外图像热辐射信息与可见光图像细节纹理的融合图像。现有自编码器架构在处理多模态图像时多采用共享或独立编码器网络，难以同时有效提取共享特征与模态特有特征。为此，文中提出一种多分支自编码融合架构（MBAFuse），旨在提升红外与可见光图像融合的特征提取与图像重建能力。该方法采用三分支结构，利用密集连接模块提取共享特征，分别结合展望注意力（Outlook Attention）模块与可逆神经网络（INN）模块提取红外与可见光图像的高频与低频特征。同时，引入相关性损失函数以增强共享与特有特征的解耦能力，进一步聚焦于模态差异显著的高频细节。在训练阶段，构建综合损失函数，结合结构相似性损失、均方误差（MSE）损失、梯度损失与相关性损失，并采用两阶段训练策略优化融合性能。解码器部分引入基于多头注意力机制的Restormer模块以增强图像重建效果。在 TNO、MSRS 和 RoadScene 三个公开数据集上，MBAFuse 在定性与定量评估中大多优于现有主流方法。与现有多种主流图像融合方法相比，MBAFuse 在 TNO、MSRS 与 RoadScene 三个数据集上的多个客观指标（如 EN、SD、VIF）均取得显著提升，平均提升幅度分别达 10.2%、29.7% 和 7.5%，体现出良好的融合性能与鲁棒性。此外，MBAFuse 在医学图像和彩色图像融合任务中展现出良好的泛化能力。相关代码已在https://github.com/DMengHao/MBAFuse开源，便于复现与拓展应用。

Abstract:
Objective Infrared and visible image fusion aims to integrate the complementary information of infrared radiation and visible textures into a single image, enhancing both perceptual quality and downstream task performance. However, most existing encoder-decoder architectures struggle to simultaneously extract modality-shared and modality-specific features, leading to suboptimal fusion results. To address these limitations, this paper proposes MBAFuse, a novel end-to-end multi-branch autoencoder fusion network tailored for infrared and visible image fusion. The proposed method is designed to enhance the extraction of complementary features and improve the quality and robustness of the reconstructed fusion image.
Methods MBAFuse adopts a three-branch encoder architecture to separately extract shared and modality-specific features from infrared and visible images. The DenseBlock module is used to extract modality-invariant shared features, while an Invertible Neural Network (INN) and the Outlook Attention module are integrated to capture modality-specific low-frequency and high-frequency details, respectively. To better disentangle the shared and unique components, a modality correlation loss is introduced. The decoder employs a Restormer module, which leverages multi-head self-attention mechanisms to enhance reconstruction fidelity and detail retention. Furthermore, a composite loss function is formulated by combining structural similarity (SSIM) loss, mean squared error (MSE) loss, gradient loss, and correlation loss. The network is optimized using a two-stage training strategy, which gradually improves feature learning and fusion quality.
Results and Discussions Extensive experiments are conducted on three widely used public datasets: TNO, MSRS, and RoadScene. MBAFuse demonstrates superior performance in both qualitative and quantitative evaluations, outperforming seven state-of-the-art fusion methods. Specifically, MBAFuse achieves average improvements of 10.2%, 29.7%, and 7.5% in metrics such as entropy (EN), standard deviation (SD), and visual information fidelity (VIF) across the three datasets, respectively. In addition to infrared-visible fusion, the proposed method is also validated on medical image fusion (e.g., MRI-CT) and RGB image fusion tasks, exhibiting strong generalization capability and robustness across domains. While MBAFuse delivers impressive fusion performance, the complexity of its multi-branch architecture introduces computational overhead, posing challenges in real-time applications and dynamic environments. Future work will focus on model lightweighting and efficiency optimization, enabling real-time fusion for video streams and mobile scenarios.
Conclusions This paper proposes MBAFuse, a novel multi-branch autoencoder fusion network for infrared and visible image fusion. The proposed method effectively combines shared and modality-specific feature extraction through a three-branch encoder architecture incorporating DenseBlocks, an Invertible Neural Network (INN), and an Outlook Attention module. By introducing a comprehensive loss function and a two-stage training strategy, MBAFuse significantly enhances both the visual quality and structural consistency of the fused images. Experimental results on the TNO, MSRS, and RoadScene datasets demonstrate that MBAFuse outperforms several state-of-the-art fusion methods in both objective metrics and subjective evaluations. Moreover, the method exhibits strong generalization ability across medical and RGB image fusion tasks. Despite the increased complexity introduced by the multi-branch structure, MBAFuse presents a robust and effective solution for multi-modal image fusion. Future work will optimize the network architecture for real-time performance and extending the approach to dynamic and video-based fusion scenarios.

HTML全文

参考文献(32)

施引文献

资源附件(0)