基于原型对齐与自适应恢复的可见光-红外增量域适应目标检测

成桢灏; 杨小冈; 卢瑞涛; 张涛; 康越卿

doi:10.3788/IRLA20250388

基于原型对齐与自适应恢复的可见光-红外增量域适应目标检测

Visible-infrared incremental domain adaptive object detection based on prototype alignment and adaptive recovery

摘要

摘要: 近年来，无人机视觉认知技术在资源勘探、环境监测、及军事侦察等领域备受关注,但无人机视觉认知技术所依赖的目标数据集多局限于晴天等理想条件下的可见光图像，无法覆盖真实场景中多云、雾天、夜晚等复杂多变的环境，使得模型在复杂场景中普遍存在检测精度下降、泛化能力不足的问题。针对上述问题，提出基于原型对齐与自适应恢复的可见光-红外增量域适应目标检测算法：通过师生网络协同机制，构建注意力引导的多模态特征融合模块(Attention-guided Multi-modal FeatureFusion, AMFF)动态融合跨模态特征；设计类别特征原型对齐模块(Class-wise Feature Prototype Alignment Module, CFPA)优化类别原型以增强特征表征与跨域泛化能力；提出原型引导的自适应恢复模块（Prototype Guide Adaptive Restoration Module, PGAR），结合原型记忆库中存储的新旧域原型，通过自适应恢复机制重置不活跃参数以减轻灾难性遗忘，提升模型动态场景下的增量域适应能力。实验结果显示，该算法在Drone Vehicle数据集的Foggy、Night和Dark night等极端场景中检测精度分别为45.2%、46.4%、43.5%，为无人机视觉跨模态域增量学习研究及实际应用提供了重要理论与技术支撑。

Abstract:
Objective The objective of this study is to address the limitations of existing UAV vision-based object detection models, which perform poorly in complex environments (e.g., foggy, night, dark night) due to insufficient visual features from visible light images and suffer from catastrophic forgetting when incrementally adapting to new domains. Specifically, it aims to propose a visible-infrared incremental domain adaptation object detection algorithm that enhances cross-domain generalization ability, mitigates catastrophic forgetting during incremental learning, and improves detection accuracy in dynamic and complex scenarios.
Methods The proposed algorithm employs a teacher-student network collaborative mechanism. It includes three key modules: 1) the Attention-guided Multi-modal Feature Fusion (AMFF) module, which dynamically fuses cross-modal features by strengthening single-modal key information via channel attention and depth-wise convolution, and promoting inter-modal information complementarity through cross-modal interaction. 2) the Class-wise Feature Prototype Alignment (CFPA) module, which optimizes class prototypes by extracting prototypes from target queries, building an adversarial learning framework with gradient reversal layers and discriminators, and using contrastive learning to enhance intra-class consistency and inter-class differences while storing historical prototypes in a prototype memory bank. 3) the Prototype Guide Adaptive Restoration (PGAR) module, which evaluates parameter importance using Fisher information matrix, combines prototypes from previous and current domains in the memory bank, and resets inactive parameters via an adaptive restoration mechanism to alleviate catastrophic forgetting. Additionally, dynamic threshold and dynamic skipping strategies are used to generate high-quality pseudo-labels for supervision, and KL divergence loss is employed for knowledge distillation.
Results and Discussions Experiments were conducted on the Drone Vehicle dataset, with a four-stage domain incremental learning setup (Sunny→Foggy→Night→Dark night). Compared with existing algorithms, the proposed algorithm achieved significantly higher mAP and lower RelGap in Foggy, Night, and Dark night scenarios. For instance, in Dark night, its mAP reached 43.5% compared to 30.8% for the Source method. Ablation experiments verified the effectiveness of each module: incrementally adding AMFF, CFPA, and PGAR consistently improved mAP and reduced RelGap. The results demonstrate that the algorithm effectively enhances adaptability to complex lighting and weather conditions, with superior performance in capturing target integrity and reducing missed or false detections, especially in environments where visible light is insufficient.
Conclusions This study proposes a visible-infrared incremental domain adaptation object detection algorithm based on prototype alignment and adaptive restoration, which effectively improves incremental detection performance of time-sensitive targets in UAV images under complex scenarios. The AMFF module enhances cross-modal feature fusion quality; the CFPA module optimizes class prototypes to strengthen feature representation and cross-domain generalization; the PGAR module mitigates catastrophic forgetting to boost incremental adaptation ability. These contributions lay a theoretical foundation for cross-modal domain incremental learning in UAV vision and provide technical support for practical applications of UAV technology in visual perception. Future work will explore more efficient cross-modal knowledge transfer mechanisms and expand multi-source datasets to enhance model adaptability in extreme cross-modal scenarios.

HTML全文

参考文献(33)

施引文献

资源附件(0)