Abstract:
Objective The objective of this study is to address the limitations of existing UAV vision-based object detection models, which perform poorly in complex environments (e.g., foggy, night, dark night) due to insufficient visual features from visible light images and suffer from catastrophic forgetting when incrementally adapting to new domains. Specifically, it aims to propose a visible-infrared incremental domain adaptation object detection algorithm that enhances cross-domain generalization ability, mitigates catastrophic forgetting during incremental learning, and improves detection accuracy in dynamic and complex scenarios.
Methods The proposed algorithm employs a teacher-student network collaborative mechanism. It includes three key modules: 1) the Attention-guided Multi-modal Feature Fusion (AMFF) module, which dynamically fuses cross-modal features by strengthening single-modal key information via channel attention and depth-wise convolution, and promoting inter-modal information complementarity through cross-modal interaction. 2) the Class-wise Feature Prototype Alignment (CFPA) module, which optimizes class prototypes by extracting prototypes from target queries, building an adversarial learning framework with gradient reversal layers and discriminators, and using contrastive learning to enhance intra-class consistency and inter-class differences while storing historical prototypes in a prototype memory bank. 3) the Prototype Guide Adaptive Restoration (PGAR) module, which evaluates parameter importance using Fisher information matrix, combines prototypes from previous and current domains in the memory bank, and resets inactive parameters via an adaptive restoration mechanism to alleviate catastrophic forgetting. Additionally, dynamic threshold and dynamic skipping strategies are used to generate high-quality pseudo-labels for supervision, and KL divergence loss is employed for knowledge distillation.
Results and Discussions Experiments were conducted on the Drone Vehicle dataset, with a four-stage domain incremental learning setup (Sunny→Foggy→Night→Dark night). Compared with existing algorithms, the proposed algorithm achieved significantly higher mAP and lower RelGap in Foggy, Night, and Dark night scenarios. For instance, in Dark night, its mAP reached 43.5% compared to 30.8% for the Source method. Ablation experiments verified the effectiveness of each module: incrementally adding AMFF, CFPA, and PGAR consistently improved mAP and reduced RelGap. The results demonstrate that the algorithm effectively enhances adaptability to complex lighting and weather conditions, with superior performance in capturing target integrity and reducing missed or false detections, especially in environments where visible light is insufficient.
Conclusions This study proposes a visible-infrared incremental domain adaptation object detection algorithm based on prototype alignment and adaptive restoration, which effectively improves incremental detection performance of time-sensitive targets in UAV images under complex scenarios. The AMFF module enhances cross-modal feature fusion quality; the CFPA module optimizes class prototypes to strengthen feature representation and cross-domain generalization; the PGAR module mitigates catastrophic forgetting to boost incremental adaptation ability. These contributions lay a theoretical foundation for cross-modal domain incremental learning in UAV vision and provide technical support for practical applications of UAV technology in visual perception. Future work will explore more efficient cross-modal knowledge transfer mechanisms and expand multi-source datasets to enhance model adaptability in extreme cross-modal scenarios.