An improved semi-supervised transfer learning method for infrared object detection neural network

Li Weipeng; Yang Xiaogang; Li Chuanxiang; Lu Ruitao; Huang Pan

doi:10.3788/IRLA20200511

In view of the infrared datasets which has limited scale and few labeled samples, a semi-supervised transfer learning method was proposed for the training of infrared object detection neural network. It aimed at improving the training efficiency and generalization ability of object detection neural networks on infrared datasets with limited scale, and increasing the adaptability of deep learning models in scenarios with few training samples such as infrared object detection. Firstly, the ability of unlabeled samples in improving model generalization and suppressing overfitting under few labeled samples was described. Then, the process of semi-supervised transfer learning for infrared object detection neural network was proposed: a pre-trained model was trained on large scale RGB dataset, and next it was fine-tuned using a few labeled and unlabeled IR images. Moreover, a pseudo-supervised loss function with feature similarity weighting was proposed, where the predictions from same batch was used as labels to each other, thus making full use of the feature distribution of similar objects in unlabeled images. To reduce the computation of semi supervised learning, the pseudo-supervised loss of object was limited on the objects within the neighborhood of its feature vector. Experimental results show that the test accuracy of object detection neural network trained by proposed method is higher than that trained by supervised transfer learning, it achieves an improvement of 1.1% on Faster R-CNN and a significant improvement of 4.8% on YOLO-v3, which verifies the effectiveness of the proposed method.

HTML

0. 引　言

目标检测是进行场景内容理解等高级视觉任务的前提，已广泛应用于智能视频监控、基于内容的图像检索、视觉导航等任务中。传统的目标检测主要使用人工设计的特征（如HAAR^[1]、HOG^[2]、SIFT^[3]、SURF^[4]等），在滑动窗口下使用分类器进行判别，其代表方法有Adaboost-SVM^[5]和形变部件模型（DPM）^[6-8]。上述方法开创了实用化的目标检测之先河，在便携式设备和机器人等领域有着广泛应用。但由于人工设计特征的性能所限，传统方法的准确率始终不高，且通常对新的图像缺乏足够的泛化能力。

相比传统目标检测方法，基于卷积神经网络（CNN）的目标检测方法在准确率方面具有显著优势。CNN通过大量参数拟合各类不同的情形，使用多层架构逐步抽象目标信息，极大地提升了目标检测的泛化能力。然而当前基于CNN的目标检测相关研究集中于RGB图像目标检测以及通用图像目标检测，而对红外目标检测的深度学习方法研究相对较少。两个重要的原因在于：（1）公开的红外图像数据集数量远远少于RGB图像数据集数量；（2）红外数据集的规模通常较小，标注数据不充分。上述问题导致深度学习方法的训练样本不足，算法测试困难，严重制约了基于深度学习的红外目标检测技术的发展。针对上述问题，文中结合迁移学习与半监督学习，旨在使用少量的已标注红外图像训练精度较高的红外目标检测网络。

迁移学习（Transfer learning）指的是将一个任务中的预训练模型（经过少许训练）重新用在另一个任务当中。深度学习中的这种迁移被称作归纳迁移，就是通过使用一个适用于不同但是相关的任务的模型，以一种有利的方式缩小可能模型的搜索范围^[9]。在目标检测任务中，参考文献[10]将ImageNet^[11]中训练的分类网络的中级特征迁移到目标检测网络中，成功提高了目标检测精度；参考文献[12]在STL-10中通过无监督训练学习图像的局部特征，并迁移到无人机目标识别网络中。鉴于上述成功案例可预计，在红外目标检测中使用大型RGB数据集的预训练模型有望显著降低训练所需样本规模以及训练量，提高模型的收敛速度。

半监督学习(Semi-supervised Learning, SSL)是监督学习与无监督学习相结合的一种学习方法，其同时使用大量的未标记数据和部分标记数据进行模式识别工作，已经成功用于RGB图像的两阶段目标检测网络^[13]和SAR目标检测网络^[14]的训练中。参考文献[15]为两阶段目标检测网络的半监督训练引入噪声扰动下的自监督损失，进一步提高了目标检测的准确率。相比于全监督学习（Supervised learning），半监督学习利用了额外的无标注数据所提供的样本的分布信息，从而使模型拟合更为真实的数据分布，提高模型的泛化能力。在红外目标检测网络的训练中，半监督学习能够充分利用无标注的红外图像特征分布，解决目标检测网络由于红外图像数据集规模较小、标注不充分所引起的过拟合问题，提高其在测试当中的检测精度。

综合考虑红外数据集规模小，标记样本少的特点，文中提出了一种红外目标检测网络的半监督迁移学习方法，主要用于提高目标检测网络在小样本红外数据集上的训练效率和泛化能力，提高深度学习方法在训练样本较少的红外目标检测场景当中的适应性。实验结果表明，该方法所训练的目标检测网络测试精度显著高于仅使用监督学习训练的网络，验证了所提出方法的有效性。

4. 结　论

文中提出了一种红外目标检测网络的半监督迁移学习方法，主要用于提高目标检测网络在小样本红外数据集上的训练效率和泛化能力，提高深度学习方法在训练样本较少的红外目标检测场景当中的适应性。文中首先阐述了在标注样本较少时无标注样本对提高模型泛化能力、抑制过拟合方面的作用。在此基础上，提出了一种红外目标检测网络的半监督迁移学习方法。该方法首先使用大量的RGB图像对原始模型进程训练，获得预训练模型，随后使用少量的有标注红外图像和无标注红外图像对网络进行半监督学习调优，得到红外目标检测网络。为充分利用样本的分布信息，文中提出了一种特征相似度加权的伪监督损失函数，使用同一批次样本的预测结果相互作为监督信息，以充分利用无监督图像内相似目标的特征分布信息；为降低半监督训练的计算量，在伪监督损失函数的计算中，各目标仅考虑特征向量邻域范围内的预测结果作为伪标注。实验结果表明，该方法所训练的目标检测网络的测试准确率高于仅使用标注样本监督训练的网络，其在Faster R-CNN上实现了1.1%的提升，而在YOLO-v3上实现了4.8%的显著提升，验证了所提出方法的有效性。

Reference (18)

[1]	Lienhart R, Maydt J. An extended set of Haar-like features for rapid object detection[C]//International Conference on Image Processing, 2002: 900-903.
[2]	Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//IEEE Computer Society Conference on Computer Vision & Pattern Recognition, 2005: 886-893.
[3]	Lowe D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[4]	Bay H, Ess A, Tuytelaars T, et al. Speeded-up robust features [J]. Computer Vision and Image Understanding, 2008, 110(3): 346-359.
[5]	Li X, Wang L, Sung E. AdaBoost with SVM-based component classifiers [J]. Engineering Applications of Artificial Intelligence, 2008, 21(5): 785-795.
[6]	Felzenszwalb P F, Huttenlocher D P. Pictorial structures for object recognition [J]. International Journal of Computer Vision, 2005, 61(1): 55-79.
[7]	Felzenszwalb P F, Girshick R B, McAllester D. Cascade object detection with deformable part models[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010: 2241-2248.
[8]	Felzenszwalb P F, Girshick R B, Mcallester D A. Visual object detection with deformable part models[C]//The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010.
[9]	Pratt L, Pratt L, Thrun S. Machine Learning - Special Issue on Inductive Transfer[M]. Netherland: Kluwer Academic Publishers, 1997.
[10]	Oquab M, Léon Bottou, Laptev I, et al. Learning and transferring mid-level image representations using convolutional neural networks[C]//IEEE Conference on Computer Vision & Pattern Recognition. IEEE, 2014.
[11]	Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database[C]//IEEE Conference on Computer Vision & Pattern Recognition. IEEE, 2009.
[12]	Xie Bing, Duan Zhemin, Zheng Bin, Yin Yunhua. Research on UAV target recognition algorithm based on transfer learning SAE[J]. *Infrared and Laser Engineering*, 2018, 47(6): 0626001. (In Chinese)
[13]	Tang P, Wang X, Wang A, et al. Weakly supervised region proposal network and object detection[C]//Proceedings of the European Conference on Computer Vision. ECCV, 2018: 352–368.
[14]	Du L, Wei D, Li L, et al. SAR object detection network via semi-supervised learning [J]. Journal of Electronics and Information Technology, 2020, 42(1): 154-163.
[15]	Tang P, Ramaiah C, Wang Y, et al. Proposal learning for semi-supervised object detection[J]. arXiv preprint, arXiv, 2020: 2001.05086.
[16]	Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149.
[17]	Redmon J, Farhadi A. YOLOv3: an incremental improvement[J]. *arXiv preprint, arXiv*, 2018: 1804.02767.
[18]	Chen K, Wang J, Pang J, et al. Open MMLab detection toolbox and benchmark[J]. arXiv preprint, arXiv, 2019: 1906.07155.

Classification	Training		Test	Total
Classification	Labeled	Unlabeled	Test	Total
Launcher	36	33	38	107
Tank	15	19	19	53
Airplane	42	41	42	125
Battleship	51	55	51	157
Total	144	148	150	442

Method		Epochs	Launcher	Tank	Airplane	Battleship	mAP
Supervised transfer learning	Faster R-CNN	60	0.946	0.995	0.965	0.980	0.972
Supervised transfer learning	YOLO-v3	80	0.964	0.848	0.919	0.979	0.927
Semi-supervised transfer learning	Faster R-CNN	60	0.971	0.997	0.962	1.000	0.983
Semi-supervised transfer learning	YOLO-v3	80	1.000	0.973	0.936	1.000	0.975

An improved semi-supervised transfer learning method for infrared object detection neural network

doi: 10.3788/IRLA20200511

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views