Domain adaptation for object detection in the frequency domain

Li Yuenan; Xu Haoyu; Dong Hao

doi:10.3788/IRLA20210638

Deep learning-based object detection technology has recently made significant progress and has a wide range of applications in robotics, autonomous driving, traffic surveillance, etc. However, due to the distribution discrepancy between the training and testing datasets, the off-the-shelf detectors pre-trained using the data in a specific domain often show apparent performance degradation when applied in wild scenarios. To address this problem, a domain adaptation method for object detection in the frequency domain is proposed. In light of the energy concentration property of the discrete cosine transform, the proposed algorithm conducts domain adaptation for object detection by processing only a few of the most significant frequency coefficients, which reduces memory and computing resource consumption and alleviates the domain shift problem. The proposed method consists of two stages. In the first stage, it translates annotated training data from the source domain to the target domain using unsupervised image-to-image translation. Adversarial domain adaptation is then applied to the object detection model to align the features of the translated data and the real data in the target domain. The experimental results of the object detection under different weather conditions show that the proposed method ranks first among the four testing algorithms. Compared with the object detection model trained with only source domain data, it can increase the mAP value by 33.9%.

HTML

0. 引　言

目标检测是计算机视觉中的一个重要任务。近年来，基于卷积神经网络（Convolutional Neural Networks，CNN）的工作大幅提高了目标检测的精度。目前，绝大多数目标检测算法以有监督的方式进行训练，数据标注工作需要耗费大量人力资源。此外，训练和测试样本间的差异性导致目标检测算法在新场景中的泛化能力不强。以不同天气下的检测任务为例，用晴朗天气下采集的图像训练的检测模型在雾霾天气下的检测精度通常较低。针对该问题，现有的解决方法主要分为两种：一是使用图像无监督转换的方式，将已有标注的图像（源域）转换到目标域，构建新的数据集进行训练；二是采用领域自适应的方式，将源域和目标域的数据映射到同一特征空间，以减小不同领域之前的差距。然而，这两种方法均存在一定的局限性。受计算资源和存储空间的限制，图像无监督转换通常仅能接受低分辨率的输入（如CycleGAN^[1]仅接受256×256和512×512的输入图像），对于高分辨率的输入图像，通常的做法是将原始图像降采样后输入网络，之后再升采样回原始分辨率，这种方式造成了细节内容的损失，难以获得高清晰度的输出图像且不利于后续检测任务。另一方面，领域自适应的效果也同样受到输入图像尺寸的影响。

为了减少降采样操作造成的信息丢失并节省计算资源，受到频域能量集中特性的启发，文中结合无监督图像转换和基于对抗的领域自适应两种方式，提出了一种面向目标检测的频域内的领域自适应方法。该方法分为两个阶段，第一阶段通过无监督图像转换的方式将带有标注的源域图像（如晴天图像）变换到与目标域（如雾天图像）相近的图像，并将变换后的图像所在的域定义为中间域。第二阶段通过基于对抗学习的领域自适应方法将中间域的数据与目标域（如真实有雾图）的数据在特征空间内作适配，两个阶段均在频域内完成。由于图像不同频带具有不同的视觉重要性，频域系数具备天然的可压缩属性。图像变换到频域后，能量集中到低频和中频频带，对少数几个频率系数处理就可以实现无监督转换和领域自适应，降低了训练和测试过程对计算资源和存储空间的要求。实验结果表明，第一阶段无监督图像转换能够生成与目标域相近的中间域图像，第二阶段基于对抗学习的领域自适应方法能够减少传统降采样操作造成的信息丢失，并显著提高目标域的检测性能。

1. 相关研究工作

1.1. 目标检测

近年来，绝大多数目标检测算法都采用基于卷积神经网络CNN的结构^[2]，这些工作又可以分为基于区域生成的两阶段方法和直接获得检测结果的一阶段方法。在两阶段方法中，R-CNN^[3]使用选择性搜索（Selective Search）得到物体的候选框，并使用支持向量机（Support Vector Machine，SVM）对特征进行预测。Fast R-CNN^[4]改进了特征的预测方式，使用神经网络进行检测框的分类与回归。Faster R-CNN^[5]进一步改进了Fast R-CNN，使用区域生成网络（Region Proposal Network，RPN）替代耗时的选择性搜索，实现了实时目标检测算法。一阶段检测方法的代表性算法有SSD^[6]、YOLOv3^[7]、RetinaNet^[8]等，这类方法能够进一步提高目标检测的实时性能。吴天舒等人^[9]结合深度可分离卷积，采用轻量化特征提取最小单元对SSD做轻量化处理，使其可以在移动设备上运行。遆晓光等人^[10]将视频图像向二维频域投影后，结合主动滤波和图像重构，能够检测出弱小运动目标。吴言枫等人^[11]通过提取图像中的显著性区域，并使用自适应双高斯算法分割出前景，提升了复杂天空背景下的目标检测精度。此外，还有一些方法通过改进检测器中的结构^[12-13]来提升复杂背景下以及小目标的检测精度。尽管基于卷积神经网络的检测器已经达到了较高的精度，但是现有检测模型对训练集与测试集之间分布不一致性较为敏感，在新场景的应用中泛化性能较差。

1.2. 领域自适应和无监督图像转换

经典的有监督学习任务往往假设训练集和测试集分布一致，但是实际测试数据一般与理想环境下的训练数据有很大差异，迁移学习（Transfer Learning）是应对这一问题的主要技术。

领域自适应（Domain Adaptation）是迁移学习的一种，其主要思想是将不同领域（如不同天气的图像）的数据映射到同一个特征空间，以减少领域之间的差距，提高模型的泛化性和鲁棒性。领域自适应一开始被用于图像分类任务，然后推广到目标检测等任务，领域自适应总体上可以分为基于人工定义约束的方式和基于对抗训练的方式。前者通过缩小两个分布之间的距离度量实现源域与目标域特征之间的对齐，常见的度量分布之间距离的方法有KL-散度、H散度、最大平均差距（Maximum Mean Discrepancy，MMD）等。Ganin等人^[14]使用基于对抗的方法使神经网络缩减域差异，并提出了梯度反转层（Gradient Reversal Layer, GRL）。梯度反转层应用在数据特征与域鉴别器之间，在前向传播过程中梯度保持不变，在反向传播过程中梯度方向取反，使得域鉴别器与主任务网络能够对抗地进行训练，实现了真正意义上的端到端训练，避免了生成对抗网络（Generative Adversarial Nets，GAN）中生成器与鉴别器交替训练的模式。近年来，一些研究通过多阶段、多尺度训练、特征融合、注意力机制、去耦合学习等方法提升了领域自适应的效果^[15-18]。

无监督图像转换需要在不成对的图像样本之间学习一个映射，将一个领域的图像映射到另一个领域。无监督图像转换的方法也可以用于领域自适应。CycleGAN^[1]中提出了循环一致性损失，将图像转换到另一个领域后再使用逆映射转换回来，并要求经过循环变换的图像与输入图像一致，同时在两个领域中引入了鉴别器对相应的映射进行约束。UNIT^[19]算法中提出了共享潜空间（Shared latent space）思想，假设不同域的图像能够映射到同一潜空间。基于这个思想，该算法将图像在不同域之间的变换过程拆分为潜空间编码和解码两个子过程，并引入变分自编码器对潜空间向量进行约束并结合其它限制条件来提升无监督图像转换的效果。无监督图像转换尽管能够生成与目标域十分相近的图像，但在计算资源受限的条件下，图像转换网络往往只能接受低分辨率图像作为输入。此外，由于无监督图像转换本身是一个欠定问题，无法保证生成图像分布与目标域完全相同，在进行下游计算机视觉任务时仍然存在特征分布不一致的领域偏移（Domain shift）问题。

1.3. 频域内的深度学习与领域自适应

Xu等人^[20]首次提出在频域内训练神经网路，使用离散余弦变换（Discrete Cosine Transform, DCT）后的变换系数作为输入，并应用于图像分类和分割任务。

Yang等人^[21]以一种非学习的方式对源域和目标域的图像分别进行快速傅里叶变换（Fast Fourier Transform，FFT），然后使用目标域图像幅值的中心（低频）区域替换源域图像相应的幅值并保持相位不变，之后采用快速傅里叶逆变换（Inverse Fast Fourier Transform，IFFT）还原出图像。该算法不需要训练，能在一定程度上实现图像间的领域变换。

4. 结　论

为提高目标检测的泛化性能，针对测试和训练数据分布不一致的问题，文中提出了一种频域内面向目标检测的领域自适应方法。通过频域内的无监督图像转换生成高分辨率图像，为测试集所在的域作数据扩充。算法同时采用基于对抗的领域自适应方法，进一步对齐扩充的数据和测试集数据的特征，减少了训练数据和测试数据之间的领域差异。实验结果表明，与空域的领域自适应和图像无监督转换方法相比，文中提出的方法在图像转换过程中能够生成清晰度和分辨率更高的图像。同时，利用频域的能量集中特性，能保留更多的原始图像信息，减少了由天气造成的领域差异，对交通监控等开放式目标检测的性能有着明显的提升效果。与仅用晴天图像训练的检测模型相比，领域自适应可将mAP值提升33.9%。

Reference (24)

[1]	Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2223-2232.
[2]	Fan L, Zhao H, Hu H, et al. Survey of target detection based on deep convolutional neural networks [J]. Optics and Precision Engineering, 2020, 28(5): 1152-1164. (in Chinese)
[3]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[4]	Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[5]	Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]// Advances in Neural Information Processing Systems, 2015, 28: 91-99.
[6]	Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision, 2016: 21-37.
[7]	Redmon J, Farhadi A. YOLOv3: An incremental improvement [J]. ArXiv Preprint, 2018, ArXiv: 1804.02767.
[8]	Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[9]	Wu T, Zhang Z, Liu Y, et al. A lightweight small object detection algorithm based on improved SSD [J]. Infrared and Laser Engineering, 2018, 47(7): 0703005. (in Chinese)
[10]	Di X, Lin Z, Chen S. Dim moving object detection based on projection into the 2D frequency domain [J]. Infrared and Laser Engineering, 2013, 42(12): 3447-3452. (in Chinese)
[11]	Wu Y, Wang Y, Sun H, et al. LSS-target detection in complex sky backgrounds [J]. Chinese Optics, 2019, 12(4): 853-865. (in Chinese)
[12]	Gong X, Ouyang H. Improvement of tiny YOLOV3 target detection [J]. Optics and Precision Engineering, 2020, 28(4): 988-995. (in Chinese)
[13]	Wang C, An J, Jiang X, et al. Region proposal optimization algorithm based on convolutional neural networks [J]. Chinese Optics, 2019, 12(6): 1348-1361. (in Chinese)
[14]	Ganin Y, Lempitsky V. Unsupervised domain adaptation by backpropagation[C]//International Conference on Machine Learning, PMLR, 2015: 1180-1189.
[15]	Xie R, Yu F, Wang J, et al. Multi-level domain adaptive learning for cross-domain detection[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019.
[16]	Hsu H K, Yao C H, Tsai Y H, et al. Progressive domain adaptation for object detection[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020: 749-757.
[17]	Zheng Y, Huang D, Liu S, et al. Cross-domain object detection through coarse-to-fine feature adaptation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020: 13766-13775.
[18]	Li H, Wan R, Wang S, et al. Unsupervised domain adaptation in the wild via disentangling representation learning [J]. International Journal of Computer Vision, 2021, 129(2): 267-283.
[19]	Liu M Y, Breuel T, Kautz J. Unsupervised image-to-image translation networks[C]//Advances in Neural Information Processing Systems, 2017: 700-708.
[20]	Xu K, Qin M, Sun F, et al. Learning in the frequency domain[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020: 1740-1749.
[21]	Yang Y, Soatto S. FDA: Fourier domain adaptation for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020: 4085-4095.
[22]	Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[23]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[24]	Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library[C]//Advances in Neural Information Processing Systems, 2019, 32: 8026-8037.

Method	Bus	Bicycle	Car	Motor	Person	Rider	Train	Truck	mAP(@.5)
Cityscapes only	31.3%	33.8%	47.7%	20.2%	34.9%	40.5%	12.5%	17.8%	29.8%
MDA^[15]	41.8%	36.5%	44.8%	30.5%	33.2%	44.2%	28.7%	28.2%	36.0%
PDA^[16]	44.4%	35.9%	54.4%	29.1%	36.0%	45.5%	25.8%	24.3%	36.9%
CFF^[17]	43.2%	37.4%	52.1%	34.7%	34.0%	46.9%	29.9%	30.8%	38.6%
Proposed algorithms	48.1%	42.7%	61.9%	32.1%	43.1%	49.1%	17.7%	25.4%	39.9%

Algorithm	Bus	Bicycle	Car	Motor	Person	Rider	Train	Truck	mAP(@.5)
Cityscapes only	31.3%	33.8%	47.7%	20.2%	34.9%	40.5%	12.5%	17.8%	29.8%
Ours w/o stage 2	39.3%	38.5%	63.3%	28.0%	39.6%	42.4%	15.7%	23.6%	36.3%
Ours w/o stage 1	41.3%	39.0%	58.4%	28.6%	42.4%	44.7%	10.7%	23.6%	36.1%
Full model	48.1%	42.7%	61.9%	32.1%	43.1%	49.1%	17.7%	25.4%	39.9%

Domain adaptation for object detection in the frequency domain

doi: 10.3788/IRLA20210638

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views

Domain adaptation for object detection in the frequency domain

doi: 10.3788/IRLA20210638

1. School of Electrical & Information Engineering, Tianjin University, Tianjin 300072, China

2. Tianjin Jinhang Institute of Technical Physics, Tianjin 300308, China