Image deblurring via multi-scale feature fusion and multi-input multi-output encoder-decoder

Zhao Qian; Zhou Dongming; Yang Hao; Wang Changcheng; Li Miao

doi:10.3788/IRLA20220018

A deblurring method combining multi-scale feature fusion and a multi-input multi-output encoder-decoder is proposed for non-uniform blurred images caused by camera shake, fast motion of the captured object, and low shutter speed. Firstly, the initial features of smaller-scale blurred images are extracted using a multi-scale feature extraction module, which uses dilated convolution to obtain a larger receptive field with a smaller number of parameters. Second, the feature attention module is used to adaptively learn useful information from different scale features, which can effectively reduce redundant features by using features of small-scale images to generate attention maps. Finally, the multi-scale feature progressive fusion module is applied to gradually fuse features at different scales, making the information of different scale features to complement each other. Compared with recent multi-scale methods that use multiple subnets stacked on top of each other, we use a single network to extract multi-scale features, thus reducing the training difficulty. To evaluate the deblurring effect and generalization performance of the network, the proposed method is tested on both the benchmark datasets GoPro, HIDE, and the real dataset RealBlur. The peak signal-to-noise ratio values of 31.73 dB and 29.39 dB and the structural similarity values of 0.951 and 0.923 on the GoPro and HIDE datasets, respectively. The deblurring performance is higher than that of recent state-of-the-art deblurring methods, and it also has better performance on the RealBlur dataset containing real scenarios. The experimental results demonstrate that the proposed method is more effective than recent deblurring methods, can effectively restore the edge contour and texture detail information of images. In addition, our method can improve the robustness of subsequent high-level computer vision tasks.

HTML

0. 引　言

在图像拍摄过程中，由于相机抖动、拍摄对象快速移动或失焦等原因导致的图像质量大幅衰减的现象称为图像退化，将退化图像恢复为清晰图像的技术称为图像恢复技术，图像去模糊技术属于图像复原技术的一种。模糊图像会严重影响后续计算机视觉任务的性能，通常在目标检测^[1-2]、图像分割^[3]和目标跟踪^[4]等高级计算机视觉任务中大多数都假设输入图像是无模糊的，一旦输入的图像是模糊的，这些任务往往无法准确检测或分割图像中的模糊对象。由于图像去模糊技术能显著提高输入模糊图像的后续计算机视觉任务的性能，图像去模糊技术受到了国内外学者的广泛关注^[5-6]。传统的去模糊方法^[7-12]通常对模糊核做出假设，对不同类型的均匀、非均匀以及深度感知模糊进行建模并施加各种约束条件，利用图像的先验信息求解模糊核，最后从给定的模糊图像中恢复出对应的清晰图像。尽管传统的方法易于实现，但这些方法大多对模糊模型的假设比较简单，不能很好地去除真实世界中复杂的非均匀模糊。此外，传统的去模糊方法计算推理复杂且大多需要多次迭代来优化参数，使得图片处理时间过长，从而限制了算法的实际应用。随着深入学习研究的深入，许多基于深度学习的去模糊算法^[13-17]不断地被提出，此类方法不依赖于自然图像的先验知识，能够以端到端的方式学习模糊图像和对应的清晰图像之间的非线性映射关系，从而能更好地处理动态场景中的非均匀模糊。早期的深度学习方法^[14]主要使用单一尺度的网络架构，但由于单尺度网络感受野较小，在编码上下文信息方面效率较低难以提取更全面的全局特征和局部特征。为此Nah等人^[5]提出一种基于多尺度的去模糊网络，该网络由多个子网络组成，每个子网络输入一张缩小的图像，并以“从粗到细”的方式逐渐恢复清晰的图像。由于多尺度的方法被证明是有效的，许多基于多尺度的去模糊方法逐渐被提出^[16-17]，然而这些多尺度方法都是将多个子网络堆叠到一起，使得网络训练更加复杂并且运行时间更长。因此文中提出一种结合多尺度特征融合和多输入多输出编-解码器的去模糊算法，不同于Nah等人^[5]的多个子网输入不同尺度图像的方法，文中的多尺度特征能够输入到单一的编码器中，同时解码器能输出多张不同尺度的清晰图像，网络更加简单并且能有效去除图像模糊。具体来说，文中贡献如下：

（1）提出一种结合多尺度特征融合和多输入多输出编-解码器的去模糊算法。相比其他堆叠多个子网的多尺度模糊方法，网络复杂度较低。

（2）为了有效利用多尺度特征信息，分别提出了多尺度特征提取模块(Multi-scale feature extraction module，MFEM)和多尺度特征渐进融合模块。此外，文中基于SAM^[18]设计了一个特征注意力模块(Feature attention module，FAM)来增强或抑制不同尺度的特征信息，从而提高网络学习并区分特征的能力。

（3）文中利用峰值信噪比（Peak signal to noise ratio，PSNR）和结构相似性（Structural similarity，SSIM）对所提出的网络进行量化评估。大量的实验结果表明，在合成数据集GoPro和HIDE中，文中方法相较其他基准方法具有更高的PSNR和SSIM。在真实数据集RealBlur-R和RealBlur-J上的结果表明，文中方法具有更好的泛化性和鲁棒性。

（4）为了进一步评估文中算法在后续高级计算机视觉任务上的应用价值，使用预先训练的YOLOv4^[1]对模糊图像和去模糊后的图像进行目标检测。结果表明文中算法能够有效提升后续高级计算机视觉任务的性能。

1. 相关工作

Yuan等人^[6]使用模糊图像以及对应的同一场景下包含噪声的清晰图像来估计模糊核，通过利用噪声图像中清晰的细节信息来较好地估计初始核，并提出了残差反卷积来减少图像反卷积固有的振铃伪影。但由于对模糊核的估计过于单一且假设模糊核是空间不变的，使得该方法不能很好地处理真实相机抖动造成的非均匀模糊。为此，Whyte等人^[8]提出了一个参数化几何模型，该模型能有效处理由于相机旋转引起相机抖动产生的非均匀图像模糊。但该方法忽略了相机平移造成的图像模糊，仅对相机旋转建立了几何模型。针对相机平移造成的运动模糊，Xu等人^[9]提出了一种基于L0稀疏表示的运动去模糊方法。该方法在优化过程中不需要额外的滤波，仅需要少量的迭代就能够收敛。Hu等人^[11]考虑到模糊图像中的光线条纹包含丰富的模糊信息，提出了一种利用光线条纹进行建模的弱光图像去模糊方法，该方法通过检测模糊图像中有用的光条纹来估计模糊核从而去除模糊。Pan等人^[12]观察到模糊图像的暗通道具有更小的稀疏性，提出一种基于暗通道先验的图像盲去模糊方法，该方法不需要任何复杂的模糊核估计就能够去除非均匀模糊。

近年来，由于深度神经网络具有强大的特征学习以及非线性建模能力，其在目标检测，目标分割，图像恢复等计算机视觉任务中得到广泛应用。Sun等人^[13]首先将卷积神经网络(CNN)应用到图像去模糊领域，该方法通过预测小图像块上运动模糊的概率分布来估计非均匀运动模糊核，并利用小图像块的先验来去除运动模糊。然而，使用具有均匀运动模糊的小图像块训练CNN忽略了较大区域上模糊图像和运动模糊核的映射关系，因此该方法去模糊性能不佳。之后，Gong等人^[15]将整张图片的运动模糊表示为像素方向的线性运动模糊，再通过CNN直接估计模糊图像中的运动流来恢复出清晰图像。总的来说，早期基于CNN的去模糊方法大多通过利用CNN来估计模糊核从而复原图像，然而当模糊核估计不准确时，这些方法很难实现理想的去模糊效果。因此，最近的去模糊方法大多以端到端的方式直接训练无核估计的网络来复原图像。Nah等人^[5]提出了一个基于高斯金字塔结构的多尺度卷积神经网络，这种由粗到细的网络结构能够充分提取图像的多尺度特征来恢复清晰图像。但使用独立的子网络分别训练每个尺度的图像，使得网络总体参数量较大且训练困难。为此，Tao等人^[19]在不同的多尺度特征提取子网络中共享参数，在提升网络去模糊效果的同时，减少了网络参数并降低了运行时间。但该方法忽略了图像特征的尺度变化特性，所有子网共享参数的方式可能会丢失多尺度特征信息，使得网络不能有效的复原图像细节信息。Gao等人在参考文献[20]的基础上提出了一种有效的参数选择性共享网络，并在网络的非线性变换模块中引入了一种新的嵌套跳跃连接结构来代替简单地堆叠卷积块来提高网络性能。Kupyn等人^[21-22]将生成对抗网络（GAN）应用于图像去模糊，先后提出了DeblurGAN、DeblurGAN-v2使用生成对抗网络将模糊图像直接映射到清晰图像来去除模糊，然而这两种方法很难复原复杂场景下的非均匀动态模糊。Zhang等人^[23]提出了一个端到端的多层次去模糊网络，该网络的每个子网能够提取由不同分割方式产生的小图像块的细节特征，并逐层融合提取到的特征信息来复原图像。Cai等人^[24]将暗通道和亮通道先验信息嵌入神经网络中来聚合通道特征，并对网络进行稀疏正则化操作来提高网络性能。但先验知识的加入提高了模型建模的复杂度，在真实模糊场景下模型的泛化性能不佳。为此，Zhang等人^[25]考虑到目前去模糊数据集大多为合成数据集，使用生成对抗网络生成真实的模糊图像，从而提高网络模型在真实场景下的去模糊性能。Park等人^[26]提出了一种基于多时相递归神经网络的单幅图像去模糊算法，该算法首先将深度模糊分解成一系列轻度模糊，然后再以迭代的方式逐步去除模糊。Zou等人^[27]提出一种基于小波变换的扩张网络去模糊算法，该算法使用具有不同扩张率的扩张卷积来获得具有不同感受野的特征，并利用小波变换模块来恢复图像的纹理细节信息。

4. 结　论

针对现有的大多数去模糊算法仍存在去模糊不彻底且图像细节信息丢失等问题，文中提出一种结合多尺度特征融合和多输入多输出编-解码器的去模糊算法。首先通过一个基于扩张卷积多尺度特征提取模块来提取较小尺度图像的特征，然后通过特征注意力模块来为不同尺度的特征图在空间上和通道上赋予不同的权重，从而提高网络学习并区分特征的能力。提出了一个多尺度特征渐进融合模块不同尺度的特征逐步融合在一起，能够减少了网络传输过程中高频细节信息的丢失。此外，为了降低网络训练的复杂度，区别于堆叠多个编-解码子网来输入和输出多尺度图像的方式，文中网络模型使用单一编-解码结构，将多尺度图像输入输出到同一个编-解码器中，以“从粗到细”的方式逐步恢复清晰图像。实验结果表明，文中算法在基准数据集GoPro和HIDE以及真实数据集RealBlur上相较于目前先进的去模糊算法均取得了较好的客观评价和主观视觉效果，并且能够提升后续计算机视觉任务的性能。

Reference (31)

[1]	Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[DB/OL]. (2020-04-23)[2022-01-06]. https://doi.org/10.48550/arXiv.2004.10934.
[2]	Li Weipeng, Yang Xiaogang, Li Chuanxiang, et al. An improved semi-supervised transfer learning method for infrared object detection neural network [J]. Infrared and Laser Engineering, 2021, 50(3): 20200511. (in Chinese)
[3]	Zhang X, Xu H, Mo H, et al. Dcnas: Densely connected neural architecture search for semantic image segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 13956-13967.
[4]	Wang Z, Zheng L, Liu Y, et al. Towards real-time multi-object tracking[C]//Computer Vision–ECCV 2020, 2020: 107-122.
[5]	Nah S, Hyun Kim T, Mu Lee K. Deep multi-scale convolutional neural network for dynamic scene deblurring[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3883-3891.
[6]	Wu Di, Zhao Hongtian, Zheng Shibao. Motion deblurring method based on DenseNets [J]. Journal of Image and Graphics, 2020, 25(5): 890-899. (in Chinese)
[7]	Yuan L, Sun J, Quan L, et al. Image deblurring with blurred/noisy image pairs [J]. ACM Transactions on Graphics, 2007, 26(3): 1-es.
[8]	Whyte O, Sivic J, Zisserman A, et al. Non-uniform deblurring for shaken images [J]. International Journal of Computer Vision, 2012, 98(2): 168-186.
[9]	Xu L, Zheng S, Jia J. Unnatural l0 sparse representation for natural image deblurring[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013: 1107-1114.
[10]	Wang Sha, Chen Yueting, Feng Huajun, et al. TwIST-TV regularization based image deblurring method [J]. Infrared and Laser Engineering, 2014, 43(6): 2000-2006. (in Chinese)
[11]	Hu Z, Cho S, Wang J, et al. Deblurring low-light images with light streaks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 3382-3389.
[12]	Pan J, Sun D, Pfister H, et al. Blind image deblurring using dark channel prior[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 1628-1636.
[13]	Sun J, Cao W, Xu Z, et al. Learning a convolutional neural network for non-uniform motion blur removal[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 769-777.
[14]	Mao X, Shen C, Yang Y B. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections [J]. Advances in Neural Information Processing Systems, 2016, 29: 2802-2810.
[15]	Gong D, Yang J, Liu L, et al. From motion blur to motion flow: A deep learning solution for removing heterogeneous motion blur[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2319-2328.
[16]	Liu Pengfei, Zhao Huaici, Cao Feidao. Blind deblurring of noisy and blurry images of multi-scale convolutional neural network [J]. Infrared and Laser Engineering, 2019, 48(4): 0426001. (in Chinese)
[17]	Chen Qingjiang, Hu Qiannan, Li Jinyang. Image deblurring based on multi-scale alternating connection residual network [J]. Optics and Precision Engineering, 2021, 29(7): 1686-1694. (in Chinese)
[18]	Zamir S W, Arora A, Khan S, et al. Multi-stage progressive image restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 14821-14831.
[19]	Tao X, Gao H, Shen X, et al. Scale-recurrent network for deep image deblurring[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8174-8182.
[20]	Gao H, Tao X, Shen X, et al. Dynamic scene deblurring with parameter selective sharing and nested skip connections[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3848-3856.
[21]	Kupyn O, Budzan V, Mykhailych M, et al. Deblurgan: Blind motion deblurring using conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8183-8192.
[22]	Kupyn O, Martyniuk T, Wu J, et al. Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 8878-8887.
[23]	Zhang H, Dai Y, Li H, et al. Deep stacked hierarchical multi-patch network for image deblurring[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 5978-5986.
[24]	Cai J, Zuo W, Zhang L. Dark and bright channel prior embedded network for dynamic scene deblurring [J]. IEEE Transactions on Image Processing, 2020, 29: 6885-6897.
[25]	Zhang K, Luo W, Zhong Y, et al. Deblurring by realistic blurring[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 2737-2746.
[26]	Park D, Kang D U, Kim J, et al. Multi-temporal recurrent neural networks for progressive non-uniform single image deblurring with incremental temporal training[C]//European Conference on Computer Vision, 2020: 327-343.
[27]	Zou W, Jiang M, Zhang Y, et al. SDWNet: A straight dilated network with wavelet transformation for image deblurring[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 1895-1904.
[28]	Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: from error visibility to structural similarity [J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[29]	Shen Z, Wang W, Lu X, et al. Human-aware motion deblurring[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 5572-5581.
[30]	Rim J, Lee H, Won J, et al. Real-world blur dataset for learning and benchmarking deblurring algorithms[C]//European Conference on Computer Vision, 2020: 184-201.
[31]	Huynh-Thu Q, Ghanbari M. Scope of validity of PSNR in image/video quality assessment [J]. Electronics Letters, 2008, 44(13): 800-801.

Method	GoPro		HIDE		RealBlur-R		RealBlur-J
Method	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
Xu et al.^[9]	22.85	0.817	21.78	0.723	31.63	0.872	24.88	0.822
Whyte et al.^[8]	24.47	0.843	22.81	0.735	30.56	0.854	25.92	0.844
Pan et al.^[12]	24.73	0.876	23.92	0.763	32.92	0.891	25.79	0.854
DeblurGAN-v1^[21]	25.64	0.859	23.96	0.809	34.28	0.932	27.01	0.865
Nah et al.^[5]	27.83	0.915	25.73	0.874	33.92	0.947	27.11	0.876
DeblurGAN-v2^[22]	29.08	0.918	27.51	0.884	34.16	0.942	27.17	0.877
SRN^[19]	30.24	0.934	28.36	0.903	34.24	0.937	27.08	0.876
Gao et al.^[20]	30.96	0.942	29.1	0.913	34.06	0.943	26.82	0.868
MT-RNN^[26]	31.12	0.944	29.15	0.917	34.19	0.95	26.74	0.869
DBGAN^[25]	31.18	0.946	28.94	0.915	32.99	0.926	24.87	0.821
DMPHN^[23]	31.39	0.947	29.1	0.916	34.12	0.948	26.63	0.865
Ours	31.73	0.951	29.39	0.923	34.35	0.951	27.19	0.878

N	PSNR	SSIM
1	31.22	0.944
2	31.58	0.949
3	31.73	0.951
4	31.85	0.952

Module	Combination of different modules
MPFM	√	Sum	Concatenate	√	√
MFEM	√	√	√	×	√
FAM	√	√	√	√	×
PSNR	31.73	31.57	31.65	31.64	31.7
SSIM	0.951	0.945	0.948	0.948	0.95

Image deblurring via multi-scale feature fusion and multi-input multi-output encoder-decoder

doi: 10.3788/IRLA20220018

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views