Infrared-visible image patches matching via convolutional neural networks

Mao Yuanhong; Ma Zhong; He Zhanzhuang

doi:10.3788/IRLA20200364

Infrared-visible image patches matching is widely used in many applications, such as vision-based navigation and target recognition. As infrared and visible sensors have different imaging principles, it is a challenge for the infrared-visible image patches matching. The deep learning has achieved state-of-the-art performance in patch-based image matching. However, it mainly focuses on visible image patches matching, which is rarely involved in the infrared-visible image patches. An infrared-visible image patch matching network (InViNet) based on convolutional neural networks (CNNs) was proposed. It consisted of two parts: feature extraction and feature matching. It focused more on images content themselves contrast, rather than imaging differences in infrared-visible images. In feature extraction, the contrastive loss and the triplet loss function could maximize the inter-class feature distance and reduce the intra-class distance. In this way, infrared-visible image features for matching were more distinguishable. Besides, the multi-scale spatial feature could provide region and shape information of infrared-visible images. The integration of low-level features and high-level features in InViNet could enhance the feature representation and facilitate subsequent image patches matching. With the improvements above, the accuracy of InViNet increased by 9.8%, compared with the state-of-the-art image matching networks.

HTML

0. Introduction

Infrared-visible image patches matching is a fundamental task of infrared-visible image processing. It compares the object or region by analyzing the similarity of content, features, structures, relationships, textures, and grayscales in infrared-visible images. The infrared-visible image matching is often used as a subroutine that plays an important role in a wide variety of applications, such as visual navigation^[1-2] and target recognition^[3-4].

Infrared-visible image patches is more challenging compared with traditional visible images. Since infrared and visible sensors use different imaging principles, the images taken by multiple sensors also have more differences than those by a single sensor. The edges of the object are blurred in infrared images. Less texture and color features are found in the object. The infrared-visible image pairs have significant grayscale distortion and illumination change.

Manual descriptors are used to extract features, such as SIFT^[5], SURF^[6], ORB^[7], etc. The features extracted with the descriptors should have the invariance of illumination, rotation, scale, and affine. After feature extraction, image patches matching is predicted by comparison of features similarity. Most work is focused on improvements to infrared and visible image descriptors in the traditional infrared-visible image system. Sima^[8] optimized the SIFT method for the infrared-visible image. Li^[9] detected object edges and extracted the features of SURF to match the infrared-visible images. Chao Zhiguo^[10] proposed a matching method based on histograms of oriented gradients used as the matching feature and the correlation coefficient used as a similar measure. Cao Zhiguo^[11] adopted an approach to shape contexts for matching infrared-visible images based on their similar shape. Jiao Anbo^[12] proposed the image matching algorithm using linear group geometric primitives for infrared and visible template matching.

Hand-craft descriptors need to improve continuously for new applications to extract efficient features. The feature extraction and similarity measure are two independent and unrelated stages, which cannot be optimized end-to-end. With the widespread application of deep learning in computer vision, the image patches matching based on deep learning has become a trend. MatchNet^[13] extracts the image features from two CNN branches. It uses two full connection (FC) layers to determine whether the extracted features are similar. Deep Compare Network^[14] compares the image patches by Siamese networks, 2-channels, and pseudo-Siamese models. Patch match networks^[15] proposed improved architecture for two-channel and Siamese networks to compare the visible image patches. The networks above have achieved excellent performance in visible images. However, they do not solve the infrared-visible image patches matching well. The patches have different imaging principles. It is necessary to design a new deep neural network to achieve better performance in infrared-visible images matching.

This paper proposes an infrared-visible image deep matching network (InViNet) to tackle these challenges above. Two CNN branches extract the infrared and visible image features independently. The full connection layers compare their similarity.

In infrared-visible image patches matching, we think that the differences between unrelated patches are still more significant than those within similar patches, even if multi-sensors take the infrared and visible images. The feature extraction subnetwork uses the contrastive loss and triplet loss to maximize the distance of the feature between unrelated patches and minimize it within similar patches. It makes the distribution of the high-level feature more centralized within the intra-classes and more separate between the inter-classes.

For the infrared images, their regions and shapes still have essential references in the infrared-visible image matching. Integrating the spatial features with semantic features is necessary. We combine the multi-scale spatial features with the high-level features to enhance the performance. Compared to the previous CNNs, our method can increase the accuracy from 78.95% to 88.75%.

1. Infrared-visible image patches matching network

3. Conclusions

Given the difficulty of infrared-visible image patches matching, this paper proposes an improved network based on deep learning. Compared to the previous method, our method can increase the accuracy from 78.95% to 88.75%. At present, it is difficult to obtain samples of visible and infrared images. There are many multi-sensor data sets available on the Internet. However, they are not fully utilized because there is no corresponding similar visible image. We believe that we can make full use of many multi-sensor images through unsupervised learning to further improve our matching performance in the future.

Reference (21)

[1]	Yang Weiping, Shen Zhenkang. Matching technique and its application in aided inertial navigation [J]. Infrared and Laser Engineering, 2007, 36(S2): 15-17. (in Chinese)
[2]	Li Hongguang, Ding Wenrui, Cao Xianbin, et al. Image registration and fusion of visible and infrared integrated camera for medium-altitude unmanned aerial vehicle remote sensing [J]. Remote Sensing, 2017, 9(5): 441.
[3]	Wang Ning, Zhou Ming, Du Qinglei. A method for infrared visible image fusion and target recognition [J]. Journal of Air Force Early Warning Academy, 2019, 33(5): 328-332.
[4]	Mao Yuanhong, He Zhanzhuang, Ma Zhong. Infrared target classification with reconstruction transfer learning [J]. Journal of University of Electronic Science and Technology of China, 2020, 49(4): 609-614. (in Chinese)
[5]	Lowe D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[6]	Bay H, Tuytelaars T, Gool L V. SURF: Speeded up robust features[C]//European Conference on Computer Vision, 2006, 3951: 404–417.
[7]	Rublee E, Rabaud V, Konolige K, et al. ORB: An efficient alternative to SIFT or SURF[C]//International Conference on Computer Vision, 2011: 2564-2571.
[8]	Sima A A, Buckley S J. Optimizing SIFT for matching of short wave infrared and visible wavelength images [J]. Remote Sensing, 2013, 5(5): 2037-2056.
[9]	Li D M, Zhang J L. A improved infrared and visible images matching based on SURF [J]. Applied Mechanics and Materials, 2013, 2418(651): 1637-1640.
[10]	Chao Zhiguo, Wu Bo. Approach on scene matching based on histograms of oriented gradients [J]. Infrared and Laser Engineering, 2012, 41(2): 513-516. (in Chinese)
[11]	Cao Zhiguo, Yan Ruicheng, Song Jie. Approach on fuzzy shape context matching between infrared images and visible images [J]. Infrared and Laser Engineering, 2008, 37(12): 1095-1100. (in Chinese)
[12]	Jiao Anbo, Shao Liyun, Li Chenxi, et al. Automatic target recognition algorithm based on affine invariant feature of line grouping [J]. Infrared and Laser Engineering, 2019, 48(S2): S226003. (in Chinese)
[13]	Han X, Leung T, Jia Y, et al. MatchNet: Unifying feature and metric learning for patch-based matching[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 3279-3286.
[14]	Zagoruyko S, Komodakis N. Learning to compare image patches via convolutional neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 4353-4361.
[15]	Hanif M S. Patch match networks: Improved two-channel and Siamese networks for image patch matching [J]. Pattern Recognition Letters, 2019, 120: 54-61.
[16]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//ICLR 2015: International Conference on Learning Representations, 2015.
[17]	Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 1-9.
[18]	Hadsell R, Chopra S, LeCun Y. Dimensionality reduction by learning an invariant mapping[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), 2006, 2: 1735-1742.
[19]	Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face recognition and clustering[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 815-823.
[20]	Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010: 249-256.
[21]	Van der Maaten L , Hinton G. Visualizing data using t-SNE [J]. Journal of Machine Learning Research, 2008, 9(86): 2579-2605.

Infrared-visible image patches matching via convolutional neural networks

doi: 10.3788/IRLA20200364

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views