-
该节共进行了三个实验:比较实验与其他深度重构方法对比,证明文中方法的有效性和优越性;消融实验验证所提出的网络结构和损失函数的合理性和必要性;上采样实验验证预上采样法的重要性。所有网络使用NYU V2数据集[15]进行训练,量化结果采用根均方差(Root Mean Square Error, RMSE)指标,单位为m。
实验测试数据由1.1节提到的实验装置采集,从SPAD探测器中得到的TCSPC直方图经预处理后输入至训练好的网络。地面真实数据为SPAD探测器输出的原始数据经过预处理,再使用中值滤波去除探测器暗计数,最后给图像反射率设定阈值,将像素反射率小于阈值的像素设定为背景得到。
-
比较实验的量化结果如表1所示,文中提出方法在各场景中都获得了最佳结果。
Table 1. Comparison of experimental quantitative results
对比实验将文中方法与MLE、He等人[16]提出的方法和Lindell等人[10]提出的方法进行比较。相比于传统方法MLE和He等人[16]提出的方法,使用神经网络,即Lindell等人[10]提出的方法和文中提出的方法,能够学习输入和输出之间非线性的复杂映射,灵活适应不同的成像场景。MLE不采用传感器融合策略,这种方法给定了概率模型,不能处理掉探测器的暗计数及探测过程中产生的异常值,重构结果仍存在大量噪声。He等人[16]提出的方法通过引导图找出物体的边缘,滤波器在图像平滑的地方进行均值滤波,而在边缘处不进行滤波,或者进行轻微的滤波,从而达到保留物体边缘的目的。这种方法不能滤除物体边缘处的噪声。Lindell等人[10]提出的数据驱动方法采用传感器融合策略和多尺度方法,但仅在最大尺度的深度特征图上融合强度特征,没有充分利用强度信息,会造成严重的深度缺失现象。如图3所示,MLE方法并不能完全除去噪声,而He等人[16]的方法过于平滑物体边缘。Lindell等人[10]的卷积神经网络方法能重构出场景,但是会造成部分边缘深度缺失的现象,特别是对于较远物体和深度值较少区域。文中提出方法能够可靠恢复场景深度信息,并且对远处物体和细小物体也具有重构鲁棒性。
-
不同网络结构的消融研究比较了不含注意力模块的网络和没有强度引导的网络,实验结果如图4(c)、(d)所示。不含注意力模块的网络对特征图的每一部分都给予相同的关注,而没有强度引导的网络无法提取更精确的边缘等细节特征,重构性能均不佳。文中提出的网络使用强度引导并引入注意力机制,可以从强度图中学习细节特征,也可以关注融合数据中特征更丰富的区域,能够去除绝大部分的噪声,目标边缘清晰。
Figure 4. (a) Network input intensity map; (b) The result of the method proposed in the paper; (c) The result processed by the network without attention module; (d) The result processed by the network without intensity guidance; (e) The result processed by the network of loss function without ordinal regression loss; (f) The result processed by the network without KL divergence
不同损失函数的消融研究在训练过程中使用不考虑序数回归损失的损失函数和不考虑KL散度的损失函数,如图4(e)、(f)所示。不考虑序数回归损失的损失函数训练的网络无法重构出物体完整的边缘,原因在于KL散度关注的为TCSPC直方图上光子的总体分布,仅滤除了和信号光子差异明显的背景光子,无法去除目标边缘受到回波光子微弱影响的背景像素;而使用没有KL散度的损失函数训练的网络进行重构,物体内部存在深度缺失,边缘呈锯齿状,这是由于序数回归损失考虑的是局部的时间仓间的序数关系,而不考虑整个时间维度上的光子数分布。文中设计的损失函数结合了KL散度和序数回归损失,并赋予了不同权重,不仅关注时间维度上光子的总体分布,也考虑每个时间仓间的序数回归关系。使用文中设计的损失函数训练得到的网络重构结果不仅具有目标轮廓,并且像素呈连续性。
表2为消融实验的量化结果,可以看出使用注意力模块和强度引导的网络结构,在训练过程中同时采用KL散度和序数回归损失进行约束,即文中提出方法,能够获得最佳的量化结果。
Without attention Without intensity KL + TV OR + TV Proposed "N" and "J" 0.7204 0.4510 0.6129 0.2432 0.1958 Table 2. Ablation experimental quantitative results
-
文中使用预上采样法,即在输入网络前将SPAD阵列原始数据的空间分辨率从32×32 pixel提升至128×128 pixel。上采样的表现形式之一为稀疏点云更加密集。对比低分辨率点云图、使用后上采样法产生的点云图(先将数据输入网络处理,再进行上采样)和预上采样法产生的点云图,结果如图5所示,预上采样法提高了深度数据携带的信息量,使网络可以处理更多的像素,重构结果像素具有空间联系,边缘平滑。
Single-photon LiDAR imaging method based on sensor fusion network
doi: 10.3788/IRLA20210871
- Received Date: 2021-11-23
- Rev Recd Date: 2021-12-28
- Available Online: 2022-03-04
- Publish Date: 2022-02-28
-
Key words:
- LiDAR /
- single-photon imaging method /
- sensor fusion /
- SPAD array /
- convolutional neural network
Abstract: LiDAR systems with active illumination obtain depth information of the scene using Single-Photon Avalanche Diode(SPAD) detectors to record the arrival time of reflected photons from the laser pulse. However, there is ambient light that interferes measurements during the detection period. Sensor fusion is one of the effective methods for single-photon imaging. Recently, many data-driven methods based on intensity-LiDAR fusion have achieved gratifying results, but most of them use the scanning LiDAR which has a slow depth acquisition speed. The advent of the SPAD array can overcome the limitation of frame rates. The SPAD array allows the collection of multiple returned photons at the same time, which accelerates the information collection process. However, the spatial resolution of SPAD array detectors is typically low, and the detection process is also interfered by the ambient light. Therefore, it is necessary to break the inherent limitation of the SPAD array through an algorithm to separate the depth information from the noise. In this paper, for the SPAD array detector with the array size of 32×32 pixel, a convolutional neural network was proposed, which could reconstruct high-resolution clean TCSPC histogram under the guidance of the intensity image. A multi-scale approach was adopted to extract input features, and the fusion of depth data and intensity data was further processed based on the attention mechanism in the network. In addition, a loss function combination suitable for the TCSPC histogram data processing network was designed, where the overall distribution of photons and the ordinal relationship between time bins in the temporal dimension could be simultaneously considered. The method proposed in this paper can successfully increase the depth spatial resolution by 4 times, and the efficacy of proposed method is verified on realistic data, which is superior to state-of-the-art methods qualitatively and quantitatively.