-
卷积神经网络(convolutional neural network,CNN)是由Lecun等[17]在深度学习研究中首次提出的一种深度学习架构。 1DCNN网络在训练过程中能够减少神经网络模型参数的数量,通过卷积核的权值共享使得运算量大幅度降低,而且不会因数据的维度压缩而导致大量时序特征信息损失。结构图如图5所示。
RNN擅长处理时间序列数据,但在输入序列的维度很大时可能会出现梯度消失的情况。长短期记忆网络(long short-term memory,LSTM)和GRU对RNN进行优化,解决了梯度消失的问题。GRU是LSTM的一种变体[18],比LSTM少了一个门函数,参数的数量远少于LSTM,因此GRU训练速度更快且成本更低。GRU网络模型中有两个门:更新门和重置门。更新门控制当前时刻输出的状态中要保留多少历史状态以及保留多少当前时刻的候选状态;重置门决定当前时刻的候选状态是否需要依赖上一时刻的网络状态以及需要依赖多少[19]。具体结构如图6所示。
用公式可以表达为:
$$ \begin{array}{c}{r}_{t}=\sigma \left({W}_{r}\cdot \left[{h}_{t-1},{x}_{t}\right]\right)\end{array} $$ (1) $$ \begin{array}{c}{z}_{t}=\sigma \left({W}_{z}\cdot \left[{h}_{t-1},{x}_{t}\right]\right)\end{array} $$ (2) $$ \begin{array}{c}{h}_{t}=\left(1-{z}_{t}\right)*{h}_{t-1}+{z}_{t}*\tilde{{h}_{t}}\end{array} $$ (3) $$ \begin{array}{c}\tilde{{h}_{t}}=\mathit{\rm tan}h\left({W}_{\tilde{h}}\cdot \left[{r}_{t}*{h}_{t-1},{x}_{t}\right]\right)\end{array} $$ (4) 式中:[]表示两个向量相连;*表示矩阵的乘积;
${{r_t}} $ 为重置门,${{z_t}} $ 为更新门;$W_r $ 、$W_z $ 、${{W_{\tilde h}}} $ 为权重参数;$ x_t$ 为t时刻GRU的输入;$\widetilde {{h_t}} $ 表示一个GRU单元在t时刻的激活值;${h_t} $ 表示当前GRU单元的激活值;$\sigma $ 为sigmod函数。 -
文中将1DCNN网络与GRU网络各自的特点相结合,采用了一种1DCNN-GRU的网络模型对人体动作进行分类识别。该设计中的网络模型输入参数是1×1200维的特征向量,由于特征点较多,因此首先选择三层1DCNN神经网络进行深层特征提取,实现对特征量进行特征重构与特征向量降维,且同时保证不破坏特征向量的时序性,然后利用GRU网络的非线性特征学习能力对卷积后的特征向量进行学习,得到可表述人体动作的特征集。最后通过全连接层、softmax层对结果进行分类。网络结构如图7所示。
-
文中实验提出的动作识别算法的流程图如图8所示,首先对一组PIR传感器采集的数据进行预处理,去除其他热源干扰,并将数据进行特征融合构建数据集。然后利用深度学习神经网络学习并建立了动作检测模型,最后使用机器学习分类算法来完成动作的识别。
-
以实验采集志愿者在PIR传感器下做指定动作时的实时输出信号作为数据集。为保证实验的多样性,排除偶然性,招募五名志愿者(3名男性,2名女性)协助测量数据。要求志愿者在两个PIR传感器视场路径的中心区域完成指定动作,每位志愿者依次进行坐、站、走、跑和摔倒这五种动作。在检测现场无规则摆放各种障碍物,并且让每位志愿者以不同的速度、动作幅度和方向做上面五种动作。传感器节点采集一次数据的时间长度为6 s,采样频率为100 Hz,并通过计算机的Matlab软件来查看和保存实时数据。将两传感器的实时输出数据拼接形成数据集中的一个样本数据,该数据集中包含1500组样本数据,五种动作样本各300组。选取每种样本中的200组数据组成容量1000组的训练集,剩下的500组作为测试集来验证网络结构的有效性。图9为某一志愿者五种动作的原始数据。
-
由于五位志愿者行为习惯及体型都不同,导致样本数据的数值变化范围存在差异,需要对每个样本数据进行标准化处理。处理方法采用最大最小标准化,将样本数据映射到(0,1)区间之中,转换函数用公式表示为:
$$ \begin{array}{c}{x}_{t}{{'} }=\dfrac{{x}_{t}-\mathit{min}\left({x}_{t}\right)}{\mathit{max}\left({x}_{t}\right)-\mathit{min}\left({x}_{t}\right)}\end{array} $$ (5) 式中:
$ x_{t} $ 表示某一样本数据;$x_{t}^{\prime} $ 表示预处理后的样本数据;$min \left(x_{t}\right) $ 表示一组样本数据的最小值;$max \left(x_{t}\right) $ 表示一组样本数据的最大值。经过归一化之后再对数据进行网络训练可以加快梯度下降求最优解的速度,减少训练时间。将经过归一化处理的两路信号进行特征融合,即将特征向量进行连接。若有
$ n $ 个输入特征向量$ {x}_{1},{x}_{2},\cdots ,{x}_{n} $ ,其维数分别为${p_1},{p_2}, \cdots ,{p_n}$ ,则输出特征向量$y$ 的维数为$\displaystyle \sum\limits_{i = 1}^n {{p_i}} $ 。最后将数据集加上标签,用训练集输入神经网络中进行训练,并用测试集测试模型训练效果。 -
实验中网络设计采用的是Tensorflow+Keras框架,表1是具体网络结构设置。文字模型使用自适应性矩估计(adaptive moment estimation,Adam)优化函数更新模型的权重与偏置。为防止过拟合,设置GRU层的dropout值[20]为0.05。
表 1 1DCNN-GRU模型参数设置
Table 1. Parameter settings for 1DCNN-GRU model
Layer(type) Output shape Parameter Conv1 d_1(Conv1D) (None,1200,256) 1536 Max_poolingld_1 (None,300,256) 0 Conv1 d_2(Conv1D) (None,300,128) 98432 Max_poolingld_2 (None,75,128) 0 Conv1 d_3(Conv1D) (None,75,64) 24640 Max_poolingld_3 (None,18,64) 0 gru_1(GRU) (None,256) 246528 Dense_1(Dense) (None,32) 8224 dense_2(Dense) (None,5) 165 神经网络学习训练的过程将参数epochs设置为300,即训练300轮。图10是经过300次迭代之后得到的模型损失函数,图11是模型的训练准确度。训练结果显示:随着神经网络训练的进行,损失函数逐渐减小,训练精度逐渐提高,未发生过拟合,并且在迭代150次左右损失值下降到零,训练精度达到百分之百。在变化过程中出现的波动表示模型并不是在向着同一个方向学习,因此会出现波动。但最终准确率和损失都趋于平稳,即模型收敛。
表2是利用测试集在1DCNN-GRU模型下进行测试的准确率。实验结果显示:该模型的平均分类准确率达到了98.6%,达到了较好的识别效果。其中可以看出摔倒动作的识别率偏低,因为摔倒动作是一种复杂的全身动作,会导致PIR传感器采集热辐射信号变化的特征信息较多,因此识别率偏低。实验结果说明1DCNN-GRU网络模型对于文中的五种基础动作能够进行有效的区分。
表 2 五种动作分类的准确率
Table 2. Accuracy of 5 types of action classification
Action category Recognition accuracy Run 98% Walk 100% Sit 100% Stand 100% Fall 95% 在文中的研究中也对不同模型的性能进行了对比研究,表3所示为不同模型识别动作的训练时长及识别准确率的对比结果。GRU网络与传统的LSTM网络相比较具有更好的优势,在保证识别准确率几乎相同的前提下,GRU网络模型可以减少训练参数,加快收敛速度。尽管1DCNN的训练时长最短,但其识别的准确率还是较低的。因此在将特征向量输入到GRU网络之前,可利用卷积和池化进一步简化计算,在保证不丢失信号特征的同时得到性能最佳的训练模型。最终实验结果表明:通过结合1DCNN和RNN网络之后的模型既减小了运算负荷,又更能捕获实验数据的时序特征。因此文中提出的方法达到了较好的动作识别效果。
表 3 不同模型的性能对比
Table 3. Performance comparison of different models
Network model Accuracy Training time/s 1DCNN 93.8% 178 GRU 88.6% 467 LSTM 91.8% 543 1DCNN-LSTM 98.6% 245 1DCNN-GRU 98.8% 195
Human motion recognition method based on pyroelectric infrared sensor
-
摘要: 针对目前人体动作识别技术中存在的隐私暴露、技术复杂度高和识别精度低等相关问题,提出了一种基于热释电红外(PIR)传感器的人体动作识别方法。首先,采用一组安置在天花板上经过视场调制的PIR传感器采集人体运动时散发的红外热辐射信号,将传感器输出的电压模拟信号进行滤波放大后通过ZigBee无线模块传送到PC端打包成原始数据集;其次,将原始数据的两路传感器输出数据进行特征融合,对融合后的数据做标准化处理封装为训练集和测试集;然后,基于数据的特征提出一种两层级联的混合深度学习网络模型作为人体动作的分类算法,第一层采用一维卷积神经网络(1DCNN)对数据进行特征提取,第二层采用门控循环单元(GRU)保存历史输入信息防止丢失有效特征;最后,利用训练集来训练该网络模型得出参数最优的分类模型,通过测试集验证模型的正确性。实验结果表明,提出的该动作识别技术模型对基本动作分类的准确率高于98%,与图像动作识别或穿戴式设备动作识别相比,实现了实时、便捷、低成本和高保密性的高精度人体动作识别。Abstract: Aiming at the privacy exposure, high technical complexity, and low recognition accuracy existing in the current human motion recognition technology, this paper proposed a human motion recognition method based on a pyroelectric infrared (PIR) sensor. Firstly, a set of PIR sensors placed on the ceiling and modulated by the field of view were used to collect the infrared heat radiation signal emitted by the human body when moving, and the voltage analog signal output by the sensor was filtered and amplified, and then transmitted to the PC through the ZigBee wireless module and packaged into raw data. Secondly, the two-way sensor output data of the original data feature was fused, and the fused data was standardized and packaged into training dataset and test dataset. Then, a two-layer cascaded hybrid deep learning network was proposed to be a classification algorithm of human motion based on the characteristics of the data. The first layer used one-dimensional convolutional neural network (1DCNN) to extract features from the data, and the second layer used gated recurrent unit (GRU) to save historical input information to prevent loss of valid features. Finally, the training dataset was used to train the network model to obtain a classification model with the best parameters, and the correctness of the model was verified through the test dataset. The experimental results show that the accuracy of the proposed motion recognition technology model for basic motion classification is higher than 98%. Compared with image motion recognition or wearable device motion recognition, it realizes high-precision human motion recognition with real-time, convenience, low cost and strong confidentiality.
-
表 1 1DCNN-GRU模型参数设置
Table 1. Parameter settings for 1DCNN-GRU model
Layer(type) Output shape Parameter Conv1 d_1(Conv1D) (None,1200,256) 1536 Max_poolingld_1 (None,300,256) 0 Conv1 d_2(Conv1D) (None,300,128) 98432 Max_poolingld_2 (None,75,128) 0 Conv1 d_3(Conv1D) (None,75,64) 24640 Max_poolingld_3 (None,18,64) 0 gru_1(GRU) (None,256) 246528 Dense_1(Dense) (None,32) 8224 dense_2(Dense) (None,5) 165 表 2 五种动作分类的准确率
Table 2. Accuracy of 5 types of action classification
Action category Recognition accuracy Run 98% Walk 100% Sit 100% Stand 100% Fall 95% 表 3 不同模型的性能对比
Table 3. Performance comparison of different models
Network model Accuracy Training time/s 1DCNN 93.8% 178 GRU 88.6% 467 LSTM 91.8% 543 1DCNN-LSTM 98.6% 245 1DCNN-GRU 98.8% 195 -
[1] Patel C I, Labana D, Pandya S, et al. Histogram of oriented gradient-based fusion of features for human action recognition in action video sequences [J]. Sensors, 2020, 20(24): 7299. doi: https://doi.org/10.3390/s20247299 [2] Ma Shiwei, Liu Lina, Fu Qi, et al. Using PHOG fusion features and multi-class Adaboost classifier for human behavior recognition [J]. Optics and Precision Engineering, 2018, 26(11): 2827-2837. (in Chinese) doi: 10.3788/OPE.20182611.2827 [3] Li Qinghui, Li Aihua, Cui Zhigao, et al. Action recognition via restricted dense trajectories and spatio-temporal co-occurrence feature [J]. Optics and Precision Engineering, 2018, 26(1): 230-237. (in Chinese) [4] Sandhya R S, Apparao N G, Usha S V. Kinematic joint descriptor and depth motion descriptor with convolutional neural networks for human action recognition [J]. Materials Today: Proceedings, 2020, 37(2): 3164-3173. doi: https://doi.org/10.1016/j.matpr.2020.09.052 [5] Pei Xiaomin, Fan Huijie, Tang Yandong. Action recognition method of spatio-temporal feature fusion deep learning network [J]. Infrared and Laser Engineering, 2018, 47(2): 0203007. (in Chinese) doi: 10.3788/IRLA201847.0203007 [6] Pei Xiaomin, Fan Huijie, Tang Yandong. Two-person interaction recognition based on multi-stream spatio-temporal fusion network [J]. Infrared and Laser Engineering, 2020, 49(5): 20190552. (in Chinese) doi: 10.3788/IRLA20190552 [7] Liu S Q, Zhang J C, Zhang Y Z, et al. A wearable motion capture device able to detect dynamic motion of human limbs [J]. Nature Communications, 2020, 11(1): 5615. doi: https://doi.org/10.1038/s41467-020-19424-2 [8] Su Benyue, Zheng Dandan, Tang Qingfeng, et al. Human daily short-time activity recognition method driven by single sensor data [J]. Infrared and Laser Engineering, 2019, 48(2): 0226003. (in Chinese) doi: 10.3788/IRLA201948.0226003 [9] Wang Zhenyu, Zhang Lei. Deep convolutional and gated recurrent neural networks for sensor-based activity recognition [J]. Journal of Electronic Measurement and Instrumentation, 2020, 34(1): 1-9. (in Chinese) [10] Wang Y, Jiang X L, Cao R Y, et al. Robust indoor human activity recognition using wireless signals [J]. Sensors, 2015, 15(7): 17195-208. doi: 10.3390/s150717195 [11] Liu Xiwen, Chen Haiming. Wi-ACR: a human action counting and recognition method based on CSI [J]. Jourmal of Beijing University of Posts and Telecommunications, 2020, 43(5): 105-111. (in Chinese) [12] De P, Chatterjee A, Rakshit A. PIR sensor-based AAL tool for human movement detection: modified MCP-based dictionary learning approach [J]. IEEE Transactions on Instrumentation and Measurement, 2020, 69(10): 7377-7385. doi: 10.1109/TIM.2020.2981106 [13] Pourpanah F, Zhang B, Ma R, et al. Non-intrusive human motion recognition using distributed sparse sensors and the genetic algorithm based neural network[C]//2018 IEEE Sensors. IEEE, 2018: 1-4. [14] Sun Q, Hu F. Dual-mode binary thermal sensing for indoor human scenario recognition with pyroelectric infrared sensors[C]//2019 IEEE Sensors. IEEE, 2019: 1-4. [15] Guan Q, Li C, Qin L, et al. Daily activity recognition using pyroelectric infrared sensors and reference structures [J]. IEEE Sensors Journal, 2018, 19(5): 1645-1652. [16] Yang Y, Yang H L, Liu Z X, et al. Fall detection system based on infrared array sensor and multi-dimensional feature fusion [J]. Measurement, 2022, 192: 110870. doi: 10.1016/j.measurement.2022.110870 [17] Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. doi: 10.1109/5.726791 [18] Cho K, Van M B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[DB/OL]. (2014-06-03). https://arxiv.org/abs/1406.1078. [19] Chung J, Gulcehre C, Cho K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[DB/OL]. https://arxiv.org/abs/1412.3555. [20] Srivastava N, Hinton G, Krizhevsk A, et al. Dropout: A simple way to prevent neural networks from overfitting [J]. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.