基于深度自编码-高斯混合模型的视频异常检测方法

钟友坤; 莫海宁

doi:10.3788/IRLA20210547

基于深度自编码-高斯混合模型的视频异常检测方法

doi: 10.3788/IRLA20210547

钟友坤^1,,
莫海宁^2, ,

1.
河池学院物理与机电工程学院，广西宜州 546300
2.
广西科技大学宏达威爱科技学院，广西柳州 545006

基金项目: 国家自然科学基金（61662007）

详细信息

作者简介:
钟友坤，男，高级实验师，硕士，主要从事计算机及其机电设计应用方面的研究

通讯作者: 莫海宁，男，副教授，硕士，主要从事数据挖掘与信息管理方面的研究。

中图分类号: TP391.4

A video anomaly detection method based on deep autoencoding Gaussian mixture model

Zhong Youkun^1
,,
Mo Haining^{2
, ,}

1.
Physics and Mechanical & Electrical Engineering School, Hechi University, Yizhou 546300, China
2.
HTC VIVEDU School of Technology, Guangxi University of Science and Technology, Liuzhou 545006, China

Funds: National Natural Science Foundation of China （61662007）

摘要: 由于异常定义的模糊性和真实数据的复杂性，视频异常检测是智能视频监控中最具挑战性的问题之一。基于自动编码器(AE)的帧重建（当前或未来帧）是一种流行的视频异常检测方法。使用在正常数据上训练的模型，异常场景的重建误差通常比正常场景的重建误差大得多。但是，这类方法忽略了正常数据本身的内部结构，效率较低。基于此，提出了一种深度自动编码高斯混合模型(DAGMM)。首先利用深度自动编码器获得输入视频片段的生成低维表示和重构误差，并将其进一步输入高斯混合模型(GMM)。而估计网络则通过高斯混合模型预测能量概率，然后通过能量密度概率判断异常。DAGMM以端到端的方式同时联合优化深度自动编码器和GMM的参数，能够平衡自动编码重建、低维表示的密度估计和正则化，泛化能力强。在两个公共基准数据集上的实验结果表明，DAGMM达到了现有最高技术发展水平，在UCSD Ped2和ShanghaiTech两个数据集上分别取得了95.7%和72.9%的帧级AUC。
- 视频监控 /
- 异常事件 /
- 自编码网络 /
- 高斯混合模型 /
- 深度学习
Abstract: Due to the vagueness of anomaly definition and the complexity of real data, video anomaly detection is one of the most challenging problems in intelligent video surveillance. Frame reconstruction (current or future frame) based on autoencoder (AE) is a popular video anomaly detection method. Using a model trained on normal data, the reconstruction error of abnormal scenes is usually much larger than that of normal scenes. However, these methods ignore the internal structure of the normal data and are memory-consuming. Based on this, a deep auto-encoding Gaussian mixture model (DAGMM) was proposed. Firstly, the deep autoencoder was used to obtain the low-dimensional representation of the input video segment and the reconstruction error, and then further input into a Gaussian mixture model (GMM). The energy probability was predicted through the Gaussian mixture model, and then the anomaly was judged through the energy density probability. The proposed DAGMM can simultaneously optimizes the parameters of the deep autoencoder and GMM in an end-to-end manner, and balance auto-encoding reconstruction, density estimation and regularization of low-dimensional representation, and has strong generalization ability. Experimental results on two public benchmark datasets show that DAGMM has reached the highest level of technological development, achieving 95.7% and 72.9% frame-level AUC on the UCSD Ped2 and ShanghaiTech dataset, respectively.
- video surveillance /
- anomalous event /
- auto-encoding network /
- Gaussian mixture model /
- deep learning

图 1 基于深度自编码-高斯混合模型的异常检测方法流程

Figure 1. Flow chart of abnormal event detection method based on DAGMM

下载: 全尺寸图片幻灯片

图 2 部分检测结果示例

Figure 2. Examples of the detection results

下载: 全尺寸图片幻灯片

表 2 与现有技术发展水平检测方法结果对比（以AUC%的形式）

Table 2. Comparison with the state of the art methods in terms of AUC%

Method	UCSD Ped2	ShanghaiTech
MPPCA^[3]	69.3%	-
MDT^[4]	82.9%	-
MT-FRCN^[5]	92.2%	-
Conv2D-AE^[10]	85.0%	60.9%
Conv3D-AE^[10]	91.2%	-
ConvLSTM-AE^[20]	88.1%	-
StackRNN^[21]	92.2%	68.0%
Baseline^[18]	95.4%	72.8%
Proposed method	95.7%	72.9%

下载: 导出CSV

表 1 基准数据集概述

Table 1. Overview of benchmark datasets

Attributes	UCSD Ped2	ShanghaiTech
Frames	4560	317398
Scene	Single	Multi
Labels	Spatial & Temporal	Spatial & Temporal
Resolution	360×240	856×480
Anomalies	Biker, cart, etc	Chasing, brawling sudden motion, etc

下载: 导出CSV

表 3 高斯混合分量个数K对于UCSD Ped2数据集实验结果（帧级 AUC%）的影响

Table 3. Influence of the number of Gaussian mixture components number K on the experimental results of the UCSD Ped2 data set (frame-level AUC%)

$ K $	AUC%
2	92.3%
4	94.5%
8	951%
16	95.7%
32	95.6%
64	95.7%

下载: 导出CSV

[1]	Sabokrou M, Fayyaz M, Fathy M, et al. Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes [J]. Computer Vision and Image Understanding, 2018, 172: 88-97. doi: 10.1016/j.cviu.2018.02.006
[2]	Li C, Han Z, Ye Q, et al. Visual abnormal behavior detection based on trajectory sparse reconstruction analysis [J]. Neurocomputing, 2013, 119(7): 94-100.
[3]	Jiang F, Yuan J, Tsaftaris S A, et al. Anomalous video event detection using spatiotemporal context [J]. Computer Vision and Image Understanding, 2011, 115(3): 323-333. doi: 10.1016/j.cviu.2010.10.008
[4]	Li W, Mahadevan V, Vasconcelos N. Anomaly detection and localization in crowded scene [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(1): 18-32.
[5]	Reddy V, Sanderson C, Lovell B. Improved anomaly detection in crowded scenes via cell-based analysis of foreground speed, size and texture [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2011: 55-61.
[6]	Wang S, Zhu E, Yin J, et al. Video anomaly detection and localization by local motion based joint video representation and OCELM [J]. Neurocomputing, 2018, 277: 161-175. doi: 10.1016/j.neucom.2016.08.156
[7]	Kaur P, Gangadharappa M, Gautam S. An overview of anomaly detection in video surveillance [C]//International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), 2018.
[8]	Schmidhuber J. Deep learning in neural networks: An overview [J]. Neural Networks, 2015, 61: 326-366.
[9]	Lecun Y, Bengio Y, Hinton G. Deep learning [J]. Nature, 2015, 521: 436-444. doi: 10.1038/nature14539
[10]	Hasan M, Choi J, Neumanny J, et al. Learning temporal regularity in video sequences [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[11]	Gong D, Liu L, Le V, et al. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 1-8.
[12]	Ravanbakhsh M, Sangineto E, Nabi M, et al. Abnormal event detection in videos using generative adversarial nets [C]//Proceedings of the IEEE International Conference on Image Processing (ICIP) 2017: 1-5.
[13]	Ravanbakhsh M, Sangineto E, Nabi M, et al. Training adversarial discriminators for cross-channel abnormal event detection in crowds [C]//Winter Conference on Applications of Computer Vision, 2019: 1896-1904.
[14]	Narasimhan MG, S SK. Dynamic video anomaly detection and localization using sparse denoising autoencoders [J]. Multimedia Tools Appl, 2018, 77(11): 1317313195.
[15]	Sabzalian B, Marvi H, Ahmadyfard A. Deep and sparse features for anomaly detection and localization in video [C]//4th International Conference on Pattern Recognition and Image Analysis (IPRIA), 2019: 173-178
[16]	Lin S, Yang H, Tang X, et al. Social MIL: Interaction-aware for crowd anomaly detection [C]//16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2019: 1-8.
[17]	Fan Y, Wen G, Li D, et al. Video anomaly detection and localization via gaussianmixture fully convolutional variational autoencoder [J]. Computer Vision and Image Understanding, 2020, 195: 102920. doi: 10.1016/j.cviu.2020.102920
[18]	Liu W, Luo W, Lian D, et al. Future frame prediction for anomaly detection-a new baseline [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6536-6545.
[19]	Springenberg J, Dosovitskiy A, Brox T, et al. Striving for simplicity: The all convolutional net [C]//International Conference on Learning Representations, 2015.
[20]	Luo W, Liu W, Gao S. Remembering history with convolutional lstm for anomaly detection [C]//IEEE International Conference on Multimedia and Expo (ICME), 2017: 439-444.
[21]	Luo W, Liu W, Gao S. A revisit of sparse coding based anomaly detection in stacked rnn framework [C]//IEEE International Conference on Computer Vision, 2017: 341-349.
[22]	王栋, 张晓俊, 戴丽华. 基于深度高斯过程回归的视频异常事件检测方法[J]. 电子测量与仪器学报, 2021, 35(3): 158-164 Wang Dong, Zhang Xiaojun, Dai Lihua. Video anomaly detection and localization via deep Gaussian process regression [J]. Chinsese Journal of Scientific Instrument, 2021, 35(3): 158-164. (in Chinese)
[23]	佘博, 田福庆, 梁伟阁. 基于深度卷积变分自编码网络的故障诊断方法[J]. 电子测量与仪器学报, 2018, 39(10): 27-35 Yu Bo, Tian Fuqing, Liang Weige. Fault diagnosis based on a deep convolution variational autoencoder network [J]. Journal of Electronic Measurenment and Instrument, 2018, 39(10): 27-35. (in Chinese)

[1]	张逸文, 蔡宇, 苑莉薪, 胡明列. 基于循环神经网络的超短脉冲光纤放大器模型（特邀） . 红外与激光工程, 2022, 51(1): 20210857-1-20210857-7. doi: 10.3788/IRLA20210857
[2]	李奕铎, 郭子博, 刘凯, 孙逍遥. 基于误差限制的神经网络混合精度量化方法（特邀） . 红外与激光工程, 2022, 51(4): 20220166-1-20220166-8. doi: 10.3788/IRLA20220166
[3]	孙敬波, 季节. 视频监控下利用记忆力增强自编码的行人异常行为检测 . 红外与激光工程, 2022, 51(6): 20210680-1-20210680-7. doi: 10.3788/IRLA20210680
[4]	范有臣, 马旭, 马淑丽, 钱克昌, 郝红星. 基于深度学习的激光干扰效果评价方法 . 红外与激光工程, 2021, 50(S2): 20210323-1-20210323-7. doi: 10.3788/IRLA20210323
[5]	郑淑君, 姚曼虹, 王晟平, 张子邦, 彭军政, 钟金钢. 基于光电混合神经网络的单像素快速运动物体分类（特邀） . 红外与激光工程, 2021, 50(12): 20210856-1-20210856-11. doi: 10.3788/IRLA20210856
[6]	邓人隽, 史坦, 李向平, 邓子岚. 基于全局拓扑优化深度学习模型的超构光栅分束器 . 红外与激光工程, 2021, 50(5): 20211028-1-20211028-4. doi: 10.3788/IRLA20211028
[7]	刘云朋, 霍晓丽, 刘智超. 基于深度学习的光纤网络异常数据检测算法 . 红外与激光工程, 2021, 50(6): 20210029-1-20210029-6. doi: 10.3788/IRLA20210029
[8]	李芳丽. 监控视频中采用深度支持向量数据描述的异常检测 . 红外与激光工程, 2021, 50(9): 20210094-1-20210094-7. doi: 10.3788/IRLA20210094
[9]	杨程, 鄢秋荣, 祝志太, 王逸凡, 王明, 戴伟辉. 基于深度学习的压缩光子计数激光雷达 . 红外与激光工程, 2020, 49(S2): 20200380-20200380. doi: 10.3788/IRLA20200380
[10]	钟锦鑫, 尹维, 冯世杰, 陈钱, 左超. 基于深度学习的散斑投影轮廓术 . 红外与激光工程, 2020, 49(6): 20200011-1-20200011-11. doi: 10.3788/IRLA20200011
[11]	张旭, 于明鑫, 祝连庆, 何彦霖, 孙广开. 基于全光衍射深度神经网络的矿物拉曼光谱识别方法 . 红外与激光工程, 2020, 49(10): 20200221-1-20200221-8. doi: 10.3788/IRLA20200221
[12]	周宏强, 黄玲玲, 王涌天. 深度学习算法及其在光学的应用 . 红外与激光工程, 2019, 48(12): 1226004-1226004(20). doi: 10.3788/IRLA201948.1226004
[13]	张秀, 周巍, 段哲民, 魏恒璐. 基于卷积稀疏自编码的图像超分辨率重建 . 红外与激光工程, 2019, 48(1): 126005-0126005(7). doi: 10.3788/IRLA201948.0126005
[14]	唐聪, 凌永顺, 杨华, 杨星, 郑超. 基于深度学习物体检测的视觉跟踪方法 . 红外与激光工程, 2018, 47(5): 526001-0526001(11). doi: 10.3788/IRLA201847.0526001
[15]	唐聪, 凌永顺, 郑科栋, 杨星, 郑超, 杨华, 金伟. 基于深度学习的多视窗SSD目标检测方法 . 红外与激光工程, 2018, 47(1): 126003-0126003(9). doi: 10.3788/IRLA201847.0126003
[16]	刘天赐, 史泽林, 刘云鹏, 张英迪. 基于Grassmann流形几何深度网络的图像集识别方法 . 红外与激光工程, 2018, 47(7): 703002-0703002(7). doi: 10.3788/IRLA201847.0703002
[17]	郭强, 芦晓红, 谢英红, 孙鹏. 基于深度谱卷积神经网络的高效视觉目标跟踪算法 . 红外与激光工程, 2018, 47(6): 626005-0626005(6). doi: 10.3788/IRLA201847.0626005
[18]	张秀玲, 侯代标, 张逞逞, 周凯旋, 魏其珺. 深度学习的MPCANet火灾图像识别模型设计 . 红外与激光工程, 2018, 47(2): 203006-0203006(6). doi: 10.3788/IRLA201847.0203006
[19]	游瑞蓉, 王新伟, 任鹏道, 何军, 周燕. 约翰逊准则的视频监控目标检测性能评估方法 . 红外与激光工程, 2016, 45(12): 1217003-1217003(6). doi: 10.3788/IRLA201645.1217003
[20]	龚卫国, 刘润瑶, 张睿. 光照突变下融合多类特征的场景分割方法 . 红外与激光工程, 2014, 43(12): 4164-4169.

点击查看大图

图(2) / 表(3)

计量

文章访问数: 336
HTML全文浏览量: 78
PDF下载量: 29
被引次数: 0

全文HTML

0. 引　言

视频监控系统越来越多地出现在各种公共场景和私人场所中，以监控人类活动并防止犯罪发生。毫无疑问，这需要有人观看监控视频，并在发生与正常情况不同的事情时进行判断并发出警报。然而，这些异常事件并不经常发生，因此大多数时候监控这些视频的人不会看到任何异常行为。这些不寻常的事件可以被认为是异常，可以将其定义为不符合正常情况的模式，发现这些不符合正常模式的任务称为异常检测。基于此，研究人员一直在尝试设计一种强大的异常检测算法，以自动监控和检测监控视频中的异常事件。

异常检测是一项具有挑战性的任务^[1]：首先，异常事件的定义往往取决于当时的环境，很难准确地区分正常事件和异常事件。其次，构成异常的不同可能性是无限的。第三，异常数据点，尤其是真实世界的数据，往往与可能被定义为正常的数据点非常接近。这些原因导致异常检测任务十分困难，是过去几年研究人员在提出新解决方案时一直在考虑的问题。

近十年前，大多数研究人员都专注于基于轨迹的异常检测^[2-3]。主要思想是：如果感兴趣的对象没有符合学习到的正常轨迹模式，视频将被标记为异常。然而，这种方法的一个主要缺点是遮挡，因为该方法严重依赖于持续检测跟踪感兴趣的对象。由于这些缺点，研究者们开始采用底层特征进行特征提取。这些基于低级特征的方法依赖于外观、运动和纹理特征的使用^[4-6]。大量的方法已经使用了各种底层运动特征表示来表示视频，如社会力模型、光流直方图等，但是这些仅基于运动的特征是不够充分的。动态纹理、描述空间和运动的光流特征、光流空间局部直方图和基于均匀局部梯度模式的光流等特征被提出^[6-7]。

尽管这些传统方法在基准数据集上取得了成功，但泛化能力较差，在其他场景中使用时它们仍然无效。此外，它们无法适应以前从未见过的异常。由于这些原因，研究人员探索使用深度神经网络来完成异常检测任务。这些神经网络能够自动学习有用的判别特征，从而消除了创建手工特征的麻烦，这也使其在用于不同场景时更具适应性。深度学习被证明对各种计算机视觉任务有效^[8-9]，例如图像中的特征提取、图像分类、对象检测、视频分析和许多其他任务。深度学习技术主要侧重于创建新网络结构或设计适合特定问题的组件。现有的基于深度学习的视频异常检测方法可以分为四类：（1）基于重构的方法^[10-11]：这类方法假设是正常样本的重构误差会更低，因为它们更接近训练数据，而对于不正常的样本，假设或预期重建误差会更高。这类方法往往基于自编码，它能够将输入编码为更紧凑的表示的同时保留重要的判别特征，并且还能够将该特定编码解码回其原始形式。（2）基于预测未来帧的方法^[12-13]：这类方法主要是通过对基于现有帧对未来帧进行预测，看其是否符合现有帧的模式进行异常判断。这类方法基于生成对抗网络，它包含生成器和鉴别器两个网络，前者能够模拟原始数据分布，后者则给出输入是否来自生成器的概率。（3）基于分类的方法^[14-15]：这类问题可以看成对一段视频片段进行直接分类，给出其正常或是异常类别。由于正负例训练样本不在一个数量级，这类方法集中于利用卷积神经网络创建紧凑、高效且鲁棒的特征。（4）基于异常得分的方法^[16-23]：将问题定义为回归问题，其中目标是提供异常分数，然后将其用作确定视频片段或帧是否异常的手段。

与这些方法不同的是，文中基于深度自编码高斯混合模型（deep autoencoding Gaussian mixture model，DAGMM），提出一种新的异常检测方法。DAGMM包含一个压缩网络和一个估计网络：压缩网络通过深度自动编码器对输入视频片段进行降维，根据低维特征和重构误差特征输入到估计网络中；而估计网络则通过高斯混合模型预测能量概率，然后通过能量密度概率判断异常。通过同时最小化来自压缩网络的重建误差和来自估计网络的样本能量，可以联合训练一个降维结构，直接帮助目标密度估计任务。与之前方法不同的是，文中方法能够同时对事件表示（压缩网络）和异常检测模型（估计网络）进行联合优化，泛化能力强。在几个公共基准数据集上的实验表明，DAGMM的性能检测效果达到现有技术发展水平。

3. 结　论

文中提出了一种DAGMM网络，结合了深度自编码和高斯混合模型，用于监控视频中的异常检测。DAGMM由两个主要部分组成：压缩网络和估计网络，其中压缩网络将样本投影到低维空间，保留异常检测的关键信息，估计网络在框架下评估低维空间中的样本能量高斯混合建模。DAGMM能够实现端对端训练，估计网络预测样本混合隶属度，从而无需交替程序即可估计GMM中的参数，估计网络引入的正则化有助于压缩网络摆脱吸引力较小的局部最优，并通过端到端训练实现低重构误差。在两个数据集上的实验证明了文中提出的方法与最先进的方法相比具有竞争力。

参考文献 (23)

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于深度自编码-高斯混合模型的视频异常检测方法

doi: 10.3788/IRLA20210547

作者简介:
钟友坤，男，高级实验师，硕士，主要从事计算机及其机电设计应用方面的研究

通讯作者: 莫海宁，男，副教授，硕士，主要从事数据挖掘与信息管理方面的研究。

A video anomaly detection method based on deep autoencoding Gaussian mixture model

计量

基于深度自编码-高斯混合模型的视频异常检测方法

doi: 10.3788/IRLA20210547

1. 河池学院物理与机电工程学院，广西宜州 546300

2. 广西科技大学宏达威爱科技学院，广西柳州 545006

作者简介:
钟友坤，男，高级实验师，硕士，主要从事计算机及其机电设计应用方面的研究

通讯作者: 莫海宁，男，副教授，硕士，主要从事数据挖掘与信息管理方面的研究。

English Abstract

A video anomaly detection method based on deep autoencoding Gaussian mixture model

1. Physics and Mechanical & Electrical Engineering School, Hechi University, Yizhou 546300, China

2. HTC VIVEDU School of Technology, Guangxi University of Science and Technology, Liuzhou 545006, China

全文HTML

1.1. 压缩网络原理

1.2. 估计网络原理

1.3. 目标函数

1.4. 预测

2.1. 实验数据及评价指标

2.2. 实验设置

2.3. 实验结果

目录

留言板

基于深度自编码-高斯混合模型的视频异常检测方法

doi: 10.3788/IRLA20210547

作者简介: 钟友坤，男，高级实验师，硕士，主要从事计算机及其机电设计应用方面的研究

通讯作者: 莫海宁，男，副教授，硕士，主要从事数据挖掘与信息管理方面的研究。

A video anomaly detection method based on deep autoencoding Gaussian mixture model

计量

出版历程

基于深度自编码-高斯混合模型的视频异常检测方法

doi: 10.3788/IRLA20210547

1. 河池学院 物理与机电工程学院，广西 宜州 546300 2. 广西科技大学 宏达威爱科技学院，广西 柳州 545006

作者简介: 钟友坤，男，高级实验师，硕士，主要从事计算机及其机电设计应用方面的研究

通讯作者: 莫海宁，男，副教授，硕士，主要从事数据挖掘与信息管理方面的研究。

English Abstract

A video anomaly detection method based on deep autoencoding Gaussian mixture model

1. Physics and Mechanical & Electrical Engineering School, Hechi University, Yizhou 546300, China 2. HTC VIVEDU School of Technology, Guangxi University of Science and Technology, Liuzhou 545006, China

全文HTML

1.1. 压缩网络原理

1.2. 估计网络原理

1.3. 目标函数

1.4. 预测

2.1. 实验数据及评价指标

2.2. 实验设置

2.3. 实验结果

目录

作者简介:
钟友坤，男，高级实验师，硕士，主要从事计算机及其机电设计应用方面的研究

1. 河池学院物理与机电工程学院，广西宜州 546300

2. 广西科技大学宏达威爱科技学院，广西柳州 545006

作者简介:
钟友坤，男，高级实验师，硕士，主要从事计算机及其机电设计应用方面的研究

1. Physics and Mechanical & Electrical Engineering School, Hechi University, Yizhou 546300, China

2. HTC VIVEDU School of Technology, Guangxi University of Science and Technology, Liuzhou 545006, China