Optical fiber network  abnormal data detection algorithm based on deep learning

Liu Yunpeng; Huo Xiaoli; Liu Zhichao

doi:10.3788/IRLA20210029

The rapid identification of abnormal data from the massive data of large-scale optical fiber networks is a key issue of optical fiber communication technology. It is also an important research direction in optimizing optical fiber communication networks and improving communication accuracy in recent years. It mainly solves constraint relationship between the monitoring accuracy and convergence speed of abnormal data. Aiming at this problem, a monitoring algorithm based on the fusion of deep learning and genetic algorithm was proposed. The segmentation preprocessing of the initial data was completed through deep learning, and then the crossover probability and mutation probability with segmentation attributes was introduced into the genetic algorithm, thereby the retention of abnormal data features were enhanced. The original data was divided according to different attributes by segmentation preprocessing, thereby the amount of initial filtering data was reducing greatly, achieving the purpose of improving the detection speed of abnormal data; the segmentation attributes was introducd into the genetic factor of the genetic algorithm to make the results have a weighting effect, the separability of data was increased, thereby improving the monitoring accuracy. The proposed algorithm was compared with unoptimized genetic algorithm and clustering algorithm in the experiment. The results showed that the minimum relative errors of abnormal data volume of proposed algorithm, traditional genetic algorithm and clustering analysis algorithm were 0.029, 0.093 and 0.104, respectively; the average deviations were respectively 0.047, 0.155 and 0.156, the average convergence time were 5.84 s, 12.6 s and 9.32 s, respectively. It can be seen that this algorithm has been well optimized in terms of monitoring accuracy, stability and timeliness.

HTML

0. 引　言

随着光纤入户的不断推进，我国光纤互联网的规模与日俱增，而对网络数据通畅和网络安全的需求也越来越高^[1-3]。为高效维护光纤网络中的正常数据通信，研究网络异常数据快速识别方法具有重要意义。

针对光纤网络异常有多种检测方法，例如免疫算法(Immune Algorithm, IA)^[4]、统计分类(Statistical Classification, SC)^[5]算法、聚类分析(Cluster Analysis, CA)^[6]算法、人工神经网络(Artificial Neural Network, ANN) ^[7]算法等。免疫算法将异常数据作为“抗原”提取其数据特征，从而形成具有针对性的选择依据，其具有收敛速度快、准确性高的优点，但通常适用于具有明显特征的异常数据类型，普适性差；统计分类算法是由大量数据统计结果分类得到的，对异常数据的特征要求不高，具有更好的普适性，但处理数据量大，仅适用于统计数据集合限定的异常数据类型；聚类分析算法对不完整、不精确数据具有较好的包容性，但其稳定性差，易受干扰；人工神经网络算法具有学习能力，可以通过自组织完成异常数据的分类，同样具有较好的普适性，但数据量增大时陷入局部极值的可能性大幅提升。可见，每种算法各有所长，而实际需求往往是在识别精度与算法收敛速度之间选取最优点。参考文献[8]采用抽样融合改进算法对光纤网络数据进行分段聚类，该种方法进行多次随机抽样，虽然解决了数据属性分类的问题，但是分段效率较低。参考文献[9]通过不平衡数据分类实现不同权重数据的聚类，达到了很好的数据分类效果，但趋势预测过程时间较长。

综上所述，为了实现异常数据的快速识别与精确定位，文中借助机器学习(Machine Learning, ML) ^[10]的交互功能将深度学习(Deep Learning, DL) ^[11]与遗传算法(Genetic Algorithm, GA)^[12-13]相融合，利用GA扩展性好，对非线性数据包容性强的特点降低异常数据不确定性导致的识别困难，再通过DL的分类模式解决GA编码困难的问题，并利用DL自适应学习的特性提高算法的收敛速度。

1. 基于深度学习的数据预处理

由于光纤网络中数据量巨大^[14]，所以选择提取异常数据的合适范围对算法的时效性具有重要意义。由于异常数据与信号数据之间存在不相关的特性，所以采用相关运算可以完成对异常数据的鉴别，但是若对所有数据进行遍历对比时，计算量非常大，算法效率大幅下降，故文中采用深度学习技术完成异常数据特征的数据挖掘，完成对不相关数据剔除的降维处理。

首先对数据进行分段处理，对获取的光纤数据分段处理，将数据划分为五级，1级表示最大，5级表示最小，其中对于数据变化明显的敏感区域进行加权处理，利用组合和量化的手段完成数据提取侧重的偏向分析，这样可以在保证提取异常数据信息的同时降低算法运算量。对分段数据中所有数据点与故障状态条件的数据特征进行相关计算，当相关参数低于阈值时，将该数据点去掉，剩余数据构成相关数据集。

然后，构建相关特征邻域，由相关数据集采集的数据点特征计算得到，将相关数据集与特征集合并成光纤网络数据集合，其数据点就是网络故障变量参考值。再将数据集合中正常数据与异常数据之间的交界点作为焦点X，采用非线性离散的方法将数据分段，分段点可表示为：

式中：X为数据间的焦点；F为聚焦因子，F∈(0, 1)；n为数据分段数。由此光纤网络中的数据经非线性离散成为2n+1份。

最后，将所有数据集合都通过非线性离散处理再统一单边取值，并以0值作为最优值，这样可以大幅提高故障数据的检出精度，提高数据挖掘精度。

4. 结　论

文中提出了一种基于DL-GA的光纤网络异常数据监测算法。通过对初始数据分段匹配交叉与变异概率，提高了异常数据特征的遗传效率。针对5组不同异常数据类型的光纤数据进行对比实验，实验结果显示：在监测精度、稳定性及收敛时间几个方面，所提到的算法均优于两种传统算法。总之，所提算法在光纤网络异常数据监测领域具有一定实用价值。

Reference (16)

[1]	Liu Y X, Wang C Y, Wang C, et al. Online classification algorithm for uncertain data stream in big data [J]. Journal of Northeastern University (Natural Science Edition), 2016, 37(9): 1245-1249.
[2]	Wang Hui, Zhang Cuiyu. Differences between network data mining algorithm based on improved genetic algorithm [J]. Computer Simulation, 2015, 32(5): 311-314. (in Chinese)
[3]	Zhang Taijiang, Li Yongjun, Zhao Shanghong, et al. Design of space optical backbone network simulation platform based on OPNET and STK [J]. Journal of Applied Optics, 2019, 40(5): 901-909. (in Chinese)
[4]	Jia Qi. Location and monitoring of fiber optic line faults [J]. China New Telecommunications, 2017, 19(1): 74. (in Chinese)
[5]	Yeung S, Russakovsky O, Jin N, et al. Every moment counts: dense detailed labeling of actions in complex videos [J]. International Journal of Computer Vision, 2018, 126(24): 375-389.
[6]	Chen Yang, Zhao Shanghong, Wang Xiang, et al. BER analysis of high-altitude OFDM-FSO modulation system under exponentiated Weibull atmospheric turbulence model [J]. Laser & Infrared, 2018, 48(7): 832-837. (in Chinese)
[7]	Chen Y, Li L J. Very fast decision tree classification algorithm based on red-black tree for data stream with continuous attributes [J]. Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition), 2017, 37(2): 86-90.
[8]	Liu Yan, Wang Cunrui. An improved big data clustering method based on sampling fusion [J]. Microelectronics & Computer, 2017, 34(4): 17-21. (in Chinese)
[9]	Gu Xiaoqing, Jiang Yizhang, Wang Shitong. Zero-order TSK-type fuzzy system for imbalanced data classification [J]. Acta Automatica Sinica, 2017, 43(10): 1773-1788. (in Chinese)
[10]	Lee J, Lee S, Hwang I. Hybrid system modeling and estimation for arrival time prediction in terminal airspace [J]. Journal of Guidance Control & Dynamics, 2016, 39(4): 903-910.
[11]	Zhou Hongqiang, Huang Lingling, Wang Yongtian. Deep learning algorithm and its application in optics [J]. Infrared and Laser Engineering, 2019, 48(12): 1226004. (in Chinese)
[12]	Huang X, Wang Z, Li Y, et al. Design of fuzzy state feedback controller for robust stabilization of uncertain fractional-order chaotic systems [J]. Journal of the Franklin Institute, 2015, 351(12): 5480-5493.
[13]	Liu He, Wang Tao. Research on breakpoint fault detection method of optical fiber communication LAN [J]. Modern Electronics Technique, 2017, 40(16): 174-176. (in Chinese)
[14]	Ma Zongmei, Zhang Ruiping. Traffic anomaly identification of optical fiber communication based on big data background [J]. Laser Journal, 2019, 40(7): 75-78. (in Chinese)
[15]	Guan Lei, Hu Guangjun, Wang Zhuan. Research on network security situational awareness technology based on big data [J]. Netinfo Security, 2016, 1(9): 45-50. (in Chinese)
[16]	Guo H, Liu H, Wu C, et al. Logistic discrimination based on G-mean and F-measure for imbalanced problem [J]. Journal of Intelligent and Fuzzy Systems, 2016, 31(3): 1155-1166.

Test time/s	Actual abnormal data	Test abnormal data
		GA		CA		DL-GA
		Test value	Relative error	Test value	Relative error	Test value	Relative error
5	106	131	0.236	135	0.274	113	0.066
10	168	193	0.149	199	0.185	178	0.059
15	214	245	0.145	236	0.103	203	0.051
20	259	283	0.088	279	0.077	268	0.035
25	384	422	0.093	424	0.104	395	0.029
30	410	448	0.094	451	0.101	421	0.028
35	467	510	0.092	512	0.098	480	0.029
40	515	562	0.093	565	0.097	531	0.031
45	539	590	0.095	594	0.102	553	0.027
50	621	679	0.094	681	0.098	639	0.029

Optical fiber network abnormal data detection algorithm based on deep learning

doi: 10.3788/IRLA20210029

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views