-
为验证DL-GA算法对光纤网络中异常数据的监测能力,从监测精度、算法稳定性与收敛时间三个方面对该算法与常用的两种传统异常数据检测算法(传统的遗传算法GA和聚类分析算法CA)进行了比较。
-
分别用三种算法对光纤网络中的异常数据进行检测,然后对比测试得到的异常数据量与实际异常数据量之间的关系(表1)。为了模拟真实网络中时而产生的异常数据情况,加载异常数据时没有采用定量加入的方法,比较机器学习和分类筛选时有规律可循。
Test time/s Actual abnormal data Test abnormal data GA CA DL-GA Test value Relative error Test value Relative error Test value Relative error 5 106 131 0.236 135 0.274 113 0.066 10 168 193 0.149 199 0.185 178 0.059 15 214 245 0.145 236 0.103 203 0.051 20 259 283 0.088 279 0.077 268 0.035 25 384 422 0.093 424 0.104 395 0.029 30 410 448 0.094 451 0.101 421 0.028 35 467 510 0.092 512 0.098 480 0.029 40 515 562 0.093 565 0.097 531 0.031 45 539 590 0.095 594 0.102 553 0.027 50 621 679 0.094 681 0.098 639 0.029 Table 1. Statistical table of test abnormal data volume and actual abnormal data volume
由表1看出,随着测试时间的增加,由人为引入的异常数据量也不断增加,实际异常数据量是已知的,然后对三种算法解算出的异常数据量进行统计分析。传统遗传算法的最小误差为0.092,最大误差为0.236,平均误差为0.117 9;传统聚类分析算法的最小误差为0.077,最大误差为0.274,平均误差为0.123 9;所提算法的最小误差为0.027,最大误差为0.066,平均误差为0.038 4。由此数据对比可以看出,所提算法在监测精度上明显优于两种传统方法,相同测试时刻的异常数据测试值更接近真实数量。并且随着测试时间增加,样本量增大,相对误差逐渐降低并趋于稳定。
-
算法的稳定性主要是算法在计算过程中出现的误差是否敏感,包括舍入误差、冗余能力等,也就是当输入数据中存在部分不符合规律的数据时,对于算法运算结果的影响程度。文中主要通过适应度值来反映算法的稳定性,因为适应度值可以表达不同数据条件下算法输出的结果,所以这里被用于评判算法的稳定性。分别采用三种算法对最优解进行迭代逼近,从而分析三种算法最大适应度的变化规律,结果见图2。
由图2(a)可知,随着迭代数的增加,三种算法的最大适应度值都将会趋于稳定,但相比之下,DL-GA算法大约在200次后就基本稳定了,而GA算法和CA算法分别在300次和350次趋于稳定,由此可见所提出算法的稳定性优于传统算法。由图2(b)可知,迭代过程中的样本个数越大,均值误差越平稳,对比三种算法的均值偏差可知,三种算法的偏差平均值分别为0.047、0.155和0.156,DL-GA的均值误差更集中且更小。通过均值误差可以有效地反映系统对异常数据的识别能力。
-
为了验证算法的时效性,针对相同数据集通过三种算法完成异常数据检测收敛时间的对比。为了使测试数据具有更好的普适性,分别设置了5组测试数据,这5组数据中混入不同比例的各类异常数据,而总数据量一致,从而排除了由于异常数据存在特征而导致某种算法收敛性好的可能。将5次不同数据测试结果进行了比较,结果如图3所示。
由图3可知,虽然测试数据中采用了不同的异常数据类型,但是DL-GA算法的收敛时间明显优于两种传统算法。通过在编译软件平台中调用时钟完成对算法运行时间的比较。DL-GA算法的收敛平均耗时为5.84 s,GA算法的收敛平均耗时为12.60 s,CA算法的收敛平均耗时为9.32 s。由此可见,在相同数据量的条件下,DL-GA具有更快的收敛速度,表示其具有更高的计算效率。
Optical fiber network abnormal data detection algorithm based on deep learning
doi: 10.3788/IRLA20210029
- Received Date: 2021-01-18
- Rev Recd Date: 2021-02-04
- Publish Date: 2021-06-30
-
Key words:
- optical fiber network /
- network anomaly monitoring /
- deep learning /
- genetic algorithm /
- clustering algorithm
Abstract: The rapid identification of abnormal data from the massive data of large-scale optical fiber networks is a key issue of optical fiber communication technology. It is also an important research direction in optimizing optical fiber communication networks and improving communication accuracy in recent years. It mainly solves constraint relationship between the monitoring accuracy and convergence speed of abnormal data. Aiming at this problem, a monitoring algorithm based on the fusion of deep learning and genetic algorithm was proposed. The segmentation preprocessing of the initial data was completed through deep learning, and then the crossover probability and mutation probability with segmentation attributes was introduced into the genetic algorithm, thereby the retention of abnormal data features were enhanced. The original data was divided according to different attributes by segmentation preprocessing, thereby the amount of initial filtering data was reducing greatly, achieving the purpose of improving the detection speed of abnormal data; the segmentation attributes was introducd into the genetic factor of the genetic algorithm to make the results have a weighting effect, the separability of data was increased, thereby improving the monitoring accuracy. The proposed algorithm was compared with unoptimized genetic algorithm and clustering algorithm in the experiment. The results showed that the minimum relative errors of abnormal data volume of proposed algorithm, traditional genetic algorithm and clustering analysis algorithm were 0.029, 0.093 and 0.104, respectively; the average deviations were respectively 0.047, 0.155 and 0.156, the average convergence time were 5.84 s, 12.6 s and 9.32 s, respectively. It can be seen that this algorithm has been well optimized in terms of monitoring accuracy, stability and timeliness.