High efficient activation function design for CNN model image classification task

Du Shengjie; Jia Xiaofen; Huang Yourui; Guo Yongcun; Zhao Baiting

doi:10.3788/IRLA20210253

Activation Functions (AF) play a very important role in learning and fitting complex function models of convolutional neural networks. In order to enable neural networks to complete various learning tasks better and faster, a new efficient activation function EReLU was designed in this paper. By introducing the natural logarithm function, EReLU effectively alleviated the problems of neuronal "necrosis" and gradient dispersion. Through the analysis of the activation function and its derivative function in the feedforward and feedback process of the mathematical model of the EReLU function exploration and design, the specific design of the EReLU function was determined through test, and finally the effect of improving the accuracy and accelerating training was achieved; Subsequently, EReLU was tested on different networks and data sets, and the results show that compared with ReLU and its improved function, the accuracy of EReLU is improved by 0.12%-6.61%, and the training efficiency is improved by 1.02%-6.52%, which strongly proved the superiority of EReLU function in accelerating training and improving accuracy.

HTML

0. 引　言

激活函数（Activation Functions，AF）的主要作用是将非线性因素引入卷积神经网络（Convolutional Neural Network，CNN），为整个模型提供充足的非线性扭曲力，进而帮助CNN更好的理解和拟合复杂函数模型，完成各项计算机视觉任务。

CNN早期发展阶段常用的激活函数有Sigmoid^[1]、Tanh^[2]函数，二者均属于S型饱和函数，这种函数容易导致梯度消失，使得模型训练困难。为了解决这一问题，Krizhevsky^[3]提出了一种线性整流单元（Rectified Linear Units，ReLU）并在当年的ImageNet ILSVRC比赛中取得了出色的成绩，相较于Sigmoid和Tanh函数，ReLU函数具有良好的稀疏性和较小的计算量，它不仅解决了梯度消失问题，还加快了网络训练速度，因此ReLU函数很快便成为CNN网络中常用的主流激活函数，但ReLU函数同样存在着一些缺陷，即ReLU函数容易在训练过程中导致神经元“坏死”^[3]，进而使“坏死”的神经元在整个训练过程中失去传递信息的能力，对模型产生不利的影响。

为了解决上述问题，基于ReLu函数的改进型激活函数出现了。如Dubey A K^[4]等提出的LeakyReLU激活函数在函数负半段设计了泄露单元，有效缓解了神经元“坏死”问题；He K^[5]等人提出的非线性修正激活函数PReLU通过引入额外的可学习参数不仅很好的解决了“坏死”问题，还有效提高了模型的拟合能力；Clevert D A^[6]等提出的ELU激活函数同样取得了比ReLU函数更优越的性能；上述改进型函数均主要针对ReLU负半轴的零常函数做出改动，进而有效的弥补了ReLU函数的缺陷，但它们同样存在新的问题，如延长了训练时间、提高了训练难度等。

随后，人们想到通过合并多个函数优点的思路来构建新的激活函数。如石琪^[7]等提出的组合激活函数ReLU-Softplus，王红霞^[8]等提出的ReLU-Softsign激活函数均获得了比单一激活函数更好的效果，组合函数采取优势互补的思路实现了更好的性能提升，但在实际应用中，组合函数同样存在一些新的问题，如ReLU-Softplus对学习率的设置要求严格，ReLU-Softsign在速度和精度方面仍有待提升。

综上，ReLU函数具有良好的稀疏性和运算效率，但本身存在神经元“坏死”问题；LeakyReLU等改进函数通过在负半轴引入非零函数改善了ReLU的缺陷，却给自身带来效率慢的问题；组合函数虽然把不同激活函数优点合并在一起，但又产生了训练困难的新问题。因此，对当前CNN网络来说设计一个既能解决以往激活函数存在的梯度弥散、神经元“坏死”等缺陷，又能加速训练、提升性能且不增加训练难度的高效激活函数是非常必要的。

4. 结论

文中通过分析激活函数在网络训练过程中的作用提出包括线性型、幂数型、分式型及自然对数型在内的4种改进ReLU激活函数的方案，实验表明当在负半轴引入自然对数函数时可以更好的提升网络性能，由此确定EReLU函数的数学模型。随后在ResNet18和VGG16网络中使用EReLU、ReLU、LerkyReLU、PReLU、ELU及ReLU-softsign共6种激活函数分别在CIFAR10、CIFAR100和Fer2013数据集上进行测试，结果表明EReLU相较于其他函数精度提升0.12%~6.61%，效率提升1.02%~6.52%，由此可见，EReLU函数在提升轻量型网络的精度和效率方面相较于其它激活函数更具竞争力。

在未来的工作中，将EReLU函数应用到大型神经网络或其他领域的数据集进行测试，以进一步验证EReLU函数的优越性和适用性。

Reference (14)

[1]	Hassell M P, Lawton J H, Beddington J R. Sigmoid functional responses by invertebrate predators and parasitoids [J]. The Journal of Animal Ecology, 1977: 249-262.
[2]	Kalman B L, Kwasny S C. Why tanh: Choosing a sigmoidal function[C]//Proceedings 1992IJCNN International Joint Conference on Neural Networks. IEEE, 1992, 4: 578-581.
[3]	Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90.
[4]	Dubey A K, Jain V. Comparative study of convolution neural network’s relu and leaky-relu activation functions[M]//Applications of Computing, Automation and Wireless Systems in Electrical Engineering, 2019: 873-880.
[5]	He K, Zhang X, Ren S, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1026-1034.
[6]	Clevert D A, Unterthiner T, Hochreiter S. Fast and accurate deep network learning by exponential linear units (elus) [J]. arXiv preprint arXiv, 2015: 1511.07289.
[7]	Shi Qi. Research and verification of image classification optimization algorithm based on convolutional neural network[D]. Beijing: Beijing Jiaotong University, 2017. (in Chinese)
[8]	Wang Hongxia, Zhou Jiaqi, Gu Chenghao, et al. Design of activation functions in convolutional neural networks for image classification [J]. Journal of Zhejiang University (Engineering Science), 2019, 53(7): 1363-1373. (in Chinese)
[9]	Zhang X, Zou Y, Shi W. Dilated convolution neural network with LeakyReLU for environmental sound classification[C]//2017 22nd International Conference on Digital Signal Processing (DSP). IEEE, 2017: 1-5.
[10]	Xu L, Choy C, Li Y W. Deep sparse rectifier neural networks for speech denoising[C]//2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC). IEEE, 2016: 1-5.
[11]	Liu Jun. Analysis of algorithm design and algorithm time complexity [J]. Computer Knowledge and Technology, 2008, 2(14): 878-879. (in Chinese)
[12]	Liu X, Zhang Y, Bao F. Kernel-blending connection approximated by a neural network for image classification [J]. Computational Visual Media, 2020, 6(4): 467-476.
[13]	Carrier P L, Courville A, Mirza M, et al. Challenges in representation learning: A report on three machine learning contests[C]//International Conference on Neural Information Processing, 2013: 117-124.
[14]	Wang H, Zhou J, Gu C, et al. Design of activation function in CNN for image classification [J]. Journal of ZheJiang University (Engineering Science), 2019, 53(7): 1363-1373.

Function	Function model
f₁	${f_1}(x) = \left\{ {\begin{array}{*{20}{c}} {\;\;x\;,x \geqslant 0} \\ { - x,x < 0} \end{array}} \right.$
f₂	${f_2}(x) = \left\{ {\begin{array}{*{20}{c} } {\quad \;\;x\;\;\;\;\;\,\;,x \geqslant 0} \\ { - \dfrac{2}{3}{ {( - x)}^{\frac{3}{2} } },x < 0} \end{array} } \right.$
f₃	${f_3}(x) = \left\{ {\begin{array}{*{20}{c} } {\quad x\;\;,x \geqslant 0} \\ {\dfrac{x}{ {1 - x} },x < 0} \end{array} } \right.$
f₄	${f_4}(x) = \left\{ {\begin{array}{*{20}{c}} {\;\;\;{\kern 1pt} {\kern 1pt} {\kern 1pt} x\quad ,x \geqslant 0} \\ { - {{\ln }^{1 - x}},x < 0} \end{array}} \right.$

Derived function	Function model
f₁’	${f_1}^\prime (x) = \left\{ {\begin{array}{*{20}{c}} {\;\,1\;,x \geqslant 0} \\ { - 1,x < 0} \end{array}} \right.$
f₂’	${f_2}^\prime (x) = \left\{ {\begin{array}{*{20}{c}} {\;\quad \;\;1\quad \;,x \geqslant 0} \\ { - \sqrt {( - x)} ,x < 0} \end{array}} \right.$
f₃’	${f_3}^\prime (x) = \left\{ {\begin{array}{{20}{c} } 1 \\ {\dfrac{1}{ { { {(1 - x)}^2} } } } \end{array} } \right.\begin{array}{{20}{c} } {,x \geqslant 0} \\ {,x < 0} \end{array}$
f₄’	${f_4}^\prime (x) = \left\{ {\begin{array}{*{20}{c} } {\;\;\;1\;\;{\kern 1pt} \;,x \geqslant 0} \\ {\dfrac{1}{ {1 - x} },x < 0} \end{array} } \right.$

Results Methods	Datasets
	CIFAR10		CIFAR100
	ACC	T/h	ACC	T/h
f₁	93.11%	1.332	74.82%	1.332
f₂	93.03%	1.335	74.27%	1.335
f₃	93.66%	1.290	75.23%	1.290
f₄	93.78%	1.262	75.87%	1.262
ReLU	92.90%	1.325	73.68%	1.325

Results Methods	Datasets
	CIFAR10		CIFAR100
	ACC	T/h	ACC	T/h
f₁	91.31%	1.225	58.91%	1.225
f₂	91.24%	1.248	58.35%	1.248
f₃	91.86%	1.243	59.23%	1.243
f₄	91.98%	1.175	59.95%	1.175
ReLU	91.15%	1.238	56.24%	1.238

High efficient activation function design for CNN model image classification task

doi: 10.3788/IRLA20210253

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views