Swish function

The swish function is a family of mathematical function defined as follows:

The swish function
[1]

where can be constant (usually set to 1) or trainable and "sigmoid" refers to the logistic function.

The swish family was designed to smoothly interpolate between a linear function and the Rectified linear unit (ReLU) function.

When considering positive values, Swish is a particular case of doubly parameterized sigmoid shrinkage function defined in [2]: Eq 3 . Variants of the swish function include Mish.[3]

Special values

For β = 0, the function is linear: f(x) = x/2.

For β = 1, the function is the Sigmoid Linear Unit (SiLU).

For β = 1.702, the function approximates GeLU.[4]

With β → ∞, the function converges to ReLU.

Thus, the swish family smoothly interpolates between a linear function and the ReLU function.[1]

Since , all instances of swish have the same shape as the default , zoomed by . One usually sets . When is trainable, this constraint can be enforced by , where is trainable.

Derivatives

Because , it suffices to calculate its derivatives for the default case.so is odd.so is even.

History

SiLU was first proposed alongside the GELU in 2016,[4] then again proposed in 2017 as the Sigmoid-weighted Linear Unit (SiL) in reinforcement learning.[5][1] The SiLU/SiL was then again proposed as the SWISH over a year after its initial discovery, originally proposed without the learnable parameter β, so that β implicitly equaled 1. The swish paper was then updated to propose the activation with the learnable parameter β.

In 2017, after performing analysis on ImageNet data, researchers from Google indicated that using this function as an activation function in artificial neural networks improves the performance, compared to ReLU and sigmoid functions.[1] It is believed that one reason for the improvement is that the swish function helps alleviate the vanishing gradient problem during backpropagation.[6]

See also

References

  1. ^ a b c d Ramachandran, Prajit; Zoph, Barret; Le, Quoc V. (2017-10-27). "Searching for Activation Functions". arXiv:1710.05941v2 [cs.NE].
  2. ^ Atto, Abdourrahmane M.; Pastor, Dominique; Mercier, Gregoire (March 2008). "Smooth sigmoid wavelet shrinkage for non-parametric estimation". 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (PDF). pp. 3265–3268. doi:10.1109/ICASSP.2008.4518347. ISBN 978-1-4244-1483-3. S2CID 9959057.
  3. ^ Misra, Diganta (2019). "Mish: A Self Regularized Non-Monotonic Neural Activation Function". arXiv:1908.08681 [cs.LG].
  4. ^ a b Hendrycks, Dan; Gimpel, Kevin (2016). "Gaussian Error Linear Units (GELUs)". arXiv:1606.08415 [cs.LG].
  5. ^ Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji (2017-11-02). "Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning". arXiv:1702.03118v3 [cs.LG].
  6. ^ Serengil, Sefik Ilkin (2018-08-21). "Swish as Neural Networks Activation Function". Machine Learning, Math. Archived from the original on 2020-06-18. Retrieved 2020-06-18.

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.