Activation Function
A mathematical function applied inside neural networks that introduces nonlinearity and lets models learn complex patterns.
An activation function is applied to a neuron's output before it is passed to the next layer. Its job is to introduce nonlinearity, which is what lets neural networks learn complex relationships rather than behaving like a simple linear equation.
Without activation functions, stacking more layers would not make a model meaningfully more powerful. With them, neural networks can learn decision boundaries, represent highly nonlinear patterns, and model language, vision, and audio at impressive scale.
Common Activation Functions
- ReLU — fast and simple; standard in many deep networks
- GELU — widely used in transformers and LLMs
- Sigmoid — common in probability outputs, less common in deep hidden layers
- Tanh — older but still useful in some architectures
The choice of activation function affects training stability, speed, and final performance. Modern transformer-based models often rely on GELU or similar variants because they work well at large scale.