Activation Function

A mathematical function applied inside neural networks that introduces nonlinearity and lets models learn complex patterns.

An activation function is applied to a neuron's output before it is passed to the next layer. Its job is to introduce nonlinearity, which is what lets neural networks learn complex relationships rather than behaving like a simple linear equation.

Without activation functions, stacking more layers would not make a model meaningfully more powerful. With them, neural networks can learn decision boundaries, represent highly nonlinear patterns, and model language, vision, and audio at impressive scale.

Why it matters: Activation functions are a small mathematical detail with huge consequences. They are part of what makes deep learning work at all.

Common Activation Functions

ReLU — fast and simple; standard in many deep networks
GELU — widely used in transformers and LLMs
Sigmoid — common in probability outputs, less common in deep hidden layers
Tanh — older but still useful in some architectures

The choice of activation function affects training stability, speed, and final performance. Modern transformer-based models often rely on GELU or similar variants because they work well at large scale.

Related Terms

← Back to Glossary