Deep Learning

A subset of machine learning using multi-layered neural networks to learn complex patterns from large datasets.

Deep learning is the engine behind most modern AI breakthroughs. It uses neural networks with many hidden layers (hence "deep") to learn hierarchical representations of data. Each layer learns increasingly abstract features — a vision model's early layers detect edges, middle layers detect shapes, and deep layers detect faces.

Deep learning requires large datasets and significant compute (GPUs/TPUs) to train effectively. But once trained, it can generalize to new inputs in ways that shallow models cannot. It powers image recognition, speech synthesis, language models, and drug discovery.

Why "deep"? The word refers to the number of layers in the network. A network with 2–3 hidden layers is shallow. Modern LLMs like GPT-4 have 96+ layers.

Key Deep Learning Architectures

CNNs — Convolutional Neural Networks for images and video
RNNs/LSTMs — Recurrent networks for sequential data (now largely replaced)
Transformers — dominant architecture for language and multimodal models
Diffusion Models — used for image and video generation

Training deep networks involves backpropagation — computing gradients layer by layer and adjusting model weights to minimize error. The transformer architecture (2017) is the foundation of virtually all modern large-scale deep learning models.

Related Terms

← Back to Glossary