Models & ArchitectureLong Short-Term Memory

LSTM

Long Short-Term Memory — a type of recurrent neural network designed to learn long-range dependencies in sequential data.

LSTM networks, introduced in 1997 by Hochreiter and Schmidhuber, solved the vanishing gradient problem that plagued standard RNNs. Their gated memory cells allow them to remember important information over many time steps and forget irrelevant details.

An LSTM cell uses three gates (input, forget, output) to control what information flows in, stays, and flows out. This design made LSTMs the dominant architecture for sequence tasks throughout the 2010s.

Why LSTMs mattered: they were the first RNN variant that could reliably handle sequences of hundreds or thousands of steps.

LSTMs powered machine translation (Google Translate pre-2016), speech recognition, and early language models. They've been largely displaced by transformers, but remain useful in specific low-latency or memory-constrained applications.

Related Terms

← Back to Glossary