Transformer
A neural network architecture that lets every token compare itself with other tokens through attention.
- Intuition
- Think of a sentence as a table of word pieces. Attention lets each word piece ask which other pieces matter before the model updates its representation.
- Used for
- Used as the backbone for many language, coding, vision-language, retrieval, and generative systems.
- Watch for
- Attention can become expensive as context grows, and the model still needs training data, evaluation, and guardrails.
- Evaluate with
- Task quality, latency, memory use, context-length behavior, and robustness on examples outside the training distribution.