Tag: Transformers


  • The Thinking Machine – Part 3

    Self-attention changed the way we think about sequence modeling. In this third part of The Thinking Machine, we explore how each token learns to look at every other – and creating richer, context-aware representations. We break down the mechanics, reveal design choices like positional encoding, and prepare the ground for what came next: the Transformer.