The Living Deep Learning Book¶

A community-maintained reference for modern deep learning.

What this is¶

This started as a master's student looking for an updated reading on Transformers, vision Transformers, and modern pretraining — and finding that the recommended textbooks were essentially a museum tour of 2021. The field moved on. This book tries to keep up. It is open to issues, pull requests, and corrections.

The first cut covers what a working researcher needs to read papers from 2024–2026 without backtracking through a decade of legacy machinery: rotary position encodings, RMSNorm and SwiGLU, grouped-query attention and FlashAttention, vision Transformers and modern self-supervision, decoder-only LLM internals, post-Chinchilla scaling, mixture-of-experts, and the alignment ladder from supervised fine-tuning through DPO and GRPO.

How to read¶

Chapters are ordered for sequential reading but designed to stand alone. If you are comfortable with the original Attention Is All You Need architecture, Chapter 1 is a refresher you can skim; the modern-component chapters (2–4) and onward are where this book diverges from older texts. Each chapter renders math through KaTeX, code through Pygments, and uses admonitions for asides. A language switcher in the header toggles between English and Brazilian Portuguese editions.

Status¶

This page is part of the workspace bootstrapped by /book-skill:init. Chapters land here as they pass the BookSkill review pipeline. Until the first chapter is reviewed, the navigation will look sparse.