A Typed Named Tensor Notation for Deep Learning
Abstract
We propose a type discipline for deep learning architectures in which every tensor axiscarries both a name and a sort, and in which the legal operations on an axis are determinedby its sort. Our central thesis is prescriptive: the most common silent bugs in deep learningcode averaging across a batch axis instead of a feature axis, dotting along a token axisinstead of a key axis, applying BatchNorm to a non-i.i.d. contrastive batch are sort errors,and a suciently expressive sort algebra promotes them from runtime nuisances to surface-level syntax errors. The proposal extends the named-tensor framework of [CRB23] with analgebra of axis sorts inspired by the type discipline implicit in Einstein summation notationin general relativity, where a Lorentz vector index and a spinor index cannot contract notbecause such a contraction is wrong, but because the notation makes it unwriteable. We giveaxioms for the sort algebra; derive the legality table for contraction, broadcast, reduction,permutation, and concatenation; introduce a canonical attention glyph A that exposes selfvs. cross attention, masking, multi-query, grouped-query, and rotary position embeddingas typed specializations of a single operator; dene residual and stack combinators thatmake the residual stream a typed invariant of the architecture, formalizing a folk theoremof mechanistic interpretability; and exhibit, as a worked example, the type checker catchingthe BatchNorm-information-leak bug documented by [He+20] in contrastive self-supervisedpretraining. We stress-test the resulting framework on Transformer, Vision Transformer, U-Net, Mamba, and Mixture-of-Experts architectures, and close with open problems in typeddierentiation, stochastic axes, dependent shape types, and distributed-training sorts.
We propose a type discipline for deep learning architectures in which every tensor axiscarries both a name and a sort, and in which the legal operations on an axis are determinedby its sort. Our central thesis is prescriptive: the most common silent bugs in deep learningcode averaging across a batch axis instead of a feature axis, dotting along a token axisinstead of a key axis, applying BatchNorm to a non-i.i.d. contrastive batch are sort errors,and a suciently expressive sort algebra promotes them from runtime nuisances to surface-level syntax errors. The proposal extends the named-tensor framework of [CRB23] with analgebra of axis sorts inspired by the type discipline implicit in Einstein summation notationin general relativity, where a Lorentz vector index and a spinor index cannot contract notbecause such a contraction is wrong, but because the notation makes it unwriteable. We giveaxioms for the sort algebra; derive the legality table for contraction, broadcast, reduction,permutation, and concatenation; introduce a canonical attention glyph A that exposes selfvs. cross attention, masking, multi-query, grouped-query, and rotary position embeddingas typed specializations of a single operator; dene residual and stack combinators thatmake the residual stream a typed invariant of the architecture, formalizing a folk theoremof mechanistic interpretability; and exhibit, as a worked example, the type checker catchingthe BatchNorm-information-leak bug documented by [He+20] in contrastive self-supervisedpretraining. We stress-test the resulting framework on Transformer, Vision Transformer, U-Net, Mamba, and Mixture-of-Experts architectures, and close with open problems in typeddierentiation, stochastic axes, dependent shape types, and distributed-training sorts.
Cite this paper
Sepulveda-Jimenez, Alfredo (2026). A Typed Named Tensor Notation for Deep Learning. Zenodo.