Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (AI Paper Explained)

#transformer #nystromer #nystromformer The Nyströmformer (or Nystromformer, Nyströmer, Nystromer), is a new drop-in replacement for approximating the Self-Attention matrix in Transformers with linear memory and time requirements. Most importantly, it uses the Nystrom-Method to subselect (or segment mean) queries and keys as so-called landmarks and uses those to reconstruct the inherently low-rank attention matrix. This is relevant for many areas of Machine Learning, especially Natural Language processing, where it enables longer sequences of text to be processed at once. OUTLINE: 0:00 - Intro & Overview 2:30 - The Quadratic Memory Bottleneck in Self-Attention 7:20 - The Softmax Operation in Attention 11:15 - Nyström-Approximation 14:00 - Getting Around the Softmax Problem 18:05 - Intuition for Landmark Method 28:05 - Full Algorithm 30:20 - Theoretical Guarantees 35:55 - Avoiding the Large Attention Matrix 36:55 - Subsampling Keys vs Negative Sampl
Back to Top