Ben Edelman | Toward Demystifying Transformers and Attention
2/9/2022 | New Technologies in Mathematics Seminar
Speaker: Ben Edelman, Harvard Computer Science
Title: Toward Demystifying Transformers and Attention
Abstract: Over the past several years, attention mechanisms (primarily in the form of the Transformer architecture) have revolutionized deep learning, leading to advances in natural language processing, computer vision, code synthesis, protein structure prediction, and beyond. Attention has a remarkable ability to enable the learning of long-range dependencies in diverse modalities of data. And yet, there is at present limited principled understanding of the reasons for its success. In this talk, I’ll explain how attention mechanisms and Transformers work, and then I’ll share the results of a preliminary investigation into why they work so well. In particular, I’ll discuss an inductive bias of attention that we call sparse variable creation: bounded-norm Transformer layers are capable of representing sparse Boolean functions, with statist
31 view
29
4
1 year ago 00:03:32 1
Why is Ireland Divided? (Short Animated Documentary)
2 years ago 00:05:59 1
The Most Efficient Way to Destroy the Universe – False Vacuum
2 years ago 01:00:04 1
Joel-Peter Witkin Interview By Catherine Edelman 2018
2 years ago 00:15:30 2
Even this DOG was excited to see Aaron Judge go for his 62nd home run at Yankee Stadium
3 years ago 01:04:31 30
Ben Edelman | Toward Demystifying Transformers and Attention