Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Explanation of the paper Mamba: Linear-Time Sequence Modeling with Selective State Spaces In this video I will be explaining Mamba, a new sequence modeling architecture that can compete with the Transformer. I will first start by introducing the various sequence modeling architectures (RNN, CNN and Transformer) and then deep dive into State Space Models. To fully understand State Space Models, we need to have some background in differential equations. That’s why, I will provide a brief introduction to differential equations (in 5 minutes!) and then proceed to derive the recurrent formula and the convolutional formula from first principles. I will also prove mathematically (with the help of visual diagrams) why State Space Models can be run as a convolution. I will explain what is the HIPPO matrix and how it can help the model “memorize“ the input history in a finite state. In the second part of the video, I will explore Mamba and in particular the Selective Scan algorithm, but

1 view

544

196