#mixer #google #imagenet
Convolutional Neural Networks have dominated computer vision for nearly 10 years, and that might finally come to an end. First, Vision Transformers (ViT) have shown remarkable performance, and now even simple MLP-based models reach competitive accuracy, as long as sufficient data is used for pre-training. This paper presents MLP-Mixer, using MLPs in a particular weight-sharing arrangement to achieve a competitive, high-throughput model and it raises some interesting questions about the nature of learning and inductive biases and their interaction with scale for future research.
OUTLINE:
0:00 - Intro & Overview
2:20 - MLP-Mixer Architecture
13:20 - Experimental Results
17:30 - Effects of Scale
24:30 - Learned Weights Visualization
27:25 - Comments & Conclusion
Paper:
Abstract:
Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In
23 views
36
7
4 years ago 00:28:12 4
MLP-Mixer: An all-MLP Architecture for Vision (Machine Learning Research Paper Explained)
4 years ago 00:05:22 2
[DeepReader] MLP-Mixer: An all-MLP Architecture for Vision