GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Paper Explained)
Google builds a 600 billion parameter transformer to do massively multilingual, massive machine translation. Interestingly, the larger model scale ...
1 view
52
19
4 years ago
01:13:04
1
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Paper Explained)
Back to Top