Knowledge Distillation - Keras Code Examples

This Keras Code Examples show you how to implement Knowledge Distillation! Knowledge Distillation has lead to new advances in compression, training state of the art models, and stabilizing Transformers for Computer Vision. All you need to do to build on this is swap out the Teacher and Student architectures. I think the example of how to overwrite and integrate two loss functions controlled with an alpha hyperparameter weighting is very useful as well. Content Links Knowledge Distillation (Keras Code Examples): DistilBERT: Self-Training with Noisy Student: Data-efficient Image Transformers: KL Divergence: –Leibler_divergence 0:00 Beginning 0:44 Motivation, Success Stories 2:47 Custom 11:18 Teacher
Back to Top