Serving BERT Models in Production with TorchServe | PyData Global 2021

Serving BERT Models in Production with TorchServe Speakers: Adway Dhillon, Nidhin Pattaniyil Summary This talk is for a data scientist or ML engineer looking to serve their PyTorch models in production. It will cover post training steps that should be taken to optimize the model such as quantization and torch script. It will also walk the user in packaging and serving the model through Facebook’s TorchServe. Description Intro (10 mins). - Introduce the deep learning BERT model. - Walk over the notebooks on Google Collab Setup. - Show the end model served along with sample inference. Review Some Deep Learning Concepts (10 mins) - Review sample trained PyTorch model code - Review sample model transformer architecture - Tokenization / pre and post processing Optimizing the model (30 mins) - Two modes of PyTorch: eager vs script mode - Benefits of script mode and PyTorch JIT - Post training optimization methods: static and dynamic quantization, distil

1 view