Large Scale Distributed Deep Learning on Kubernetes Clusters - Yuan Tang & Yong Tang

Large Scale Distributed Deep Learning on Kubernetes Clusters - Yuan Tang, Ant Financial & Yong Tang, MobileIron The focus of this talk is the deployments of large scale distributed deep learning with Kubernetes. The usage of operators to manage and automate training processes for machine learning are discussed. We share our experiences and compare two open source Kubernetes operators, tf-operator and mpi-operator in this talk. Both operators manage training jobs for TensorFlow but they have different dist
Back to Top