BERT for Video

Can we learn joint representations between vision and language? We can! This video explains VideoBERT, an algorithm that exploit the ideas of BERT to learn such deep representations. The learned vision-language model can be used to solve many downstream tasks. Enjoy the video! Papers cited in the video: - VideoBERT: A Joint Model for Video and Language Representation Learning #VideoBERT #languagemodels #BERT #neuralnetworks #deeplearning #nlp #computervision
Back to Top