Yury Kashnitsky: Firing a cannon at sparrows: BERT vs. logreg

Data Fest Online 2020 Catalyst Workshop Track There is a Golden Rule in NLP, at least when it comes to classification tasks: “Always start with a tfidf-logreg baseline”. Elaborating a bit, that’s building a logistic regression model on top of tf-idf (term frequency - inverse document frequency) text representation. This typically works fairly well, is simple to deploy as opposed to neural net and that’s what already deployed and working day and night while you are struggling with fancy transformers. In this presentation, we will go through a couple of real-world text classification problems and speculate on the reasons to resort to BERT as opposed to good old tf-idf & logreg. Meanwhile, we will discuss a Catalyst text classification pipeline with HuggingFace. Register and get access to the tracks: Join the community:
Back to Top