Data Selection for Data-Centric AI - Cody Coleman | Stanford MLSys #53
Episode 53 of the Stanford MLSys Seminar Series!
Data selection for Data-Centric AI: Data Quality Over Quantity
Speaker: Cody Coleman
Abstract:
Data selection methods, such as active learning and core-set selection, improve the data efficiency of machine learning by identifying the most informative data points to label or train on. Across the data selection literature, there are many ways to identify these training examples. However, classical data selection methods are prohibitively expensive to apply in deep learning because of the larger datasets and models. This talk will describe two techniques to make data selection methods more tractable. First, “selection via proxy“ (SVP) avoids expensive training and reduces the computation per example by using smaller proxy models to quantify the informativeness of each example. Second, “similarity search for efficient active learning and search“ (SEALS) reduces the number of examples processed by restricting the candidate pool for labeling to the nearest neighbor
4 views
10
1
1 month ago 00:07:44 1
How to Use AI Tools to Make Money Online – Easy and Effective!
3 months ago 00:26:46 1
Endless OS | Best Linux Distro That’s Chock Full of Apps