publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
- Semdedup: Data-efficient learning at web-scale through semantic deduplicationBest Paper Award @ The Multimodal Representation Learning Workshop, at ICLR 2023,
- Datacomp-lm: In search of the next generation of training sets for language modelsNeurIPS 2024,
- Effective pruning of web-scale datasets based on complexity of concept clustersICLR 2024 Oral Presentation @ DataComp Workshop at ICCV, 2023 ,
- Progress and limitations of deep networks to recognize objects in unusual posesAAAI 2023,
- Sieve: Multimodal dataset pruning using image captioning modelsICLR 2024,
- A comparison between humans and AI at recognizing objects in unusual posesTMLR 2024,