view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other • 5 days ago • 527
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 98
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Apr 7 • 132
SpaceByte: Towards Deleting Tokenization from Large Language Modeling Paper • 2404.14408 • Published Apr 22, 2024 • 7