EleutherAI
Home
About
Research
FAQ
Blog
Dataset
OpenWebText2
An enhanced version of OpenWebTextCorpus.
The Pile
A large, diverse, open-source language modeling dataset.