Language Modeling
GPT-Neo
An implementation of model & data-parallel autoregressive language models with Mesh TensorFlow for distributed TPUs.
GPT-NeoX
An implementation of 3D-parallel autoregressive language models for distributed GPUs.
Mesh Transformer JAX
An implementation of model & data-parallel autoregressive language models with JAX and Haiku for distributed TPUs.
OpenWebText2
An enhanced version of OpenWebTextCorpus.
The Pile
A large, diverse, open-source language modeling dataset.