GLM-130B: An Open Bilingual Pre-Trained Model
Introducing GLM-130B: the open source language model that outperforms GPT-3 175B and Google's PALM 540B on critical benchmarks.
There is a new open source language model that seems to have mostly gone under the radar. GLM-130B is a bilingual (English and Chinese) model that has 130 billion parameters and was pre-trained using the General Language Model (GLM) algorithm. It has been trained on over 400 billion text tokens (200 billion each for English and Chinese), and has some impressive capabilities.
First and foremost, GLM-130B outperforms several other state-of-the-art models on critical benchmarks. In English, it performs better than GPT-3 175B (+4.0%), OPT-175B (+5.5%), and BLOOM-176B (+13.0%) on LAMBADA, and slightly better than GPT-3 175B (+0.9%) on MMLU. In Chinese, it significantly outperforms ERNIE TITAN 3.0 260B on 7 zero-shot CLUE datasets (+24.26%) and 5 zero-shot FewCLUE datasets (+12.75%). These results show that GLM-130B is able to achieve strong performance on a variety of tasks in both languages.
Another key feature of GLM-130B is its fast inference speed. It is able to support fast inference on both SAT and FasterTransformer, with a single A100 server able to achieve up to 2.5X faster inference compared to other models. This makes it a practical choice for users who need to run large-scale language processing tasks in real-time.
But perhaps the most exciting aspect of GLM-130B is that it is open sourced. This means that anyone can access the model weights and code, and run the model on their own machine for free. This is a major advantage over proprietary models like GPT-3 and PALM 540B, which can only be accessed by paying customers. This open source nature also makes it easy for researchers and developers to build upon and improve the model, further advancing the field of NLP.
In addition to its strong performance and fast inference speed, GLM-130B is also reproducible and cross-platform compatible. All of the results from the model's evaluations can be easily reproduced using the open-sourced code and model checkpoints, and it is able to support training and inference on a variety of platforms including NVIDIA, Hygon DCU, Ascend 910, and Sunway.
Overall, GLM-130B is a promising new development in the NLP community. Its strong performance, fast inference speed, open source nature, and cross-platform compatibility make it a valuable resource for researchers, developers, and anyone interested in working with large-scale language processing tasks. If you are interested in trying out GLM-130B for yourself, the model weights and code are available at the following URL: https://github.com/THUDM/GLM-130B.
Links:
https://arxiv.org/abs/2210.02414
https://github.com/THUDM/GLM-130B/blob/main/LICENSE
It seems like it was authored by Tsinghua University and a Chinese startup. The data it was trained on was probably heavily censored