| # GLM-4-9B | |
| ## Model Introduction | |
| GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu | |
| AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, **GLM-4-9B** | |
| and its human preference-aligned version **GLM-4-9B-Chat** have shown superior performance beyond Llama-3-8B. In | |
| addition to multi-round conversations, GLM-4-9B-Chat also has advanced features such as web browsing, code execution, | |
| custom tool calls (Function Call), and long text | |
| reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 | |
| languages including Japanese, Korean, and German. We have also launched the **GLM-4-9B-Chat-1M** model that supports 1M | |
| context length (about 2 million Chinese characters) and the multimodal model GLM-4V-9B based on GLM-4-9B. | |
| **GLM-4V-9B** possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120. | |
| In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, | |
| text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to | |
| GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus. | |
| We evaluated the GLM-4-9B base model on some typical tasks, and the results are as follows: | |
| | Model | MMLU | C-Eval | GPQA | GSM8K | MATH | HumanEval | | |
| |:--------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:---------:| | |
| | Llama-3-8B | 66.6 | 51.2 | - | 45.8 | - | - | | |
| | Llama-3-8B-Instruct | 68.4 | 51.3 | 34.2 | 79.6 | 30.0 | 62.2 | | |
| | ChatGLM3-6B-Base | 61.4 | 69.0 | - | 72.3 | 25.7 | - | | |
| | GLM-4-9B | **74.7** | **77.1** | **34.3** | **84.0** | **30.4** | **70.1** | | |
| **This repository is the base version of GLM-4-9B, supporting 8K context length.** | |
| ## LICENSE | |
| The weights of the GLM-4 model are available under the terms of [LICENSE](LICENSE). | |
| ## Citations | |
| If you find our work useful, please consider citing the following paper. | |
| ``` | |
| @article{zeng2022glm, | |
| title={Glm-130b: An open bilingual pre-trained model}, | |
| author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others}, | |
| journal={arXiv preprint arXiv:2210.02414}, | |
| year={2022} | |
| } | |
| ``` | |
| ``` | |
| @inproceedings{du2022glm, | |
| title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling}, | |
| author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie}, | |
| booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, | |
| pages={320--335}, | |
| year={2022} | |
| } | |
| ``` | |