Qwen/Qwen2.5-7B

Qwen/Qwen2.5-7B is a 7.61 billion parameter causal language model developed by Qwen, featuring a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. This base model offers significantly improved knowledge, coding, and mathematics capabilities compared to its predecessor, Qwen2. It supports a long context length of 131,072 tokens and is designed for pretraining, serving as a foundation for further fine-tuning for specific applications.

Warm
Public
7.6B
FP8
131072
License: apache-2.0
Hugging Face