Qwen/Qwen2.5-7B
Qwen/Qwen2.5-7B is a 7.61 billion parameter causal language model developed by Qwen, featuring a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. This base model offers significantly improved knowledge, coding, and mathematics capabilities compared to its predecessor, Qwen2. It supports a long context length of 131,072 tokens and is designed for pretraining, serving as a foundation for further fine-tuning for specific applications.
No reviews yet. Be the first to review!