Qwen/Qwen2.5-3B

Qwen2.5-3B is a 3.09 billion parameter causal language model developed by Qwen, featuring a transformer architecture with RoPE, SwiGLU, and RMSNorm. This base model offers a 32,768 token context length and significantly improved capabilities in coding, mathematics, instruction following, and generating structured outputs like JSON. It also provides robust multilingual support for over 29 languages, making it suitable for further fine-tuning for specialized applications.

Warm
Public
3.1B
BF16
32768
License: qwen-research
Hugging Face