Qwen/Qwen2.5-3B

Qwen2.5-3B is a 3.09 billion parameter causal language model developed by Qwen, featuring a transformer architecture with RoPE, SwiGLU, and RMSNorm. This base model offers a 32,768 token context length and significantly improved capabilities in coding, mathematics, instruction following, and generating structured outputs like JSON. It also provides robust multilingual support for over 29 languages, making it suitable for further fine-tuning for specialized applications.

Warm

Public

Model Size: 3.1B

Quant: BF16

Ctx length: 32768

License: qwen-research

Hugging Face