Qwen/Qwen2-72B

Qwen2-72B is a 72.7 billion parameter dense decoder-only Transformer language model developed by Qwen. It features SwiGLU activation, attention QKV bias, and group query attention, with an improved tokenizer for multiple natural languages and code. This base model demonstrates strong performance across language understanding, generation, multilingual tasks, coding, mathematics, and reasoning benchmarks, often surpassing other open-source models and competing with proprietary alternatives.

Warm
Public
72.7B
FP8
131072
License: tongyi-qianwen
Hugging Face

No reviews yet. Be the first to review!