Qwen/Qwen2-72B

Qwen2-72B is a 72.7 billion parameter dense decoder-only Transformer language model developed by Qwen. It features SwiGLU activation, attention QKV bias, and group query attention, with an improved tokenizer for multiple natural languages and code. This base model demonstrates strong performance across language understanding, generation, multilingual tasks, coding, mathematics, and reasoning benchmarks, often surpassing other open-source models and competing with proprietary alternatives.

Warm

Public

Model Size: 72.7B

Quant: FP8

Ctx length: 32768

License: other

Hugging Face

No reviews yet. Be the first to review!