Qwen/Qwen2-72B
Qwen2-72B is a 72.7 billion parameter dense decoder-only Transformer language model developed by Qwen. It features SwiGLU activation, attention QKV bias, and group query attention, with an improved tokenizer for multiple natural languages and code. This base model demonstrates strong performance across language understanding, generation, multilingual tasks, coding, mathematics, and reasoning benchmarks, often surpassing other open-source models and competing with proprietary alternatives.
No reviews yet. Be the first to review!