Qwen/Qwen2-0.5B
Qwen2-0.5B is a 0.5 billion parameter base language model from the Qwen2 series developed by Qwen. This Transformer-based model features SwiGLU activation, attention QKV bias, and group query attention, alongside an improved tokenizer for multiple natural languages and codes. It serves as a foundational model designed for further post-training applications like SFT, RLHF, or continued pretraining, rather than direct text generation.
No reviews yet. Be the first to review!