Qwen/Qwen3-8B
Qwen3-8B is an 8.2 billion parameter causal language model developed by Qwen, featuring a native context length of 32,768 tokens, extendable to 131,072 tokens with YaRN. This model uniquely supports seamless switching between a 'thinking mode' for complex reasoning, math, and coding, and a 'non-thinking mode' for efficient general dialogue. It demonstrates enhanced reasoning capabilities, superior human preference alignment for creative tasks, and strong agentic abilities with multilingual support for over 100 languages.