unsloth/DeepSeek-R1-0528-Qwen3-8B

DeepSeek-R1-0528-Qwen3-8B is an 8 billion parameter language model developed by DeepSeek AI, based on the Qwen3 architecture with a 32768 token context length. This model is distilled from the DeepSeek-R1-0528 model's chain-of-thought, significantly enhancing its reasoning capabilities, particularly in mathematics and programming. It achieves state-of-the-art performance among open-source models on benchmarks like AIME 2024, making it suitable for complex reasoning tasks and code generation.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 32768

License: mit

Hugging Face