deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

DeepSeek-R1-Distill-Qwen-32B is a 32.8 billion parameter language model developed by DeepSeek-AI, distilled from the larger DeepSeek-R1 model and based on the Qwen2.5 architecture. It is specifically fine-tuned using reasoning data generated by DeepSeek-R1, excelling in complex reasoning, mathematical, and coding tasks with a context length of 131072 tokens. This model demonstrates strong performance across various benchmarks, often outperforming larger models in its class due to its specialized distillation process.

Warm
Public
32.8B
FP8
131072
License: mit
Hugging Face

No reviews yet. Be the first to review!