ertghiu256/Qwen-3-1.7b-deepseek-r1-0528-distillation

The ertghiu256/Qwen-3-1.7b-deepseek-r1-0528-distillation is a 2 billion parameter language model based on the Qwen 3 architecture. This model has been specifically trained using a distillation process on the DeepSeek R1 0528 dataset. It features an extended context length of 40960 tokens, making it suitable for tasks requiring extensive contextual understanding.

Warm
Public
2B
BF16
32768
Hugging Face