ertghiu256/Qwen-3-1.7b-deepseek-r1-0528-distillation
The ertghiu256/Qwen-3-1.7b-deepseek-r1-0528-distillation is a 2 billion parameter language model based on the Qwen 3 architecture. This model has been specifically trained using a distillation process on the DeepSeek R1 0528 dataset. It features an extended context length of 40960 tokens, making it suitable for tasks requiring extensive contextual understanding.