ertghiu256/Qwen-3-1.7b-deepseek-r1-0528-distillation
The ertghiu256/Qwen-3-1.7b-deepseek-r1-0528-distillation is a 2 billion parameter language model based on the Qwen 3 architecture. This model has been specifically trained using a distillation process on the DeepSeek R1 0528 dataset. It features an extended context length of 40960 tokens, making it suitable for tasks requiring extensive contextual understanding.
No reviews yet. Be the first to review!