unsloth/DeepSeek-R1-Distill-Llama-8B

The DeepSeek-R1-Distill-Llama-8B is an 8 billion parameter language model developed by DeepSeek AI, distilled from the larger DeepSeek-R1 model and based on Llama-3.1-8B. It specializes in reasoning tasks, leveraging patterns learned from a 671B parameter model to achieve strong performance in math, code, and general reasoning. This model offers a 32,768 token context length and is designed to bring advanced reasoning capabilities to a smaller, more efficient architecture.

Warm
Public
8B
FP8
32768
License: llama3.1
Hugging Face

No reviews yet. Be the first to review!