unsloth/DeepSeek-R1-Distill-Llama-8B

The DeepSeek-R1-Distill-Llama-8B is an 8 billion parameter language model developed by DeepSeek AI, distilled from the larger DeepSeek-R1 model and based on Llama-3.1-8B. It specializes in reasoning tasks, leveraging patterns learned from a 671B parameter model to achieve strong performance in math, code, and general reasoning. This model offers a 32,768 token context length and is designed to bring advanced reasoning capabilities to a smaller, more efficient architecture.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 32768

License: llama3.1

Hugging Face

No reviews yet. Be the first to review!