RLHFlow/LLaMA3-SFT-v2

RLHFlow/LLaMA3-SFT-v2 is an 8 billion parameter instruction-tuned causal language model developed by RLHFlow, based on Meta-Llama-3-8B. This model is a supervised fine-tuning (SFT) checkpoint, specifically designed for use within the RLHFlow/Online-RLHF project. It demonstrates strong performance across academic benchmarks, particularly excelling in mathematical reasoning (GSM8K, MATH) and code generation (HumanEval) compared to its base model.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 8192

Hugging Face