RLHFlow/LLaMA3-SFT-v2

RLHFlow/LLaMA3-SFT-v2 is an 8 billion parameter instruction-tuned causal language model developed by RLHFlow, based on Meta-Llama-3-8B. This model is a supervised fine-tuning (SFT) checkpoint, specifically designed for use within the RLHFlow/Online-RLHF project. It demonstrates strong performance across academic benchmarks, particularly excelling in mathematical reasoning (GSM8K, MATH) and code generation (HumanEval) compared to its base model.

Warm
Public
8B
FP8
8192
Hugging Face