RLHFlow/LLaMA3-SFT-v2
RLHFlow/LLaMA3-SFT-v2 is an 8 billion parameter instruction-tuned causal language model developed by RLHFlow, based on Meta-Llama-3-8B. This model is a supervised fine-tuning (SFT) checkpoint, specifically designed for use within the RLHFlow/Online-RLHF project. It demonstrates strong performance across academic benchmarks, particularly excelling in mathematical reasoning (GSM8K, MATH) and code generation (HumanEval) compared to its base model.