OpenRLHF/Llama-3-8b-rlhf-100k

OpenRLHF's Llama-3-8b-rlhf-100k is an 8 billion parameter Llama 3 model fine-tuned using Reinforcement Learning from Human Feedback (RLHF) for 100,000 samples. This model builds upon a Llama-3-8b-sft base and a Llama-3-8b-rm reward model, demonstrating improved conversational performance over its SFT base. It is optimized for generating more aligned and helpful responses in chat-based applications.

Warm
Public
8B
FP8
8192
Hugging Face

No reviews yet. Be the first to review!