OpenRLHF/Llama-3-8b-rlhf-100k

OpenRLHF's Llama-3-8b-rlhf-100k is an 8 billion parameter Llama 3 model fine-tuned using Reinforcement Learning from Human Feedback (RLHF) for 100,000 samples. This model builds upon a Llama-3-8b-sft base and a Llama-3-8b-rm reward model, demonstrating improved conversational performance over its SFT base. It is optimized for generating more aligned and helpful responses in chat-based applications.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 8192

Hugging Face

No reviews yet. Be the first to review!