TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-8B-R

SFR-Iterative-DPO-LLaMA-3-8B-R is an 8 billion parameter instruct model developed by Salesforce, based on the LLaMA-3 architecture with an 8192 token context length. It utilizes an iterative DPO-based online RLHF training method, enabling it to outperform models of similar size and many larger open-source and proprietary models on instruct benchmarks like Alpaca-Eval-V2, MT-Bench, and Chat-Arena-Hard. This model is optimized for instruction following and general conversational AI tasks, achieving strong performance without relying on additional human or GPT-4 labeling.

Warm
Public
8B
FP8
8192
Hugging Face

No reviews yet. Be the first to review!