princeton-nlp/Llama-3-Base-8B-SFT-DPO

The princeton-nlp/Llama-3-Base-8B-SFT-DPO is an 8 billion parameter Llama-3-based language model developed by Princeton NLP, fine-tuned using the SimPO (Simple Preference Optimization with a Reference-Free Reward) method. This model is specifically optimized for preference alignment without requiring a reference reward model, making it suitable for tasks benefiting from direct preference optimization. It offers an 8192-token context window and is derived from research detailed in the SimPO preprint.

Warm
Public
8B
FP8
8192
Hugging Face

No reviews yet. Be the first to review!