princeton-nlp/Llama-3-Base-8B-SFT-DPO
The princeton-nlp/Llama-3-Base-8B-SFT-DPO is an 8 billion parameter Llama-3-based language model developed by Princeton NLP, fine-tuned using the SimPO (Simple Preference Optimization with a Reference-Free Reward) method. This model is specifically optimized for preference alignment without requiring a reference reward model, making it suitable for tasks benefiting from direct preference optimization. It offers an 8192-token context window and is derived from research detailed in the SimPO preprint.
No reviews yet. Be the first to review!