princeton-nlp/Llama-3-Base-8B-SFT-ORPO

The princeton-nlp/Llama-3-Base-8B-SFT-ORPO is an 8 billion parameter language model based on the Llama 3 architecture, developed by princeton-nlp. This model is fine-tuned using the ORPO (Odds Ratio Preference Optimization) method, as detailed in the SimPO research. It is designed for preference optimization tasks, offering a reference-free reward approach for improved alignment.