princeton-nlp/Llama-3-Instruct-8B-ORPO

princeton-nlp/Llama-3-Instruct-8B-ORPO is an 8 billion parameter language model developed by princeton-nlp, based on the Llama-3 architecture with an 8192 token context length. This model is fine-tuned using the SimPO (Simple Preference Optimization) method, which is a reference-free reward approach for preference optimization. It is designed to demonstrate the effectiveness of SimPO as detailed in the associated research preprint.

Warm
Public
8B
FP8
8192
Hugging Face

No reviews yet. Be the first to review!