princeton-nlp/Llama-3-Instruct-8B-CPO

The princeton-nlp/Llama-3-Instruct-8B-CPO is an 8 billion parameter language model developed by princeton-nlp, based on the Llama-3 architecture. This model is specifically fine-tuned using the SimPO (Simple Preference Optimization with a Reference-Free Reward) method, as detailed in their research preprint. It is designed for instruction-following tasks, leveraging its unique optimization approach to enhance performance.

Warm
Public
8B
FP8
8192
Hugging Face