princeton-nlp/Llama-3-Instruct-8B-CPO

The princeton-nlp/Llama-3-Instruct-8B-CPO is an 8 billion parameter language model developed by princeton-nlp, based on the Llama-3 architecture. This model is specifically fine-tuned using the SimPO (Simple Preference Optimization with a Reference-Free Reward) method, as detailed in their research preprint. It is designed for instruction-following tasks, leveraging its unique optimization approach to enhance performance.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 8192

Hugging Face