princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2
Llama-3-Instruct-8B-SimPO-v0.2 is an 8 billion parameter instruction-tuned language model developed by princeton-nlp, based on the Llama 3 architecture. This model is fine-tuned using the SimPO (Simple Preference Optimization) method, which utilizes a reference-free reward mechanism. It is designed for general instruction following tasks, leveraging its 8192 token context length for processing longer inputs.
No reviews yet. Be the first to review!