princeton-nlp/Llama-3-Base-8B-SFT-KTO
princeton-nlp/Llama-3-Base-8B-SFT-KTO is an 8 billion parameter language model developed by princeton-nlp, based on the Llama-3 architecture. This model is specifically fine-tuned using the SimPO (Simple Preference Optimization) method, which utilizes a reference-free reward mechanism. It is designed for tasks benefiting from preference optimization without requiring explicit reference responses, offering a distinct approach to alignment.