albertfares/MNLP_SFT_DPO
albertfares/MNLP_SFT_DPO is a 0.8 billion parameter causal language model fine-tuned from Qwen/Qwen3-0.6B-Base. It utilizes filtered Direct Preference Optimization (fDPO) on the MNLP M3 DPO dataset, comprising approximately 69,000 samples. This model is specifically optimized for tasks benefiting from preference-based fine-tuning, offering enhanced security and faster loading through its SafeTensors format.