berkeley-nest/Starling-LM-7B-alpha
Starling-LM-7B-alpha is a 7 billion parameter language model developed by Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, and Jiantao Jiao, fine-tuned from Openchat 3.5 (based on Mistral-7B-v0.1). This model is optimized using Reinforcement Learning from AI Feedback (RLAIF) and the advantage-induced policy alignment (APA) method, leveraging the GPT-4 labeled Nectar ranking dataset. It achieves an MT-Bench score of 8.09, outperforming many models in its class and excelling in helpfulness and harmlessness.