mlxha/Qwen3-4B-grpo-medmcqa

The mlxha/Qwen3-4B-grpo-medmcqa model is a 4 billion parameter language model based on the Qwen/Qwen3-4B architecture, fine-tuned by mlxha. It was trained using the GRPO method on the medmcqa-grpo dataset, specializing it for medical multiple-choice question answering. This model leverages advanced reinforcement learning techniques to enhance its reasoning capabilities, particularly in specialized domains.

Warm
Public
4B
BF16
32768
Hugging Face

No reviews yet. Be the first to review!