mlxha/Qwen3-4B-grpo-medmcqa

The mlxha/Qwen3-4B-grpo-medmcqa model is a 4 billion parameter language model based on the Qwen/Qwen3-4B architecture, fine-tuned by mlxha. It was trained using the GRPO method on the medmcqa-grpo dataset, specializing it for medical multiple-choice question answering. This model leverages advanced reinforcement learning techniques to enhance its reasoning capabilities, particularly in specialized domains.

Warm

Public

Model Size: 4B

Quant: BF16

Ctx length: 32768

Hugging Face