lastmass/Qwen3_Medical_GRPO
The lastmass/Qwen3_Medical_GRPO is a 4 billion parameter Qwen3-based language model developed by lastmass, fine-tuned specifically for the medical domain. It leverages multi-stage Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) with accuracy-based reward functions to enhance its medical knowledge, logical reasoning, and reliability. This model excels at understanding complex medical problems, providing detailed logical analysis, and delivering structured solutions in healthcare contexts.