lastmass/Qwen3_Medical_GRPO

The lastmass/Qwen3_Medical_GRPO is a 4 billion parameter Qwen3-based language model developed by lastmass, fine-tuned specifically for the medical domain. It leverages multi-stage Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) with accuracy-based reward functions to enhance its medical knowledge, logical reasoning, and reliability. This model excels at understanding complex medical problems, providing detailed logical analysis, and delivering structured solutions in healthcare contexts.

Warm
Public
4B
BF16
40960
License: apache-2.0
Hugging Face