TachyHealth/Gazal-R1-32B-GRPO-preview

Gazal-R1-32B-GRPO-preview is a 32.8 billion parameter causal language model developed by TachyHealth, built upon Qwen 3 32B. It is specifically designed and fine-tuned for medical reasoning and clinical decision-making, leveraging a two-stage training pipeline including Group Relative Policy Optimization (GRPO). This model excels at diagnostic reasoning, treatment planning, and prognostic assessment, achieving state-of-the-art performance on medical benchmarks like MedQA and MMLU Pro (Medical).

Cold
Public
32B
FP8
32768
License: apache-2.0
Hugging Face