dmis-lab/llama-3.1-medprm-reward-v1.0

dmis-lab/llama-3.1-medprm-reward-v1.0 is an 8 billion parameter Process Reward Model (PRM) developed by dmis-lab, specifically designed for the medical domain with a 32768 token context length. It integrates clinical knowledge through retrieval-augmented generation (RAG) to enhance verification capabilities. This model excels in scaling-test-time computation on complex medical reasoning tasks, outperforming majority-voting ensembles and achieving a score over 80 on the MedQA (4-option) benchmark when combined with llama-3-meerkat-8b-v1.0.

Cold
Public
8B
FP8
32768
License: mit
Hugging Face

No reviews yet. Be the first to review!