dmis-lab/llama-3.1-medprm-reward-v1.0
dmis-lab/llama-3.1-medprm-reward-v1.0 is an 8 billion parameter Process Reward Model (PRM) developed by dmis-lab, specifically designed for the medical domain with a 32768 token context length. It integrates clinical knowledge through retrieval-augmented generation (RAG) to enhance verification capabilities. This model excels in scaling-test-time computation on complex medical reasoning tasks, outperforming majority-voting ensembles and achieving a score over 80 on the MedQA (4-option) benchmark when combined with llama-3-meerkat-8b-v1.0.
No reviews yet. Be the first to review!