dmis-lab/llama-3.1-medprm-reward-v1.0

dmis-lab/llama-3.1-medprm-reward-v1.0 is an 8 billion parameter Process Reward Model (PRM) developed by dmis-lab, specifically designed for the medical domain with a 32768 token context length. It integrates clinical knowledge through retrieval-augmented generation (RAG) to enhance verification capabilities. This model excels in scaling-test-time computation on complex medical reasoning tasks, outperforming majority-voting ensembles and achieving a score over 80 on the MedQA (4-option) benchmark when combined with llama-3-meerkat-8b-v1.0.

Cold

Public

Model Size: 8B

Quant: FP8

Ctx length: 32768

License: mit

Hugging Face

No reviews yet. Be the first to review!