Gen-Verse/ReasonFlux-PRM-1.5B

Gen-Verse/ReasonFlux-PRM-1.5B is a 1.5 billion parameter trajectory-aware process reward model (PRM) designed to evaluate reasoning traces. It incorporates both step-level and trajectory-level supervision for fine-grained reward assignment aligned with structured chain-of-thought data. This model supports both offline and online reward supervision, making it suitable for data selection, reinforcement learning training, and reward-guided test-time scaling. Its lightweight architecture and efficient inference capabilities are optimized for resource-constrained applications and edge deployment.

Warm

Public

Model Size: 1.5B

Quant: BF16

Ctx length: 131072

License: mit

Hugging Face

No reviews yet. Be the first to review!