nvidia/Qwen3-Nemotron-32B-RLBFF

The nvidia/Qwen3-Nemotron-32B-RLBFF is a 32 billion parameter large language model developed by NVIDIA, built upon the Qwen/Qwen3-32B foundation. It is fine-tuned using Reinforcement Learning from Binary Flexible Feedback (RLBFF) to enhance the quality of LLM-generated responses in a default thinking mode. This research model excels at generating responses to multi-turn user queries, demonstrating improved performance on benchmarks like Arena Hard V2, WildBench, and MT Bench compared to its base model.

Warm
Public
32B
FP8
32768
License: other
Hugging Face

No reviews yet. Be the first to review!