nvidia/Qwen3-Nemotron-32B-RLBFF

The nvidia/Qwen3-Nemotron-32B-RLBFF is a 32 billion parameter large language model developed by NVIDIA, built upon the Qwen/Qwen3-32B foundation. It is fine-tuned using Reinforcement Learning from Binary Flexible Feedback (RLBFF) to enhance the quality of LLM-generated responses in a default thinking mode. This research model excels at generating responses to multi-turn user queries, demonstrating improved performance on benchmarks like Arena Hard V2, WildBench, and MT Bench compared to its base model.

Warm

Public

Model Size: 32B

Quant: FP8

Ctx length: 32768

License: other

Hugging Face

No reviews yet. Be the first to review!