nvidia/Nemotron-Cascade-8B

Nemotron-Cascade-8B is a powerful general-purpose language model developed by NVIDIA, post-trained from Qwen3-8B-Base. It features a unique sequential and domain-wise reinforcement learning (Cascade RL) pipeline, enabling it to operate in both 'thinking' and 'instruct' modes. This 8-billion parameter model achieves best-in-class performance across diverse benchmarks, including knowledge reasoning, alignment, mathematics, and competitive programming, notably matching the LiveCodeBench scores of much larger models like DeepSeek-R1-0528 (671B). Its primary use case is complex reasoning and instruction following, with robust support for long contexts up to 64K tokens via YaRN scaling.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 32768

License: nvidia-open-model-license

Hugging Face