BytedTsinghua-SIA/DAPO-Qwen-32B

DAPO-Qwen-32B is a 32.8 billion parameter language model developed by BytedTsinghua-SIA, based on the Qwen2.5-32B architecture. It is trained using the DAPO (Deep Alignment with Preference Optimization) algorithm, specializing in mathematical problem-solving. With a context length of 131072 tokens, this model is optimized for complex reasoning tasks requiring step-by-step mathematical solutions.

Warm
Public
32.8B
FP8
131072
License: apache-2.0
Hugging Face