rl-research/DR-Tulu-8B

DR-Tulu-8B is an 8 billion parameter deep research agent developed by rl-research, built upon the DR-Tulu-SFT-8B base model. This model has undergone Reinforcement Learning (RL) training specifically for advanced tool-use within the dr-agent-lib framework. It excels in complex research-oriented tasks, demonstrating superior performance across benchmarks like SQAv2, HealthBench, and DeepResearch Bench compared to its SFT counterpart and other 8B models.

Warm
Public
8B
FP8
32768
License: apache-2.0
Hugging Face