Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404
Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404 is a 3.2 billion parameter language model developed by Menlo Research, built on the Llama-3.2-3B backbone with a 32768 token context length. This model is specifically trained using reinforcement learning to develop effective search behaviors, enabling it to interact with synthetic search engines to refine queries and find exact answers. It is optimized for enhancing LLM search ability, focusing on persistent and adaptive information retrieval rather than static data memorization.