tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1

The tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1 is a 27 billion parameter pre-trained language model developed by tokyotech-llm, built upon the Gemma 2 architecture. It features a 32768 token context length and is continually pre-trained on approximately 200 billion tokens, significantly enhancing its Japanese language capabilities while maintaining strong English performance. This model excels in Japanese language tasks, demonstrating superior performance across various benchmarks compared to its base Gemma 2 counterpart and other models in its size class.

Cold
Public
27B
FP8
32768
License: gemma
Hugging Face

No reviews yet. Be the first to review!