tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1

The tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1 is a 27 billion parameter pre-trained language model developed by tokyotech-llm, built upon the Gemma 2 architecture. It features a 32768 token context length and is continually pre-trained on approximately 200 billion tokens, significantly enhancing its Japanese language capabilities while maintaining strong English performance. This model excels in Japanese language tasks, demonstrating superior performance across various benchmarks compared to its base Gemma 2 counterpart and other models in its size class.

Cold

Public

Model Size: 27B

Quant: FP8

Ctx length: 32768

License: gemma

Hugging Face

No reviews yet. Be the first to review!