tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1

The Gemma-2-Llama-Swallow-2b-pt-v0.1 model by tokyotech-llm is a 2.6 billion parameter pre-trained language model built upon the Gemma 2 architecture, with a context length of 8192 tokens. It was continually pre-trained on approximately 200 billion tokens, including a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia, and mathematical/coding content. This model significantly enhances Japanese language capabilities while retaining strong English performance, making it suitable for bilingual applications requiring robust understanding and generation in both languages.

Warm
Public
2.6B
BF16
8192
License: gemma
Hugging Face