tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1

The Gemma-2-Llama-Swallow-2b-pt-v0.1 model by tokyotech-llm is a 2.6 billion parameter pre-trained language model built upon the Gemma 2 architecture, with a context length of 8192 tokens. It was continually pre-trained on approximately 200 billion tokens, including a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia, and mathematical/coding content. This model significantly enhances Japanese language capabilities while retaining strong English performance, making it suitable for bilingual applications requiring robust understanding and generation in both languages.

Warm

Public

Model Size: 2.6B

Quant: BF16

Ctx length: 8192

License: gemma

Hugging Face