tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1
The Gemma-2-Llama-Swallow-2b-pt-v0.1 model by tokyotech-llm is a 2.6 billion parameter pre-trained language model built upon the Gemma 2 architecture, with a context length of 8192 tokens. It was continually pre-trained on approximately 200 billion tokens, including a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia, and mathematical/coding content. This model significantly enhances Japanese language capabilities while retaining strong English performance, making it suitable for bilingual applications requiring robust understanding and generation in both languages.