tokyotech-llm/Llama-3.1-Swallow-8B-v0.1
Llama-3.1-Swallow-8B-v0.1 is an 8 billion parameter large language model developed by tokyotech-llm, built upon Meta's Llama 3.1 architecture with a 32K context length. This model was continually pre-trained on approximately 200 billion tokens, including a large Japanese web corpus, Japanese and English Wikipedia, and mathematical/coding content. It is specifically designed to enhance Japanese language capabilities while maintaining strong English performance, making it suitable for bilingual applications requiring robust understanding and generation in both languages.