tokyotech-llm/Llama-3.1-Swallow-8B-v0.1

Llama-3.1-Swallow-8B-v0.1 is an 8 billion parameter large language model developed by tokyotech-llm, built upon Meta's Llama 3.1 architecture with a 32K context length. This model was continually pre-trained on approximately 200 billion tokens, including a large Japanese web corpus, Japanese and English Wikipedia, and mathematical/coding content. It is specifically designed to enhance Japanese language capabilities while maintaining strong English performance, making it suitable for bilingual applications requiring robust understanding and generation in both languages.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 32768

License: llama3.1

Hugging Face