tokyotech-llm/Llama-3.1-Swallow-8B-v0.1

Llama-3.1-Swallow-8B-v0.1 is an 8 billion parameter large language model developed by tokyotech-llm, built upon Meta's Llama 3.1 architecture with a 32K context length. This model was continually pre-trained on approximately 200 billion tokens, including a large Japanese web corpus, Japanese and English Wikipedia, and mathematical/coding content. It is specifically designed to enhance Japanese language capabilities while maintaining strong English performance, making it suitable for bilingual applications requiring robust understanding and generation in both languages.

Warm
Public
8B
FP8
32768
License: llama3.1
Hugging Face

No reviews yet. Be the first to review!