tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3

The Llama-3.1-Swallow-8B-Instruct-v0.3 model by tokyotech-llm is an 8 billion parameter instruction-tuned large language model, continually pre-trained from Meta's Llama 3.1. It significantly enhances Japanese language capabilities while retaining strong English performance, utilizing approximately 200 billion tokens from Japanese web corpora, Wikipedia, and technical content. This model excels in multi-turn Japanese dialogue, achieving state-of-the-art performance on Japanese MT-Bench among open-source LLMs of comparable size.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 32768

License: llama3.1

Hugging Face