tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3

The Llama-3.1-Swallow-8B-Instruct-v0.3 model by tokyotech-llm is an 8 billion parameter instruction-tuned large language model, continually pre-trained from Meta's Llama 3.1. It significantly enhances Japanese language capabilities while retaining strong English performance, utilizing approximately 200 billion tokens from Japanese web corpora, Wikipedia, and technical content. This model excels in multi-turn Japanese dialogue, achieving state-of-the-art performance on Japanese MT-Bench among open-source LLMs of comparable size.

Warm
Public
8B
FP8
32768
License: llama3.1
Hugging Face