tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.1

The Llama-3.1-Swallow-8B-Instruct-v0.1 is an 8 billion parameter instruction-tuned large language model developed by tokyotech-llm, built upon Meta Llama 3.1. It features enhanced Japanese language capabilities through continual pre-training on a 200 billion token Japanese web corpus, while retaining strong English performance. This model excels in Japanese language tasks, making it suitable for applications requiring robust bilingual understanding and generation, with a context length of 32768 tokens.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 32768

License: llama3.1

Hugging Face

No reviews yet. Be the first to review!