tokyotech-llm/Llama-3.1-Swallow-8B-v0.2

Llama-3.1-Swallow-8B-v0.2 by tokyotech-llm is an 8 billion parameter language model built by continually pre-training on Meta Llama 3.1. This model significantly enhances Japanese language capabilities while maintaining strong English performance, utilizing approximately 200 billion tokens from a large Japanese web corpus, Wikipedia, and mathematical/coding content. It features a 32768 token context length and demonstrates improved Japanese benchmark scores compared to its Llama 3.1 base.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 32768

License: llama3.1

Hugging Face

No reviews yet. Be the first to review!