tokyotech-llm/Llama-3.1-Swallow-8B-v0.2
Llama-3.1-Swallow-8B-v0.2 by tokyotech-llm is an 8 billion parameter language model built by continually pre-training on Meta Llama 3.1. This model significantly enhances Japanese language capabilities while maintaining strong English performance, utilizing approximately 200 billion tokens from a large Japanese web corpus, Wikipedia, and mathematical/coding content. It features a 32768 token context length and demonstrates improved Japanese benchmark scores compared to its Llama 3.1 base.
No reviews yet. Be the first to review!