xtuner/llava-llama-3-8b

The xtuner/llava-llama-3-8b is an 8 billion parameter LLaVA model developed by XTuner, fine-tuned from Meta-Llama-3-8B-Instruct and CLIP-ViT-Large-patch14-336. This vision-capable model integrates a large language model with a visual encoder, enabling it to process and understand both text and images. It is designed for multimodal tasks, demonstrating improved performance on various visual question answering and perception benchmarks compared to LLaVA-v1.5-7B.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 8192

Hugging Face

No reviews yet. Be the first to review!