Cohere For AI Releases Aya Vision: Open-Weights Models for Multilingual Understanding
Cohere For AI, the open research arm of Cohere, has announced the release of Aya Vision, a pair of state-of-the-art, open-weights vision models designed to significantly improve AI's multimodal performance across diverse languages. Addressing the persistent gap in how well AI handles combined text and image tasks globally, Aya Vision expands these capabilities to 23 languages, spoken by over half the world's population.
The models excel at tasks like image captioning, visual question answering, and generating text descriptions from images, aiming to foster greater cross-cultural understanding.
State-of-the-Art Multilingual Vision Performance
Aya Vision was released in two sizes, 8 billion (8B) and 32 billion (32B) parameters, both demonstrating exceptional performance and efficiency:
- Aya Vision 8B: Outperforms leading open-weight competitors in its class (including models from Qwen, Google, Meta, and Pangea) by up to 70-79% win rates on multilingual multimodal benchmarks (AyaVisionBench, m-WildVision). Notably, it even surpasses much larger models like Llama-3.2 90B Vision.
- Aya Vision 32B: Sets a new standard for open-weights multilingual vision models, outperforming significantly larger models (Llama-3.2 90B, Molmo 72B, Qwen2-VL 72B) by substantial margins (up to 64-72% win rates).
This highlights Cohere For AI's focus on achieving top performance with greater compute efficiency, benefiting the broader research community.
Innovations and Evaluation
The models incorporate several algorithmic breakthroughs developed by Cohere For AI, including techniques like synthetic annotations, scaling multilingual data via translation, and multimodal model merging. Alongside the models, the team is also open-sourcing the Aya Vision Benchmark, a new evaluation suite featuring nuanced, open-ended questions across 23 languages to better reflect real-world interactions.
Open Weights and Broad Access
In line with their commitment to open research, Cohere For AI has released both the Aya Vision 8B and 32B models as open-weights on platforms like Kaggle and Hugging Face. Further democratizing access, they are also enabling free use of the models via WhatsApp, allowing users worldwide to leverage these advanced multimodal capabilities easily.
Aya Vision represents a significant step forward in creating AI systems that can understand and interact with the world's diverse linguistic and visual information, empowering researchers and users globally.
Learn more on the Cohere For AI Blog.