Voyage AI's Voyage 4: MoE & Shared Embeddings Revealed

Voyage AI Launches Voyage 4 Embedding Family with Industry-First Shared Embedding Space and MoE Architecture

SAN FRANCISCO — Voyage AI on Wednesday introduced the Voyage 4 series, a new family of text embedding models that for the first time offer fully compatible embeddings across multiple model sizes and capabilities. The lineup includes the flagship voyage-4-large, which uses a mixture-of-experts (MoE) architecture to achieve state-of-the-art retrieval accuracy at 40% lower serving costs than comparable dense models, along with voyage-4, voyage-4-lite and the open-weight voyage-4-nano.

The release targets two primary audiences: existing customers seeking higher retrieval accuracy than previous Voyage models, and developers building context-engineered agents that demand both high retrieval quality and low latency/cost for high-volume query workloads. All four models produce embeddings in a shared vector space, enabling "asymmetric retrieval" where documents can be indexed with a more powerful model while queries are processed with smaller, cheaper variants without sacrificing compatibility.

According to Voyage AI's official announcement, the shared embedding space represents an industry-first capability for production embedding models. This design allows developers to optimize document embeddings for maximum accuracy — a one-time or infrequent cost — while independently tuning query embeddings for latency and ongoing serving costs. For example, a production system could embed its entire document corpus once with voyage-4-large and then serve queries using voyage-4-lite or even the open-weight voyage-4-nano, benefiting from the larger model's document representations while keeping per-query costs low.

Model Variants and Capabilities

The Voyage 4 family consists of four models with distinct performance and efficiency characteristics:

voyage-4-large: The flagship model leverages a mixture-of-experts architecture, marking the first production-grade embedding model to use MoE. It delivers better retrieval accuracy than its predecessor voyage-3-large while maintaining serving costs approximately 40% lower than comparable dense models.
voyage-4: A balanced model that approaches the retrieval quality of voyage-3-large while operating with the efficiency of a mid-sized model.
voyage-4-lite: Delivers accuracy approaching that of voyage-3.5 but with significantly fewer parameters, enabling high-quality embeddings at reduced computational cost.
voyage-4-nano: Voyage AI's first open-weight model, released under the Apache 2.0 license on Hugging Face. It is positioned for local development, prototyping and offers a clear path to production deployment.

All models in the series support Matryoshka Representation Learning (MRL), allowing users to generate embeddings at 2048, 1024, 512 or 256 dimensions with minimal quality degradation. They also support multiple quantization options: 32-bit floating point, signed and unsigned 8-bit integer, and binary precision. These features can substantially reduce storage and compute costs in downstream vector databases.

Technical Innovation: Mixture-of-Experts for Embeddings

The use of MoE architecture in voyage-4-large represents a significant technical milestone. While MoE has become increasingly common in large language models for improving compute efficiency, applying it successfully to embedding models at production scale presented unique challenges. Voyage AI claims this implementation breaks through what it calls the "dense ceiling" in embedding model performance.

The company states that voyage-4-large achieves frontier-level retrieval accuracy with substantially lower serving costs than dense alternatives of similar capability. A companion technical blog post referenced in the announcement describes a scaling study showing a 75% reduction in the number of active parameters while maintaining nearly equivalent retrieval accuracy compared to dense embedding models.

This efficiency gain is particularly valuable for high-volume retrieval-augmented generation (RAG) applications and semantic search systems where embedding inference costs can become significant at scale.

Shared Embedding Space and Asymmetric Retrieval

The core innovation of the Voyage 4 family is the shared embedding space across all model sizes. Traditional embedding models require using the identical model for both document indexing and query embedding to maintain compatibility. Voyage AI's approach eliminates this constraint.

This enables sophisticated cost-accuracy tradeoffs. As described in the announcement, the typical production workload pattern — documents embedded once or infrequently, queries embedded continuously — makes asymmetric retrieval particularly advantageous.

Voyage AI recommends the following workflow for high-query-traffic applications:

Vectorize the document corpus once using voyage-4-large for maximum retrieval accuracy.
Begin with voyage-4-lite for query embeddings during development and early production to minimize serving costs.
Upgrade query embeddings to voyage-4 or voyage-4-large as accuracy requirements increase, without needing to re-embed the document corpus.

The company evaluated the asymmetric retrieval capabilities of these models across eight domains: medical, code, web, finance, technical documentation, long documents, conversations, and law. Each evaluation dataset included both a corpus and corresponding queries.

Benchmark Performance

Voyage AI evaluated the general-purpose retrieval quality of the Voyage 4 models using all 29 datasets in the comprehensive Retrieval Embedding Benchmark (RTEB). While specific numerical scores were not detailed in the primary announcement, the company positions voyage-4-large as establishing a new state-of-the-art in retrieval accuracy at its price point.

The models were also specifically tested on their asymmetric retrieval performance, validating that mixing document and query embeddings from different family members maintains strong retrieval quality.

Industry Context and Availability

Voyage AI has positioned itself as a specialist provider of high-performance embedding models, competing with offerings from OpenAI, Cohere, and various open-source alternatives. The introduction of production-grade MoE for embeddings and the shared embedding space concept differentiates the Voyage 4 family from existing solutions.

The voyage-4-nano model is immediately available as open weights on Hugging Face under the Apache 2.0 license. The announcement implies that the proprietary models (voyage-4-large, voyage-4, and voyage-4-lite) are available through Voyage AI's API platform, though specific pricing details were not disclosed in the announcement.

MongoDB has already announced integration of the Voyage 4 series into its vector search capabilities, highlighting the models' readiness for production RAG, semantic search and agentic applications.

Impact on Developers and the Industry

For developers, the Voyage 4 family offers unprecedented flexibility in balancing accuracy, latency and cost. The ability to independently optimize document and query embeddings could significantly reduce operational costs for large-scale retrieval systems while maintaining or improving quality.

The open-weight voyage-4-nano model lowers the barrier to experimentation and enables fully local development workflows with a supported migration path to production cloud models. Combined with Matryoshka embeddings and aggressive quantization options, this family provides powerful tools for managing vector database costs.

The successful application of MoE to embedding models may accelerate adoption of sparse architectures in other retrieval-focused AI components, potentially influencing the broader embedding model landscape.

What's Next

Voyage AI indicated that additional technical details about the MoE implementation and scaling studies would be shared in follow-up blog posts. The company is expected to publish more comprehensive benchmark results and best practices for asymmetric retrieval in the coming weeks.

As vector search and RAG applications continue to scale across industries, the shared embedding space approach introduced by Voyage 4 could become a standard feature that other embedding providers will need to match.

The models are available immediately through Voyage AI's platform, with voyage-4-nano accessible on Hugging Face for local use and experimentation.

The Voyage 4 model family: shared embedding space with MoE architecture — news

Model Variants and Capabilities

Technical Innovation: Mixture-of-Experts for Embeddings

Shared Embedding Space and Asymmetric Retrieval

Benchmark Performance

Industry Context and Availability

Impact on Developers and the Industry

What's Next

Sources

Original Source

Related Topics

Comments