Voyage AI Launches rerank-2.5 and rerank-2.5-lite, First Instruction-Following Rerankers
SAN FRANCISCO — Voyage AI on Monday introduced rerank-2.5 and rerank-2.5-lite, a new generation of reranking models that deliver significant gains in retrieval accuracy while adding native instruction-following capabilities for the first time in the reranker category.
The models improve retrieval accuracy by 7.94% and 7.16% respectively over Cohere Rerank v3.5 across a standard suite of 93 retrieval datasets spanning multiple domains, according to Voyage AI’s announcement. On the Massive Instructed Retrieval Benchmark (MAIR), the performance gap widens to 12.70% for rerank-2.5 and 10.36% for the lite version. Both models also expand context length to 32K tokens — eight times that of Cohere Rerank v3.5 — with no increase in pricing.
Rerankers play a critical role in modern retrieval pipelines by taking an initial set of candidate documents from faster but less precise first-stage retrieval methods — such as BM25 lexical search or embedding-based vector similarity — and re-scoring them for relevance. Voyage AI, now part of MongoDB, positions the new models as superior to using large language models directly as rerankers, a claim the company says it will explore in an upcoming technical blog post.
Technical Advances and Instruction Following
The rerank-2.5 series was developed using an improved mixture of training data and advanced distillation techniques drawn from Voyage AI’s larger in-house instruction-following models. The most notable addition is the ability for users to steer relevance scoring through natural language instructions appended or prepended to their queries.
This capability allows developers to define custom notions of relevance without retraining or fine-tuning. Voyage AI provided several practical examples:
- In academic search: “Prioritize the title and ignore the abstract”
- In legal research: “Retrieve regulatory documents and legal statutes, not court cases”
- In e-commerce: “This is an e-commerce application about cars” to disambiguate “Jaguar” as the vehicle brand rather than the animal
On a set of 24 domain-specific in-house instruction-following evaluation datasets covering web, tech, legal, finance, conversational, medical, and code domains, the instruction-following feature delivered additional accuracy gains of 8.13% for rerank-2.5 and 7.55% for rerank-2.5-lite.
Both models support a 32K token context window, doubling the capacity of the previous rerank-2 and enabling reranking of much longer documents without truncation. The company emphasized that this expanded context comes at no additional cost compared to prior versions.
Evaluation Methodology
Voyage AI evaluated the models across nine domains: technical documentation, code, law, finance, web reviews, multilingual (51 datasets from 31 languages), long documents, medical, and conversations. The standard benchmark results reflect performance without instructions, while MAIR and the in-house datasets specifically test instruction-following behavior.
The evaluation protocol involved reranking results from four different first-stage retrievers, including BM25 lexical search. This approach provides a realistic view of how the models perform in typical hybrid search pipelines.
The announcement represents Voyage AI’s continued focus on specialized embedding and reranking models optimized for enterprise retrieval use cases. Since its acquisition by MongoDB, the company has integrated its technology more deeply into database-powered AI applications, where accurate retrieval from large document collections is essential.
Industry Context
Rerankers have become increasingly important as retrieval-augmented generation (RAG) systems have moved into production. While first-stage retrievers prioritize speed and scale, rerankers provide the precision needed for high-stakes applications in legal, medical, financial, and technical domains.
Cohere’s Rerank v3.5 has been a popular choice among developers, making Voyage AI’s reported gains noteworthy. The introduction of instruction following also addresses a longstanding limitation: the difficulty of encoding complex, domain-specific relevance criteria into a purely embedding-based system.
By allowing natural language instructions, Voyage AI aims to make sophisticated retrieval behavior more accessible to developers who may not have machine learning expertise. The feature effectively turns the reranker into a more controllable component that can adapt to application-specific requirements on a per-query basis.
Impact on Developers and Enterprises
For developers building AI applications, the new models offer several practical advantages. The 32K context window reduces the need to chunk very long documents, potentially simplifying pipeline architecture and preserving more context for relevance decisions.
The instruction-following capability could reduce reliance on custom fine-tuning or complex prompt engineering at the LLM level. Instead, developers can encode business logic and domain expertise directly into the retrieval stage, where it can influence which documents reach the generative model.
MongoDB’s involvement suggests tighter integration with its vector search capabilities in the future, potentially offering end-to-end retrieval solutions for customers using MongoDB Atlas. The no-price-increase announcement makes the upgrade particularly attractive for existing Voyage AI users.
The models outperform general-purpose LLMs when used as rerankers, according to Voyage AI. This aligns with industry observations that smaller, specialized models often deliver better performance and cost efficiency for narrowly defined tasks like relevance scoring compared to frontier LLMs.
What’s Next
Voyage AI indicated it will publish a deeper technical analysis comparing its rerankers to LLMs used for the same purpose. The company has not announced specific dates for broader availability beyond the initial release, but the models are referenced in its current documentation, suggesting they are already accessible via the Voyage AI API.
The introduction of instruction-following rerankers may influence how other providers approach retrieval model development. As RAG systems handle increasingly nuanced queries across specialized domains, the ability to dynamically guide relevance criteria could become a standard feature rather than a differentiator.
Enterprises with complex search requirements in regulated industries may find the combination of higher baseline accuracy, longer context, and instruction steering particularly valuable for building compliant and accurate knowledge systems.
The release continues the rapid pace of innovation in the retrieval stack, where improvements in embedding models, rerankers, and hybrid search techniques are delivering measurable gains in application quality without necessarily requiring larger generative models.

