Production checklist: - Add retry + fallback to `rerank-2.5-lite` - Cache reranking results for identical (query + instruction) pairs for 5 minutes - Monitor average rerank latency and cost per 1k tokens (pricing unchanged) - A/B test the instruction version vs baseline for 1–2 weeks - Log the exact instruction used per request for debugging **Pitfalls and guardrails** - Don’t put the instruction in the `documents` field — it belongs with the query. - Overly long instructions can dilute signal —

Voyage AI Reranker Guide with MongoDB & Cohere

# Build a Production-Grade RAG Pipeline with Voyage rerank-2.5 Instruction-Following

Why this matters for builders

Voyage AI just shipped rerank-2.5 and rerank-2.5-lite — the first rerankers with true instruction-following capabilities. They deliver +7.94% and +7.16% retrieval accuracy over Cohere Rerank v3.5 on 93 datasets, jump to +12.7% on the Massive Instructed Retrieval Benchmark (MAIR), support 32K context (8× Cohere), and add zero cost.

The game changer is the natural-language instruction field. You can now tell the reranker exactly how to interpret relevance (“Prioritize regulatory documents and legal statutes, ignore court cases”, “This is an e-commerce site about cars — treat Jaguar as the brand”, “Focus only on the methodology section of papers”). This turns static reranking into a controllable, domain-specific relevance layer.

For builders shipping real apps, this means fewer brittle prompt hacks, better precision in legal, medical, finance, and technical search, and a cleaner separation between retrieval and generation.

When to use it

Use rerank-2.5 when:

You already have a first-stage retriever (BM25, Voyage embeddings, or vector search in MongoDB, Pinecone, Weaviate, etc.)
Your domain has nuanced relevance rules that change per user, tenant, or product vertical
Documents are long (>4k tokens) and you need full context
You want to reduce hallucination by feeding the LLM only the truly relevant chunks

Use the -lite variant for lower latency/cost in high-QPS consumer apps.

The full process — from idea to shipped feature

Here’s a battle-tested workflow you can follow with Cursor, Claude, or any strong coding assistant.

1. Define the goal (10 minutes)

Write a one-paragraph spec:

“Build a legal research assistant that retrieves from a corpus of statutes, regulations, and case law. For every user query, the system must bias results toward regulatory documents and statutes while de-emphasizing court opinions. Use Voyage rerank-2.5 with the standing instruction ‘Retrieve regulatory documents and legal statutes, not court cases.’ Return the top 5 most relevant passages to the LLM for synthesis. Support documents up to 25k tokens. Measure nDCG@5 before and after adding the instruction.”

2. Shape the spec & prompt your AI coder

Give your coding assistant this starter prompt (copy-paste ready):

You are an expert RAG engineer. We are adding Voyage AI rerank-2.5 (instruction-following) to an existing retrieval pipeline.

Requirements:
- First stage: hybrid search (BM25 + Voyage voyage-3 embeddings) against MongoDB Atlas Vector Search or a simple list for prototyping
- Second stage: rerank with rerank-2.5 using the instruction: "Prioritize regulatory documents and legal statutes, not court cases."
- Support 32k context — do NOT truncate documents
- Return top 5 results with relevance scores
- Provide a simple FastAPI endpoint: POST /legal-search with { "query": "...", "instruction": "..." }
- Include evaluation harness using nDCG@5 on a small golden dataset
- Use official Voyage SDK (check docs for exact method signature)

Output structure:
1. requirements.txt + .env.example
2. core/reranker.py with VoyageReranker class
3. api/routes.py
4. eval/evaluate.py
5. README with local testing instructions

3. Scaffold the project

Run the generated scaffold, then refine with follow-up prompts:

from voyageai import Client
from typing import List, Dict

class VoyageReranker:
    def __init__(self, model: str = "rerank-2.5"):
        self.client = Client()
        self.model = model
    
    def rerank(self, query: str, documents: List[str], instruction: str = None, top_k: int = 5):
        # Official pattern from Voyage docs
        if instruction:
            # Prepend or append — both work; prepend is often cleaner
            query_with_instruction = f"{instruction}\n\nQuery: {query}"
        else:
            query_with_instruction = query
            
        response = self.client.rerank(
            query=query_with_instruction,
            documents=documents,
            model=self.model,
            top_k=top_k
        )
        return response.results  # contains index, relevance_score, document

4. Implement the full pipeline

Next prompt for your AI pair programmer:

“Now implement the hybrid retrieval + rerank flow. First retrieve top 30 candidates with MongoDB Atlas $vectorSearch and BM25, then rerank with rerank-2.5 using the legal instruction. Add a fallback to rerank-2.5-lite on timeout. Include async support.”

Validate that the code respects the 32K limit by passing full document text.

5. Validate with real metrics

Create a small evaluation set (10 queries with known good passages).

# eval/evaluate.py
from voyageai import Client
import json

reranker = VoyageReranker("rerank-2.5")

def test_instruction_impact():
    query = "legal implications of AI training data"
    instruction = "Retrieve regulatory documents and legal statutes, not court cases."
    
    # Run twice — once with, once without instruction
    results_with = reranker.rerank(query, docs, instruction)
    results_without = reranker.rerank(query, docs, None)
    
    print("With instruction top-3 relevance:", [r.relevance_score for r in results_with[:3]])
    print("Without instruction top-3 relevance:", [r.relevance_score for r in results_without[:3]])

Typical gain: 7–13% in relevance alignment on domain-specific data.

6. Ship it safely

Production checklist:

Add retry + fallback to rerank-2.5-lite
Cache reranking results for identical (query + instruction) pairs for 5 minutes
Monitor average rerank latency and cost per 1k tokens (pricing unchanged)
A/B test the instruction version vs baseline for 1–2 weeks
Log the exact instruction used per request for debugging

Pitfalls and guardrails

Don’t put the instruction in the documents field — it belongs with the query.
Overly long instructions can dilute signal — keep under 100 words.
Test your instruction on a few dozen examples before shipping. Vague instructions (“make it better”) perform worse than specific ones.
The model still returns relevance scores; higher score = better match to both query AND instruction.
If you see degraded performance, try prepending vs appending the instruction (some domains prefer one).

What to do next

Replace your current reranker (Cohere, Voyage rerank-2, or LLM-as-reranker) with rerank-2.5
Add one standing instruction per product vertical or user persona
Measure nDCG@5 or human preference on your real traffic
Experiment with dynamic instructions generated by a small LLM based on user session context
Write a follow-up blog post once you have before/after metrics

This single change — adding a controllable reranker — often yields bigger relevance gains than upgrading embeddings or prompt engineering the final LLM.

Sources

Voyage AI Official Announcement: https://blog.voyageai.com/2025/08/11/rerank-2-5/
MongoDB mirrored post on rerank-2.5
Voyage AI Rerankers documentation: https://docs.voyageai.com/docs/reranker
Massive Instructed Retrieval Benchmark (MAIR) results referenced in the release

(Word count: 982)

core/reranker.py