core/reranker.py
News/2026-03-09-corererankerpy-vibe-coding-guide
Vibe Coding GuideMar 9, 20266 min read
Verified·First-party

core/reranker.py

# Build a Production-Grade RAG Pipeline with Voyage rerank-2.5 Instruction-Following

Why this matters for builders

Voyage AI just shipped rerank-2.5 and rerank-2.5-lite — the first rerankers with true instruction-following capabilities. They deliver +7.94% and +7.16% retrieval accuracy over Cohere Rerank v3.5 on 93 datasets, jump to +12.7% on the Massive Instructed Retrieval Benchmark (MAIR), support 32K context (8× Cohere), and add zero cost.

The game changer is the natural-language instruction field. You can now tell the reranker exactly how to interpret relevance (“Prioritize regulatory documents and legal statutes, ignore court cases”, “This is an e-commerce site about cars — treat Jaguar as the brand”, “Focus only on the methodology section of papers”). This turns static reranking into a controllable, domain-specific relevance layer.

For builders shipping real apps, this means fewer brittle prompt hacks, better precision in legal, medical, finance, and technical search, and a cleaner separation between retrieval and generation.

When to use it

Use rerank-2.5 when:

  • You already have a first-stage retriever (BM25, Voyage embeddings, or vector search in MongoDB, Pinecone, Weaviate, etc.)
  • Your domain has nuanced relevance rules that change per user, tenant, or product vertical
  • Documents are long (>4k tokens) and you need full context
  • You want to reduce hallucination by feeding the LLM only the truly relevant chunks

Use the -lite variant for lower latency/cost in high-QPS consumer apps.

The full process — from idea to shipped feature

Here’s a battle-tested workflow you can follow with Cursor, Claude, or any strong coding assistant.

1. Define the goal (10 minutes)

Write a one-paragraph spec:

“Build a legal research assistant that retrieves from a corpus of statutes, regulations, and case law. For every user query, the system must bias results toward regulatory documents and statutes while de-emphasizing court opinions. Use Voyage rerank-2.5 with the standing instruction ‘Retrieve regulatory documents and legal statutes, not court cases.’ Return the top 5 most relevant passages to the LLM for synthesis. Support documents up to 25k tokens. Measure nDCG@5 before and after adding the instruction.”

2. Shape the spec & prompt your AI coder

Give your coding assistant this starter prompt (copy-paste ready):

You are an expert RAG engineer. We are adding Voyage AI rerank-2.5 (instruction-following) to an existing retrieval pipeline.

Requirements:
- First stage: hybrid search (BM25 + Voyage voyage-3 embeddings) against MongoDB Atlas Vector Search or a simple list for prototyping
- Second stage: rerank with rerank-2.5 using the instruction: "Prioritize regulatory documents and legal statutes, not court cases."
- Support 32k context — do NOT truncate documents
- Return top 5 results with relevance scores
- Provide a simple FastAPI endpoint: POST /legal-search with { "query": "...", "instruction": "..." }
- Include evaluation harness using nDCG@5 on a small golden dataset
- Use official Voyage SDK (check docs for exact method signature)

Output structure:
1. requirements.txt + .env.example
2. core/reranker.py with VoyageReranker class
3. api/routes.py
4. eval/evaluate.py
5. README with local testing instructions

3. Scaffold the project

Run the generated scaffold, then refine with follow-up prompts:

from voyageai import Client
from typing import List, Dict

class VoyageReranker:
    def __init__(self, model: str = "rerank-2.5"):
        self.client = Client()
        self.model = model
    
    def rerank(self, query: str, documents: List[str], instruction: str = None, top_k: int = 5):
        # Official pattern from Voyage docs
        if instruction:
            # Prepend or append — both work; prepend is often cleaner
            query_with_instruction = f"{instruction}\n\nQuery: {query}"
        else:
            query_with_instruction = query
            
        response = self.client.rerank(
            query=query_with_instruction,
            documents=documents,
            model=self.model,
            top_k=top_k
        )
        return response.results  # contains index, relevance_score, document

4. Implement the full pipeline

Next prompt for your AI pair programmer:

“Now implement the hybrid retrieval + rerank flow. First retrieve top 30 candidates with MongoDB Atlas $vectorSearch and BM25, then rerank with rerank-2.5 using the legal instruction. Add a fallback to rerank-2.5-lite on timeout. Include async support.”

Validate that the code respects the 32K limit by passing full document text.

5. Validate with real metrics

Create a small evaluation set (10 queries with known good passages).

# eval/evaluate.py
from voyageai import Client
import json

reranker = VoyageReranker("rerank-2.5")

def test_instruction_impact():
    query = "legal implications of AI training data"
    instruction = "Retrieve regulatory documents and legal statutes, not court cases."
    
    # Run twice — once with, once without instruction
    results_with = reranker.rerank(query, docs, instruction)
    results_without = reranker.rerank(query, docs, None)
    
    print("With instruction top-3 relevance:", [r.relevance_score for r in results_with[:3]])
    print("Without instruction top-3 relevance:", [r.relevance_score for r in results_without[:3]])

Typical gain: 7–13% in relevance alignment on domain-specific data.

6. Ship it safely

Production checklist:

  • Add retry + fallback to rerank-2.5-lite
  • Cache reranking results for identical (query + instruction) pairs for 5 minutes
  • Monitor average rerank latency and cost per 1k tokens (pricing unchanged)
  • A/B test the instruction version vs baseline for 1–2 weeks
  • Log the exact instruction used per request for debugging

Pitfalls and guardrails

  • Don’t put the instruction in the documents field — it belongs with the query.
  • Overly long instructions can dilute signal — keep under 100 words.
  • Test your instruction on a few dozen examples before shipping. Vague instructions (“make it better”) perform worse than specific ones.
  • The model still returns relevance scores; higher score = better match to both query AND instruction.
  • If you see degraded performance, try prepending vs appending the instruction (some domains prefer one).

What to do next

  1. Replace your current reranker (Cohere, Voyage rerank-2, or LLM-as-reranker) with rerank-2.5
  2. Add one standing instruction per product vertical or user persona
  3. Measure nDCG@5 or human preference on your real traffic
  4. Experiment with dynamic instructions generated by a small LLM based on user session context
  5. Write a follow-up blog post once you have before/after metrics

This single change — adding a controllable reranker — often yields bigger relevance gains than upgrading embeddings or prompt engineering the final LLM.

Sources

(Word count: 982)

Original Source

blog.voyageai.com

Comments

No comments yet. Be the first to share your thoughts!