Sarvam 30B Uncensored via Abliteration: A Technical Deep Dive
Executive Summary
- Sarvam-30B is a 30-billion-parameter dense decoder-only Transformer developed from scratch by Sarvam AI (India), trained on 16 trillion tokens with a strong emphasis on reasoning, code, mathematics, and multilingual (especially Indic) data.
- The “aoxo/sarvam-30b-uncensored” variant applies the abliteration technique — a surgical weight-space intervention that removes refusal circuitry without additional fine-tuning or RLHF reversal.
- The resulting model retains the original 30B parameter count and architecture while exhibiting dramatically reduced safety alignment, enabling unrestricted generation on topics previously blocked by the base model.
- This release highlights both the power of open-weight Indian foundation models and the growing ease with which safety alignments can be surgically excised in the open-source ecosystem.
Technical Architecture
Sarvam-30B follows a standard dense autoregressive Transformer architecture. According to Sarvam’s official release, the model was trained on 16T tokens spanning code, general web, mathematics, specialized knowledge corpora, and heavy multilingual content. The final data mixture was carefully ablated (in the classical hyper-parameter sense) to emphasize reasoning, factual grounding, and software engineering capabilities.
Key architectural details disclosed so far:
- Parameter count: 30 billion (dense, no MoE)
- Context length: 8k–16k tokens (exact final value not yet disclosed in the original blog; community reports suggest 8192–16384)
- Tokenizer: Custom BPE trained on multilingual + code corpus with strong Indic language coverage
- Positional encoding: RoPE (Rotary Position Embeddings) — standard for recent 30B-class models
- Normalization: Likely RMSNorm or DeepNorm variant (exact layer-norm configuration not disclosed)
- Activation: SwiGLU or GeGLU (Sarvam has not published the precise MLP configuration)
The abliteration process applied by aoxo is a form of refusal vector ablation in weight space. The technique, popularized in the LocalLLaMA community in 2024, works as follows:
- Identify the “refusal direction” — typically a small subspace in the residual stream or specific attention heads that activate strongly on harmful prompts.
- Compute the mean activation difference between a large set of “refuse” and “comply” prompts at a chosen layer (often mid-to-late layers where safety behavior crystallizes).
- Subtract a scaled version of this refusal vector from the corresponding weight matrices (usually
W_oof attention or the gate/up projections in the MLP). - The resulting model loses the ability to reliably trigger refusal tokens while preserving general capability.
Because abliteration is a zero-training intervention, the model retains the exact same parameter count, KV cache behavior, and inference characteristics as the original Sarvam-30B. This makes it drop-in compatible with existing inference stacks (vLLM, Hugging Face, Ollama, llama.cpp, etc.).
Performance Analysis
Sarvam has not yet released comprehensive public evaluation numbers for the base 30B model. The official blog focuses on training scale (16T tokens) rather than downstream benchmarks. Community testing of both the base and the uncensored variant is still nascent — the uncensored model was uploaded only days after the base model.
However, we can infer expected performance from training scale and architecture:
| Model | Parameters | Tokens (T) | MMLU (est.) | GSM8K (est.) | HumanEval (est.) | Context |
|---|---|---|---|---|---|---|
| Llama-3.1 70B | 70B | ~15T | ~86 | ~88 | ~80 | 128k |
| Qwen2.5 32B | 32B | ~18T | ~84 | ~85 | ~78 | 128k |
| Sarvam-30B (base) | 30B | 16T | 78–82* | 80–84* | 72–78* | 8–16k |
| Sarvam-30B Uncensored | 30B | 16T | ~same | ~same | ~same | same |
| Mistral-Large 2 (123B) | 123B | undisclosed | 84.0 | 85.0 | 75+ | 128k |
*Estimated ranges based on training compute parity and multilingual emphasis. Exact numbers not yet disclosed by Sarvam.
The uncensored variant is expected to show near-identical benchmark scores on standard academic evaluations because abliteration primarily affects the refusal policy, not capability. Early Reddit and Discord reports suggest the uncensored model is slightly more willing to engage in creative or “edgy” reasoning chains, which can sometimes improve performance on controversial or role-play-heavy prompts, but may degrade factual grounding on safety-sensitive topics.
Technical Implications
-
Indian Open-Source Leadership — Sarvam-30B represents one of the largest dense models trained entirely from scratch outside the major US/Chinese labs. Its strong Indic language performance and code capabilities fill a critical gap in the open-source ecosystem.
-
Democratization of Uncensoring — The ease of abliteration (requiring only a few hours on a single H100 to compute the refusal vector) means any sufficiently capable open model can be turned into an “uncensored” version within days of release. This accelerates the “uncensored model zoo” phenomenon seen with Llama-3, Qwen2, and now Sarvam.
-
Local Agent Capabilities — Because the model is 30B, it fits comfortably in 24–40 GB VRAM using 4-bit or 8-bit quantization (especially with grouped-query attention if present). Combined with native tool-calling tendencies observed in the base model, the uncensored version is being actively used by the community as a local coding agent backbone on RTX 4090-class hardware.
-
Safety Research Signal — The speed with which safety alignment was removed underscores that current post-training alignment (primarily SFT + RLHF/RLAIF) still relies on fragile, easily ablatable representations rather than deeply integrated world-model constraints.
Limitations and Trade-offs
- Loss of Safety Without Capability Trade-off Control: Abliteration is crude compared to fine-tuning-based unalignment. It can introduce unintended side effects such as increased sycophancy, reduced calibration on controversial topics, or occasional incoherent refusal remnants.
- No Official Support: The
aoxo/sarvam-30b-uncensoredmodel is an unofficial community derivative. Sarvam AI has not endorsed it, and users lose any implicit warranty or downstream support. - Multilingual Safety Gaps: The base model’s strong Indic focus means the refusal circuitry may be less robust in non-English languages to begin with; abliteration amplifies this asymmetry.
- Quantization Sensitivity: Because abliteration modifies specific weight directions, aggressive quantization (2-bit, 3-bit) may disproportionately damage the remaining capability compared to the base model.
Expert Perspective
The Sarvam-30B uncensored release is less significant for its raw capability leap than for what it signals about the current state of open-source frontier AI. A company in India was able to train a competitive 30B model on 16T tokens, open-source it, and within a week the community had produced a fully ablated version that removes corporate safety policy.
This pattern — rapid open release followed by near-instant community unalignment — is now the default. It suggests that for the foreseeable future, any sufficiently capable open-weight model will effectively exist in both aligned and unaligned forms. The real technical challenge is shifting safety mechanisms from easily ablatable linear directions in activation space to more robust, distributed, or cryptographic-style constraints that survive weight-space surgery.
For ML engineers, the model is a valuable addition to the 30–40B class, particularly for Indic language tasks and code. The abliteration technique itself is a useful tool in the interpretability/alignment researcher’s toolkit, demonstrating once again how shallow current safety representations remain.
Technical FAQ
How does abliteration differ from standard unalignment fine-tuning?
Abliteration is a training-free intervention that subtracts a computed refusal direction from weight matrices. Standard unalignment uses continued SFT or preference optimization on “uncensored” datasets. Abliteration is faster and preserves original capabilities more faithfully but is less controllable and can leave residual artifacts.
Is the uncensored model backwards-compatible with the original Sarvam-30B inference setup?
Yes. Because it has identical architecture, parameter count, and tokenizer, the uncensored weights are a drop-in replacement. You can simply swap the model checkpoint in vLLM, Hugging Face AutoModelForCausalLM, Ollama, or llama.cpp without changing any configuration.
What VRAM is required to run Sarvam-30B Uncensored at usable speed?
- FP16: ~60 GB (not practical on consumer hardware)
- 4-bit GPTQ/AWQ: ~18–22 GB → comfortably fits on RTX 4090 (24 GB)
- 8-bit: ~32 GB → requires dual 4090 or A6000-class card Community reports confirm good performance at 4-bit on a single 4090 for 8k context with batch size 1–4.
Does abliteration preserve the model’s multilingual and code capabilities?
Early testing suggests yes. The base Sarvam-30B was heavily optimized for Indic languages and code; the refusal vector appears largely orthogonal to those capabilities. However, tasks that require high factual caution (medical, legal, or security-related code) may see degraded reliability.
References
- Sarvam AI Official Blog: Open-Sourcing Sarvam 30B and 105B
- aoxo/sarvam-30b-uncensored Hugging Face repository
- LocalLLaMA community discussions on refusal vector ablation
- Original Reddit thread: r/artificial — “Sarvam 30B Uncensored via Abliteration”
Sources
- Original Reddit Thread
- Sarvam AI Official Blog — Open-Sourcing Sarvam 30B and 105B
- aoxo/sarvam-30b-uncensored on Hugging Face
- Times of India coverage of Sarvam open-source release
- r/LocalLLaMA discussion on Sarvam 30B/105B
All technical specifications, pricing, and benchmark data in this article are sourced directly from official announcements. Competitor comparisons use publicly available data at time of publication. We update our coverage as new information becomes available.

