Run NVIDIA Nemotron 3 Nano as a fully managed serverless model on Amazon Bedrock
News/2026-03-09-run-nvidia-nemotron-3-nano-as-a-fully-managed-serverless-model-on-amazon-bedrock
Breaking NewsMar 9, 20266 min read
Verified·First-party

Run NVIDIA Nemotron 3 Nano as a fully managed serverless model on Amazon Bedrock

Featured:AmazonNVIDIA

NVIDIA Nemotron 3 Nano Now Available as Serverless Model on Amazon Bedrock

Key Facts

  • What: NVIDIA Nemotron 3 Nano, a 30B-parameter Mixture-of-Experts (MoE) small language model with only 3B active parameters, is now available as a fully managed, serverless model on Amazon Bedrock.
  • Architecture: Hybrid Transformer-Mamba with MoE layers, supporting up to 256K token context length.
  • Performance: Leads open models (≤30B MoE) on benchmarks including SWE Bench Verified, AIME 2025, Arena Hard v2, IFBench, Humanity Last Exam, and RULER, particularly excelling in coding, reasoning, math, and tool calling.
  • Availability: Fully open-weights model accessible through Amazon Bedrock’s serverless inference, powered by Project Mantle distributed inference engine.
  • Benefits: Eliminates infrastructure management while providing high efficiency, low latency, and OpenAI API compatibility.

Amazon Web Services and NVIDIA have expanded their collaboration with the launch of NVIDIA Nemotron 3 Nano as a fully managed and serverless model on Amazon Bedrock. The addition enables developers and enterprises to access the open-weight small language model (SLM) without managing underlying infrastructure, building on previous Bedrock support for earlier Nemotron 2 Nano variants.

The move reflects growing demand for efficient, specialized models suitable for agentic AI workloads. Nemotron 3 Nano’s hybrid architecture and MoE design deliver strong reasoning capabilities while maintaining high throughput and low token consumption, making it attractive for production applications that require fast, accurate inference at scale.

Model Architecture and Technical Details

NVIDIA Nemotron 3 Nano features a 30 billion parameter model with only 3 billion active parameters at any given time, thanks to its Mixture-of-Experts routing. The architecture uniquely combines Transformer layers for precise attention mechanisms, Mamba layers for efficient long-range sequence modeling with low memory overhead, and MoE layers that activate only relevant experts per token.

This hybrid Transformer-Mamba-MoE backbone is specifically engineered to balance efficiency, reasoning accuracy, and scalability. According to the announcement, the design is particularly well-suited for agent clusters running many concurrent, lightweight workflows. The model supports a context length of up to 256K tokens, enabling it to handle complex, long-document tasks common in enterprise environments.

The model accepts text input and produces text output. It includes a “token budget” feature designed to improve accuracy while preventing overthinking, which helps control inference costs and latency in agentic systems.

Benchmark Leadership and Efficiency Gains

Nemotron 3 Nano demonstrates leading performance among open models with 30 billion or fewer MoE parameters. It excels on coding and reasoning benchmarks including SWE Bench Verified, AIME 2025, Arena Hard v2, IFBench, Humanity Last Exam, and RULER. The model also scores highly on instruction following, tool calling, and general chat capabilities.

Independent evaluation from Artificial Analysis places Nemotron 3 Nano in the most favorable quadrant on the Openness Index versus Intelligence Index, highlighting both its transparency and capability. The model achieved an impressive 52-point score on the Intelligence versus Output Speed Index, representing a significant improvement over its predecessor, Nemotron 2 Nano.

This combination of high accuracy and efficiency is increasingly important as agentic AI systems drive higher token consumption. The model’s ability to reach correct answers quickly while using fewer tokens addresses a critical need in production deployments.

Industry Use Cases and Customer Adoption

The announcement highlights several practical applications across industries. In finance, organizations can use Nemotron 3 Nano to accelerate loan processing through data extraction, income pattern analysis, fraud detection, and risk assessment. Cybersecurity teams can leverage it for vulnerability triage, malware analysis, and proactive threat hunting.

Software development teams benefit from code summarization and related assistance capabilities, while retailers can apply the model to inventory optimization and real-time personalized recommendations.

BridgeWise, a financial services company, has already adopted Nemotron models on Amazon Bedrock to enhance internal agentic workflows that test and validate model outputs, according to supporting coverage of the launch.

Powered by Project Mantle Inference Engine

The availability of Nemotron 3 Nano on Bedrock is powered by Project Mantle, a new distributed inference engine developed for large-scale machine learning model serving on Amazon Bedrock. Project Mantle streamlines the onboarding of new models, delivers high-performance and reliable serverless inference, and includes sophisticated quality-of-service controls.

The engine also provides automated capacity management, higher default customer quotas through unified inference pools, and out-of-the-box compatibility with OpenAI API specifications. This compatibility significantly reduces the friction for developers already familiar with OpenAI’s ecosystem.

Impact on Developers and Enterprise AI Adoption

For developers and organizations, the serverless availability of Nemotron 3 Nano removes traditional barriers to using advanced open models. Users gain access to Bedrock’s extensive features including guardrails, knowledge bases, agents, and model evaluation tools without needing to provision or manage GPUs.

The fully open nature of the model — including open weights, datasets, and training recipes — provides transparency that many enterprises require for auditing and governance. This openness, combined with Bedrock’s enterprise-grade security and compliance features, makes Nemotron 3 Nano suitable for regulated industries.

The launch strengthens the competitive position of both companies. NVIDIA continues to expand distribution of its open Nemotron family across major cloud providers, while AWS further enriches the model catalog on its flagship generative AI service.

What’s Next

The announcement follows NVIDIA’s broader release of the Nemotron 3 family and earlier Bedrock availability of Nemotron 2 Nano models. AWS has also made Nemotron 3 Nano 30B available in Amazon SageMaker JumpStart, providing customers with multiple deployment options across the AWS AI/ML portfolio.

Organizations interested in testing the model can access it immediately through Amazon Bedrock. The post co-written by AWS and NVIDIA teams includes technical guidance for getting started with generative AI applications using Nemotron 3 Nano.

As demand for efficient agentic AI systems continues to grow, the combination of Nemotron 3 Nano’s architectural innovations and Bedrock’s serverless delivery is expected to accelerate development of specialized AI solutions across industries.

Sources

Original Source

aws.amazon.com

Comments

No comments yet. Be the first to share your thoughts!