How to Run NVIDIA Nemotron 3 Nano as a Fully Managed Serverless Model on Amazon Bedrock
TL;DR
- Access NVIDIA Nemotron 3 Nano (30B total, 3B active parameters) instantly through Amazon Bedrock’s serverless inference—no infrastructure to manage.
- Invoke the model using the Bedrock Converse API or OpenAI-compatible endpoint for text generation, coding, reasoning, and agentic workflows.
- Start testing in minutes via the AWS Console, SDK, or CLI by requesting model access and using your existing Bedrock IAM permissions.
Prerequisites
Before you begin, ensure you have the following:
- An active AWS account with billing enabled.
- IAM permissions to use Amazon Bedrock (specifically
bedrock:InvokeModelandbedrock:InvokeModelWithResponseStream). - The AWS CLI v2 installed and configured, or the latest AWS SDK for Python (boto3), JavaScript, or your preferred language.
- Model access to NVIDIA Nemotron 3 Nano—request it in the Bedrock console (it is now generally available in supported regions).
- Basic familiarity with large language model APIs.
Step 1: Request Model Access in the Amazon Bedrock Console
- Sign in to the AWS Management Console and navigate to Amazon Bedrock.
- In the left navigation pane, choose Model access.
- Click Manage model access and locate NVIDIA Nemotron 3 Nano.
- Select the model and request access. Once approved (usually instant for GA models), the status changes to Access granted.
Tip: If you don’t see the model, verify you are in a supported region (check the official Bedrock documentation for the latest availability).
Step 2: Test the Model Directly in the Bedrock Playground
The fastest way to experiment is through the console playground:
- In the Bedrock console, go to Chat or Text playground.
- From the model dropdown, select NVIDIA Nemotron 3 Nano.
- Enter a prompt such as:
Summarize the following code and suggest three improvements:
def calculate_loan_payment(principal, rate, periods):
return principal * rate / (1 - (1 + rate) ** -periods)
- Adjust inference parameters:
- Temperature: 0.7 (for balanced creativity)
- Top-p: 0.9
- Max tokens: 2048
- Click Run and review the response.
This step requires zero code and lets you validate the model’s coding, reasoning, and instruction-following abilities immediately.
Step 3: Invoke Nemotron 3 Nano Using the AWS SDK (Python Example)
Use the Bedrock Runtime client with the Converse API for the best experience.
import boto3
import json
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1') # change region as needed
model_id = "nvidia-nemotron-3-nano" # exact model identifier—check console for full ARN if needed
def invoke_nemotron(prompt, max_tokens=2048, temperature=0.7):
response = bedrock_runtime.converse(
modelId=model_id,
messages=[{
"role": "user",
"content": [{"text": prompt}]
}],
inferenceConfig={
"maxTokens": max_tokens,
"temperature": temperature,
"topP": 0.9
}
)
return response['output']['message']['content'][0]['text']
prompt = "Explain how the hybrid Transformer-Mamba-MoE architecture in Nemotron 3 Nano improves efficiency for agentic workflows."
result = invoke_nemotron(prompt)
print(result)
Note: The exact model identifier string can be found in the Bedrock console under model details. Some regions may use a longer ARN format.
Step 4: Use the OpenAI-Compatible API Endpoint (Optional)
Project Mantle powers Nemotron 3 Nano on Bedrock and provides out-of-the-box OpenAI API compatibility. This is ideal if you have existing code using OpenAI client libraries.
from openai import OpenAI
client = OpenAI(
api_key="YOUR_BEDROCK_API_KEY", # Use AWS SigV4 auth or Bedrock-specific setup
base_url="https://bedrock-runtime.amazonaws.com/v1" # adjust per region/docs
)
response = client.chat.completions.create(
model="nvidia-nemotron-3-nano",
messages=[{"role": "user", "content": "Write a Python function to detect anomalies in financial transactions."}],
max_tokens=1024,
temperature=0.6
)
print(response.choices[0].message.content)
Consult the latest Bedrock OpenAI compatibility guide for authentication setup using AWS SigV4.
Step 5: Build an Agentic Workflow with Tool Calling
Nemotron 3 Nano excels at tool calling and instruction following. Here’s a basic example structure:
tool_prompt = """
You are a financial assistant. Use the available tools when necessary.
Available tools:
- get_account_balance(account_id)
- detect_fraud(transaction_data)
Query: Analyze this transaction for fraud risk: amount=12500, account=ACC-3921, location=unusual.
"""
response = invoke_nemotron(tool_prompt, temperature=0.3)
print(response)
The model’s strong performance on IFBench and tool-calling benchmarks makes it suitable for building lightweight agent clusters.
Tips and Best Practices
- Token efficiency: Nemotron 3 Nano uses a 256K context window. Take advantage of long-context capabilities for code repositories, long documents, or multi-turn agent conversations.
- Cost control: Because it activates only 3B parameters per token via MoE, it typically delivers lower latency and better throughput than dense models of similar quality.
- Prompt engineering: Start with clear, structured prompts. The model performs exceptionally well on coding, math, and reasoning tasks when given step-by-step instructions.
- Streaming responses: Use
converse_streamfor real-time UI feedback in web applications. - Monitoring: Enable Amazon CloudWatch logging for Bedrock to track latency, token usage, and invocation patterns.
Common Issues
Why am I getting "AccessDeniedException"?
Ensure your IAM role or user has the bedrock:InvokeModel permission for the specific model. Update your policy and retry.
Model identifier not found error
Double-check the exact model ID in the Bedrock console. Model names sometimes include version suffixes or full ARNs.
High latency on first call
Serverless models may experience a brief cold-start delay on the very first invocation. Subsequent calls are much faster due to Bedrock’s managed scaling.
Poor performance on simple tasks
Try lowering temperature to 0.3–0.5 for factual or coding tasks. Nemotron 3 Nano’s “think fast” design works best when not forced to overthink.
Next Steps
After successfully running your first inferences:
- Explore advanced agentic patterns using Nemotron 3 Nano’s superior tool-calling and reasoning capabilities.
- Integrate the model into production applications such as automated loan analysis, vulnerability triage, or code assistance tools.
- Compare performance against other models available in Bedrock using the same evaluation prompts.
- Experiment with the 256K context length for document-heavy or repository-wide analysis tasks.
- Monitor the NVIDIA Nemotron family for future variants and updates on Bedrock.

