Oumi Config YAML Guide: Amazon Vibes Unleashed

Q: The full process?

#### 1. Define the goal (30 minutes) Write a one-page spec before touching any code. **Good spec example:** ``` Goal: Create a Llama-3.1-8B model that answers customer support questions for our SaaS product in the tone of our knowledge base. Success criteria: - Fine-tune on 4,000 high-quality examples (mix of real tickets + synthetic) - Achieve < 8% hallucination rate on 200 held-out questions - Deploy to Bedrock us-east-1 with < 400 ms p50 latency at 10 concurrent users - Total training cost <

Q: Pitfalls and guardrails?

**### What if the Oumi training job fails with out-of-memory?** Reduce `per_device_train_batch_size` to 1 and increase `gradient_accumulation_steps`. Use `bf16` and enable activation checkpointing. If still failing, switch to a smaller base model (Llama-3.1-8B → 70B only if you have p5 instances). **### What if the Bedrock import job is rejected?** Bedrock is strict about the folder structure. The root of your S3 prefix must contain `model-00001-of-0000X.safetensors`, `tokenizer.model` (or `

How to Fine-Tune Llama with Oumi on EC2 and Deploy to Amazon Bedrock Custom Model Import

Why this matters for builders
Oumi + Amazon Bedrock Custom Model Import lets you fine-tune an open-source Llama model on EC2 (optionally generating synthetic data with Oumi), store the artifacts in S3, and import the resulting model into Bedrock for fully managed, pay-per-use inference without managing any inference infrastructure.

This workflow removes the biggest friction for teams that want a custom model but don’t want to run their own vLLM/GPU fleet in production. You keep full control of the training data and fine-tuning recipe while getting Bedrock’s enterprise-grade scaling, logging, and security model.

When to use it

You need domain-specific behavior (support tickets, legal clauses, internal knowledge) that base Llama doesn’t have
You want to stay within the AWS ecosystem for compliance and billing
You already have an EC2 training budget but don’t want to operate SageMaker or self-hosted endpoints
You want to experiment quickly with synthetic data generation inside the same toolkit
You plan to serve the model through Bedrock’s existing InvokeModel API so downstream apps require zero changes

The full process

1. Define the goal (30 minutes)

Write a one-page spec before touching any code.

Good spec example:

Goal: Create a Llama-3.1-8B model that answers customer support questions for our SaaS product in the tone of our knowledge base.

Success criteria:
- Fine-tune on 4,000 high-quality examples (mix of real tickets + synthetic)
- Achieve < 8% hallucination rate on 200 held-out questions
- Deploy to Bedrock us-east-1 with < 400 ms p50 latency at 10 concurrent users
- Total training cost < $180 on g5.12xlarge

Turn this into a prompt you can give an AI coding assistant (Claude, Cursor, etc.):

You are an MLOps engineer. Create a complete project structure and step-by-step README for fine-tuning Llama-3.1-8B with Oumi on EC2, generating synthetic data with Oumi if needed, uploading to S3, and importing into Amazon Bedrock Custom Model Import. Include cost estimation, required IAM roles, and validation steps.

2. Scaffold the project

Use the AI coding tool to generate the skeleton. Typical layout:

oumi-bedrock-llama/
├── oumi_config.yaml          # main Oumi training config
├── data/
│   ├── raw_tickets.jsonl
│   ├── synthetic/
│   └── train_valid_split.py
├── scripts/
│   ├── 01_generate_synthetic.py
│   ├── 02_train.sh
│   ├── 03_package_for_bedrock.py
│   └── 04_import_to_bedrock.py
├── terraform/                # IAM, bucket, EC2 launch template
├── requirements.txt
└── README.md

Prompt template for scaffolding:

Generate a production-ready folder structure and all placeholder files for the Oumi + Bedrock workflow described in the AWS blog "Accelerate custom LLM deployment: Fine-tune with Oumi and deploy to Amazon Bedrock".

3. Implement the data & training pipeline

Oumi uses a declarative YAML config. Here’s a minimal starter you can copy and adapt:

model:
  model_name: "meta-llama/Llama-3.1-8B"
  model_max_length: 8192
  torch_dtype: "bf16"

data:
  train:
    - path: "data/train.jsonl"
      dataset_type: "jsonl"
  validation:
    - path: "data/valid.jsonl"
      dataset_type: "jsonl"

training:
  trainer_type: "trl_sft"
  num_train_epochs: 2
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 8
  learning_rate: 2e-5
  output_dir: "./output"
  save_strategy: "epoch"
  logging_steps: 10

  # FSDP + EFA recommended on EC2
  fsdp: full
  fsdp_config:
    fsdp_activation_checkpointing: true

Prompt for AI assistant:

Write an Oumi training config for Llama-3.1-8B-Instruct using SFT on a JSONL chat dataset. Optimize for g5.12xlarge with 4x A10G GPUs. Include synthetic data generation script using Oumi’s built-in capabilities.

Run the training script on EC2:

# scripts/02_train.sh
oumi run train -c oumi_config.yaml

4. Package and upload artifacts

After training, Bedrock Custom Model Import expects a specific safetensors + tokenizer layout.

# scripts/03_package_for_bedrock.py
import shutil
from pathlib import Path

output_dir = Path("./output")
bedrock_dir = Path("./bedrock_model")

bedrock_dir.mkdir(exist_ok=True)
shutil.copytree(output_dir / "final", bedrock_dir, dirs_exist_ok=True)
# Bedrock also needs a specific serving config
(bedrock_dir / "serving.properties").write_text("engine=MPI\noption.model_id=.")

# Upload
aws s3 cp --recursive ./bedrock_model s3://your-bucket/models/my-llama-v1/

5. Import to Bedrock

Use the AWS SDK or console to create the custom model import job.

# scripts/04_import_to_bedrock.py
import boto3

bedrock = boto3.client("bedrock")

response = bedrock.create_model_customization_job(
    jobName="llama-oumi-v1",
    customModelName="my-saas-support-llama",
    roleArn="arn:aws:iam::...:role/BedrockCustomModelRole",
    baseModelIdentifier="meta.llama3-1-8b-instruct-v1:0",
    trainingDataConfig={
        "s3Uri": "s3://your-bucket/models/my-llama-v1/"
    },
    outputDataConfig={
        "s3Uri": "s3://your-bucket/bedrock-output/"
    }
)

After the import finishes (usually 30–90 minutes), you get a model ARN you can call with InvokeModel exactly like any other Bedrock model.

6. Validate before shipping

Run this checklist:

Run 200 held-out examples and calculate exact match / ROUGE / hallucination rate
Compare latency and cost vs base Llama on Bedrock
Test with realistic concurrent load (use Locust or Artillery)
Verify IAM least-privilege and S3 bucket policies
Confirm the model appears in the Bedrock model catalog with correct name
Add monitoring (CloudWatch invocation metrics + custom guardrails)

Pitfalls and guardrails

### What if the Oumi training job fails with out-of-memory?
Reduce per_device_train_batch_size to 1 and increase gradient_accumulation_steps. Use bf16 and enable activation checkpointing. If still failing, switch to a smaller base model (Llama-3.1-8B → 70B only if you have p5 instances).

### What if the Bedrock import job is rejected?
Bedrock is strict about the folder structure. The root of your S3 prefix must contain model-00001-of-0000X.safetensors, tokenizer.model (or tokenizer_config.json), and a serving.properties file. Double-check the exact layout in the official docs.

### What if synthetic data quality is poor?
Use Oumi’s data generation with a strong prompt template and a small amount of seed real examples. Always include a human review step for at least 10% of synthetic data.

### What if latency is higher than expected?
Custom Model Import currently uses a fixed inference configuration. You cannot change tensor parallel or quantization at import time. Test with the exact concurrency you need before going live.

What to do next

Pick one narrow vertical (support, sales, legal) and ship the first version this week.
Instrument Bedrock invocation logs and start collecting real user feedback.
Build a small evaluation harness that runs nightly against new tickets.
Iterate: add preference optimization (DPO) in Oumi on the collected feedback data.

You now have a repeatable, mostly-automated path from raw data → custom model running on managed Bedrock.

Sources

AWS Machine Learning Blog: “Accelerate custom LLM deployment: Fine-tune with Oumi and deploy to Amazon Bedrock” — https://aws.amazon.com/blogs/machine-learning/accelerate-custom-llm-deployment-fine-tune-with-oumi-and-deploy-to-amazon-bedrock/
Amazon Bedrock Custom Model Import documentation (linked from the blog)

oumi_config.yaml