โœ… Production Validated S3 Storage LoRA Adapters

Checkpoint Storage & Usage

Where trained LoRA adapters live on S3, how to download and load them for inference, and how to hot-swap adapters at runtime without reloading the base model.

๐Ÿ“„ 10 min read ๐ŸŽฏ All Levels ๐Ÿ“… March 2026

Overview

Every training run produces a LoRA adapter checkpoint โ€” a compact ~50 MB file containing the low-rank weight deltas. These adapters are stored on AWS S3 and can be loaded at inference time on top of the frozen ACE-Step 1.5 base model. The system supports:

  • Multiple concurrent adapters โ€” Different styles share one base model instance
  • Hot-swapping โ€” Switch adapters without GPU restart or model reload
  • Versioned storage โ€” S3 versioning preserves all checkpoint history
  • Serverless loading โ€” Auto-download from S3 on container startup

Production Adapter

โœ… Current Production Checkpoint

Adapter: multi-style-gen-c
Preset: C (rank=64, alpha=192, LR=5e-5, 100 epochs)
Validation: 11/11 WTA tests passed
Dataset: ~100 multi-genre tracks
Trained on: Lambda Cloud A100 40GB

Adapter Contents

File Size Description
adapter_model.safetensors ~50 MB LoRA weight deltas for all 48 DiT layers
adapter_config.json 1 KB Rank, alpha, target modules, scaling
training_args.json 2 KB Full training hyperparameters for reproducibility

S3 Locations

s3://lumina-data-foldartists/
โ”œโ”€โ”€ models/
โ”‚   โ””โ”€โ”€ ace-step-1.5/                # Base model (frozen)
โ”‚       โ”œโ”€โ”€ model.safetensors
โ”‚       โ””โ”€โ”€ config.json
โ”œโ”€โ”€ lora/
โ”‚   โ”œโ”€โ”€ multi-style-gen-c/           # โœ… Production adapter
โ”‚   โ”‚   โ”œโ”€โ”€ adapter_model.safetensors
โ”‚   โ”‚   โ”œโ”€โ”€ adapter_config.json
โ”‚   โ”‚   โ””โ”€โ”€ training_args.json
โ”‚   โ”œโ”€โ”€ loo-subsets/                  # LOO validation adapters
โ”‚   โ”‚   โ”œโ”€โ”€ loo-blues/
โ”‚   โ”‚   โ”œโ”€โ”€ loo-classical/
โ”‚   โ”‚   โ”œโ”€โ”€ loo-country/
โ”‚   โ”‚   โ””โ”€โ”€ ...                      # One per GTZAN genre
โ”‚   โ””โ”€โ”€ experimental/                # Work-in-progress adapters
โ””โ”€โ”€ datasets/
    โ”œโ”€โ”€ multi-style-hf/              # Production HF dataset
    โ””โ”€โ”€ gtzan-hf/                    # GTZAN validation dataset
๐Ÿ’ก Versioning Enabled

S3 versioning is active on the lora/ prefix. Every overwrite creates a new version. Use aws s3api list-object-versions to retrieve previous checkpoints if needed.

Download & Load

1

Download from S3

# Download the production adapter
aws s3 sync \
    s3://lumina-data-foldartists/lora/multi-style-gen-c/ \
    ~/adapters/multi-style-gen-c/

# Verify files
ls -la ~/adapters/multi-style-gen-c/
# adapter_model.safetensors  (~50 MB)
# adapter_config.json        (~1 KB)
2

Load in Python

from peft import PeftModel
from transformers import AutoModel

# Load frozen base model
base_model = AutoModel.from_pretrained(
    "~/models/ace-step-1.5",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Apply LoRA adapter on top
model = PeftModel.from_pretrained(
    base_model,
    "~/adapters/multi-style-gen-c",
    is_trainable=False
)

# Ready for inference
model.eval()

Full Inference Example

import torch
from ace_step.pipeline import ACEStepPipeline

# Initialize pipeline with base model
pipeline = ACEStepPipeline(
    model_path="~/models/ace-step-1.5",
    device="cuda",
    dtype=torch.bfloat16
)

# Load production adapter
pipeline.load_lora("~/adapters/multi-style-gen-c")

# Generate music
audio = pipeline.generate(
    tags="jazz, piano trio, smooth, walking bass, 120bpm",
    lyrics="[Instrumental]",
    duration=60,           # seconds
    num_inference_steps=100,
    guidance_scale=3.5,
    seed=42
)

# Save output
pipeline.save_audio(audio, "output.wav", sample_rate=48000)
๐Ÿ’ก Tags Drive Style

The model was fine-tuned with rich style tags. Use descriptive tags matching the training data for best results: genre, instruments, mood, tempo, key, vocal type.

Hot-Swapping Adapters

The base model stays in VRAM โ€” only the adapter weights (~50 MB) are swapped. This takes < 1 second and avoids reloading the 815M parameter base model.

# Start with jazz adapter
pipeline.load_lora("~/adapters/multi-style-gen-c")
jazz_output = pipeline.generate(tags="jazz, piano, smooth, 120bpm", ...)

# Hot-swap to a different adapter โ€” NO base model reload
pipeline.unload_lora()
pipeline.load_lora("~/adapters/electronic-exp-01")
electronic_output = pipeline.generate(tags="techno, synth, 140bpm", ...)

# Or merge multiple adapters (experimental)
pipeline.load_lora("~/adapters/adapter-a", adapter_name="style_a")
pipeline.load_lora("~/adapters/adapter-b", adapter_name="style_b")
pipeline.set_adapter_weights({"style_a": 0.7, "style_b": 0.3})

Performance Comparison

Operation Time VRAM Impact
Full model reload ~30 seconds 16 GB allocated fresh
LoRA swap < 1 second ~50 MB delta
Multi-adapter merge ~2 seconds ~100 MB for 2 adapters

Serverless Engine

The ACE-Step 1.5 Engine runs as a Docker container that auto-downloads the base model and adapter from S3 at startup. This enables fully serverless inference on Lambda Cloud, RunPod, or Modal.

# Run the serverless engine with a specific adapter
docker run --gpus all \
    -e AWS_ACCESS_KEY_ID=$AWS_KEY \
    -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET \
    -e ADAPTER_S3_PATH=s3://lumina-data-foldartists/lora/multi-style-gen-c/ \
    -e MODEL_S3_PATH=s3://lumina-data-foldartists/models/ace-step-1.5/ \
    -p 8080:8080 \
    lumina-engine:v1

The engine exposes a REST API:

# POST /generate
curl -X POST http://localhost:8080/generate \
    -H "Content-Type: application/json" \
    -d '{
        "tags": "jazz, piano, smooth, 120bpm",
        "lyrics": "[Instrumental]",
        "duration": 60,
        "seed": 42
    }'

Versioning & History

S3 Versioning Commands

# List all versions of the production adapter
aws s3api list-object-versions \
    --bucket lumina-data-foldartists \
    --prefix lora/multi-style-gen-c/adapter_model.safetensors

# Download a specific version
aws s3api get-object \
    --bucket lumina-data-foldartists \
    --key lora/multi-style-gen-c/adapter_model.safetensors \
    --version-id "abc123def456" \
    adapter_model_v1.safetensors

Naming Convention

Pattern Example Use
{style}-gen-{preset} multi-style-gen-c Production adapters
loo-{genre} loo-blues Validation experiments
{name}-exp-{nn} electronic-exp-01 Experiments