Checkpoint Storage & Usage

Overview

Every training run produces a LoRA adapter checkpoint — a compact ~50 MB file containing the low-rank weight deltas. These adapters are stored on AWS S3 and can be loaded at inference time on top of the frozen ACE-Step 1.5 base model. The system supports:

Multiple concurrent adapters — Different styles share one base model instance
Hot-swapping — Switch adapters without GPU restart or model reload
Versioned storage — S3 versioning preserves all checkpoint history
Serverless loading — Auto-download from S3 on container startup

Production Adapter

✅ Current Production Checkpoint

Adapter: multi-style-gen-c
Preset: C (rank=64, alpha=192, LR=5e-5, 100 epochs)
Validation: 11/11 WTA tests passed
Dataset: ~100 multi-genre tracks
Trained on: Lambda Cloud A100 40GB

Adapter Contents

File	Size	Description
`adapter_model.safetensors`	~50 MB	LoRA weight deltas for all 48 DiT layers
`adapter_config.json`	1 KB	Rank, alpha, target modules, scaling
`training_args.json`	2 KB	Full training hyperparameters for reproducibility

S3 Locations

s3://lumina-data-foldartists/
├── models/
│   └── ace-step-1.5/                # Base model (frozen)
│       ├── model.safetensors
│       └── config.json
├── lora/
│   ├── multi-style-gen-c/           # ✅ Production adapter
│   │   ├── adapter_model.safetensors
│   │   ├── adapter_config.json
│   │   └── training_args.json
│   ├── loo-subsets/                  # LOO validation adapters
│   │   ├── loo-blues/
│   │   ├── loo-classical/
│   │   ├── loo-country/
│   │   └── ...                      # One per GTZAN genre
│   └── experimental/                # Work-in-progress adapters
└── datasets/
    ├── multi-style-hf/              # Production HF dataset
    └── gtzan-hf/                    # GTZAN validation dataset

💡 Versioning Enabled

S3 versioning is active on the lora/ prefix. Every overwrite creates a new version. Use aws s3api list-object-versions to retrieve previous checkpoints if needed.

Download & Load

Download from S3

# Download the production adapter
aws s3 sync \
    s3://lumina-data-foldartists/lora/multi-style-gen-c/ \
    ~/adapters/multi-style-gen-c/

# Verify files
ls -la ~/adapters/multi-style-gen-c/
# adapter_model.safetensors  (~50 MB)
# adapter_config.json        (~1 KB)

Load in Python

from peft import PeftModel
from transformers import AutoModel

# Load frozen base model
base_model = AutoModel.from_pretrained(
    "~/models/ace-step-1.5",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Apply LoRA adapter on top
model = PeftModel.from_pretrained(
    base_model,
    "~/adapters/multi-style-gen-c",
    is_trainable=False
)

# Ready for inference
model.eval()

Full Inference Example

import torch
from ace_step.pipeline import ACEStepPipeline

# Initialize pipeline with base model
pipeline = ACEStepPipeline(
    model_path="~/models/ace-step-1.5",
    device="cuda",
    dtype=torch.bfloat16
)

# Load production adapter
pipeline.load_lora("~/adapters/multi-style-gen-c")

# Generate music
audio = pipeline.generate(
    tags="jazz, piano trio, smooth, walking bass, 120bpm",
    lyrics="[Instrumental]",
    duration=60,           # seconds
    num_inference_steps=100,
    guidance_scale=3.5,
    seed=42
)

# Save output
pipeline.save_audio(audio, "output.wav", sample_rate=48000)

💡 Tags Drive Style

The model was fine-tuned with rich style tags. Use descriptive tags matching the training data for best results: genre, instruments, mood, tempo, key, vocal type.

Hot-Swapping Adapters

The base model stays in VRAM — only the adapter weights (~50 MB) are swapped. This takes < 1 second and avoids reloading the 815M parameter base model.

# Start with jazz adapter
pipeline.load_lora("~/adapters/multi-style-gen-c")
jazz_output = pipeline.generate(tags="jazz, piano, smooth, 120bpm", ...)

# Hot-swap to a different adapter — NO base model reload
pipeline.unload_lora()
pipeline.load_lora("~/adapters/electronic-exp-01")
electronic_output = pipeline.generate(tags="techno, synth, 140bpm", ...)

# Or merge multiple adapters (experimental)
pipeline.load_lora("~/adapters/adapter-a", adapter_name="style_a")
pipeline.load_lora("~/adapters/adapter-b", adapter_name="style_b")
pipeline.set_adapter_weights({"style_a": 0.7, "style_b": 0.3})

Performance Comparison

Operation	Time	VRAM Impact
Full model reload	~30 seconds	16 GB allocated fresh
LoRA swap	< 1 second	~50 MB delta
Multi-adapter merge	~2 seconds	~100 MB for 2 adapters

Serverless Engine

The ACE-Step 1.5 Engine runs as a Docker container that auto-downloads the base model and adapter from S3 at startup. This enables fully serverless inference on Lambda Cloud, RunPod, or Modal.

# Run the serverless engine with a specific adapter
docker run --gpus all \
    -e AWS_ACCESS_KEY_ID=$AWS_KEY \
    -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET \
    -e ADAPTER_S3_PATH=s3://lumina-data-foldartists/lora/multi-style-gen-c/ \
    -e MODEL_S3_PATH=s3://lumina-data-foldartists/models/ace-step-1.5/ \
    -p 8080:8080 \
    lumina-engine:v1

The engine exposes a REST API:

# POST /generate
curl -X POST http://localhost:8080/generate \
    -H "Content-Type: application/json" \
    -d '{
        "tags": "jazz, piano, smooth, 120bpm",
        "lyrics": "[Instrumental]",
        "duration": 60,
        "seed": 42
    }'

Versioning & History

S3 Versioning Commands

# List all versions of the production adapter
aws s3api list-object-versions \
    --bucket lumina-data-foldartists \
    --prefix lora/multi-style-gen-c/adapter_model.safetensors

# Download a specific version
aws s3api get-object \
    --bucket lumina-data-foldartists \
    --key lora/multi-style-gen-c/adapter_model.safetensors \
    --version-id "abc123def456" \
    adapter_model_v1.safetensors

Naming Convention

Pattern	Example	Use
`{style}-gen-{preset}`	`multi-style-gen-c`	Production adapters
`loo-{genre}`	`loo-blues`	Validation experiments
`{name}-exp-{nn}`	`electronic-exp-01`	Experiments

Document	Description
ACE-Step Training Pipeline	Full training guide, presets, Docker setup
WTA Attribution & Validation	How checkpoints are validated with WTA experiments
Docs Home	All LUMINA documentation in one place

Overview

Production Adapter

Adapter Contents

S3 Locations

Download & Load

Download from S3

Load in Python

Full Inference Example

Hot-Swapping Adapters

Performance Comparison

Serverless Engine

Versioning & History

S3 Versioning Commands

Naming Convention

Related Documentation