Overview
Every training run produces a LoRA adapter checkpoint โ a compact ~50 MB file containing the low-rank weight deltas. These adapters are stored on AWS S3 and can be loaded at inference time on top of the frozen ACE-Step 1.5 base model. The system supports:
- Multiple concurrent adapters โ Different styles share one base model instance
- Hot-swapping โ Switch adapters without GPU restart or model reload
- Versioned storage โ S3 versioning preserves all checkpoint history
- Serverless loading โ Auto-download from S3 on container startup
Production Adapter
Adapter: multi-style-gen-c
Preset: C (rank=64, alpha=192, LR=5e-5, 100 epochs)
Validation: 11/11 WTA tests passed
Dataset: ~100 multi-genre tracks
Trained on: Lambda Cloud A100 40GB
Adapter Contents
| File | Size | Description |
|---|---|---|
adapter_model.safetensors |
~50 MB | LoRA weight deltas for all 48 DiT layers |
adapter_config.json |
1 KB | Rank, alpha, target modules, scaling |
training_args.json |
2 KB | Full training hyperparameters for reproducibility |
S3 Locations
s3://lumina-data-foldartists/ โโโ models/ โ โโโ ace-step-1.5/ # Base model (frozen) โ โโโ model.safetensors โ โโโ config.json โโโ lora/ โ โโโ multi-style-gen-c/ # โ Production adapter โ โ โโโ adapter_model.safetensors โ โ โโโ adapter_config.json โ โ โโโ training_args.json โ โโโ loo-subsets/ # LOO validation adapters โ โ โโโ loo-blues/ โ โ โโโ loo-classical/ โ โ โโโ loo-country/ โ โ โโโ ... # One per GTZAN genre โ โโโ experimental/ # Work-in-progress adapters โโโ datasets/ โโโ multi-style-hf/ # Production HF dataset โโโ gtzan-hf/ # GTZAN validation dataset
S3 versioning is active on the lora/ prefix. Every overwrite creates a new
version.
Use aws s3api list-object-versions to retrieve previous checkpoints if needed.
Download & Load
Download from S3
# Download the production adapter aws s3 sync \ s3://lumina-data-foldartists/lora/multi-style-gen-c/ \ ~/adapters/multi-style-gen-c/ # Verify files ls -la ~/adapters/multi-style-gen-c/ # adapter_model.safetensors (~50 MB) # adapter_config.json (~1 KB)
Load in Python
from peft import PeftModel from transformers import AutoModel # Load frozen base model base_model = AutoModel.from_pretrained( "~/models/ace-step-1.5", torch_dtype=torch.bfloat16, device_map="auto" ) # Apply LoRA adapter on top model = PeftModel.from_pretrained( base_model, "~/adapters/multi-style-gen-c", is_trainable=False ) # Ready for inference model.eval()
Full Inference Example
import torch from ace_step.pipeline import ACEStepPipeline # Initialize pipeline with base model pipeline = ACEStepPipeline( model_path="~/models/ace-step-1.5", device="cuda", dtype=torch.bfloat16 ) # Load production adapter pipeline.load_lora("~/adapters/multi-style-gen-c") # Generate music audio = pipeline.generate( tags="jazz, piano trio, smooth, walking bass, 120bpm", lyrics="[Instrumental]", duration=60, # seconds num_inference_steps=100, guidance_scale=3.5, seed=42 ) # Save output pipeline.save_audio(audio, "output.wav", sample_rate=48000)
The model was fine-tuned with rich style tags. Use descriptive tags matching the training data for best results: genre, instruments, mood, tempo, key, vocal type.
Hot-Swapping Adapters
The base model stays in VRAM โ only the adapter weights (~50 MB) are swapped. This takes < 1 second and avoids reloading the 815M parameter base model.
# Start with jazz adapter pipeline.load_lora("~/adapters/multi-style-gen-c") jazz_output = pipeline.generate(tags="jazz, piano, smooth, 120bpm", ...) # Hot-swap to a different adapter โ NO base model reload pipeline.unload_lora() pipeline.load_lora("~/adapters/electronic-exp-01") electronic_output = pipeline.generate(tags="techno, synth, 140bpm", ...) # Or merge multiple adapters (experimental) pipeline.load_lora("~/adapters/adapter-a", adapter_name="style_a") pipeline.load_lora("~/adapters/adapter-b", adapter_name="style_b") pipeline.set_adapter_weights({"style_a": 0.7, "style_b": 0.3})
Performance Comparison
| Operation | Time | VRAM Impact |
|---|---|---|
| Full model reload | ~30 seconds | 16 GB allocated fresh |
| LoRA swap | < 1 second | ~50 MB delta |
| Multi-adapter merge | ~2 seconds | ~100 MB for 2 adapters |
Serverless Engine
The ACE-Step 1.5 Engine runs as a Docker container that auto-downloads the base model and adapter from S3 at startup. This enables fully serverless inference on Lambda Cloud, RunPod, or Modal.
# Run the serverless engine with a specific adapter
docker run --gpus all \
-e AWS_ACCESS_KEY_ID=$AWS_KEY \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET \
-e ADAPTER_S3_PATH=s3://lumina-data-foldartists/lora/multi-style-gen-c/ \
-e MODEL_S3_PATH=s3://lumina-data-foldartists/models/ace-step-1.5/ \
-p 8080:8080 \
lumina-engine:v1
The engine exposes a REST API:
# POST /generate
curl -X POST http://localhost:8080/generate \
-H "Content-Type: application/json" \
-d '{
"tags": "jazz, piano, smooth, 120bpm",
"lyrics": "[Instrumental]",
"duration": 60,
"seed": 42
}'
Versioning & History
S3 Versioning Commands
# List all versions of the production adapter aws s3api list-object-versions \ --bucket lumina-data-foldartists \ --prefix lora/multi-style-gen-c/adapter_model.safetensors # Download a specific version aws s3api get-object \ --bucket lumina-data-foldartists \ --key lora/multi-style-gen-c/adapter_model.safetensors \ --version-id "abc123def456" \ adapter_model_v1.safetensors
Naming Convention
| Pattern | Example | Use |
|---|---|---|
{style}-gen-{preset} |
multi-style-gen-c |
Production adapters |
loo-{genre} |
loo-blues |
Validation experiments |
{name}-exp-{nn} |
electronic-exp-01 |
Experiments |