LUMINA Technical Paper
Gradient-Based Influence Attribution for AI Music Generators
1Introduction
When an AI music generator produces audio, rightsholders need answers to three critical questions:
- Which training songs influenced the output?
- How much did each song contribute?
- How confident are we in these attributions?
A model's gradients encode which parameters would change to better fit a sample. By comparing gradient signatures, we can identify which training songs share "influence DNA" with a generated output.
From raw signal to fair influence share — how we find who really taught the model what it used.
2Mathematical Foundations
Cross-Entropy Teacher Forcing
LUMINA uses teacher forcing with cross-entropy loss to extract gradient signatures. Given audio codes from EnCodec:
Chunked Processing
Audio is processed in 10-second chunks with gradients averaged across chunks:
Attribution via Cosine Similarity
Like spotting which teacher taught the exact method a student used on the test. Each training song leaves a unique gradient fingerprint — a record of how it shaped the model's weights. Cosine similarity measures how closely aligned two fingerprints are, revealing causal influence.
Kernel Regression (LUMINA-WTA Aligned)
To account for correlations between training songs, we use kernel regression:
Where K is the (N×D) training fingerprint matrix and λ=0.01 is the regularization parameter.
If two teachers taught the same lesson, they share the credit rather than both getting full marks. Kernel regression decorrelates overlapping training samples — when two songs taught similar patterns, the regularized inverse (KKᵀ + λI)⁻¹ attributes proportionally rather than double-counting.
3Statistical Confidence
In high-dimensional space (d=216 per channel), random vectors cluster near zero cosine similarity. The noise floor σ is calibrated empirically using 50 GTZAN control tracks (5 per genre, outside training data). Attribution requires signals significantly above this empirical baseline.
Songs must achieve ≥ 95% confidence (~1.65σ) to qualify for attribution.
Only the top performers make the finals. The noise floor is measured empirically from GTZAN control tracks — not assumed from theory. The confidence gate (≥ 95% via the error function) ensures we only attribute influence to songs whose signal is statistically significant — not random coincidence.
4Dual-Channel Attribution
LUMINA separates influence into two distinct rights channels:
| Channel | Gradient Source | Layers | Tensors (per layer) | Captures |
|---|---|---|---|---|
| Composition (P) | All gradients (upper) | 42–47 (upper) | self_attn.in/out, cross_attn.in/out, linear1, linear2 |
Melody, Harmony, Structure |
| Production (M) | All gradients (mid) | 12–17 (lower) | self_attn.in/out, cross_attn.in/out, linear1, linear2 |
Timbre, Texture, Sound Design |
Each tensor produces 6 summary statistics (mean, std, L2 norm, max, min, skew). Both channels use all 6 tensors but from different layer ranges. Channel P: 6 layers × 6 tensors × 6 stats = 216D. Channel M: 6 layers × 6 tensors × 6 stats = 216D. Total fingerprint: 432D.
6Validation
LUMINA has been validated against 10,000 generation cycles.
- Reproducibility: 100% deterministic — identical inputs always produce identical 216D fingerprints (P_L2 = 1.0000, M_L2 = 1.0000).
- Baseline Confidence: > 68% at 1σ qualification gate across 99-song corpus. Noise floor calibrated empirically via 50 GTZAN control tracks.
- Causal Link: Leave-one-out ablation testing confirms output similarity drops ≥ 2σ when top-attributed song is removed from training.
7Version History
Intellectual Property Notice
This document contains proprietary and confidential information belonging to Fold Artists Research. The methods, algorithms, and technical implementations described herein are protected intellectual property. Unauthorized reproduction, distribution, or disclosure of this document or its contents is strictly prohibited and may violate applicable trade secret and intellectual property laws.