CONFIDENTIAL

LUMINA Technical Paper

Gradient-Based Influence Attribution for AI Music Generators

Research Paper Version 8.0 • March 2026

1Introduction

When an AI music generator produces audio, rightsholders need answers to three critical questions:

Which training songs influenced the output?
How much did each song contribute?
How confident are we in these attributions?

💡 Core Insight

A model's gradients encode which parameters would change to better fit a sample. By comparing gradient signatures, we can identify which training songs share "influence DNA" with a generated output.

Attribution Pipeline

From raw signal to fair influence share — how we find who really taught the model what it used.

§2.3

Signature Matching

cos(g₁, g₂)

→

§2.4

Shared Credit

(KKᵀ+λI)⁻¹

→

§5.1

Unusualness

z=(s−μ)/σ

→

§3

Trust Gate

erf(z/√2)≥95%

§5.2

Excess Share

max(cos−τ, 0)

→

§5.3

Fair Share

excessᵢ/Σexcessⱼ

2Mathematical Foundations

Cross-Entropy Teacher Forcing

LUMINA uses teacher forcing with cross-entropy loss to extract gradient signatures. Given audio codes from EnCodec:

Loss Function L = CrossEntropy(logits, codes) = -Σ log P(code_t | code_<t)

Chunked Processing

Audio is processed in 10-second chunks with gradients averaged across chunks:

Gradient Averaging g = (1/N) × Σ ∇_θ L(chunk_i)

Attribution via Cosine Similarity

Cosine Similarity Score score = (g_output · g_song) / (‖g_output‖ × ‖g_song‖)

🎓 Signature Matching

Like spotting which teacher taught the exact method a student used on the test. Each training song leaves a unique gradient fingerprint — a record of how it shaped the model's weights. Cosine similarity measures how closely aligned two fingerprints are, revealing causal influence.

Kernel Regression (LUMINA-WTA Aligned)

To account for correlations between training songs, we use kernel regression:

Kernel Regression Formula scores = (K K^T + λI)^-1 K · g_output

Where K is the (N×D) training fingerprint matrix and λ=0.01 is the regularization parameter.

🎓 Shared Credit

If two teachers taught the same lesson, they share the credit rather than both getting full marks. Kernel regression decorrelates overlapping training samples — when two songs taught similar patterns, the regularized inverse (KKᵀ + λI)⁻¹ attributes proportionally rather than double-counting.

3Statistical Confidence

In high-dimensional space (d=216 per channel), random vectors cluster near zero cosine similarity. The noise floor σ is calibrated empirically using 50 GTZAN control tracks (5 per genre, outside training data). Attribution requires signals significantly above this empirical baseline.

Confidence Formula confidence(s) = erf(z / √2), where z = (s − μ) / σ

Songs must achieve ≥ 95% confidence (~1.65σ) to qualify for attribution.

🎓 Trust Gate

Only the top performers make the finals. The noise floor is measured empirically from GTZAN control tracks — not assumed from theory. The confidence gate (≥ 95% via the error function) ensures we only attribute influence to songs whose signal is statistically significant — not random coincidence.

4Dual-Channel Attribution

LUMINA separates influence into two distinct rights channels:

Attribution Flow

Gradient Extraction

↓

Publishing (Composition)

Master (Production)

Channel	Gradient Source	Layers	Tensors (per layer)	Captures
Composition (P)	All gradients (upper)	42–47 (upper)	`self_attn.in/out`, `cross_attn.in/out`, `linear1`, `linear2`	Melody, Harmony, Structure
Production (M)	All gradients (mid)	12–17 (lower)	`self_attn.in/out`, `cross_attn.in/out`, `linear1`, `linear2`	Timbre, Texture, Sound Design

Each tensor produces 6 summary statistics (mean, std, L2 norm, max, min, skew). Both channels use all 6 tensors but from different layer ranges. Channel P: 6 layers × 6 tensors × 6 stats = 216D. Channel M: 6 layers × 6 tensors × 6 stats = 216D. Total fingerprint: 432D.

5Share Allocation

Royalty splits are proportional to excess cosine similarity above the empirically calibrated threshold.

Z-Score

Raw cosine similarities are z-score normalized using the empirical noise floor:

Z-Score Normalization z = (cos_sim − μ_noise) / σ_noise

🎓 Unusualness

Like seeing who scored far above the class average. Z-score standardization measures how many standard deviations each song's similarity sits above the noise mean. A z-score of 2.33 means the song passes the 99% confidence threshold — statistically remarkable, not just noise.

Excess Above Threshold

Songs that exceed the cosine similarity threshold contribute their excess:

Excess excess_i = max(cos_sim_i − cos_threshold, 0)

Where cos_threshold = μ_noise + z_threshold × σ_noise is calibrated empirically from GTZAN control tracks.

🎓 Signal Above Noise

Only the similarity that exceeds the empirical noise floor counts. This excess measures genuine causal influence — not random correlations inherent in the model's latent space.

Proportional Allocation

Shares are distributed proportionally based on excess:

Share Formula share_i = excess_i / Σ(excess_j) for all qualified songs

🎓 Fair Share

Like cutting a pizza into fair slices based on contribution. Each qualified song's excess determines its slice size. The entire 100% is distributed proportionally — a song with 4× the excess gets 4× the royalty share.

🍕 Worked Example: The Pizza Story

Consider 250 training songs evaluated against a generated output (τ = 0.162, z ≥ 2.33 for 99% confidence):

Song	Cos Sim	Excess	Share
Song A	0.245	3.6σ	0.245 − 0.162 = 0.083	55.7%	🍕🍕🍕🍕
Song B	0.198	2.9σ	0.198 − 0.162 = 0.036	24.2%	🍕🍕
Song C	0.182	2.6σ	0.182 − 0.162 = 0.020	13.4%	🍕
Song D	0.172	2.5σ	0.172 − 0.162 = 0.010	6.7%	🤏
Songs E–N (246 songs)	< 0.162	< 2.33σ	0	0%	—

Small help is common. Exceptional help is rare — and that's who earns responsibility.

6Validation

LUMINA has been validated against 10,000 generation cycles.

Reproducibility: 100% deterministic — identical inputs always produce identical 216D fingerprints (P_L2 = 1.0000, M_L2 = 1.0000).
Baseline Confidence: > 68% at 1σ qualification gate across 99-song corpus. Noise floor calibrated empirically via 50 GTZAN control tracks.
Causal Link: Leave-one-out ablation testing confirms output similarity drops ≥ 2σ when top-attributed song is removed from training.

7Version History

v8.0 March 16, 2026

WTA fingerprint overhaul: 6 tensors × 6 stats × 6 layers = 216D per channel (432D total). Dual layer ranges (42-47 for P, 12-17 for M). Empirical noise calibration via GTZAN control corpus. Both channels now use all 6 tensors (self_attn, cross_attn, FFN).

v7.1 February 13, 2026

Added intuitive pipeline overview with teacher-student analogies and worked examples

v7.0 January 16, 2026

Added LUMINA-WTA alignment, kernel regression, dual-channel refinements

v6.0 December 20, 2025

Introduced excess-based share allocation, z-score thresholding

v5.0 November 15, 2025

Dual-channel separation (Publishing vs Master), confidence thresholds

v4.0 October 8, 2025

10-second chunked processing, gradient averaging

v3.0 September 1, 2025

Initial cross-entropy teacher forcing implementation

Intellectual Property Notice

This document contains proprietary and confidential information belonging to Fold Artists Research. The methods, algorithms, and technical implementations described herein are protected intellectual property. Unauthorized reproduction, distribution, or disclosure of this document or its contents is strictly prohibited and may violate applicable trade secret and intellectual property laws.