LUMINA Technical Due Diligence

216D × 2

Fingerprint (P + M)

Empirical

Noise Floor (GTZAN calibrated)

6 × 6

Tensors × Stats per Layer

~60s

E2E Attribution Time

System Design

LUMINA Architecture

End-to-end pipeline from audio generation to rightsholder attribution, leveraging gradient-based signatureing and dual-channel analysis.

🎵

Training Data

99 licensed songs

→

🧠

MusicGen

Transformer Model

→

📊

Gradient Capture

P/M Channels

→

⚖️

Attribution

Kernel Regression

Dual-Channel Analysis

Channel P & Channel M

Attribution is computed through two complementary signal pathways, each capturing different aspects of musical influence.

🎹

Channel P (Composition)

Source: All gradients from layers 42–47 (upper transformer).

Tensors: self_attn.in/out, cross_attn.in/out, linear1, linear2.

Captures: Melody, harmony, structure → 216D (6 layers × 6 tensors × 6 stats).

🎚️

Channel M (Production)

Source: All gradients from layers 12–17 (mid transformer).

Tensors: self_attn.in/out, cross_attn.in/out, linear1, linear2.

Captures: Timbre, texture, sound design → 216D (6 layers × 6 tensors × 6 stats).

Core Technology

LUMINA-WTA Gradient Extraction

The engine uses cross-entropy teacher forcing — computing how well MusicGen would predict existing audio tokens rather than generating new audio. This provides stable, reproducible influence signatures.

# LUMINA-WTA Core Algorithm (lumina-engine)
with torch.no_grad():
    codes, _ = compression_model.encode(audio_chunk)

# Teacher forcing: LM predicts codes from codes
lm_output = lm.compute_predictions(codes=codes, conditions=attrs)
logits, mask = lm_output.logits, lm_output.mask

# Cross-entropy loss with masking
loss = F.cross_entropy(logits.flatten(), codes.flatten())
loss = (loss * mask.flatten()).sum() / mask.sum()

# Backpropagate to extract gradients
loss.backward()

# Collect 6 stats per tensor per layer for each channel
# Channel P: all 6 tensors from layers 42-47
# Channel M: all 6 tensors from layers 12-17
stats = [mean, std, L2_norm, max, min, skew]

⏱️

10s Chunked Processing

Audio is split into 10-second windows. Gradients are accumulated and averaged across chunks. This provides temporal stability while fitting in ~11GB VRAM.

🧬

Why Teacher Forcing Works

Gradients encode how the model would change to better predict each sample. Songs with similar gradients share "influence DNA" — the model represents them internally the same way.

Statistical Basis

Significance Thresholds

Thresholds are derived from the expected cosine similarity distribution The noise floor σ is calibrated empirically using 50 GTZAN control tracks (5 per genre, outside training data).

Threshold	Sigma Level	Confidence	Meaning
< 1σ	< 1σ	< 68%	Indistinguishable from noise
≥ 1σ	≥ 1σ	≥ 68%	Qualified Influence
≥ 2σ	≥ 2σ	≥ 95%	High Confidence
≥ 3σ	≥ 3σ	≥ 99.7%	Definitive Proof

Causal Attribution

Share Allocation

Influence shares are computed from excess cosine similarity above the empirically calibrated threshold — no tanh, no potency scores, just proportional excess.

📊

Z-Score

z = (cos_sim − μ) / σ
Empirical mean and standard deviation from GTZAN control tracks.

🔥

Excess Share

excess = max(cos_sim − cos_threshold, 0)
share = excess / Σ(excess)

Mathematical Foundation

Why The σ Rules Apply

In 216D space, random unit vectors are nearly orthogonal. Their dot products follow a tight Gaussian distribution around 0 with σ calibrated empirically from GTZAN control tracks. This makes outlier detection robust.

Royalty Calculation

Attribution Share System

Songs with cos_sim ≥ cos_threshold (empirically calibrated) qualify. Shares are proportional to excess. Share_i = excess_i / Σ(excess).

Benchmarks

System Performance

On NVIDIA H100 SXM5 (80GB).

~24s

Generation

~80ms

Extraction

~1ms

Attribution

11GB

VRAM

Quality Assurance

Validation Measures

Rigorous safeguards implemented to ensure attribution accuracy, prevent false positives, and handle edge cases.

⚠️

Low-Energy Filter

Problem: Silent or low-volume segments (e.g., a capella breaks) can produce random high-variance gradients.

Solution: Audio segments with RMS energy below -50dB are strictly excluded from attribution.

✓

Causal Verification

Method: "Ablation Testing". We remove the top attributed song from training and regenerate.

Pass Condition: Output similarity to the removed song must drop by at least 2σ.

🔄

Reproducibility

Guarantee: 100% Deterministic.

Fixed seed 422024 ensures that the same audio always produces the exact same 216D fingerprint per channel, essential for legal audits.

⚡

Positive-Only Policy

Rule: Negative cosine similarity is ignored.

Rationale: "Anti-influence" (doing the opposite of a song) does not constitute copyright infringement or influence.