Our Methodology: How We Test AI Music Tools Against Real Distributors

Same protocol, every tool, every quarter. Tracks bought, submitted, scored. Here is exactly what we do, what we measure, and where the limits of our testing lie.

By Editorial team Updated 2026-05-20 Reading time 5 min Methodology How we test

Key takeaways

48 Suno tracks plus 12 Udio tracks generated on real paid subscriptions
Every track submitted to 6 major distributors under real artist accounts
Every paid tool purchased on our own card with our own receipts
Quarterly re-runs to catch classifier drift

Testing methodology diagram. Aurora gradient with testing-grid layout.

What this page is

The full methodology behind every test, ranking, and recommendation on this site. If you want to understand why we say what we say, this is the page.

We update this methodology when we change the protocol. Material changes are versioned at the bottom of this page in an update log.

Test design

Source material

Suno tracks. Generated on a real Suno Pro subscription during March and April 2026. Forty-eight tracks total, spanning these genres: indie folk (8), electronic (8), lo-fi (8), ambient (8), vocal-led pop (8), instrumental (8). Track lengths between 1:30 and 3:45. Each track generated as a single continuous output, not stitched from multiple generations.

Udio tracks. Generated on a real Udio Standard subscription during the same window. Twelve tracks total, weighted toward Udio's strength categories: vocal-led contemporary (4), R&B and soul (4), ambient and electronic (4).

Format. All tracks exported at the highest quality each platform's interface offered. For Suno that meant 16-bit 44.1 kHz WAV. For Udio, similar.

Storage. Source tracks stored locally and on encrypted backup. Each tracking event (generation, processing, submission) is timestamped and logged.

Variables we control

Input track. Same source file used for every tool's processing pass. No drift between tests.
Output format. Each tool's output is exported in the same format (WAV 16-bit 44.1 kHz) before submission.
Metadata. Identical metadata (artist name, track title, genre, year, ISRC handling) across every submission. We do not game metadata to influence screening outcomes.
Submission timing. Submissions go in batches within the same 24-hour window per round of testing. This reduces variance from distributor queue effects.
Account history. Each distributor account had clean history before testing began. No previous AI-related flags.

Variables we do not control

Distributor classifier updates. Distributors update their internal models on their own schedules. A tool that passes today may not pass next quarter. We acknowledge this by re-testing each quarter.
Distributor server-side errors. Occasional submission failures unrelated to AI screening (file format issues, metadata validation) are excluded from outcomes and re-submitted.
Network conditions. Submission upload completeness is verified before screening evaluation.

Tools tested

In the current iteration of the methodology, we evaluate five tools that claim to address AI music artifact removal or detection. We name them publicly even when results are unfavorable.

Undetectr. Commercial fingerprint removal tool. Lifetime tier purchased on our card.
SongSubmit's processing pipeline. Subscription purchased on our card.
AI-Music-Cleaner. Subscription purchased on our card.
DIY processing (Audacity). Free open-source DAW. We attempt manual frequency-domain editing per published technical writeups.
No processing (control). Raw export submitted as-is.

We add tools to the test set as they enter the market. We retire tools that shut down or stop responding. Both events are documented in the update log below.

Tools that asked to be evaluated and that we evaluated: we accept submissions from tool vendors. Vendors do not get advance access to results, do not get to revise their tool before re-testing, and do not get input into how we describe outcomes.

Distributors tested

The six distributors that cover the bulk of independent artists shipping to Spotify, Apple Music, YouTube Music, and Tidal:

DistroKid (annual subscription, USD pricing)
TuneCore (subscription, USD)
CD Baby (per-release fee, USD)
Amuse (free tier with revenue share + paid tier)
Ditto (annual fee, GBP)
RouteNote (free tier with revenue share + paid tier)

Each was tested under a real artist account with no special arrangement, no comp service, and no insider contact. We are ordinary paying customers from each distributor's perspective.

Outcomes we measure

Primary outcome. Did the track ship to streaming platforms after distributor screening, yes or no.

Secondary outcomes.

Time from submission to outcome (minutes to hours)
Any feedback the distributor surfaces in the rejection email
Whether the rejection consumed an upload credit or not
Whether the rejection affected account standing

What we do not measure.

Streaming royalty performance after release (that depends on marketing, not on the tool)
Listener subjective preference between processed and unprocessed audio (no formal A/B; we report informal blind-test impressions but do not present them as test results)
Other platform-level questions (Content ID on YouTube, store-front placement) beyond the screening event

Affiliate disclosure as a methodological commitment

We earn affiliate commissions on signups to tools we recommend. The dominant relationship is with Undetectr, which placed first in this round of testing.

Affiliate revenue does not enter our methodology. The protocol above runs identically whether or not the tested tool has an affiliate program. A tool that fails the protocol stays failed. A tool that joins an affiliate program after testing does not improve in our ranking unless its tested performance also improves.

We publish the order of operations as a commitment: testing first, ranking second, affiliate relationship third. We will publicly correct any case where this order is breached.

How we update the site

Quarterly cycle, plus event-driven out-of-cycle updates.

Quarterly: Re-run the protocol against all currently-evaluated tools. Update tool rankings, distributor outcome tables, and "as of" dates on pages. Refresh the update log below.

Event-driven: Pricing changes from any vendor, lawsuit rulings affecting subscribers, distributor policy announcements, new tools launching, tools shutting down, classifier updates we detect via routine spot checks.

Limits of our testing

Sample size is fixed at 48 + 12. Not a research-grade dataset. Sufficient to distinguish robust tools from broken ones.
Two AI generators. Suno and Udio. Other generators (Riffusion, MusicGen, ElevenLabs Music) are tested less rigorously and noted in their own pages.
Six distributors. Smaller distributors and regional services are not tested. Outcomes for those may differ.
English-language genre tags. We have not tested Asian-market distributors, regional language metadata, or non-Latin script handling.
Single-account testing. We submit from one account per distributor. Account-level variation in classifier behavior would not be visible in our data.

Contact

For tool submissions, corrections, methodology questions, or to flag a divergence between our published results and current behavior: editorial@sunowatermarkremover.com.

We read every message. Replies prioritize correction requests, then methodology questions, then tool submissions.

Update log

2026 Q2 (current): Baseline methodology established. 48 Suno + 12 Udio tracks across 6 distributors. Five tools tested.

This page is itself part of the editorial record. Changes to methodology are tracked here and the version dates on individual pages reference this log.

Frequently asked questions

How many tracks did you test?

48 Suno tracks and 12 Udio tracks across multiple genres. Genres included indie folk, electronic, lo-fi, ambient, vocal-led pop, and instrumental. Track lengths ranged from 1:30 to 3:45.

Which distributors did you test against?

DistroKid, TuneCore, CD Baby, Amuse, Ditto, and RouteNote. These are the six largest by independent-artist market share in 2026 and cover the streaming platforms that matter for royalty collection.

How do you generate the test tracks?

On real paid Suno Pro and Udio Standard subscriptions. We submitted prompts across genre tags supported by each platform. We exported at the highest quality the platform's interface offered.

How do you submit to distributors?

Through real artist accounts on each distributor. Same metadata schema each time: real artist name, real track titles, accurate genre tags. No metadata gaming or distributor-policy violations beyond the question of AI screening itself.

What do you measure?

The binary outcome: did the distributor accept the upload and route it to streaming platforms, or did it reject the upload? Time to outcome (how fast the screening fires). Any classifier feedback the distributor surfaces in the rejection email.

How often do you re-test?

Quarterly minimum. Distributors update their classifiers regularly. A tool that passed in March may not pass in June if the classifier was retuned. We document drift in our update log.

Are your results reproducible?

By any individual artist running the same workflow, yes. The classifier outputs are deterministic per-input. The only variation comes from classifier updates between our test runs and yours.

What are the limits of your testing?

Six distributors and two AI generators is not exhaustive. We do not test every smaller distributor, every regional service, or every AI generator in existence. We focus on the platforms and tools most independent artists actually use.

Ready to release your Suno tracks?

Undetectr was the only tool that passed every distributor in our testing. Clean your first track in under 60 seconds.

Try Undetectr → Read our full review