pex_identifying_ai_generated_music_and_voices

Real or fake: Identifying AI-generated music and voices

WRITTEN BY Diana Pfeil

Jul 19, 2023

Though the use of Artificial Intelligence (AI) in music has been growing over many years, there has been a recent explosion of conversation around AI-generated music. The latest crop of AI-generated music that has dominated the conversation involves deepfakes, where the vocals in a musical work are swapped with the AI-generated voice of a known artist. There are a growing number of examples: Blur’s “Song 2” with vocals replaced by AI-generated Kurt Cobain, Billy Joel’s Piano Man sung by AI-generated Paul McCartney, and a Rickroll with Rick Astley’s voice replaced by AI-generated Michael Jackson, to name a few.

There are also new songs being generated by AI that use the vocals of popular artists. In April, Ghostwriter977 released his song Heart on My Sleeve with AI-generated Drake and The Weeknd performing the vocals, which gained millions of views before being removed. It’s clear that AI technology has reached a level of quality and accessibility that’s both interesting and frightening to many creators and music fans.

As these tracks are uploaded to platforms, the music industry is grappling with big existential questions, including how music will evolve as the capabilities of AI grows, whether this transformation will be a good thing or a bad thing, and how creators can make money in a changing industry. There are so many unknowns and opinions about AI-generated music: is it copyrightable? (Not in the U.S., as of now. But the boundaries of what constitutes AI-generated vs AI-assisted music is very murky.) Is it ethical? (Depends.) Can you successfully sue for name and likeness if your voice is used by AI? (We’ll see.)

While we wait to see what courts decide about training AI models with copyrighted content, creators are busy releasing AI-generated music. At Pex, we are constantly evolving our technology to keep up with new trends and identify the most challenging content. With the rise of AI-generated music, we’re using our tech to help artists identify uses of their voice and their works, and helping music companies identify AI songs that infringe on copyright.

Distinguishing between AI-generated and human-created music

There are three main types of AI-generated music right now:

Music created by humans using AI tools to assist the songwriting process
Music created by humans with AI-generated vocal swaps, and
Music that is entirely AI generated

In any of these scenarios, is it possible to determine whether the music was created by a human or AI? The music industry desperately wants the answer to be yes, but we’ll see below why the answer is “no, but”.

Historically, the very definition of AI has been linked to the Turing test. In this test, a human evaluator is asked to distinguish between an AI and another human after interacting with both and freely asking them the same questions. If the evaluator is unable to tell the human from the AI, the AI has passed the test. In other words, if an AI can create content (words, images, videos) that is indistinguishable from content a human would create, it’s considered ‘good’.

All of the recent public attention surrounding AI is a result of AI quickly becoming very good – it is increasingly hard to distinguish between human vs AI content. For example, just a couple of years ago, AI-generated artwork contained obvious giveaways that made it easily distinguishable from human-created works. An AI-generated portrait would have an extra finger (it was really hard to get AI to draw realistic hands!), or earrings would be mismatched (AI wasn’t great at keeping context across a larger area – it would “forget” that the left earring was green). These details are much less of a challenge now. As AI improves, the “signals” that could previously be used to distinguish that a work is AI generated, are trained out.

AI-generated music is undergoing this same phenomenon. Although it may be currently possible to distinguish between AI-generated and human-generated works based on small signals in the media, next month the creators of the AI models will improve them and those signals will no longer be present. It’ll be a game of cat and mouse, until the long term (which might arrive quickly!), when it will simply be impossible to determine the provenance of media (whether human- or AI-generated) based on the media alone.

How does AI technology continuously evolve to become indistinguishable from human-created works? One way this occurs is part of the training process itself (another way is increasingly larger and more powerful models, but we won’t get into that here). In the class of AI-music generators called GANs (generative adversarial networks), the generative model is trained by combining a generator with a discriminator. The job of the discriminator is to correctly determine whether a piece of music is human-generated (based on a training data set of known music) or created by the generator (the AI). Detectors make decisions based on signals, like an extra finger, mismatched earrings, or unusual patterns in pixels, in images. In music, these signals may be visual differences in the spectrogram corresponding to a vocal stem, or robotic-like vocals. As the detectors identify these signals, the generator is trained to avoid them, and AI becomes increasingly more human.

The good news is that we aren’t working with media in a vacuum, and music AI companies such as Pex have built technology that presents some options for rightsholders.

Identifying AI-generated music today

Automated Content Recognition (ACR) and Music Recognition Technology (MRT) can be used to identify uses of existing AI-generated music today. By creating a digital fingerprint of a known AI song and comparing it against fingerprints of other content – such as videos uploaded to social media or a registry of known works – new uses of an existing song can be identified. Pex uses these technologies and others to help rightsholders identify AI-generated music. Our solutions can recognize new uses of existing AI tracks, detect impersonations of artists, and help determine when music is likely to be AI-generated.

Pex Discovery

With Pex Discovery, artists or rightsholders can identify reuploads of a song or video across over 25 digital platforms, including YouTube, Facebook, Instagram, and TikTok. Discovery can be used to track reuploads of known AI-generated tracks, including tracks that impersonate artists. Unlike YouTube’s Content ID, Discovery is open to all, and it is not necessary to own a work to be able to track it. We tracked the viral AI song “Heart on My Sleeve” and have found nearly 4,000 matches.

Pex Search

Pex Search can identify AI-generated works that retain instrumental stems from the original work, even when the stem has been subject to modifications like speed or pitch changes, or the vocal stem has been changed (like Billy Joel’s Piano Man sung by AI-generated Paul McCartney). Search creates fingerprints of audio files and compares them to Pex’s music registry, which contains millions of registered works. Because our ACR technology can be used to identify music stems or their combinations, it will identify the instrumentals in Blur’s “Song 2” even when AI-generated Kurt Cobain has replaced the original vocal stem. The vocal stem can then be extracted and compared to the original recording to show it is not a match to Blur’s original vocals.

DSPs, music distributors, and social sharing platforms can leverage Search to determine if uploads contain AI-generated vocals of known artists. Search can even be integrated to identify content during the upload process, so infringing AI content could be blocked.

Voice identification

At Pex, we are developing voice identification technology trained on a corpus of known voices to identify the voice in a given sound recording. This technology can then be layered with Pex Search to identify that the voice in the example above is that of Kurt Cobain. Once compared to known recordings with Kurt Cobain’s voice, this is an indication that the track is likely to be AI-generated.

Identifying AI-generated music going forward

Because it will soon not be possible to distinguish AI-generated works from those that are human created, and because the lines between these will be increasingly blurred as musicians continue to use AI to assist the songwriting process, the best way forward in protecting artists’ rights is through content recognition technology.

New musical works being uploaded can be compared to an existing catalog of registered works, and this technology can then tag when there is a match to the audio (or audio stem), melody, or voice of known works and artists. Pex is at the forefront of identifying digital and AI-generated content. Want to see our tech in action? Reach out to schedule a demo.