R&D spotlight: How Pex identifies singers and voices in AI-generated content

WRITTEN BY Jakub Galka

Mar 14, 2024

At Pex, we are always evolving our content identification technologies to stay ahead of trends and new challenges. As part of our Research and Development efforts, we have been honing our tech to identify AI-generated content, especially music and voices. With the rapid advancement of generative AI, voice cloning and swapping are already very popular among audio creators, and are becoming increasingly important issues for music artists and rightsholders.

Many of the songs using AI-generated voices to mimic famous singers are likely created through AI models trained on copyrighted content without proper licensing. This means the original artists aren’t getting credit or compensation for the use of their works or voices, and we’re on a mission to change that. Our company vision of attribution for all may be made harder by AI voice cloning, but by combining our latest Voice ID technology with our Automatic Content Recognition (ACR), we can continue to help copyright owners protect their IP, including their voices.

Watermarking and artifact detection are insufficient for identifying AI-generated content

The variety of different methods, approaches, new AI models, and tools are going to keep growing rapidly, making it difficult to clearly distinguish what is AI-generated content, and for identification technology to keep up with advancements. Currently, AI-generated content is most often detected using two different approaches: watermarking and artifact detection. However, neither is fully reliable.

Watermarking
The first approach relies on detection of added watermarks. During the AI-content generation process, either the AI model or AI-generation platform embeds imperceivable watermarks into the generated content itself. When the watermark characteristics are known, they can be detected by other technologies, and content can be identified as AI-generated. The main flaw of watermarking is that there are no known watermarking techniques that can successfully survive removal or modification attempts. Since watermarking is intended to be discoverable by watermark detection algorithms, such algorithms can also be used to show how to remove or modify the watermark embedded in audio so it is no longer discoverable. For this reason, watermarking detection cannot be considered a reliable method for identifying AI-generated content, especially when bad will or misinformation is intended by the creator or the publisher of such content.

Artifact detection
The second approach relies on the detection of AI-specific artifacts in the generated content. For example, in early image generation solutions, human hands were often misshapen or earrings would be mismatched. Such artifacts were easy to spot, but have since been trained out by advancements in AI. Direct AI detection tools that rely on finding specific artifacts and features of AI-generated content are also deep neural networks-based models (AI models in fact) which must be trained on huge amounts of both real and AI-generated examples. Often it is already too late to prevent the widespread use of AI-generated content because the remedy (detection tool) must be built using the content it is supposed to detect. So, artifact detection will always lag behind AI-generated content, and therefore is not a suitable solution for identification.

How Pex Voice ID and ACR technology solves today’s challenges

The flaws of watermarking and artifact detection make identifying AI-generated music a particularly difficult task. At Pex, we believe automatic content recognition (ACR) technologies are more accurate, scalable, and deception-proof methods for identifying AI-generated content. ACR works by comparing the qualities of one piece of content against those of another, and determining if there is a match. It does not look for artifacts or watermarks, but examines entire files to find overlapping qualities. By combining our audio, melody, phonetic, and now voice matching technology, Pex can identify AI-generated cover songs, known recordings with AI vocal swaps, and unreleased or new songs with AI-generated voice impersonations.

Identification of AI-generated music using Pex Voice ID and ACR technology

Pex’s Voice ID technology can identify singers by matching the biometric traits of their voices, whether they are singing, rapping, or speaking. Our voice identification technology is created to match both human and AI-generated voices against one another (singer matching), and to determine the identity of a voice in a given sound recording (singer identification).

Voice ID: Singer matching
Singer matching determines if one or multiple recordings have the same singer or singers, regardless of musical style or language. Singer matching can determine which segments of two or more songs contain the same singer, even if the voices match for as little as 10 seconds. The singer’s identity does not need to be known in advance and the model doesn’t need to be trained on any voice samples in order to identify matches.

For example, we compared two music videos from recording artist Lizzy McAlpine and determined that the voices were the same. This is the most straightforward of examples since both songs are original Lizzy McAlpine recordings. Although the actual identity of the singer was not known in advance, we can confidently say that the singer in video one is the same as the singer in video two.

Visualization of matching voice segments in video one and video two.

Just as we compared these two voices and determined they are the same, we can compare an original voice to an AI-generated voice and determine if they are the same. A great example of this would be the viral AI track: “Heart on my Sleeve”, which uses AI-generated vocals impersonating Drake and The Weeknd. Pex Voice ID matched Drake’s voice in “Hotline Bling” to the voice in the AI song, as well as The Weeknd’s voice in “Save Your Tears”with the voice in the AI song.

Pex_Voice_ID_Hotline Bling_Heart on my Sleeve

Pex_Voice_ID_Save Your Tears_Heart on my Sleeve

Voice ID: Singer identification
Singer identification determines the identity (i.e. the name) of a singer or singers in an audio file. To identify particular singers, their voice and identity must be part of a reference database which can be used for matching. Voice ID technology can then be used to extract the singer’s biometric vocal traits in the form of a digital fingerprint. No original audio material would be kept once the voice fingerprint was created, and registered fingerprints could only be used to identify singers, not to re-generate their voices. With the database of voice fingerprints, a known singer could be identified in any audio recording, including songs using AI-generated vocals.

For example, if we used singer identification on the recording of “The Boy is Mine” which features two recording artists, we would be able to identify which segments include Brandy vs Monica.

Using the same “Heart on my Sleeve” example above, we could compare the AI-generated song to a fingerprint of Drake’s voice biometrics and determine specifically that it’s Drake’s voice being used in “Heart on my Sleeve.”

Voice ID + Automated Content Recognition (ACR)
Voice ID alone is not capable of distinguishing between real and AI vocals, but we can find AI-generated music using Automated Content Recognition (ACR) technology in conjunction with Voice ID.

Pex_Voice_ID_Automated Content Recognition_ACR

Check out what we’ve identified using our audio, melody, and phonetic matching technologies along with Voice ID.

Identifying a known original song with an AI vocal swap

Pex’s Voice ID along with audio matching technology was able to identify Michael Jackson’s voice in this AI version of “Careless Whisper”, originally recorded by George Michael. By identifying this audio as “Careless Whisper” and separately identifying Michael Jackson’s voice, we can determine this song is using AI vocals.

Original recording

AI vocal swap

Identifying a novel/unreleased song with an AI-generated voice

Creators aren’t just swapping vocals on original songs, they are creating entirely new songs with AI and generating vocals in the style of well-known artists. We identified this AI song using vocals that impersonate popular artists Drake, Travis Scott, and 21 Savage. Each artist has a distinct vocal style represented at various points in the track, and our advanced technology is able to identify each voice and the specific spots in the song in which they are singing or rapping.

Pex_Voice_ID_AI-generated_voice_detection

Identifying a cover song with an AI-generated voice

We also used a combination of our Voice ID technology and ACR to identify this AI cover of Johnny Cash singing “Barbie Girl” by Aqua. The vocal styles and genres could not be more different here, but thanks to the underlying melody and lyrics, we were still able to find this cover. We then identified the voice as Johnny Cash to further determine that this song was created with AI.

AI Johnny Cash cover song

Original Aqua recording

Identify singers and AI-generated voices with Pex

Our Voice ID technology can match singing voices to their identities, even when there are multiple voices in a recording, or when the voice is AI-generated and is effective across various genres and languages.

Pex Voice ID is capable of:

Analyzing the voice, independent of lyrics, spoken words, or any textual information or metadata
Recognizing the exact same voice across different recordings, regardless of whether there is one or multiple voices
Matching different songs, rap performances, or speech segments performed by the same, even if previously unknown, individual
Identifying any previously registered singer in a given song
Matching real and AI-generated voices to help detect AI-generated music

Are you interested in Voice ID for your company or catalog? Reach out to our team to learn more and discuss how we can work together.