Identify more compositions and AI-generated voices with Pex phonetic matching

WRITTEN BY Jakub Galka
Nov 27, 2023

The top 100 songs on Spotify, Pandora, and Apple Music may have different genres, styles, and instrumentation, but they all have one thing in common: lyrics. Lyrics help songs top music charts and make them go viral on social media. But because of how often lyrics are changed or sung as part of cover versions, it can be difficult to find uses of a song’s lyrics in digital content. In a world driven by video and social sharing, the ability to monitor the use of lyrics for monetization online is particularly difficult for songwriters and can stifle royalty payments.

At Pex, we use AI-powered content recognition technology to track the use of music online. With our audio and melody matching technologies, we’re able to effectively track the use of recordings and compositions across platforms like YouTube, Facebook, TikTok, and Instagram. But to help more songwriters track the use of their copyrights, we need to go a step beyond audio and melody matching. 

  • Audio matching looks for recordings that match 1:1 with the original, even if it’s been sped up, sampled or the pitch has been changed.
  • Melody supplements audio to track a song’s distinct pattern of music, to find cover versions or recordings of live performances.
  • In many cases though, because two versions of one song can be so drastically different, the only part of a song that is really used by another creator will be the lyrics. If this is the only matching element between two recordings, then lyric tracking technology is needed.

Fortunately, Pex can track lyrics with phonetic matching, which is based on the vocal expression of lyrics (aka singing or speaking them). With phonetic matching, new recordings of lyrics can be traced back to their original recording, so that songwriters and lyricists can see where their compositions are being used, and receive credit for their work. 

Pex phonetic fingerprinting and matching 

Pex’s phonetic fingerprinting and matching enables the identification of two different audio files that contain the same underlying words, regardless of whether they are sung or spoken. Our technology analyzes the underlying phonetic content of the audio recording itself (both individual words and combined phrases) and creates a unique representation of the phonetics, independent of the unique vocal style and cadence being used by the singer or speaker. 

Unlike other solutions that rely on available song metadata (such as transcribed lyrics or text), Pex’s phonetic matcher is language-agnostic and produces highly accurate results across all regional languages and dialects. It is independent of the actual text transcriptions, vocabularies and language models required for text-based searches, which can only be applied to a closed set of known languages.

This advanced method of tracking lyrics ensures the most uses of compositions can be identified. Let’s look at some examples of challenging lyric changes that Pex is able to identify with phonetic matching.

Identifying lyrics sung in various – even fictional – languages

Pex’s phonetic matching technology matches content accurately in common languages like English and Spanish, but also in less widespread languages like Euskera and Suomi. Our technology can even identify matching lyrics in fictional languages like Elvish, created as part of The Lord of the Rings series, and Na’vi, from the movie Avatar. Because our phonetic matching focuses on verbal sounds and structures, there is no need to register or train a specific language before Pex can identify two songs using the same words.

“Pero Quererte Jamás” – Spanish language matches

We queried the live version of Spanish language song “Pero Quererte Jamás” by Javier Rosas and were able to identify a studio recording of the same song by Regino Aguilar. Because the instrumentation, melody, and voices are different between these two songs, neither audio or melody matching would have identified the Aguilar recording, but phonetic matching was able to based on just the lyrics. 

Query audio we searched against

Audio we matched

Because we’re matching against verbal sounds and structures, we can identify lyrics in any language, even fictional ones. In this example, we queried a cover version of “The Songcord”, which is an original song written and recorded for the film “Avatar: The Way of Water.” The song is sung in Na’vi, a language created entirely for the movie. We were able to identify another cover version of the song based on the lyrics, despite both being sung in this fictional language and by different artists.

Query audio we searched against

Audio we matched

Identifying lyrics despite different musical styles and genres

Pex phonetic matching can also match two recordings with the same underlying lyrics, even when the musical styles or genres are different. In this example, we queried a Reggae cover version of “Another Day in Paradise” and matched it to the original recording by Phil Collins, which is considered Pop/Rock. 

Query audio we searched against

Audio we matched

Despite being different genres, these two recordings are still quite similar. But since we are matching against lyrics, we can identify even very drastic variations in songs, like in the example below. The two songs below are much more different than they are similar. We queried a bluegrass version of the song “I Love to Tell the Story” and matched it to a version described as “authentic period music” from the American Civil War, which took place in the late 1800s. The bluegrass version is from a very different time period and was recorded in 2003. But because these songs use the same lyrics, we were able to match them. 

Query audio we searched against

Audio we matched

Identifying partial lyric matches

Pex’s phonetic matching process has a precise time resolution, enabling us to identify matching segments or sections of lyrics, even when the lyrics occur at completely different points in a song. This enables us to identify songs that share a verse or chorus, but have differing structures otherwise.

Take these two songs for example. The lyrics sung during the chorus appear numerous times throughout the song, at 0:15, 1:01, 1:39, and elsewhere. These same chorus lyrics appear in our second song, but make up the entirety of the short 44 second track. Additional non-matching lyrics exist within the first recording, and the structure of the two recordings are dramatically different with the second featuring additional layered vocalists and no instruments whatsoever.

Query audio we searched against

Audio we matched

Using phonetic matching to track AI-generated voices

Pex’s phonetic matching can help detect AI-generated voices, a growing pain point in the music industry. Nowadays, many cover songs are being created and distributed using AI models trained on famous artists’ voices. From Freddie Mercury covering Celine Dion’s My Heart Will Go On to a dramatic Adele cover by Spongebob Squarepants’ Patrick Star, these AI tracks are becoming more and more popular across all major digital service providers. Phonetic matching can help identify the use of the same lyrics in recordings that have been modified with AI-generated voices. 

When we queried the official audio for “Take On Me” by the band A-ha, we identified a match to an AI cover using Frank Sinatra’s voice. While the voices and styles may be different in these two renditions, we were able to match these tracks based on the lyrics. After that, it can easily be determined the vocals are AI-generated since we can confirm Sinatra never recorded this song.

Query audio we searched against

Audio we matched

Find more content with Pex phonetic matching

Many of the examples above wouldn’t be identified with just audio or melody matching technology because the recordings or musical structures are different. In order to find heavily modified uses like these, multiple approaches are required, which is why at Pex we use all three technologies – audio, melody, and phonetic matching – to find the most uses of music. Phonetic matching can find uses of compositions that rightsholders would never otherwise be able to find, and that can translate to increased revenue. While identifying music is only going to become more challenging, especially in the face of AI-generated content, there are many ways technology can help music rightsholders find uses of their works. We are always improving our technology and finding new ways to identify copyrighted content, even AI-generated music and voices.

If you’re a songwriter or music publisher, you’re probably missing out on royalties from unidentified uses of your compositions. Reach out to our team to learn more about phonetic matching and see what you’ve been missing.

Recent stories

Our AI Song Detector confirms Sienna Rose is AI-generated music

You may have heard of Sienna Rose. She’s a verified artist on Spotify with over 4.2M monthly listeners and numerous soul music releases racking up 20M+ streams. However, upon further investigation, her music may be soulless. Suspicion is rising rapidly that Sienna...

New: Check any file to identify music, speech, or silent segments

We help rightsholders solve some pretty technical problems, like finding all versions of a song online even if it’s been sped up, chopped up, or mashed up. We get into the hairy details of cover versions and publisher splits. But, sometimes the problem plaguing...

Vobile Completes Acquisition of Pex

SANTA CLARA, Calif., April 14, 2025  -- Vobile, a global leader in digital content protection and transaction services, today announced it has completed its acquisition of Pex, a leading technology provider of audio content identification. The acquisition enhances...