pex_building_attribution_engine_research_and_development

Building Attribution Engine: How R&D fuels our groundbreaking products

WRITTEN BY Diana Pfeil

Feb 10, 2022

This is the second installment in a series of blogs that detail how and why we built our Attribution Engine. Missed the first blog on our product principles? Catch up here.

At Pex, we are committed to investing in research and development and building state-of-the-art content identification technology into our product. I find it incredibly exciting to be part of a company that not only develops the fastest and most accurate content identification in the market, but is also dedicated to continuously improving our algorithms so that we remain leading-edge. Our team of individuals with backgrounds in industry and academia is tuned into the latest developments in the research community, the biggest challenges faced by copyright owners, and the result is a groundbreaking product that we are always making even better.

We use our technology to identify uses of copyright online, so that the proper creators or companies can receive attribution and compensation for the use of their works. We believe in attribution for all, and we can’t deliver on that vision without technology capable of handling the complexities of digital content.

To better understand our technology and the kind of problems we solve through our Attribution Engine, we need to start at the beginning.

What is content identification?

Content identification, often called automated content recognition or ACR, is the process of scanning and matching pieces of content (which can include audio, video, images, text, or other variations) based on a reference file. Content identification isn’t a new technology – you may be familiar with it from the popular music app Shazam – but as new types of content have emerged, the amount of content on the internet has exploded. And as uploaders have learned to skate around common identification practices, it has become more and more difficult to accurately and quickly identify content online.

Advanced content identification

What makes content identification so challenging, and also so fun to work on?

With simple identifications, it’s quite straightforward to compare two identical pieces of content and determine if they are the same. As humans, we can just listen to two songs and determine if they are the same or not. But what if we want to answer more complex questions, such as:

Does this piece of audio match any known copyrighted work ever created, even if just for a few seconds?
Is this a distorted version of our resource file, where the volume has been changed, tempo has changed, or background noise has been added?

When trying to answer these complex questions, it becomes infeasible to compare content without technology. And we have to make sure that when we compare, we aren’t fooled by any modifications to the original, which uploaders often use to avoid having their content discovered. Then we need to think about scale: billions of pieces of content are uploaded daily. How can we identify content accurately, granularly, at scale, and invariant to distortions? These are the challenges that make it so fun to work at Pex and to work on our identification technology.

Pex’s technology

At Pex, we use digital fingerprinting to enable content identification. A fingerprint is a compact representation of a piece of content that allows us to robustly and efficiently match against other pieces of that same content type. Our fingerprinting tech powers our identification capabilities, which are the base of our products, including our Attribution Engine.

Attribution Engine relies on fingerprinting to process and identify content in mass and at the scale of the Internet. Our advancements in this technology allow us to identify content in real time, as it’s uploaded to content-sharing platforms, so we can fundamentally change the way content is attributed and published. Identifying, attributing, and licensing the use of copyrighted content before it’s shared online balances the creator economy by ensuring rights are respected, copyright use is compensated, and content is still able to flow freely.

Attribution Engine – identifying, attributing, and licensing in seconds

The need for this balance in the creator economy is why we continuously and obsessively improve our approach to fingerprinting and matching content. We are always iterating in order for our algorithms to:

Identify content faster and at enormous scale
Find more true positive matches to correctly identify as many matches as possible
Find fewer false positive matches, targeting zero because that is critical for copyright
Process more content types in addition to audio, video, and melody which we currently support

First-of-its-kind melody matching

We also prioritize research and development to solve major industry problems. We are incredibly excited about our melody matching system, which is aimed to address the challenge of attribution for music composition rightsholders and one of the major challenges in building a global rights database. Music composition rightsholders (songwriters and their publishing companies) receive lower royalty rates than those of recording artists or record labels. On top of this, they struggle to find uses of their copyright and claim them for additional revenue, and this is because of shortcomings in audio identification.

Audio matching for sound recordings is the bread and butter of the content identification world, but matching against only sound recordings leaves gaps in attribution. For example, composition rightsholders have no way to be attributed or compensated when someone uploads a cover song of their work, because the sound recordings are different.

To help solve this problem, we developed melody matching, which allows us to match two pieces of audio when they represent the same underlying composition, even when the sound recordings differ. Our melody matching can also help us link sound recordings with compositions in our database, which has been a sticking point for others who have attempted to create a global rights database like Pex’s Registry. Our system leverages our deep knowledge and experience with audio identification, together with neural networks and concepts from the forefront of the machine learning research community, to identify and match using the melody of a song.

We have developed trail-blazing algorithms for melody matching that will bring new revenue and proper attribution to an underserved part of the music industry. These are the research problems we love to work on and prioritize at Pex.

First-of-its-kind melody matching

The building blocks of Attribution Engine

Our identification technology is just one part of our Attribution Engine. While it is the foundation for all that we do, there are many other key modules that combine to create this all-in-one copyright solution for the creator economy. Stay tuned as we dive into more areas of product development and how we built this game-changing solution.

Want to work with us to solve complex problems? Check out open roles.

Vobile’s zero-day readiness is the new standard for AI music detection

Vobile’s industry-leading AI Song Detector can identify whether a song is AI-generated and determine which model, such as Suno or Udio, generated the track. Unlike competitors, our unique and cutting-edge architecture can detect AI music from new models on the day...

Vobile launches first AI-agent integration for AI-Generated Song Detection

Rightsholders and music industry professionals can now detect AI-generated songs using natural language prompts in agents like ChatGPT, Claude, and Gemini. SANTA CLARA, Calif., July 7, 2026 — Vobile, a worldwide leader in digital content protection and transaction...

New: Track branded hashtags to reduce social media music copyright risk

Key points Brands face severe legal and financial liabilities when influencers or affiliates use commercial music in posts that feature the company’s branded hashtags. A new “hashtag tracking” feature has been added to the Vobile music auditing...

Brands: Audit your social media posts against Meta and TikTok commercial libraries

One of the biggest pain points we hear from brands is the confusion surrounding platform-provided music. Just because a song is available on a platform, doesn’t mean a brand has the right to use it for a commercial campaign. Brands need to use the platform...

Our AI Song Detector confirms Sienna Rose is AI-generated music

You may have heard of Sienna Rose. She’s a verified artist on Spotify with over 4.2M monthly listeners and numerous soul music releases racking up 20M+ streams. However, upon further investigation, her music may be soulless. Suspicion is rising rapidly that Sienna...

Check if a song is AI generated with Vobile’s AI Song Detector, powered by Pex

Summary Streaming platforms are overwhelmed by AI-generated tracks and up to 70% of plays may be fraudulent. The music industry is adapting to AI, with some collection societies now accepting partially AI-generated works. The Vobile AI Song Detector (powered by...

How production music libraries can track licenses and catalog across social media

Social media content relies heavily on music, so creators, influencers, and even commercial brands have turned to production music libraries for help crafting engaging posts. For production libraries this means many individual accounts licensing music for different...

Vobile Launches AI Song Detector for Streaming Platforms, Distributors, and Collection Societies

SANTA CLARA, Calif., NOVEMBER 20, 2025 – Vobile, a worldwide leader in digital content protection and transaction services, today announced the release of its AI Song Detector, a real-time solution designed to help Digital Service Providers (DSPs), music...

New: Check any file to identify music, speech, or silent segments

We help rightsholders solve some pretty technical problems, like finding all versions of a song online even if it’s been sped up, chopped up, or mashed up. We get into the hairy details of cover versions and publisher splits. But, sometimes the problem plaguing...

Enhanced cover song identification: The best technology for publishers and songwriters

As rightsholders continually find new ways to identify music online, uploaders keep finding new ways to transform copyrighted content and evade detection. That’s why it’s important to adapt technology to solve ever-evolving issues, and why we’ve just released our...