Go back to blog

Engineering Backend

Pex by the numbers – 2023

David Southwell

In February, Pex celebrated it’s ninth anniversary and as a way to celebrate, we took a look at our accomplishments by the numbers. For roughly nine years our Discovery product has searched the Internet for audio and video content. Once it finds new content, we fingerprint that content in preparation for matching it to the assets our customers select for tracking. Over the years we’ve processed an incredible amount of content.

Given my role at the company, I’d love to say that our system was available and processing at full power for every one of those 283,860,000 seconds over the past 9 years, but I can’t. The early years were full of learning and growing which we did tremendously well, and every year since then we’ve had periods of renewed learning.

  • Searching for content at scale is hard.
  • Fingerprinting and matching content at scale is hard.
  • If it weren’t for our novel architecture, which is something I touch on later, all of this would be fiscally unfeasible.

Nonetheless, throughout all of the challenges we’ve managed to succeed and have some pretty impressive statistics which are worth celebrating.

11,016,020,147, roughly 11 Billion matches  

That’s how many matches that we identified for our customers’ tracked assets resulting in:

  • 11 Billion chances to learn where a rightsholder’s content is used.
  • 11 Billion chances to monetize, or take other action on owned content.
  • 11 Billion chances to get to know an audience aka customer better.

23,768,228,864 fingerprints  

That’s how many pieces of content we’ve fingerprinted to track our customers’ content. It’s interesting to break this number down a bit to put more context to it, our users’ preferences and the Internet more broadly.

Percent of fingerprinted content by platform

You may have heard or already know that YouTube is big and our data confirms that with nearly 40% of all the content we’ve fingerprinted originating there. Twitter, Instagram and Facebook also loom large in the data. It’s worth clarifying that we are incentivized to prioritize accessing certain sites over others as our users’ preferences dictate, so this is not an entirely unbiased assessment of the amount of content on each site relative to others.

Smartphones have given us a lot. One of the things they’ve given us is a shared context of just how important data storage is, especially in the context of multimedia files. As I’m sure even the technologically layperson will know by now, media files, like images, videos, and audios are larger than many other types. You might then wonder, how much storage we consume when processing 23 billion pieces of content?

337 TB of storage

The amount of storage used to contain 23 billion fingerprints.  337TB is a lot, but still a tiny fraction of the original files used to generate those fingerprints.  For some context, it would take approximately 146 years to play 337 TB of music encoded at 320kbps.

2.3 Trillion messages

I mentioned earlier that our unique architecture is what allows us to process this amount of data so quickly without going broke. Without going into all the details a key element of that architecture is a message queue. Over the years, we’ve used various different message queue technologies including: RabbitMQ, Google PubSub, and Apache Pulsar. If the architecture is a human body, a good way to think of the role our message queue plays is that of the circulatory system. Just like the human circulatory system our message queue pumps lots of information(blood) around.

 Here again, our numbers aren’t 100% accurate as our record keeping in the early days wasn’t as good as it is in more recent years. But, we estimate that we’ve processed 2.3 Trillion messages. 2.3 Trillion over 9 years, is roughly 75,000 per second. For some context, that’s 15x the amount of pieces of mail handled by the USPS every second.

Birthday’s, not unlike New Years Day, often encourage us to reflect on what’s been done and what lies ahead.  It’s interesting to look back on the past 12 months, but at the same time it’s often more enlightening to look further back and reflect on the entire corpus of experience. Remember, “Life moves pretty fast.  If you don’t stop and look around once and a while, you could miss it.” – Ferris Bueller from Ferris Bueller’s Day Off.  Happy 9th Birthday, Pex!

Interested in helping us scale? View our open roles.

Recent stories

See more Engineering Backend