Powering Digital Rights and Scalable Streaming: A Success Story with Pex and DataStax Astra Streaming
Ensuring fair and transparent use of copyrighted works in the fast-paced digital landscape is crucial. Pex enables copyright owners to protect and monetize their works across various user-generated content platforms. But what drives its success? How does Pex handle massive data volumes and scale its operations seamlessly? I recently spoke with our latest Digital Champion, Dave Southwell, director of infrastructure engineering at Pex. He shares insights on being a leader, Pex's vision, the role of data, and the benefits of partnering with DataStax.
1) Congrats on being named a Digital Champion! What does being a Digital Champion mean to you as a leader?
Being recognized as a Digital Champion is something I'm very honored to receive. I have to acknowledge that it's really been a team effort. We have been building a system to search the Internet for music and video content for the past nine years. So one way to think about it is that we are, like Google, searching for music and video content. This is something that many people said was not possible; the technology either didn't exist, or they didn't think it would scale well, or it wouldn't be cost-effective. But through the use of digital streaming message queues and related technologies, we were able to build a system that can scale in a cost-effective way.
2) What are the big challenges you are focused on addressing?
I am on the infrastructure side and have partners and collaborators on the pure development side. My main focus is on how I can take the solutions that we want to develop and operate them at scale cost-effectively.
We, like a lot of companies, leverage the cloud for our deployments. This makes a lot of sense when you’re building a product and need to make adjustments quickly as your customers provide input about what features they want and don’t want. One of the main trade-offs of deploying to a cloud provider is cost. It’s very important that we find novel ways to optimize our cloud spend, and we did that by making certain architectural decisions that enabled us to take advantage of ephemeral compute. A key element of this architecture is the use of a streaming data service in the form of a message queue.
4) What possibilities did you envision for Pex when you first began exploring this field, and how have they evolved over time?
The foundations of leveraging ephemeral computing, using some kind of an orchestration system, a message queue. We've used a few different message queues over the years. The foundations were there, and my vision was to see how far we could take it. How many other different kinds of workloads and products we might want to build? How many of those could we mold to that paradigm and execute in production, at scale, at a low cost?
5) How did you evaluate different streaming technologies before you settled on Pulsar?
Over the years, we evaluated various message queue and streaming systems to meet our growing needs. We used RabbitMQ but outgrew it in terms of scalability. Then we switched to Google’s Pub/Sub, which provided more flexibility but still had some limitations. When transitioning from Google to Microsoft Azure, we explored different solutions and came across Apache Pulsar. We were attracted by its ability to handle functions within the message queue. We started evaluating Pulsar on our own and eventually found Kesque, a managed service for Pulsar. After DataStax acquired Kesque [in 2021], we continued our partnership, leading to the adoption of Astra Streaming. The transition from self-managed Apache Pulsar to Astra Streaming has been highly successful, and we’re excited about the product's growth and the introduction of new features.
6) What are the key aspects or features of Astra Streaming that differentiate it and make it valuable for your organization?
Astra Streaming is essential. It is a foundational building block and piece of our infrastructure. If our message queue is down, our system is not processing. There might be a couple of systems that can continue on without it—but very, very few. So it is essential, and its uptime is essential to us.
On average, we process around 50,000 messages per second at Pex. These messages can range from a few bytes (small) to 8 to 10 megabytes (large). Handling such a wide range of message sizes efficiently is challenging; not many streaming solutions can do so. However, we closely collaborate with the team to fine-tune our client-side library and server-side configurations to ensure our system can handle this workload seamlessly. It's remarkable that we don't need to worry excessively about massive traffic spikes, like when we discover a substantial amount of new content. The system simply scales up to handle the load, and we continue to process the data and then scale down once the processing is completed.
Astra Streaming stands out in its ability to handle this load and provide seamless scalability. We have encountered situations where our workloads surprised even the DataStax team, but their support and responsiveness have been commendable. We strive to be proactive in notifying them of any potential scale-up in our workloads.
By embracing Astra Streaming and partnering with DataStax, Pex has been able to drive innovation while working to achieve its vision of attribution for all, and continue its mission of enabling fair and transparent use of copyrighted works on the internet.