NETINT Technologies about scalable distribution in the age of DRM: Key Challenges and Implications
Welcome to Voices of Video. I'm Jan Ozer. This is where we explore critical streaming-related topics with experts in the field. If you're watching and have questions, please post them as a comment on whichever platform you're watching, and we'll answer live if time permits. Today's episode is all about distribution at scale. Alex Zambelli is technical product manager for video platforms at Warner Brothers Discovery is our guest. I've known Alex at least 15 years,
going back to his history with Microsoft, and we'll start there, where he was a codec evangelist and a producer of events like Olympics and NFL Football. We'll hear about some of his experiences there. Then we'll walk through the various points in his career where he got to Warner Brothers. These are a lot of stops that are worth chatting about. Then I'm known, I think, as a codec theorist. I do a lot of testing and I render conclusions. That's
useful in a lot of ways, or at least I hope it is, but it's not real world. Alex just has a ton of real-world experience that he's going to share with us today, things as high level as where the industry needs to go to to make it simpler for publishers like Warner Brothers to focus on content as opposed to compatibility, and issues as deep diving as what's his percentage of VBR? Is it 200% constrained VBR, 300% constrained BR? Particular to what I'm interested in, when does a company like Warner Brothers look at adopting a new codec? I think Alex is going to talk about the decision that they're in the process of making, which is whether to integrate AV1. So Alex just has a ton of real-world experience in live event production at huge scales, as well as premium content encoding and delivery with some of the biggest names in the industry. So I'm way excited to have Alex joining us today. Alex, thanks for being here. Jan, thank you so much for having me. Real pleasure. I'm looking forward to the next hour talking to you. Yeah. We don't get a chance to do this that often. Let's dive in. I'm not intimately familiar with your CV. Did you start in streaming at Microsoft,
or was there a stop before that? I did start my career at Microsoft. So that was my very first job out of college, actually. So this was back in 2002. I started out as a software tester. So I started as a software test engineer in Windows Media Player. I worked on both Windows Media Player and then the codec team at Microsoft as a software tester for about five years. And so, it was during that second phase of my software testing role there working on the codecs where I started working with the VC-1 codec, which at the time was a new codec for Microsoft in the sense that it was the first codec that Microsoft had standardized. So there was a codec called Windows Media Video 9, WMV 9,
and Microsoft took that through SMPTE to basically get it standardized. And so, that became VC-1. Some folks may recall that that was basically one of the required codecs for both HD DVD and Blu-ray at the time. And so, that's what put it on the map. And so, during that time where I was testing the VC-1 encoder, I started interacting a lot with Microsoft's external customers and partners. And so, that then transitioned me into my next job at Microsoft, which was technical evangelism. So I ended up doing technical evangelism for VC-1 for a few years. Then my scope brought in to include really all Microsoft media technologies that were at the time available and could be used for building large online streaming solutions. And so, when I started at Microsoft working
in digital media, I mean in 2002, it was still mostly dominated by physical media. So we're still talking about CDs, DVDs, Blu-rays. By the time I transitioned into this technical evangelism job, which was around 2007 or so, streaming was really starting to pick up steam. And so,
from that point on, really to this day, my career has been focused on streaming, really, because that has become the dominant method of distribution for digital media. And so, I mentioned that starting around 2007 or so, I started doing technical evangelism for a whole bunch of different Microsoft media technologies. So at the time, Silverlight was a technology Microsoft was developing. That was a competitor to Flash. And so, it was seen as a solution for building rich webpages, because everything was still primarily online through websites and browsers at the time. Mobile applications haven't even started picking up yet.
And so, really the primary way of delivering stream media at the time was through the browser, and this is where Silverlight came in. It was a plugin that allowed both rich web experiences to be built, but also really great premium media experiences as well. And so, that included even things like digital rights management, so using PlayReady DRM to protect the content and so on.
How did that transition to actual production in your work at the Olympics and with the NFL? Yeah. So at the time, Microsoft was partnering with NBC Sports on several projects. The first one that I was involved with was the 2008 Olympics in Beijing. And so, NBC Sports had the broadcast rights to the Olympics, still does, and they wanted to basically put all of the Olympics content online for essentially any NBC Sports subscriber to be able to access.
That was, I think, a first, where that was really the first attempt to put all of Olympics streaming online. So up until that point, if you wanted to watch an event, you had to wait for it to be broadcast on either your local NBC station or one of the cable channels. And so, if it wasn't broadcast in live linear, you can never see it. It wasn't available. And so, NBC Sports had the idea to put all of that content online. So the very first version of the NBC Olympics site that we built in 2008 was still using Windows Media for livestreaming, but was starting to use Silverlight in what at the time was actually the very first prototype implementation of adaptive streaming at Microsoft to do on demand. Then the next project we did with NBC Sports in 2009 was supporting Sunday Night Football. For that, we built a fully adaptive streaming-based website. So that was the origins of Microsoft's
Smooth Streaming technology. So Microsoft had taken that prototype that was built during the 2008 Olympics and essentially productized that into Smooth Streaming. So we had both live streams in HD, which was, again breakthrough at the time to be able to do HD at scale. Now just we take it for granted. But in 2009, that was really seen as a big deal.
Then 2010 Vancouver Olympics, that's when, really, we went full-on Smooth Streaming. Everything was basically available on demand and live in Smooth Streaming. So, yeah, those are some really, I would say, groundbreaking events that we did. We ended up being nominated for a few sports Emmys, technical Emmys at the time. I don't remember which years we
won or didn't win, but, yeah, to get recognized by the industry is also pushing the envelope. I'm remembering ... And I don't want to mix you up with another technology, but I'm remembering either Monday or Sunday Night Football with a player that had four different views that you could page through. Was that you guys? Yup. That was us. Yeah, that was us. Yup, that was Sunday Night Football. So, yeah, we had basically ... You could watch multiple camera
angles simultaneously. One of the cool things about that is that we used Smooth Streaming to do that where it was actually a single manifest that had all four camera angles in the same manifest. And so, switching between the camera angles was completely seamless because it was similar to switching bitrates the way you do in DASH or HLS today. So it was a very cool solution that ... Actually, I don't think we've even rebuilt it since then. It was a feature that we developed in 2009 and then lost to history.
Did you actually go to the Olympics or were you working back in the plumbing in Redmond? We were on the backend side of it. So I did get a chance to go to one Olympic event at the Vancouver Olympics since they were close to Seattle where I live. But other than that, yeah, we spent most of those projects in windowless rooms in data centers, mostly in Redmond, sometime in Las Vegas, because we were working closely with iStreamPlanet at the time as well, who were based out of Las Vegas. Spent a lot of time in New York as well at 30 Rock, because NBC Sports was still at the 30 Rock location at the time. So, yeah, it was a fun time. What were the big takeaways? If you met somebody on a plane and they ask, "Gosh, I'm doing a livestreaming event that's huge. What did you learn from the Olympics? What are
the high-level things that you took away from that that you've implemented throughout your career?" One of the, perhaps, obvious takeaways was that livestreaming is hard in that it's not on demand. Everything you know about on-demand streaming, you have to throw that out the window when you start working on livestreaming because you're dealing with very different issues, you're dealing with real-time issues. And so, even something as simple as packets getting lost on the way from your origin encoder to your distribution encoder, and dealing with packet loss and then dealing with segment loss on the publishing side and figuring out how do you handle that and handling blackouts and ad insertions. And so, everything's under a lot more pressure, because if you are doing on-demand streaming and if there's something wrong with the content, if there's something wrong with the origin or any part of your delivery chain, you have a little bit of leeway in that you've got time to address it, and hopefully you'll address it very quickly. But if the content goes down for a few hours, it's fine. People will come back later. Whereas with live, you don't have that luxury. You really have to be on top of it. And so, my memory of it is that every time we were doing these events, it was all hands on deck. I mean we had everyone from Microsoft to NBC, to Akamai, to iStreamPlanet. All the
different companies are involved in these projects. We would just have everyone on calls ready to go fix whatever needed to be fixed in real time because that was the nature of it. So that was a big learning lesson there was that live is not on demand. You have to really give it a lot more focus, give it a lot more attention than you would necessarily to on demand.
Does live ever get easy? I mean at even events like what we're doing today, it seems like there's always something that breaks or there's always the potential for it. You never feel comfortable with it. I think that's a great way to describe it. It's just you're never comfortable because, yeah, something could go wrong, and then you can't just say, "Well, we'll fix it sometime in the next 24 hours." You have to fix it right now. And so, it's like, yeah, if our Zoom link went down right now, we'd be in trouble, right? No backup for that. So you jumped from the frying pan into the fire. I think your next stop was iStreamPlanet, where you're doing live events all the time. So tell us about that.
At the very end of 2012, I left Microsoft and I joined iStreamPlanet. iStreamPlanet, for those not familiar with the company, so that was a startup out of Las Vegas, started by Mio Babic. They built a reputation for themselves as being a premium live event streaming provider. At the time, they wanted to get into live linear and they wanted to also start building their own technology. And so, 2012 was when Mio started a software engineering team in Redmond. And so, the next year, I joined that software engineering team. What I worked on was the very first live encoder that was built in-house that I supplanted.
And so, one of the ideas at the time was to build it all on commodity hardware. So, again, something that we now take for granted because now we're accustomed to things running in the cloud. And so, we assumed that, yeah, of course you can go spin up a live encoder in the cloud and it's running on just commodity hardware that's there. But 2012, 2013, that was not the case. It was mostly hardware-based encoders that you have to actually put in a data center and maintain. And so, the idea that Mio had was like let's
run it on commodity hardware. Let's build a cloud-based live encoder. And so, I worked in that product for about four, four and a half years. 2015, if my memory serves me correctly, I think it was 2015 or 2016, iStreamPlanet got acquired by Turner, and Turner was part of WarnerMedia. And so, iStreamPlanet became the subsidiary of WarnerMedia. And so, that was a pretty nice ending to that story as well. Real briefly if you can, I'm trying to ... So
we had Silverlight here and then we had Flash here, and somehow we ended up with both of those going away. I guess it was the whole HTML5 thing, and that brought HLS and ... Smooth is in there. But when did you transition from VC-1 to 264 and how did that work? When Silverlight launched, originally the only video codec it supported was VC-1, and then I think it was third or fourth version of Silverlight- That's right, yeah. ... where H.264 support was added. I think Flash added it around the same time. I think it was literally one month after another. So the challenge with basically building any streaming solution in HTML around that time, so, again, going back to 2007, 2008 timeframe, the challenge was that HTML was just not ready.
There was basically no APIs in HTML that would allow you to do streaming with the level of control that that was needed. And so, there were some workarounds where, for example, Apple went and ... When they came out with HLS as their streaming protocol, they baked it into the Safari browser. And so, if you use the video tag in HTML in Safari,
you could basically just point it at an M3U8 playlist and it would just work. But that was an exception rather than the rule. I mean most other browser implementations, whether it was Chrome or Firefox, or Internet Explorer at the time, did not do that. And so, there was this
challenge of, well, how do you stream? And so, what basically Flash and Silverlight, I think, brought to the table at that time was an opportunity to really leapfrog HTML to basically just advance it, even if it was a proprietary plugin, but advance the technology to a point where it was usable. And so, one of the innovations that Silverlight brought was the concept of a media stream source, which today now exists in HTML. So when you go build a solution in HTML today that's a streaming solution, you're using the media source extensions and the encrypt media extensions portions of the HTML spec. At the time, that was not yet an HTML5. So Silverlight had that approach of, well, we're not going to bake in any particular stream protocol into the plugin. We're going to basically open up an API that allows you to go handle your own downloading of segments and parsing of segments, and then you essentially just pass those video and audio streams into a media buffer and then the plugin goes and decodes and renders that and handles the rest.
Then another crucial part, I think, of what Silverlight brought to the table was DRM, because that was something that, again, HTML just didn't have a good solution for content protection. The reality of the industry that we work in is that if you want to provide premium content to audiences, you have to protect it. Generally, content owners, studios will not let you go stream their content just in the clear. And so, it was a big deal that Silverlight could
both enable streaming but also enable content protection of the content. Then Flash ended up doing the same with Flash DRM, Adobe DRM as well. And so, around I think it was 2012, 2011, if I remember, where both Silverlight and Flash went away and were replaced by HTML. It was because by that point, HTML had matured enough where that was feasible. There were still some growing pains there. I remember there was a period where it was like we were neither here nor there. But by, I would say, 2014, 2015, HTML5 had all the needed
APIs to enable basic stuff like implementing DASH and HLS and Smooth Streaming in the browser and protecting it with DRM. So that's where we are today and, yeah, it took a while to get there. Real quickly, what do you do at WarnerMedia? So I'm hearing when ... Were you a programmer or were you a live video producer? You started testing, which is ... So what's your skillset? So I mentioned that earlier when I started my career, I started in engineering, and then transitioned to technical evangelism. By the time that I moved over to iStreamPlanet, so my job at that point became product management. And so, I've been a product manager since then, so for the past 10 years. So after iStream, I went to Hulu,
and I was a product manager for the video platform of Hulu for five years. Then my most recent job, so for the past two years, I've been at Warner Brothers Discovery, also product managing the video platform here as well. So what my responsibilities are as a product manager is I focus on the video platform itself. Specifically today, I focus on mostly transcoding, packaging. So for the most recent launch of Max, which is the new service that combines Discovery+ and HBO Max, that just launched last week. So I was the product manager for the VOD transcoding and packaging platform there. And so, that involved essentially defining the requirements of what are the different codecs and formats we need to support, what the workflows should look like, how do we get content in from the media supply chain, what are all the different permutations or formats we need to produce, what kind of signaling needs to be in the manifest so the players would be able to distinguish between HDR and SDR. So all those types of technical details, those are part of my job.
Let's do a speed round of some technical encoding issues that ... Though your answers are ... You're a pyramid expert. Where are you on encoding cost versus quality? That would translate to are you using the placebo or the very slow preset? I don't know if you use x.264, but do you use that to get the best possible quality for bitrate irrespective of encoding cost, or do you do something in the middle? I'm sure you're not in the ultra-fast category.
But real quick, where are you in that analysis? So, yeah, we currently do use x264 and x265 for a lot of transcoding at Warner Brothers Discovery. So we typically use either the slow or slower presets for those encoders. Though one of the things we have been discussing recently is that we perhaps shouldn't necessarily use the same preset across all bitrates or even across all content. And so, that's an idea that we've been exploring where if you look at your typical encoding ladder, you've got, let's say, 1080p or 2160p at the top. But at the bottom of your ladder, you'll have 320 by 180. 360, yeah.
You might have a 640 by 360. And so, then the question becomes, well, why use the same preset for both those resolutions? Because x264, very slow, is going to take a lot less time on your 640 by 360 resolution than on your 1080p resolution. And so, that's one of the ideas that we've been looking at is like, okay, we should probably apply different presets for different resolutions, different complexities. Then not all content is necessarily the same in the sense that it's not equally complex. So perhaps not everything requires the very slow
preset. Then not all content is equally popular. If there's a particular piece of content that's watched by 20 million viewers versus something that's watched by 10,000 viewers, the one that's watched by 20 million probably should get the more complex preset, the slower preset, because whatever extra compute you spend on that is going to be worth it, because it'll hopefully translate to some CDN savings on the other side. So, yeah, so hopefully that answers your question. You talked about X.265, that's HEVC. When did you add that and why, or were you even there? Did Warner add it before you got there? Yeah. So HBO Max had already been using HEVC. So we obviously continued using it for Max as well. On the Discovery+ side, we had been using HEVC for some 4K content, but there
wasn't a lot of it. And so, it was really mostly all H.264 on the Discovery+ side. But with Max, we are using obviously H.264 still and we are using HEVC as well for both SDR and HDR content. Okay. And so, right now, for example, if you go play something on Max, on most devices, it's actually going to playback in HEVC. So even if it's SDR, it will be 10-bit HEVC. Then obviously if it's HDR, it'll definitely be HEVC. How many encoding ladders do you have for a typical piece of content? So the way we define ... And when you say how many encoding ladders,
you mean different variations of encoding ladders, or do you mean steps within the ladder? Different variations of encoding ladders. Literally looking at the spreadsheet right now, and I think it's about six or eight different variations right now. And so, what we've tried to do is build an encoding ladder where, depending on the source resolution, we don't have to necessarily have different permutations of the ladders. And so, we have a UHD ladder where, depending on what the source resolution is, that determines where you stop in that ladder, but doesn't change the ladder necessarily itself.
Where the permutations come in is things like frame rates. So if the source is 25p or 30p or 24p, that's going to go and use a different ladder than if the source is 50p or 60p, because that is one of the things we've done for Max that wasn't supported before, for example, is high frame rates. So previously everything was capped at 30 FPS. Most of that was due to the fact that there wasn't really a lot of source content on HBO Max, for example, that required more than 30 FPS. But now that the content libraries of Discovery+ and HBO Max are combined, there's a lot more reality TV on the Discovery+ side. A lot of
that is shot at 50 FPS if it's abroad or 60 FPS if it's US. And so, we wanted to preserve that temporal resolution as much as possible. And so, we've started to support high frame rates as well. And so, we have different encoding ladders for different frame rates. Then, of course, there's different encoding ladders for SDR versus HDR. Even within HDR, we have different encoding ladders for HDR10 versus Dolby Vision 5, for example.
What about for different devices? So if I'm watching on my smart TV and then I transition to my smartphone, am I seeing the same ladder, or do you have different ladders for different devices? At this moment, they're the same ladders for all the devices. We might deliver different subsets of the ladder for certain devices, but that's typically capping on the high end of the ladder. So if, for example, some device cannot handle 60 FPS or if it cannot handle resolutions above 1080p, for example, then we might intentionally cap the manifest itself that we're delivering to that device. But in terms of different bitrates and different encodings, we're not differentiating it yet between different devices. So I'll give you my personal take on that question, which is that in most cases it's not really necessary, in my opinion, to have different encoding ladders for different devices, because your 1080p should look great no matter whether you're watching it on an iPhone or Apple TV. And so, having two different 1080p encodes doesn't necessarily make sense.
I've definitely heard people say, well, perhaps on the lower end of the bitrate ladder, where you have your lower bitrates, lower resolutions, that's where you need to have differentiation. But, again, in my opinion, there's no harm in delivering 100, 200 kilobit per second bitrates in a manifest to a smart TV because most likely it's never going to play it. And so, you can put it in the manifest. You can deliver it to the TV or to the streaming stick. In a vast majority of cases, it's never even going to touch that variant. It's just going to skip right over it, go straight for the HD and the UHD. The only times you might ever see that low bitrate is if something catastrophic happens to your network and really the player struggle so badly, it needs to drop down to that level.
What's your VBR maximum rate on a percentage basis? So when we started out, it was CBR. So your max was 100% of your target. Where are you now with your VBR for your premium content? So we've taken an approach with x264 and x265 of relying primarily on the CRF rate control, but it's a CRF rate control that uses a bitrate and a buffer cap. So when you're writing your command line in FFmpeg, you can set the CRF target, but you can also specify a VBV buffer size and a VBV max rate. Right.
And so, we are doing that. The reason behind that is we want to make sure that we're controlling essentially the codec level at each resolution and each bitrate and that the peak's also constrained that way. I can give you an example where if it's something like, let's say, HEVC and it's 1080p, you might want to stay at codec level 4 rather than codec level 4.1, because 4.1 might ... Or that one actually maybe is not as big of a deal. But, for example, what if you're choosing between level 5 and level 5.1, there are certain devices that might not support 5.1,
for example. And so, in order to stay under codec level 5 for HEVC, you have to stay under a certain buffer size. And so, that's what ends up driving a lot of the actual caps that we set. Circling back, I mean CRF gives you a measure of per-title encoding as well. So is that intentional? Yeah. That's part of it, yeah, is that with CRF, really when you specify your VBV max rate, you're just specifying your highest average bitrate, really, for the video. And so, as long as you're comfortable with that max rate, then you can also
count on CRF probably bringing your average bitrate below that max rate most of the time. And so, if we set, for example, 10,000 kilobits per second as the max rate, most of the time the CRF target is really going to bring in that average bitrate much lower, around five or six megabits. And so, that is a way of getting per-title encoding in a way and achieving CDN savings without sacrificing quality, because depending on the complexity of your content, it's either going to be way below your max rate or it's going to hit against the max rate. Then at least you're capping the highest possible bitrate that you'll have for that video. That's a pretty creative way to do it. What's the impact of DRM on encoding ladder,
if anything? So I know there's a difference between hardware and software DRM and there are some limitations on content you can distribute with software-based DRM. So can you encapsulate ... We're a bit running short of time, but can you encapsulate that in a minute or two? The way most of the content licensing agreements are structured, typically under the content security chapter, there's requirements around what kind of, essentially, security levels are required to playback certain resolutions, and then often what kind of output protection is required. And so, typically what you'll see is that something like Widevine L1, which is a hardware-based security level of Widevine, or hardware-based protection.
Then on the PlayReady side, something like SL3000, which is also the hardware-based implementation of Play Ready. Those will be required for 1090p and above, for example. So a lot of the content licensing agreements will say unless you have hardware-backed DRM on the playback client, you cannot play anything from 1080p and above. Then they'll be typically ... And they'll have similar requirements around each level. So they'll group the resolutions, typically an SD, HD, full
HD, UHD, and each one of those will have different DRM requirements in terms of security levels. Also requirements around HDCP, whether that needs to be enforced or not, whether it's HDCP 1, HDCP 2. And so, what that essentially means in practice then is that when you're doing your ABR ladder, you have to define those security groups based on resolution and you have to assign different content keys to those groups. And so, your video streams up to, let's say, 720p might get encoded with one encryption key, and then between 720p and 1080p gets a different encryption key. Then everything above 1080p gets another encryption key, and audio gets a different encryption key.
Wow. And so, by doing that will you essentially accomplish that at playback time when the licenses are being requested by the players for each of those bitrates. Because they're using different keys, you can now associate different playback policies with each key. And so, you can say, well, this SD content key, for example, has a policy that doesn't require HDCP to be enforced and doesn't require hardware level of protection, whereas the HD group or the UHD group might require those. So that's really something that we do today in response to the way the content licensing agreements are structured. And so, in the future, that might change. My impression is
that we're actually moving in a direction of more DRM rather than less DRM. So even as recently as three, four years ago, some studios, some continuities were still allowing certain resolutions to be delivered in the clear, like SD, for example. A lot of that's going away where now essentially it's like, look, if you're going to do DRM, you might as well do DRM across the board, because it actually makes it less complicated that way. One of the things I've also noticed is that when it comes to HDR, for example, it's the strictest requirements for all of HDR. And so, even with HDR, you have an encoding ladder that ranges from UHD all the way down to 360p or something, and the requirements and the agreements are, well, you must use hardware-based DRM and you must use HDCP 2.3 for the whole HDR ladder. And so,
it seems that that's the trend of the industry is that we're actually moving just towards using DRM for everything. What's the difference between hardware and software? Hardware, is that a browser versus mobile device thing? Where is software DRM and where is hardware? So the difference is in the implementation of the DRM client itself. And so, if you basically want to get the highest security certificate from either Google or Microsoft or their DRM systems, you essentially have to bake in their DRM clients into the secure video path of the system. So that typically means they tight coupling with the hardware decoder as well, so that essentially when you send a video stream to the decoder, once it goes past the decoder, there's no getting those bits back.
So essentially once you send it to the decoder, at that point it's secured decoding and secured decryption. Well, first, I guess, secure decryption then secure decoding. Then it goes straight to the renderer. And so, there's no API call that you can make as an application that says now that you've decrypted and decoded these bits, hand them back to me. And so, that's typically called a secure video path or secure media path. And so, that's what you get with a hardware-based DRM. Software-based DRM does either some or all of those aspects of decoding and decryption in software and, therefore, there's a risk that somebody could essentially hack that path at some point and get those decoded bits back and be able to steal the content. So if I'm watching 265 on a browser without
hardware support, I'm likely to be limited in the resolution I can view if it's premium content, because the publisher says I don't want anything larger than 360p going to software. Exactly, yeah. Today, for example, if you're using Chrome, for example, so Widevine DRM is available in Chrome, but only L3, which is the software-based implementation of Widevine. And so, oftentimes if you're using Chrome, you actually get worse video quality with some of the premium streaming services than if you're using Edge or Safari, for example, because both Safari on Mac and Edge on Windows do support hardware DRM, because they're just more tightly integrated with the operating system. And so, they're able to essentially achieve
that secure video path between the browser and the operating system and the output. So let's jump to the packaging, because you ... Are you in the HLS, DASH, or CMAF camp these days? Both. So at both Warner Brothers Discovery and then my previous job at Hulu, we've been using both HLS and DASH, and, interestingly enough, actually even distributing it ... The split between those two is almost identical. So we use HLS for Apple devices and we use DASH for streaming to all other devices. What's common to them is the CMAF format. And so,
one of the things that I get a little bit annoyed about in our industry is when people refer to CMAF as a streaming protocol, and I always feel like I need to correct it and say, "No, no, it's not a streaming protocol," because CMAF is really two things. CMAF is, on one hand, a standardized version of what we frequently call fragmented MP4, the ISO-based media file formats. What the CMAF spec did is basically just defined, look, if you're going to use fMP4 in HLS and DASH, here's the boxes you need to have and here's how common encryption gets applied to that and so on. And so, it's really just a more buttoned down version of what we have always called fMP4. And so, in many cases, if you have been packaging either DASH or HLS in fMP4 media segments, you're most likely already CMAF-compliant. You're already using CMAF. But the other thing that CMAF is, the CMAF spec also defines a hypothetically logical media presentation model. And so, it essentially describes what really when you read through the lines will sound a lot like HLS or DASH without HLS or DASH.
It's really defining here's the relationship between tracks and segments and fragments and chunks and here's how you address all those different levels of the media presentation. And so, you can then think of HLS and DASH really being the physical manifestations of that hypothetical presentation model. There's a really great spec that CTA authored, so I think it's CTA, I think, 5005. That is the HLS-DASH interoperability spec, and it's heavily based on CMAF and using CMAF as really the unifying model, and then really describing how both the HLS and DASH plug into CMAF and how you can describe the same concepts in both. And so, it's almost like
HLS and DASH are just programming languages that are describing the same pseudo code. I want to come back to some other topics, but one of the topics important to you is is the CTA part of the organization that's going to make it simpler for publishers to publish content and just focus on the content development and not the compatibility? Because it seems like that's a pretty compelling issue for you. I hope that CTA will make some efforts in that space. I think a lot of what they've been doing is trying to improve the interoperability in the streaming industry. And so, I think it does feel like CTA WAVE is the right arena for that. One of the issues that I think today makes deploying streaming solutions really complex and challenging is that we have a lot of different application development platforms. Just before
this call, I went and counted the number of app platforms that we have at WBD that we just fill out for Max, and it's basically about a dozen or 16 different application development platforms. Now there's overlap between some of them. So Android TV and Fire TV are more or less the same thing with slight differences. But at the end of the day, you're looking at probably, at the very least, half a dozen different app development platforms. Then worst-case scenario, you're looking upwards of 20 or so app development platforms, especially once you start considering set-top boxes made in Europe or Asia that might be like HbbTV-compatible and so on.
And so, that's a lot of complexity because the same app needs to be built over and over and over again in different program languages, using different platform APIs. I think, as an industry, we're unique in that sense. I'm not actually aware of any industry other than streaming that needs to develop that many applications for the same thing. If you're working in any other, I think, industry, if you're working in fintech or anything else, you typically have to develop three applications, a web app, iOS app, and an Android app, and you're done. And so, it's crazy that in our industry, we have to go build over a dozen different applications.
But the practical challenges that then brings when it comes to things like encoding and packaging and so on is that it's hard to know what the devices support, because there is no spec, there is no standard that essentially allows ... That specifies APIs, for example, that every different device platform could call and expect standardized answers. So when we talk about media capabilities of a device, what are we talking? We're talking about we need to know what decoders are supported for video, for audio, but also for images, for text, time text. We need to know what different segment formats are supported. Is it CMAF? Is it TS? What brand of CMAF? CMAF has this nice concept of brands, but nobody's really using it. In order for that concept to be useful,
you need to be able to query a device and say, well, what CMAF brands do you support? Manifest formats. There's different versions of HLS, there's different profiles of DASH, there's different DRM systems. And so, these are all things that we need to know if we want to play something back on the device and play it well. So how do we standardize the playback side? Probably one of the key steps I think we need to take is I think we need to standardize device media capabilities detection APIs. There has been some efforts in W3C of defining those types of
APIs in HTML, for example. But, again, not every platform used as HTML. And so, when it comes to Roku, when it comes to me Media Foundation and other different media app development platforms, we need essentially the same API, really, to be present on every platform. Then once we have APIs standardized in a way they detect media support, we need to also have a standardized method of signaling those capabilities to the servers, because if you want to, for example, target specific devices based on their capabilities, the next question becomes, well, how do you express that? How do you signal that to the backend? How do you take action on that? How do you do things like manifest filtering based on that? So I think there's a lot of space there for standardization. There's a lot of room for standardization. And so, yeah, I'm hoping that CTA WAVE or one of the other industry organizations
will take some steps in that direction. Final topic is going to be AV1 or new codec adoption. You're in charge of choosing which technologies you're going to support, when does a technology like AV1 come on your radar screen from a .. I mean you've heard of it since it was announced, obviously, but when does it come on your radar screen in terms of actually supporting it in a Warner Brothers product? The first thing I typically will look at is devicing option, because that's really, I think, the most crucial requirement is that there has to be enough devices out there that we can actually deliver media to with a new codec that makes it worthwhile, because there's going to be cost involved in deploying a new codec. First, cost comes from just R&D associated with investigating a new codec, testing it, measuring quality, then optimizing your encoding settings and so on. And so,
that's both time and then also either manual or automation effort that needs to be done to be able to just understand what is this codec? Is it good? Do I want to use it? Then if you suddenly decide you want to deploy that codec, there's going to be compute costs associated with that. There's going to be storage costs associated with that. Then in some cases there might be licensing costs as well. If you're using a proprietary encoder, maybe you're paying them, or if you're using an open source encoder, well, you still might owe some royalties on just usage. You're pretty familiar with that. I read one of
your recent blog posts. So I know that you've spent a lot of time looking at royalties and different business models that different codecs now have. So in order to justify those costs, in order to make those costs actually worthwhile, there needs to be enough devices out there that can be reached by that new codec. So the first, really, question is what percentage of devices that are ... Active devices on a service are capable of using that codec? Interesting ... This goes back to that previous question that you asked, which is about device capabilities and how do we basically improve those things? So without good, healthy data coming back from players, coming back from these apps that tell us what's supported on the platforms, it's hard to plan what your next codec is that you want to deploy. Right now, for example, if I wanted to estimate
the number of AV1 decoders out there, my best resource would be to go study all the different hardware specs of all the different devices out there and figure out which ones support AV1, for example, or VVC or LCEVC, and then try to extrapolate from that data, okay, what does that mean? How do we project that onto our particular active device base? So, yeah, it's not straightforward today, but I'm hoping that if we can improve the device capabilities detection and reporting, then we can also get to a point where we can just run a simple query and say, "Okay, tell me what percentage of devices that the service has seen in the last week supports AV1 decoding, and specifically maybe AV1 decoding with DRM support or AV1 decoding of HDR." And so, it's like ... There's even nuances beyond just which codec is supported. What kind of pressure do you get, if any, from your bosses or your coworkers about new codecs? Because we love to talk about them, we read them all the time, but are people pounding on you and saying, "Where's AV1 support? Where's VVC? When's VVC?" or do they not care? Is that not part of what they're thinking about? I would say there's not a lot of pressure from leadership to support specific codecs. I think they're more interested in probably cost-savings and looking at things like how do we lower CDM costs? But one of the things that I usually always explain to them is that it's not a perfect one-to-one relationship between deploying a new codec and CDN cost-savings, for example. Even if you save, for example, 20% on your encoding bitrate, for example,
with a new codec, that doesn't necessarily translate into 20% of CDN cost-savings, because, in some cases, if somebody's on a three-megabit connection speed, for example, somebody's on 4G and the most they can get is three-megabits per second, you being able to lower your bitrate from 10 to six megabits per second is not really going to impact them. They're still going to be pulling the same amount of data. And so, that's why it's not a clear one-to-one mapping. But, yeah, I would say most of the demand for new codecs comes from that aspect, from that direction, rather than somebody saying, 'Well, we have to support VVC because it's the latest, greatest thing out there." Generally that's not the case. If anything, I'm usually the one that's pushing for that and saying, "Well,
we really should be moving on from H.264 and moving on to the next generation of codecs because, at some point, you do have to leave old codecs behind and slowly deprecate them as you move on to the new technology. I mean do you have a sophisticated financial analysis for doing this, or do you do the numbers on an envelope kind of thing? It's more an envelope kind of thing right now. Yeah, it would be something that would be based on, again, number of devices supported and then comparing that to average bitrate savings, and comparing that to compute costs and potentially licensing costs associated with it. So, yeah, it is a back of a paper napkin calculation at this point, but I think the ... The factors are well-known. It's really coming up with the data that feeds into those different variables.
A couple of questions. What about LCEVC? Are you doing enough live, or is that even a live versus VOD kind of decision? With LCEVC, I don't think it's even a live versus VOD decision. I think with LCEVC, I think what's interesting with that codec is that it's an enhancement codec. It's a codec that really piggybacks on top of other codecs
and provides better resolution, better dynamic range, for example, at bitrates that would typically be associated with lower resolutions, more narrow dynamic ranges. And so, the way LCEVC works is that there's a pre-processor part of it that essentially extrapolates the detail that is then lost when the video is scaled down. So you can start with a 1080p video, scale it down to, let's say, 540p, encode as 540p, and then with the LCEVC decoder on the other end, it can now take some of that sideband data and attempt to reconstruct to fulfill the other 1080p source signal. And so, that concept works
the same, whether the baseline codec that you're using is H.264 or 265 or VVC or AV1. And so, I think that's what's interesting about that codec is that it can always let you be a step ahead of whatever the latest generation of codecs is providing. Then the other nice thing about it is that there's a backwards compatibility option there, because if a decoder doesn't recognize that sideband data that is specific to LCEVC decoding, it'll just decode your base signal, which might be half resolution or quarter resolution. So I think in ABR, I think it can be very applicable in ABR, because typically you have a lot of different resolutions in your ladder. So it's like if you could potentially deliver that 360p resolution in your ladder at 720p, for example, to an LCEVC decoder, then why not? Well, we've got a technical question here. Are you
able to deliver one CMAF package using one DRM, or do you have to have different packages for Apple and the rest of the delivery platforms? Yeah, that's a great question. So right now what we do is we encrypt every CMAF segment twice, once with CBCS encryption mode and the other one with CTR, CENC encryption mode. And so, the CBCS encrypted segments, those are the ones that we deliver to the HLS, to FairPlay devices. Then at the moment, the CTR segments are the ones that we then package with DASH and are used with both PlayReady and Widevine. That said, both Widevine and PlayReady have introduced support for CBCS a while ago. It's actually, I think, been probably over five years at this point. And so, theoretically, we could deliver those CBCS
encrypted segments to all three DRM systems and it would work. The challenge at the moment is that not all devices that are Widevine or PlayReady clients have been updated to the latest version of PlayReady or Widevine, because in a lot of cases there are hardware implementations. And so, without basically firmware updates from the device manufacturer, they're never going to be up to date with the latest DRM client. And so, we're waiting to see when those
last CTR-only Widevine and PlayReady clients are going to be deprecated, slowly move out of the lifecycle. Once the vast majority of the PlayReady and Widevine clients out there are CBCS-compatible, then that opens up the path to even CBCS improvement segments everywhere. Final question, AV1 this year or not? What do you think? I think probably not this year, I would say. I mean I think we might do some experimentation, I think just some research into encoder quality and optimization this year with AV1. But I wouldn't expect deployment of AV1 this year, not because of lack of support, because I think the support is really starting to be there in significant numbers. I think the latest either Samsung or LG TVs, for example, now include AV1 decoders as well.
Yeah, yeah. And so, that's always, I think ... Often people will look at mobile as being the indicator of codec adoption, especially Apple. People will be like, "Okay. Well, if Apple will adopt it in iOS, then clearly it's here." But when it comes to premium streaming services, so whether it's Max or Hulu or Amazon Prime or Netflix, most of that content is watched in living rooms. And so, really the devices to watch are
smart TVs and connected streaming sticks. So once those devices have support for a particular codec, then, in my opinion, that's really the big indicator that, yeah, it might be ready. We're running over, but this is a question I need the answer on. But what's the HDR
picture for AV1 and how clear does that have to be? Because it seems like there's a bunch of TV sets out there that we know play Dolby Vision and HDR10+ with HEVC. Do we have the same certainty that an AV1-compatible TV set will play AV1 in HDR? I don't think that certainty is there yet. I do need to do some more research into that particular topic because I've been curious about the same thing. So I think some standardization
efforts have been made. I can't remember off the top of my head if it's CTA or some other- No. HDR10+ is now a standard for AV1. I just don't know if TVs out there will automatically support it. Right, yeah. Then if it automatically doesn't work for you, you've got to make sure, you've got to test. Yeah, yeah. Then with Dolby Vision, it's like, well, until Dolby says so. Then it's not a standard. So, yeah, I mean I think that's an excellent question, is that there's nothing
from a technical perspective that should be stopping somebody from using AV1 or VVC or any other new codec with HDR, because there's nothing specific to the codec that HDR needs. And so, it's really just a matter of standardization, a matter of companies implementing that standard. So, yeah, I'm with you on this one in that it is one of those where, yeah, it should work, but until it's been tested and it's been tested on many different devices, it's not a real thing, right? Listen, we are way out of time. Alex,
I don't think we've ever done this for an hour, but it's great. I really appreciate you spending time with us, being so open and honest about how you're producing your video, because I think that helps everybody. Thanks. This has been great. Absolutely. Thank you so much for having me. Yeah, this has been really great. I feel like we could probably keep talking for another hour or two, and I think we'd have still plenty of topics to discuss. Yeah. I was taking some notes while we were doing this, and, yeah, I think I have notes for another hour at least.
Okay. We'll talk to Anita about that. I'll see you at IBC? You're going to go to that show? Yeah, I think I'll be at IBC, so I most likely will see you there. Cool. Take care, Alex. Thanks a lot. All right. Thanks so much.