Meteor Lake Tech Tour Deep Dive

Show video

Hey, everyone. I'm Mark Hachman of PCWorld. I'm here with Dr. Ian Cutress of Tech Tech Potato. And we are here to talk about, well, among other things. Meteor Lake. So Intel sent us to Malaysia as part of its Intel Tech tour from August 22nd to the August 25th in Penang and Coolum.

Full disclosure, I was flown out courtesy of Intel, who paid for my hotel and meals as well. We didn't want to do it this way, but that was a way we were able to get there. This meteor, that tends to be the way they do things. If you don't pay, you should have fly by that died and they don't let you in. Things like that, right? Okay, so Intel has fab operations around the world, developing fabs in Hillsboro, ten nanometer fabs in Israel and Intel four Fab in Ireland.

More on that later. And the upcoming Intel 28 Fab and Chandler, Arizona. But we also saw the additional part of the process, which is chip assembly, which is putting the dice on the wafer itself.

So we're going to talk a little bit about that as well. Dice from wafer into package coming in right away. I think that's right.

So Ian is going to be here to help correct me, make sure I stay on track, make sure I'm accurate and so forth. So we're just going to go back and forth throughout this whole thing. So Intel had a tech tour last year where they sent reporters to Israel to look at the fabrication operations. This is like chapter two of that operation. So what we're going to do is we're going to talk about Meteor Lake first.

we actually just to give you a little bit of a timeline, we spent a full day with Meteor League. We spent two days at the so called Intel Penang Assembly and test and our peak at and the intel coo Liam Di sort di prep. So we're going to talk about everything from how Meteor Lake works to how it gets assembled and packaged. It's all very interesting, but the one thing that we aren't going to talk about are going to be some of the questions that you have specifically in mind.

Intel talks about new processors in two stages. It talks about the architecture first and then it later gives us all the information that we really want to know, which is speeds, feeds, costs and the whole work. So we're starting with phase one right now.

Let's be honest. That's all we know. That's all we know. That's all we know. That's right. That's all we know exactly. That's right. So we're not going to be able to talk about some of the even some of the basic questions. I mean, Intel confirmed to me and this is any great big deal, this is the 14th generation courtship, but they wouldn't even really answer questions about, say, whether this was just mobile only.

You know, this reported Raptor Lake refresh didn't address that at all. Yeah. So, I mean, we know a little bit about this mobile chip and that's pretty much it.

So so let's talk about, though, first off, what Meteor Lake is and let's just hit the high points and then we'll dive in a little bit deeper. So my reaction to this was this was very a little bit like Qualcomm Snapdragon messaging. And I mean, we're talking about an emphasis on low power. We're emphasis on AI, which is something that Snapdragon chips have had.

And at least in the mobile space, a leap ahead in graphics performance and the fourth thing is this disaggregation. We moved to Intel four and now we're we've moved away from sort of this Northbridge Southbridge model to, well, I want to say four tiles and says five tiles. We can talk about that a little bit later. But yes, I mean, we're talking about a disaggregated architecture where all these different functions are so separated with a couple of new features. We have low power e cores now and we'll talk about those a little later.

So let's just start here in what was your your your top level takeaways from all of this. It's Intel. And Intel still do two things to its product portfolio that kind of every house in this space has one of them is Chiplets on the consumer platform, on tiles, whatever you want to call them. Right. And then advanced packaging, right.

This does both. And I guess the third one is EUV. You know, the MySQL process actually drives down cost and makes stuff yield higher in the fab. So so the fact that Michalak does all three all in one go means that this is arguably a big step function in Intel's design and feature process and manufacturing. You know, regardless of where we land on performance, the key takeaway I got from the event is that and, and the tech day is that they are trying to do a lot. They recognize that the market is changing quite fast.

Obviously, some of this stuff had to be put in plan two years ago. So for us as some prediction is is needed and they're trying to target a market that they think that a lot of these features will be useful because, I mean, there is something to be said of if you build an all singing, all dancing chip, you don't need it in every device, right? Absolutely. And so there's some differentiation here. We think where Intel typically does multiple dies, depending on the different SKUs. But that's kind of changed as well a little bit here. And just in terms of strategy, I mean, for the end user, it's still going to be a computer chip.

It's still going to run your software, it's still going to run your games. How well it does, we'll find out when they actually get launched and we get some of those details. But to get to the end user, almost nothing changes.

It's just next gen Mm mm mm. So next gen. Absolutely. So you know I think one of the things that consumers are going to be maybe a little bit concerned about, I talked to Michelle Johnson Holthaus who is the head of CCG and she sort of put it MJ's amazing, by the way.

Yeah, she really is. And she put it basically saying that I mean this is didn't require any pricing. I mean, she was saying that CPU performance is going to about equal Apple Raptor Lake. So, you know, just set expectations. Don't expect some sort of dramatic improvement in performance.

So the Tick-Tock model is you're changing process node, not necessarily architecture. That's exactly exactly right. Two X the graphics performance is what she said. And then of course if we did mention earlier the integrated CPU, so bring in the first iteration of integrated AI into the Intel CPU model. So I think what we decided we were going to do was sort of just talk about this from an individual tile perspective. I mean, it's probably easier to talk with the let's start out with the CPU tile first. We've got the new Cores Red Cove and we've got the new EE Cores Crescent, but we also have these low voltage EE cores as well.

And let's just talk maybe about those to start out with then. I think really so, so, so the CPU tile could not only be just talking about the CPU, but because this is a disaggregated architecture, we just talk about CPU tile. So we have the big pea cores there tend to be called Redwood Cove. There's going to be six of them in the design. This will be paired with eight low powered EE cores, which are the crescent architecture, right? There's a small slight bump in their general engine of about, they said about 8% in terms of IPC spacing.

So 6 to 8%. IPC just making sure our notes align. No, that's fine.

I want to double check myself. Yeah. And so so sort of 4 to 6 46%. Okay. So we have this of six plus eight design.

We're going to have hyper-threading on the cores again is what we used to know Hyper-Threading on, on the ee cause they're now going to be that while they are equal in terms of the instruction set they support, that was the same on the desktop parts before, except we don't have this like bulkier 512 unit that's kind of there. But disabled, it's actually a unified architecture design. They're also implementing some additional instructions called We and I. We've already seen them on the enterprise platforms. Now, but they're coming coming to coming to mobile. This is just some additional acceleration for on board features that are CPU based rather than GPU or and CPU based.

And that's kind of the CPU tile in a nutshell. This is connected by a die state interconnect to the main. So see tile and there are going to be two additional EE cores on that IO tile. Right? So if we look at a pseudo SC tile, sorry. Yeah.

There's so many tiles I can't get my name straight and these two ee cores are the same ee cores that are in CPU tile. Right. It's still Cressman that's just designed for a lower performance and point and a better efficiency point. The idea being that if your laptop is on idle and mine just went to sleep, so I'm going to have to relook in here, but these are designed for lower voltage, low frequency background tasks where you're not in need of immediate feedback. Right? That's always been kind of the thing on the ARM ecosystem when it comes to compute.

So having these two additional EE cores separates from the CPU tile. So you don't need to wake up the CPU type. Right. And that's the important. That's it. Yeah. Yeah, yeah. That's where a lot of the power savings are going to come in right. It's the same code between the two tiles, but they're actually built on different processors and as a result is that they're going to be configured slightly differently in Silicon just to optimize for power and performance.

Kind of like if you if you know about Zen four and Zen four see, AMD made a denser design for Zen for me because they didn't need to hit the high frequency points. It's that sort of thing here. But for the low power E cores. And I think just for clarification, I'm going to call them LTE because it's probably uses PICO in core LP cores. So you have 682 and internally it's it looks like based on that all we had is called h682. Hmm.

And we expect that to be a lower court count design coming later which may have a different number. Right. Exactly.

Well, that's going to be we don't know yet. And let me I guess I'll just back up this and just put it to to sort of restate what Ian said to me, because I think it's kind of things that took us. It took me a while to sort of break as well. So Redwood Cove, I mean, as far as the changes that Intel's talked about, the only thing that I heard was improved performance efficiency and improved bandwidth. There's a larger L2 cache on it, although I'm not sure if they actually disclosed what size it was. And then with Christmas font, again, the IPC scenes over gold mine, I believe it is, are 4 to 6% and then the low power e cores are simply a different implementation of Cressman.

It's my understanding as well. But the key is, is that as he has pointed out though is that one of the things they talked about was because of the fact that, you know, the old ways of doing it monolithic or even like this sort of the Northbridge Southbridge where you had like the the PCB PCI, you can see it. Yeah.

PCH like yes, PCH, exactly what the in-memory stuff was that, you know, if you if you touched the logic, you woke it up and that's the problem. One of the key things for these low power e cores is the fact that they are they want to perform these background tasks on these e cores and some of the background tasks that you think of as background tasks are really background tasks. I mean, they showed like decoding tiers, you know, they showed a playing back tiers of steel as a low voltage or loop power E core function. And I mean we do and so does everyone else. We do battery life testing on laptops and like when you have a battery life test that touches the main CPU, that's going to be a quite a different experience than one that just touches one of those little E core.

So there's a low power E cause it's going to like battery life is going to be a lot or you're simply just not turning on silicon. That's right. Totally power gating part of this to just how great design allows you to have finer control over that power gating.

As a result, they had to essentially redesign their power distribution model from the ground up. No easy task by any means. However, they knew they had to do it because they understand that this is now the future of computing. Exactly as we go to more advanced process nodes.

I mean, part of the power efficiency gains here is that the CPU tile itself is built on Intel four, so not Intel four nanometer like you keep on installing. Yeah, exactly. But this is Intel's first process now to use the easy extreme ultraviolet lithography. So while TSMC has been prioritizing it for three, almost four generations now, this is Intel's first.

So they're going to get a step function benefit in terms of density and efficiency here, which will kind of the density stuff will go unnoticed by most people. But I think the power efficiency here, along with the new power distribution model, along with the tiled model, you know, these are going to be, if not additive, they're multiplicative to what people are going to experience in the battery life. It's I've kind of been delaying upgrading until I get one just to see how I was going to perform. I don't necessarily think consumers going to care so much about you. You let's just explain it briefly. So EUV stands for extreme ultraviolet.

Extreme ultraviolet. So when you have when you create a computer chip, you essentially have shine light through a mask onto a series of chemicals and either some of them harder or some of them dissolve, and then you wash it away and then you build up a metal layer. But realistically, you're using a wavelength of light that's larger than the feature you're trying to print on the silicon. Exactly.

And depending on what wavelengths I say light, it's electromagnetic radiation. What wavelengths you use determines how fine of a pitch you can make those features. We were on 193 nanometer wavelength light for the last 20 years or so since the early since the early 1990s, and it wasn't until, I think 2014, 2015, the first TSMC products came out with this new extreme ultraviolet lithography tool and we reduced the wavelength of the light down to 13.5 nanometers. There are benefits and negatives.

This the the main thing is this was a technology that was first dreamt, dreamt of 35 years ago. They thought it would only take 15 years to implement and we had to wait another 20. So so there is that.

It does mean you actually using light megawatt lasers to create molecules of energy to create some of these features. A series of mirrors too, right? Yeah. So efficiency is one of the least efficient things on the planet to do. But the point is you end up making very efficient chips at the end of it. It helps increase the density, which increases number of transistors, increase some transistors.

You can do more things in silicon. So one of the aspects I know I've covered and I know you've covered is this function of different amounts of accelerators on boards to help with, say, video encoding, video decoding and playback. You have more transistors, you can do more of that as well. So it's, it's, it's of a benefit. However, I'm quite bullish in saying that EUV is kind of like the last major technology in this all graphene and that that's just due to the physics of it all is good.

Absolutely. It's requires a hard vacuum too. I thought it was kind of cool too. Yeah. You can't just do it in there.

It's the systems are really tall. There's only one company that makes them. They're really expensive. I've touched one, I think.

Really? Yeah. I've got a small short on YouTube where I touch one and run away and somebody runs after me. That's the comment that the comments on it. I blame Ian for the chip shortage. ASML, is that right? Yeah, I of the company that makes them machines.

But ASML is essentially a conglomerate of all the big fabs as well. So yeah, guess that's fine. Yeah. Now the one thing that I think that was that's important to mention though as far as the EUV Intel four and so forth, is that they did have Bill Graham, who's the VP of Logic talk technology development on stage. And one of the things he pointed out which kind of makes sense, was the fact that, you know, if you're producing not just one massive chip but tiles and packaging them together, the efficiency in doing so is actually improved to the point that they talked about. Not only is we're talking about power efficiency, which I think the process technology according to this, gives you 20% better power efficiency, but the yields are better.

And that's what really struck me was the fact that according to Grimm, he says it's going to be the best yielding product at time zero, which I guess means launch in more than a decade. So we're going back to 14 nanometer, ten nanometer, the whole works, That's that's a big deal. So so so one of the benefits there is if you imagine the features you're printing on the chip, right, If you're using the old technology, you know, 193 nanometer wave light in order to make a design feature that say 39 meters, right, you have to use multiple exposures.

And you know, the special maths and physics that are involved in that. And each time you do an exposure, that's an opportunity to lose yield with extreme ultraviolet light and you can replace four exposures with one because the EUV is allows you to be more accurate first time. So when he's saying this is improving, yield is literally a function of the fact that they can use they require fewer steps to make the product.

Right. Right on top of all the different design features that allow for power efficiency and also creating smaller chiplets the smaller your chip, the general high yield. The number I'd like to quote is tsmc's and seven process, which is very why it has been widely used for graphics. That process has a defect rate equates to about 45 per wafer.

Right. So if you make really really small chips something you're making, you know, it's 500 chips on a wafer if only 45 have defects, whereas if you're making massive chips and you've only got 80 chips on a wafer like a big CPU and there's 45 defects, well suddenly you can't use 45 and. Right. Or that is a bit more complicated than that. But, and just be clear, Intel is not making everything in Meteor Lake.

No, it's making its own CPU tile. But as you mentioned before, some of these are being manufactured by TSMC, Right. Where the process that the tiles being which tiles are being manufactured in which process. Yeah. Yeah. So we're all very familiar of the struggles that Intel's had with its manufacturing, the issues that have a ten nanometer and then subsequently the renamed seven nanometer. And now Intel four is like kind of the first jump into that EUV market.

As a result, in order to remain competitive on the product side, it's had to contract out some of its manufacturing, TSMC. Now Intel's already being a top four customer of TSMC for a decade, so this isn't really new. What's really new here is that actually that silicon is going directly into the CPUs that people buy. So we have this Seaview Tiles Intel for we have a few tile, so latest generation GPU graphics which is on TSMC and five.

Then we have an SC tile, which is the system on chip that's solid TSMC and six you have an IO tile which is dealing with some of the IO functions and six and so that's the floor tiles. Intel official officially exists, right? There is technically a base tile on top of that, like a passive interpose that's not given a process name, but it is made on Intel is tentatively called Intel. P 1227 If you go back many years through slide decks, you will actually find references to this, but that's part of their advanced packaging.

This what they call foveros die to die stacking, which we can get into a little bit. Right? Right. But yeah, so five tiles, technically two or an Intel three are on TSMC and the four main ones are all using EUV and they are just make sure I understand correctly too we have a this this so we have the individual tiles sort of talking to one another is the we have a new fabric that goes ahead and does that. Those are part that replaces the old that the old direct media interface. I guess it is. Yes.

So previously Intel, when it was a monolithic design, you'd have this this thing called the ring bus. And the ring bus would have stations, every CPU core, then a station at the drama station at the graphics and a station, all the IO and these rings could be sort of like 15, 16 or 17 stops long. And there were trade offs with power and latency.

When you get to that size and whether they unidirectional bi directional, I think in the previous generation they actually doubles the bandwidth of them by actually doubling the number of rings this time because you've got this disaggregated tile structure, you can essentially have like a mini ring inside the CE Futile, right? And then you can have pathways in the SOC tile, the GPU tile, so it makes it more efficient within a tile. There are some drawbacks to this because it's a tile design. Typically you like all the electrons to say inside silicon. With tiles you kind of have to go from one silicon to another to another in order to get your communications. There is a power penalty for doing that, I believe if you're interested. They said point three peaker joules per bit, which is really low because on the silicon, you know, it's it's next to zero.

But actually if you look at AMD solution, it's more like wants to pick the jewels per bit. So the fact that it's really low power is really great because it means when you're dealing with bandwidth to the CPU. So the GPUs, but it also allows for a lot of different control in the platform, especially moving data, especially when you want to wake up, you only to make wake up a massive ring just to fire up the video decoder, for example.

Now, because the video decoders in the system. Right. Which is actually a great Segway because that's what I want to talk about next. The SOC tile is is honestly maybe arguably the most interesting title they have. It's the complex title because it has both the low power E cause this is where the NPU AI engine is located and we've got Wi-Fi six, E and Wi-Fi seven.

We can talk about that a little bit later, but that's a hint for you that we're talking about advanced technology. The media controller Displayport, which is actually Displayport 2.1 and HDMI 2.1 Integrated memory controller, which they didn't actually disclose.

But as we asked later, I'm not sure who it was. And there are five and lpddr3 five. Yeah, yeah, yeah. It's I'm pretty sure this platform isn't supporting DDR4, right.

But that's just because new technologies and moving on and such. So yeah. So the SOC let's talk about the two things that are probably the most interesting and let's just let's start out with the NPU because this is, you know, Intel has made a big deal of the fact that this is the first introduction of the IPC.

The Pat Gelsinger said a year ago at Intel Innovation that this was going to be the again, this is going to be sort of a big deal. And then they talked about, you know, this is going to introduce AI to the consumer PC. It's going to be, you know, a foundation for years to come.

It's interesting, of course, because to date, A.I. has pretty much not been a local issue. It's been something that lives in the cloud.

I mean, everything from Bing chat to chat. CBT is, you know, you talking to a server that's you have a smartphone unless you have a smartphone. That's true, Yes. Unless you're actually running stable. The Fusion, which runs on your local hardware, which, you know, you can create all sorts of weird cat URLs or whatever you want to do. And that's one of the things that, you know, they are talking about is, is privacy.

The fact that eventually you're going to want to have an AI that knows everything that you want to give it, but doesn't talk to anything else. It doesn't exist on a server so it can't be hacked. So this would be something that could, you know, do your finances or something. It's the way machine learning is being spoken about today, right? So there are two types of machine learning.

You have the training, which is where you need the massive GPUs and the big data and then you have the inference. So once you have a trained model, you show it and you picture and it identifies it for you or you give it and you prompt and it creates a new image for you, the expectation is that the amount of work given to inference workloads is going to be so much more than what training will be in the future. And as a result, there's this desire to move it away from high power in the cloud, or you could say low power in the cloud, high efficiency in the cloud, and you don't spend that extra power sending that data to the device.

So how much can you do on the device? That's why we're seeing now not only Intel but also AMD. And we Qualcomm is starting to have these AI engines or Nvidia's Intel actually changed the name from Vpu to NPU on our recommendations. Thank you, Intel. And just because Nvidia has become a staple in in the Apple world, just so just so every understands kind of what it's bringing to the table here so this this NPU this this little piece of silicon this is actually derived from an architecture acquired by Intel called Movidius. Right. So if you follow the mergers and acquisitions, Movidius happened a few years ago, basically.

So you've got all of these right there and they launched a few products, some of them over this brand. And now it's decided that the Movidius architecture has evolved to a point where they can now integrate it into laptop, mobile, CPU, and more than just simply vision processing, which is why it's called a Vpu. Initially, it can do it can do much more. It can do convolutional neural networks on audio and video.

It can do transformers for generative AI purposes. I still think we need to they have disclosed roughly what the performance of this ANP is, but they're working closely with Microsoft to engage, to integrate it into the Microsoft. You know, the experiences with the camera that I know you guys are going to Madam have played with before in the channel, you know, so we get the idea is that we get a unified, you know, a unified experience at the Microsoft level between AMD, Intel and Qualcomm. And then Intel is also open.

They open their opening this up through open vino to do really cool demonstrations and things. We saw a demo at the event called Re Fusion. This was this is one of the highlights of the event for me. So they they took a music track.

They applied one machine learning model to separate audio vocals from the melody. Right? And then they applied for fusion, which took the melody spectrogram as a picture. Yeah, exactly.

And then applied stable diffusion to the picture to change it from one type of music to the other. That process, they put the audio back in and now suddenly your country music is exactly. That's right. Yeah. With the same vocals country music and to advertise their live karaoke. But but but my funny thing is it was stable diffusion so it wasn't working on the audio as a wave.

It was working on a picture of the spectrogram and the spectrum was red back in it and mind blowingly insane as a as a model. Great as a demo. I've been telling other people this is a demo they had to use for this song. Absolutely.

But this is only the start of it. That's right. Well, and one of the things we've talked about, too, is the fact that it's not just the the NPU is not just the end all be all here.

It's because of the fact that, you know, a lot of it, for example, if you get we're on stable diffusion right now, you're tending to run on the GPU and VRAM is actually sort of one of the gating factors for it. So they showed off a demonstration of actually running a stable, the Fusion 1.5 on 20 or 20 iterations and then basically ran it against some different configurations. And there's really not really any easy answer to this.

They showed it off like all CPU and it took 43 seconds and consumed 40 watts. They ran out of GPU 14 seconds, 14 and a half seconds, 37 watts with all of the the NPA, which I assumed to be the one that's in Meteor Lake 2020 point 7 seconds, ten watts. So that was the most efficient one, but the most effective one in terms of, you know, in terms of actually getting it done was to combine the GPU and the NPU, which did it at 11.3 seconds and did it in 30 watt. So it did consume a little bit more power, but it got the job done faster. And I think some people that's one thing I care about is they they didn't stretch the same workload over both. They separated the machining workload into two parts.

One on one, one on the other. But they did it simultaneously, I believe. And ultimately there's a lot of conversation about, well, we have a CPU GPU and now an MP.

You other workloads are going to get stretched across both. I think stretches are long term, they're going to be disaggregated among the set. So disaggregated tiles, disaggregated workload and yeah, it'll be interesting to see though. I mean, you mentioned, you know, working with Microsoft and I think that they are trying to get their open V.A.

libraries working with what Microsoft's doing, their APIs. I yeah, I think there's a I think there's there's more that we're not hearing here from behind the scenes work because I'm not exactly sure what parts. I mean, you know, this suggests basically that Windows is going to be accelerated by AI. But what parts of it. Yeah, that's something we don't know is this quite yet. So I think you know, we're we're there's there's definitely more to that particular story.

Well, I mean I'd love search to be accelerated by machine learning. Of course everyone would because everybody knows how right search is playing right now. But they just also talk about a lot of the partners that are using maybe not open, you know, but a variety of different features to get access to the new you.

So Adobe was using one of the features, I'm sure we get slides. That's a really nice slide showing some of these how they're separated out but of the four different kind of ways for most ML models in Windows to talk to the OS or talk to the hardware, they showed off different examples of each and open you know is one. Now can you help me a little bit because as I was talking with you before we started doing this, the I think most people understand what makes for a good CPU. I mean, we're talking about core counts, we're talking about threads, clock speeds, power and so forth, even caches. When we're talking about V cache with Andy, you know, with the AI engine that they've got there, they're talking about two neural compute engines, each with their own MAC array, each of their own share of these DSP tied to a shared scratch pad, RAM. That doesn't mean a whole lot to me right now.

And I think the question that we're going to be asking in the future is what makes for a good I mean, what's going to be the sort of the thing that people need to focus on? Is there anything that we can say about night now about the direction Intel's heading and what what we can look for in the future, anything? Unfortunately, the correct answer is no. No. So I follow the hardware market more so, you know, data center type stuff.

Yeah. And there are a lot of different architectures in play, whether that's, you know, kind of Google TPU systolic or something called a ray or dataflow architectures or massive wafer scale architectures. There's a mix of approaches just because of how dynamic the machine learning industry is right now on the software side. Every week there are new models coming out. There are new ways for those models to be computers.

The model machine learning is all about having matrices and layers and sometimes the well, most of the time the way you arrange those matrices, will you arrange those layers? What you do back propagation and feedback, all changes. And so how do you develop an architecture that can do all of those at speeds and efficiently? You could run everything on a CPU, but as you said with those numbers, right, everything on the CPU is going to run hot and it's going to run slow. So you need an energy efficient architecture that can approach most of these in a very power efficient way. So Intel's, you know, essentially with them a video team of have said that this architecture has kind of hit the nail on the head that thinks for the broad set of workloads currently in use for on consumer platforms today. Will that be the same set of workloads for the consumer next year? We don't know because it's such a fast paced environment. So what they're trying to do is while they're trying to have some acceleration with vectors and matrices and what have you, they still need to make it flexible enough.

So in the computing industry, we often have this term called DSP digital signal processor. And a digital signal processor is like an ultimate basic. It is geared towards one workload and it does really well like a video decoder. If you know that all your taking in is h26 for video and that's the only workload you have, then you can pipeline those exact instructions to get the best efficiency out of your design. A machine learning is not there yet, so it's also a mix. What they're doing here specifically I think that's worth highlighting is that the NPU is technically a dual core NPU, so they can either run one big workload across both or two independent workloads concurrently.

Right. Or if they're busy, you could arguably offload to the GPU if you've compiled it that way. So these all these little different things on how you design your machine learning engine you have to consider. But you had to consider those two years ago.

So I think I'm very interested in what they're doing today about next year's product. Yeah, I mean, I guess my question was more along the lines of do you think that they're going to simply make it like a for neural compute engine? I mean, is it we should we should we can we expect more parallelism or is that just. Yeah. So with this neural engine they have in the past kind of offered it on an MK2 card for embedded in industrial design, right. And I've, I told m.J Michelle who we mentioned earlier,

you need to make that simple to code available for systems because if I've now got a motherboard with four and or two slots, you have got a SSD in one, maybe in two, right? Maybe I want a high powered NPU in another one of those. And if we have the right software to support that, then I can then I can run machine learning on Intel hardware on any system with an end up to slot. So the idea is that the architecture is scalable. It's I think internally they call this version 2.7 architecture, right? Yeah. And the next generation is, is going to be version three or version four.

So they are adding new features for future generation products. And yeah, I'm really hoping that they can scale it out beyond just simply being integrated into the chip, because I've seen demos with this thing running special features on OBIS for streaming, right? I'd love to be able to just have a system that doesn't necessarily have an NPM in the CPU, but just has an MP on an adding card so that I can stick in and do all of that in the power load. Hopefully low cost and just offload it from the CPU. So that's a loop. So yeah, I think it's from what you're saying, and I think I would agree with you that they're trying to sort of just create a sort of rising tide lifts all boats, something fundamental. And as they start to figure out, hey, this is what people are using it for, and then they start to specialize and accelerate those particular functions.

Yeah. Mean, so yeah, they're trying to do it, build an ecosystem around it. And I think having the hardware is stage one, having the software support, having the software developers on board is stage two and they're going to be working through that. This isn't an overnight success regardless of what's happened with either Apple or smartphone space, this is going to be a multi-year progression.

Now, a cynic will say, well, wasn't that the same thing with ADX 512? The answer being yes, accepts that the desire for machine learning macro economically is so high right now. It was never that high for x 512. So speaking maybe it's 512. It's as far as instruction acceleration is concerned. They talked a little bit about DPI for a which is a special AI instruction acceleration, which I think they first launched with XLP and Tiger Lake. As far as I'm guessing we're talking about, that seems to be a sure how big of a deal it is, but that seems to be something they're talking about.

Well, yeah, it's, it's, it's, it's a GP2 type instruction that was in your XLP XY chassis. Right. So I'll agree with that one as well. Yeah. And this is part of their XY supersampling upscaling. You know, again, a dedicated instruction to help with machine learning and they can use it in that context as well. Yeah.

Just simplifying the throughput and increasing the efficiency and performance right now since we're still floating around the tile, I did want to talk about something that we may have talked that we may have skipped over earlier, but I think is important to mention and that is what Intel is doing with Thread Director and it's low power cause there's a big change in thread director for this particular implementation. And the fact is, is that previously Intel would look to put hype for new tasks, especially ones that demanded a little bit more on the PE course first. And now they're going to do exactly the opposite direction. So right now, according to we'll see if we can show the slides here.

Yeah, exactly. Yeah. If we can show the slide, we'll try to show this later. So right now we're there are going to what what Intel is trying to do is to take a new task and put it on the low voltage e course first and to make sure that those are always essentially in use in saturated.

And then if something else requires it to bump it up to the regular voltages course. And then finally the p cause. And that's a real that's designed to save power, but it's also just a real change in their mindset from previous generations and a little bit of that. Yeah. So in the past Intel has always parroted this race to sleep.

Right? Let's get worked on. Yeah, yeah. Let's, let's bring let's get the workload done. Let's bring the CPU to sleep and then let's save power that way. And with, with the desktop version as a thread director you would go workload as you said and put it on the pico on the high phone swap because if you're just, if you're typing, if you're opening up the start menu, you want that to happen as quickly as possible.

So put it on the high performance part. This obviously consumes power and if the call was idle before, you obviously have to ramp up speed as well. And so for a mobile platform at least, I'm not sure if that would change Actually on the desktop platform. And we'll have to wait and see with this mobile platform.

Let's just put all the workload on the low power cores and then if it meets the threshold for performance as determined by the CPU, right, it will tell the operating system this is a good task to increase the priority and as it increases the priority, it will be moved to the regular E cores, then to the peak cores and get high performance. Now the question that I mean that comes to mind when this happens is, is there an additional latency for this process. Right. Really step up. Yeah. Yeah. Because there's always there's already some concern with all delay can wrap too like that. There is this latency differential when you just start a program due to the PS and the E cores and now you've got LP to E to P, they kind of answered that and they said it shouldn't be noticeable, the idea being that they could switch tasks within a single frame.

You know, if you're trying to hit that 33 or 16 milliseconds to switch tasks, which should be plenty enough time to switch tasks. Right? But it, I think the problem there is that if you have a lot of tasks starting on the core on the LP cores straight away, they're going to be fighting for resources before they're fighting to be. Mm hmm. Mm hmm. And that could be a potential worry because we're in this method mindset now of programing of rather than just writing very linear code. Let's make each new block of code a threads independent thread. So these threads are going to be spawned within a task and then they're going to all start on the LP cores.

So I really hope Intel has done their background, checking their trace analysis and they can fix this out because the, you know, the ultimate step back would be to reverse that process through an update. So yeah, so something in this design that I would want to point to in this like proof that work properly is thread director because whereas most of us enthusiasts right right right we want performance and then for those of us on the road and on battery life and on some level, I don't think you can have both. Yeah, it sounded like a policy decision, like they could go ahead and change the scheme for gaming or they could change the scheme for whatever it might be. And you know that that suggested to me that, yes, I mean, we could see reversal like you suggested, you know, if something either went wrong or they just decided this was not the way to go.

Yeah, it to be clear here that there are two there are two agents in play. There's the CPU itself, and that's the operating system, right? Only the operating system can migrate a task to a high performance core. Technically the CPU could, but as part of the arrangement with the operating system, Intel specifically said we only provide hints right sizes. No. So yeah. So the OS has to know what's going on.

So the OS is essentially the orbiter, all the work load detection. So say you've got a streaming video decode or have a and you still need cores active for that. Right. Is the CPU can detect that and get it, hence the OS. But the OS actually has to do the change and this is why there's also a mix of responses, at least at the event we were regarding Windows and Linux. And you know, Charlie, we know you're really into that, that you said work.

Work is being done on Linux and up streaming. They just wouldn't provide a date. That's right. Exactly right.

Okay. So we've talked about the CPU tile. We've talked about the SLC tile. I think we've hit most of the topics on the.

So Steve, associate tile, you were talking about graphics. Yes, graphics. And so go ahead.

Yeah, I was going to say Intel very well known for their graphics hardware bumping out for this generation. They gave a they gave it two X performance uplift gen on gen and the question I had is are you comparing that today 10th Gen or to launch. Right exactly.

Because is Intel has made such progression on drivers over the last year and a half and they just say no it's a two day statement today to today previous gen today this gen that they're expecting a doubling in performance. That's an increase in frequency because let's see if we can show that graph as well. A really good increasing frequency, a reduction in voltage to reduce the power. So at the same at the same power two X performance.

And then I believe they were showing almost two by five. The graph I think was very sneaky because I think the wax is actually set frequency my half and it went up to like 2.5, so a 2.5 gigahertz intel processor, you, you know, I've been so used to seeing 1.1, 1.2 on, on this type like thing.

Yeah. I'm going to be it's going to be good to see 2.5 for the same power Raytracing do. Yeah yeah. Integrated Raytracing though I think that's going to be a little bit hit and miss depending on how much Raytracing performance they can provide it. So at the whim of the game, developers themselves optimizing for it and I don't think they're going to be optimizing for mobile anytime soon.

They're more likely to optimize for console and, and the GPU in meeting Lake is still not at the performance of a console. So. Hmm. I guess we going to rundown we don't have we can run down a few of the specs they showed eight you're coming out you're progressing well Zeke was actually cause the official name is actually. Thank you. I know. Yeah. Spelling wise, that's true.

And pronunciation. It's not something I do offhand. I did 128 vector engines, two geometry pipelines, eight samplers for pixel backends, and 16 256 bit vector engines. DirectX x 12 Ultimate Optimized.

They didn't show performance data, but they showed Forza Horizon five running at what river resolution and what never quality. I don't know. So I was sitting fairly near the time. Did you see that? And it looked like at least 1080 B However, they didn't mention whether it was super sampled up on, Right.

Yeah, I know so, so so so so and it could have been five $0.40 upscaled to 4K. Mm. No it wasn't. But no, I was happy in saying that that was at least a ten hp medium if not ten HP. Hi. Um, it is Forza. Yeah, right, exactly. You know, it's not, it's not a cyberpunk and it's not a starfield.

No, that was right. Didn't show off performance, did like I said, but they showed off some of the Oh goodness. I forgot the exact term that you say synthetic data. Thank you. Yeah. Yeah. So

the tests we used to do back in the day when GPUs were really bad. Good point. Yeah, that's right. Yeah. So things like Vertex processing, data triangle field data.

Yeah. It, you know, somewhat arbitrary numbers, but still 2x26x again. Yeah. We can show you the graph on that. You're always going to have synthetic data that is. Plus however, the way games are built these days, it's going to be a mix of that. It's yeah.

So I mean I've got a higher maximum clock, low minimum voltage to roughly two times the frequency of the same voltage and also a 20% power reduction when using machine learning workloads on the GP. Yep. Couple of things that I liked. HK 60 HDR 44 x 4k 60 HDR. So we're talking about for driving, for driving for monitors. I have actually four 4K monitors at home because I use it drive, I test Thunderbolt docks. So yeah, I mean it's things like that.

It's interesting to see what happens with that. 1080p 360 1440p 360 So again that's some nice, some nice numbers for gamers. Let's see what yeah, you can run your counterstrike at 360. Exactly. Off your laptop. That's right. Yeah exactly. So that's always a plus.

I mean I have a that way again, my son can stick this laptop and I have to bug me for it. That's. Which is good. What else we got here? I mean, there's a couple of other numbers we can throw at you. You know, the ray tracing was 2.6, four times what it was did and raptor

like in the sleep in classroom and blender a couple of other things with the support for various codecs I'm not sure if they matter or not that the supersampling they they give a performance metrics. So if you give a power metric and they're saving about 30% power per frame on the surface. So I'm playing I assume there's also a new algorithm in there involved not only with the efficiency of the architecture. And I think the other thing here is one of the new features they're adding is this sort of endurance gaming control feature.

So you're used to their features in other GPU softwares where you can fix the frame rate, whether it's 120 or 60. And typically you do that in a device like this because. You don't want to use more power than you have to, and you don't want to increase the temperature as much as you have to. So as part of the endurance gaming, our control, you can they will enable a max battery and advanced a balanced or a relaxed study for max battery or relax to 60. And the whole point here is that the number that they gave no pitch so all that Rocket League home run at the set, is it the three year max battery at one.

What. I think that's right. So you can play Rocket League with a one watt GPU. Yeah. We're going to have to measure that

Rocket league isn't the most demanding of games and we'll have to see what happens in a big multiplayer workload for that. As far as Power two, I think when they're talking about one of the things we're were talking about was the sort of the new enhancements in panel self refresh. So if there's a repeated frame, they just actually don't even wake the core. They just go ahead. Yeah, right So they do some frame cueing as well as there are too. So some smart decisions. I guess so.

So that's video decode what what we're talking about here is rather than decode a frame, show it, decode a frame, show it. It used to be your deep decoder frame. And then even if it wasn't ready to show that frame, you'd start decoding in the next frame.

And then you have a little bit of idle time. Now they can decode 16 frames and they go to idle for the rest of until it's time to decode the next one. Frame 17 So ultimately you can have an idle period of over half a second with 24 hour content, which the film fanatics will love. Right? But it also means when you're watching video on an airplane, it extends the battery life a lot better.

Say also to I don't know if this is important to mention, but the LPGA has its own style. The key ingredients are the medium display lives in the SLC style. So it's separate and then the display lives in the IO tile. So again, they're separating these things out to try to minimize power consumption yet. Yeah, it's because the CPU tile is is pretty big and video decode and encode is kind of a different process altogether. It's still graphics but it's a different process.

There's no need to wake up the GPU tile and that's what that stuff designed for. All right. We've gone through a lot. Anything else with media like so. So in terms of a little bit of the iris maybe.

Yeah. Yeah there was the IO. So the IO tile is which is a little one in the corner. Yeah. The that Yeah. That simply holds Thunderbolt and PCI. Right.

And it looks like there'll be different variations of that depending on whether that's true. Right. Yeah. Yeah. We talked about PCI gen five. Right. Yeah. Thunderbolt four is what I heard. Yep. Yeah.

It's not the end gig stuff, unfortunately, but that could come in the future I guess. Yeah. Iot out. So. Yeah, it's should be fun. Well, this is all the point of this disaggregated system, right? If.

If they build a different, a different function, different feature like Thunderbolt five or they need to fix something in a displayport engine, for example, they can just swap out tiles. And obviously you still have to send your product in to get it changed. But now I think what we'll see from this point on is that gen on gen advancements, you may not see everything in the chip updating, so you may get better CPUs, one gen, you may get a better memory interface the next gen. I mean I don't think that this kind of configuration with the five tiles will also have the final configuration. It's good for now.

I really want them to change some things from a from a business perspective, but it doesn't mean that they can do that and be effective and obviously different SKU out. I mean, we said broadly in this video, this is the huge 682 right. That could be a different one coming along later. I tell you one thing, as far as actually putting the stuff in print and like listing the various specifications of it, I mean, right now it takes up, I don't know, better part of three words just to describe the pre the chip. And as they start putting in different tiles, it's going to even more complex. I mean, it's just going to be it's it's a challenge.

It's I do wonder for how many generations we'll have to speak about tiles. And until it becomes a bit more common knowledge not anytime soon, especially because they'll be doing some enhancements generation and generation to drive down the power to do the better integration packaging and all that fun stuff. Um, and then what we're waiting on from that point is actual performance data and products and then lead partner devices. Exactly when it will hit the market. Intel is being kind of consistent in saying by end of year, right? Though we're not sure in what volume I think sees will be a big launch big launch for that.

So don't count on getting one before Christmas. But come Q1 we should start seeing a lot more into the market and I think that's going to be pretty important. So that was Meteor Lake. We're now going to shift gears a little bit and talk about what Intel showed us in Penang and come again assembly as part of the manufacturing process, but not one that normally gets talked about. Malaysia is also the first country that Intel expanded to more than 50 years ago. So it's important to talk.

It's important for both the company as well as employees to sort of show off what it can do. I think we should talk about the process. So yeah, absolutely.

Right now everybody, I think listening should be familiar with you get a silicon wafer, you print it, you print a design on it, and then that's what the fab gives you right now. What do you do with Right, Right. So part of it is die sort die prep. So you actually need to cut the wafer into bits and then pull all the different dyes out.

Then you need to test to see which ones are good, either electrically or functionally. Right. And then you actually need to package them. So that those three steps are just as important as the manufacturing in the chip itself.

Because without them, nothing happens. Obviously, after you assemble them, then you have to test them again. So it's yeah, it's dice or dye prep. Yeah. Testing assembly test again. That's it. And every CPU that goes through Intel does this.

I think the site said that they go through, they've been through 500 million computer chips since 2020. Think so. Yeah. So 500 million in three years.

Just, just for this. Just for intel in Penang and cooling but combined. Right.

Insane amount of volume and they shut it down for a couple of days for us to go see. Exactly. Financial people don't ask how much it costs. I mean they so just give you a bit of a description how it's just what the process was like. I mean, we took a bus out to Caerleon. I mean, we took a bus out to Caerleon. It's very tropical, sort of reminded me a lot of Hawaii, lots of palm trees, lots of, you know, and so forth.

Just sort of a very lush, wet environment. Then all of a sudden you have these sort of these really advanced facilities, advanced fabs just kind of sprouting from the grass area. Now, this is a high security place.

So we did not get to take cameras in. We couldn't take even in some cases we couldn't even take pen and paper. And so some of this is being done by memory. And it's not going to necessarily be I mean, from from my standpoint, not entirely accurate that I am. They almost threaten to take our eyeballs. That's right. Yeah. Convince them to keep us, kept us with.

What amused me is we had to go. We had to drive to one car park in a big rush and then go to smaller busses because the road. Because the coach wouldn't go through the road. Right? Yeah. I like.

Hang on. This is a multi-billion dollar facility. And they couldn't even get the roads and we're having to go down the road in what is the equivalent of a rascal van. That's what it was. Yeah, exactly. I sat in the I sat in the seat for pregnant mothers because it had like leg room.

Yeah. Right. Yeah. So I don't think pregnant mothers are allowed in the first thing. So. Yeah, exactly. So anyway, so yeah, so we had to and then we of course for some of the places we had to get in bunny suits. So you've been in a bodysuit before. If you go on to my child tech type potato, you will find a video on how to put on a bunny suit. Oh, well, there you go.

That was at Intel, Oregon. Oh, so perfect. Well, then, yeah, you have more experience than I do. So, yeah, we had to go ahead and sort of submit our at least I had to go ahead and submit my shirt size and so forth, pants size. So we had our we had a in some places we had more PPE. Thank you. That's my team.

Then they actually had our suits with our own name on. Right. As in, you know, a vacuum sealed bag. That's not every facility I've been in.

Typically, you normally just pick one off the shelf, but they had this tore down to a tee. There was so much manpower devoted to assisting us, getting our bunny suits off, assisting to get the mattresses up. They had people that were, you know, normal. You have like traffic cones.

They had individual people set up so we wouldn't go walking as elsewhere. I mean, and it wasn't just one or two. I mean, it was like eight. I mean, every time you walk down a corridor, right, And there were potential left, right, left, right. There was somebody there on every one. Absolutely.

Yes They want to make sure that we're not going to go wandering off. So, yeah, yeah, maybe me more than that. But yes, you didn't. Did you actually get away for this? You are. I don't think you did it. Do. Sorry. Did you get away for that. Chew on. Oh.

So during our tour, this is perhaps a little bit of an aside. Yeah. One person I tour did drop away before they actually dropped one.

2023-09-20

Show video