High-Performance Computing Supercomputers AI Quantum and Cooling Intel Technology

Show video

(inquisitive electronic music) - [Announcer 1] Welcome to "What That Means with Camille" companion episodes to the InTechnology podcast. In this series, Camille asks top technical experts to explain in plain English commonly used terms in their field. Then dives deeper, giving you insights into the hottest topics and arguments they face.

Get the definition directly from those who are defining it. Now, here is Camille Morhardt. - We're doing "What That Means," high performance computing or super computing today.

I'm really looking forward to this topic. I've got with me James Reinders, who's a supercomputing or high performance computing engineer at Intel. He is also the oneAPI evangelist for Intel. Welcome to the show James. - Thank you for having me. - So as usual on "What That Means," can you please define for us what is high performance computing or HPC? And also, is it interchangeable with supercomputing? - It's been known by a lot of terms over time.

Very simply put, it's a concept of building, kind of the biggest, baddest fastest computer you can to solve very large engineering complex engineering, scientific computational problems. - So has the architecture of the machine changed over time or is it just more and more and more sort of cores, if you will? - Like everything else in computing it changes over time. We think way back to the earliest days of computing. Computing was used to solve scientific and compute problems but we just called them computers or mainframes and so forth. The idea of a supercomputer came about maybe in the mid to late seventies.

There started to be concepts that maybe we could build a machine that was a little more expensive, a little more complex, than your average machine to do your business processing including read your punch cards and balance the books. And out of that there's been many different changes in supercomputing over time. One of them, by the late nineties, we settled one argument. There was a concern that you couldn't really scale by adding lots and lots of cores. So there were some people trying to build individual processors that were super, super fast and very few of them and others saying, "hey I can hook together thousands of them." By the late nineties, standard supercomputer changed from being an exotic built machine to one that consisted of thousands of off-the-shelf processors and it kind of ended that argument.

Of course, we've seen two big changes since then. One is we've gone to multi-core and the next one has been accelerators. Specifically, GPUs have been an important addition to supercomputers for a big part of the last decade.

- I should've mentioned in your intro too, you have been part of the team or responsible for a number of different supercomputers that have been ranked in the top in the world. Some of them for many years running. Is that right? - Yes. One of them was ASCI Red which when we assembled it in 1996 became the number one supercomputer in the world by the top 500 ranking and it was a multiprocessor machine with a little over 9,000 P6's in it. And that settled the argument about whether computers were going to be massively parallel or not. - The other thing I'm wondering is, can you talk about the evolution sort of, of use cases or workloads that super supercomputers have been used for from the time they sort of started to kind of now and projections into the future? - In the early days for supercomputers one of the biggest motivations for them were what you would call military loosely, whether it's computing ballistics or doing weapon design of the most horrendous proportions in terms of destruction capability.

That was the reason people wanted to do a lot of computation and that continued pretty well into the nineties. Since then, supercomputers get used for a lot more than that. In fact, I think the majority of use for supercomputers or high performance computing these days, is not military. We see things like energy exploration, trying to figure out where to drill for oil or how to design a wind turbine better, how to build a better vacuum cleaner. Instead of building physical wind tunnels you're able to do aircraft design, automotive design and really refine it.

Rug simulations, all of these things. What they have in common is they do simulations, they do simulations of the real world and the more compute powered we get the more realistic our simulations can be to tell us about what the weather will be or to do climate forecasting, to look into what may happen with climate change or design an aircraft or try to figure out more things about COVID so we can combat it. All of those are problems that you'll see people using supercomputers tackling these days.

- Some of this kind of modeling of very complex things like weather or disease spread or mutation, I've heard referenced as future use cases for quantum compute. So can you explain, is HPC or high performance computing going to become eclipsed by quantum or do they somehow fit together in a future world? - For a long time we called them supercomputers because they were exactly that. They were the superest computers we could build, if you'll excuse the broken English. But along the line somebody decided the term high performance computing was even cooler. I don't know, again, referencing high performance.

So the concept with quantum computing is to be able to use this very spooky action at a distance, entanglement as the basis of computation for certain types of problems. I just think of it as another cool way to build a supercomputer. That said, quantum computing is pretty specific, the type of problems it can solve. It may not be the best way to solve every problem, time will tell. Of course we need to figure out how to build them at scale, stand the promise of being phenomenally amazing at modeling the real physical world. Some of the first uses will clearly be modeling of molecular dynamics, different things in chemistry and those are incredibly important in solving problems.

So yeah, I think quantum computing as it matures will become another form of supercomputing. I don't think it'll displace all the other architectures it'll just join the fold. - So it sounds James, like you're talking about supercomputer or high performance computing, is kind of a generic term.

Like you said earlier, biggest baddest computer, doesn't matter. The technology that's in it, if that shifts over to become quantum in the future or if it's some sort of a hybrid or you'll have different ones for different kinds of approaches. - Absolutely.

- Is high performance computing something that exists like in every big company in university? Is this something that a consumer would have access to via a cloud? Where are these things located and how many are there? - There's a couple of ways to look at it. One is, if you're looking for one of the top 500 machines in the world, obviously those are fairly rare. There are a lot of ways to make those accessible. So, some of them end up in government labs with limited access, some end up in national labs or universities with considerable access. I work with a lot of different places around the world like the Texas Advanced Computing facility in Austin. They're part of a program that makes that compute power available to students all over the place who can apply for grants.

And the grants are basically giving them compute time, in where somebody has to pay for having bought the machine and paying to keep it turned on but then they turn around and make that time available for lots of different science projects. So if you're in a lot of programs around the world getting access to supercomputers is something you may find at your universities or your institutions. The other way I look at it, a cell phone qualifies as what a supercomputer was capable of 25 years ago in terms of compute.

You look at the amount of compute power we can pack into a laptop or into a small desktop machine, it's amazing. And lots of companies have small racks of systems that have very powerful processors or very powerful processors with GPUs and these are fairly affordable. So we're seeing things that I would've thought of as supercomputers even a decade ago being things you can go click on a website and order up from your favorite compute vendor and it's amazing. And so lots of small companies are using these for simulations. We talk about crash simulations which is a way to test the straining of metals or the straining of materials of any type. And people who build products look for their durability and rather than building prototypes these days people draw 'em up in CAD tools and apply different crash methods to 'em and it's an affordable way for a lot of companies to do what they used to do physically.

So we do see a lot of that and frankly I consider that supercomputing although it wouldn't show up on the top 500 list anymore. - Can you rent out time at a server farm of supercomputers? Is that a thing? - Absolutely. That's been a fairly niche field. You might go see if you could get access to a supercomputing facility or you might find a specialty company. But now more and more we see the cloud vendors offering some HPC capabilities.

And the thing that really distinguishes an HPC computer from just a normal large cluster is how high performance the connections are between the compute capabilities. So if I have a lot of different nodes, that's what we tend to call 'em in a compute, where we have several processors maybe some processors with some GPUs, they all share memory in a way. They might be on one board but when we connect them together typically the connections are fairly low performance, but in a supercomputer you invest more expense at connecting them together fast.

That's what enables us to write a program that uses thousands, tens of thousands, sometimes hundreds of thousands of cores of other either CPUs, GPU's, FPGA's. In order to do that, they have to be able to communicate with a lot of bandwidth, low latency, a lot of performance. We're seeing cloud vendors often offer instances like that.

You name it, all the cloud vendors are venturing into that and it's a very affordable way to get access to high performance computing. - Is there a difference when we think about security for a supercomputer or high performance computing versus another kind of computer? Or I'll extend it to include privacy and trustworthiness. I mean is there any different way of looking at that kind of compute? - I don't think fundamentally there is. In other words, when I think about running a workload and securing the data and keeping it private a supercomputer fundamentally is very similar. There's a couple of things I think of though that make it a little different. One thing is supercomputers don't tend to allow multiple applications to run on the same node.

That provides a little security. Obviously you still have to secure the borders but supercomputers, since people are trying to get the ultimate in performance they don't tend to wanna share their nodes and multitask, that's inefficient. So when you're actually running a workload you run it full bore on, at least the part of the machine you're on. But the other thing is, supercomputing of course by its nature is the most computationally intensive capability we can put in one place. Exploiting or tearing through privacy, those concerns get multiplied when you put more compute power together. So I think among HPC folks talking about privacy and ethics it's a very important topic and one that I hope all of us as engineers are concerned about.

- Can you talk a little bit about how this intersection of artificial intelligence and AI workloads and machine learning workloads are coming together with HPC? - Yeah, it's super exciting but you won't find me calling them AI workloads. I look at AI as a technique. So I will very commonly say AI is a technique not a workload.

And to explain that a little, if you look at problems that we might wanna solve, molecular dynamics which is a simple concern of simulating the world of a lot of molecules bouncing around. Maybe some of those molecules make up a cell membrane, some of them make up a virus, some of them make up a drug that's trying to interact with the virus and stop it from going through the cell wall. You bounce those around and you have to inject some randomness.

And in those simulations we tend to do things called Monte Carlo operation. There's been some very interesting work some of it from CERN, looking at can you replace the Monte Carlo operations with a neural network that was trained, that's AI. And basically they took a neural network, a GAN network, or I should say GAN 'cause the N stands for network. And they trained it, by letting it watch Monte Carlo operations and then they basically plugged it in and said, "behave like what you saw."

And the results were really exciting. It was able to do simulations that seemed to give us comparable answers at a fraction of the compute power. So they're looking at using that, possibly to simulate the next generation Hadron Collider detectors, so that they can keep looking for what happens when you split into subatomic particles.

We've seen this in weather simulations. People work very hard at algorithms to ascertain what the weather's going to be but they also have done AI training on critical parts of the weather model and seen it perform the same or better. Detecting things like atmospheric rivers and other weather phenomenon that affect our weather. And so AI is taking its way into what people would call traditional HPC workloads as solving parts of the problems that were solved other ways before. So I find it a very exciting intersection if you will, of AI techniques with traditional HPC techniques.

But I would just eventually call all of it high performance computing because it's all about solving those problems. - So high performance computing is centralized by its definition, right? You're putting a bunch of stuff physically together, a bunch of compute physically together. How does it operate or is it just completely orthogonal to distributed computing? Is there any way to have distributed high performance computing? - The ultimate thing to do is do a lot of computations that are independent and when they're independent I can say, "hey have this part of the computer solve part of the problem, this solve another." Those can be as far away as you want them to be, very distributed. The problem is is when they get their solutions to go forward with the problem, we usually have to exchange a little data, do a little communication, and that's where being close together by physics the way we know it today they have to be physically close together.

Light, which is, we believe the fastest thing right now, only travels about a foot in nanosecond. When you're trying to exchange data, you can't afford for your wires to be very long or your distance is traveled. So supercomputers are often, have somewhat of a circular, or they're very tight. People are very concerned about how far apart they are, how far the wires are that connect them and so forth. And when you're designing your problem, if you're using tens of thousands of cores across the machine and many cabinets you try to exchange data with nodes that are close to you because if you go from one end of the computer to the other it takes longer. So you can distribute the problem, it will slow it down, it will slow down that exchange of data and that's why supercomputers tend to be all in one place, at least the part of the supercomputer that is gonna run a tightly connected computation on it.

- How much heat do these things generate and how do you cool them? - We used to, that several decades ago, assume that the cost of air conditioning or cooling a computer was about the same as the cost of running it. Nowadays they have a ratio and I actually don't remember exactly what it's called but that would be called I think a 1.0 or well maybe that's called a 2.0. But the concept is, takes one unit to run the computer, one unit to cool it. Nowadays people try to talk about 1.4 or less than that,

trying to get the cooling costs down. One of the ways is. we don't cool computers as cold as we used to. We let them run a little warm and people talk about warm air instead of cold air. That has its disadvantages but it lowers the power consumption.

Some computers are still air cooled. A lot of them are water cooled and what that exactly means can be, vary a lot. Some of them, the water is piped in to a unit very close to it and then turned into cold air and blown over the system. Others, the compute board will have a heat sink on it, a very flat one and another board with water running through it and they'll be clamped together to do the heat exchange.

I'm not aware of any now, that drip water over the circuitry but there have been computers built like that in the past. They're kind of a nightmare. I knew a repairman that once said he felt like a plumber more than an electrician when he worked on those computers 'cause he'd have to shut down. But yeah, water is a better way to, to evacuate the heat. So water is often involved to pull the heat away in order to keep the computers dense enough so that we don't stretch those wires out too far and slow down the computer. - So what other kind of major challenges like that are there when you're dealing with high performance compute? - The biggest cost in running a computer is moving data around.

It used to be the computation, but now moving the data from one processor to another, to a memory, to a disc, the moving of the data dominates the power consumption in a machine. In some very realistic ways, there's a lot of things going on to try to make the memories higher performance closer to the processor, trying to reduce the power or increase the efficiency. So a very hot topic is things like high bandwidth memories and how do you connect those. You know, nowadays the things we used to call chips are not a single piece of silicon anymore.

There are devices out there with over a hundred pieces of silicon in 'em, all connected together. We still call 'em a chip because we're so accustomed to that. And the reason there's so many is you have processing capabilities, you have data storage memories, caches, high bandwidth memory and you're trying to put them all together in a package and everybody's doing some form of that. It's a very exciting development but the desire for that is to shorten the leads, increase the bandwidth, try to reduce the power that's consumed, moving data around so that you can lower the cost of running the machine, increase its performance at the same time.

- Really fascinating. James Reinders, am I saying your name correctly? - Absolutely. - James Reinders like after the Rhine River from a former conversation we had. Thank you so much for joining us.

You are a supercomputer, high performance computing engineer at Intel and also a oneAPI evangelist. - [Announcer 1] Never miss an episode of "What That Means with Camille" by following us here on YouTube or search for "InTechnology" wherever you get your podcasts. - [Announcer 2] The views and opinions expressed are those of the guests and author and do not necessarily reflect the official policy or position of Intel Corporation.

2022-11-23

Show video