The future of liquid cooling for data centers
hello thanks for coming um I'm Jason zyler I'm the liquid cooling product manager here at HP today we're going to talk about the future of liquid cooling I hope that you like this session uh afterwards if you have any questions I'll be off to the stage just off the side so feel free to to come and grab me so we'll get kicked off here um one of the things I always like talking about just to really set the stage for what's happening in the industry um broadly in both Enterprise and HPC is chips are changing right for the last 20 years even really the last 5 to 10 years we've seen pretty flat power curve for what chip densities are for TDP the thermal design power so you can think about you know for the last 3 to four years what was a high-end Intel CPU Nvidia GPU AMD CPU for CPUs we were talking about kind of 200 Watts that was pretty hot today it's around 300 watts gpus around 500 watts but there's some very interesting things happening in the market right now this power War trend is increasing density very quickly our friends making these amazing products are moving into kind of some new design uh methodologies really 3D silicon stacking so we're starting to see more stuff packed into a small footprint which means the power is going up but there's some other things happening with cooling so you know this line GPU power is going up cpu's power is going up that's not really a big surprise if you've listen to different presentations from different groups we're going to see thousand watt and Beyond gpus very quickly but the thing that's not often talked about is the tease or the Silicon temperature that's really the maximum temperature these chips can get up to so if we think about chip from last generation we can get these chips very hot they could run at 90 95 nearly 100 Celsius and they would continue running I've seen some te cases as low as 60 Celsius and so that's kind of a corner case but it definitely isn't going up chips are becoming less tolerant of very high heat because there's a lot of components happening inside of that package and that's what we're going to talk a bit about today why that makes it a really good case for liquid cooling now whenever I talk about the value prop like why do you care about liquid cooling why is this interesting in the past the top of this triangle has been very easy part of the story for me to tell if you want a 350 W CPU running flat out all the time liquid cooling is the way to go cools it very consistently there's a lot of good kind of performance goodness there but efficiency is becoming a really important part of the story I'm starting to get more and more questions from folks in the finance department or the cfo's office saying how can we use less power how can we pay less for Opex over time efficiency is a really good part of that story too liquid cooling for the most part is going to use substantially less power at the rack but in the data center overall to exchange that heat we think about all the fans inside of the servers well what if we can remove them what if we can run them at idle that has a really good efficiency element now the other part of this is density so when we talk with customers about rack densities today often 17 Kow is like the highend for them and what that means is many of these racks are near empty but what if we can fully populate racks we'll need more power and we'll need cooling but we're going to need a far smaller Center overall so density is kind of a third part of the story we talk about with liquid cooling now you know we you know it's always interesting we talk about all these big systems that we build HP we're building all this amazing exascale class technology really big systems what is interesting is they're liquid cooled we build them rack and roll and we ship them directly to customers for Enterprise that hasn't always been very important for AI it's going to be very important we're going to build very high density racks we're going to build them liquid cooled and we want them to have a really good positive customer experience we want them to be ready to go when they show up and so that's where we're going to leverage all of this experience building these really big exoscale uh class systems and supercomputers and we're going to apply that to AI now I was like showing kind of some of this basic information to just show like what is table Stakes to be expected with liquid cooling we did a kind of a quick comparison of our xd200 server so this is kind kind of a higher density HPC product we took an air cooled server and a liquid cooled server and right away when we benchmarked them against each other there was nearly a 15% decrease in chassis power now that's not that surprising the fans run at idle right so we put coal plates on the CPUs and the gpus the servers don't have to run very hard this does not include what happens at the data center if we don't have to use perimeter cooling the savings is much more substantial but I was like showing what's happening at the chassis level we actually noticed a mild increase in power or performance as well because the chips are being cooled so consistently so we didn't have any hot spots in the rack we're able to see very consistent cooling at the chip so there was a bit of a bump at performance and so when we kind of combine these numbers together we come away with we get about 20% more performance per kilowatt so each kilowatt that we put into the rack because we're using so much less for the actual cooling infrastructure we get to use that for performance so it just tells a nicer story you know at the end of the day so when we take a higher level uh view of this this is always really interesting to talk about what are the things to think about what are the real benefits of liquid cooling at a more macro level so when we look on the left here when we talk about just cooling costs air-based cooling versus DLC we did this example 10,000 server cluster we assumed you know some variables for the electricity cost this is primarily kind of a North American or us power rate so if you thinking about a European uh example it would be drastically different weight their power costs are going to be nearly four times the cost the electricity cost just to do a lot of the cooling for an air-based U data center in this configuration just over $2 million per year when we looked at the liquid cooling costs it was a fraction of that of 300,000 so that's kind of a magnitude of what we would expect kind of for Roi for the CFO office why would we want to move towards liquid cooling there's a lot of goodness baked into the energy costs but what we're talking a lot about in the industry today is carbon emissions carbon footprint kind of uh energy neutrality when we look at that from a CO2 emission standpoint that is also pretty substantial so we could expect about 8,700 tons of CO2 being released to All That Power where we're going to get that electricity from versus 1,200 tons so we're just taking the power consumption relating it to CO2 emissions um this is always fun talking about households so you know a lot of these Big Data Centers today you know they'll use about the same power as 2,000 homes that itself is pretty large but if we can reduce that to 280 homes that is something that we're a lot happier with for our energy consumption and now the last one here this is probably one of my favorites is density so we talked about the triangle why density is interesting for this same system 10,000 servers using kind of this power rate and some of the assumptions for how much we could actually pack into a rack we would need almost five times as many server racks to do it in an air cool configuration as we would for DL C so just kind of a quick example of what this means for smaller data centers overall and kind of Greater efficiencies now why HP um I always like showing this visual because it shows really our involvement and what's been happening in the industry over long periods of time liquid cooling may seem like a new idea but really HP um cray um and a lot of our partners have been doing this for a long time so you can see way back in the' 60s IBM was doing some interesting work with immersion um and kind of where I started putting the bubbles where cray research started entering the space cray was doing Refrigeration based cooling cray was doing immersion we were doing a lot of coal plate designs but today we're a combination of all three of those companies so cray uh we look at SGI and HP we have been building Innovative liquid cooled systems for decades and that is really where we are today everything that we do in HPC or high performance Computing we're trickling that technology down to all of our Enterprise applications same coal plate design same kind of cdus the coolant distribution units same kind of manifolds but we test it and we prove it in HPC and then we roll it out to the broader market so the stuff I I was like you know tangibly what is the stuff that's liquid cool today from a server I'll show you the servers and then I'm going to show you the bigger Hardware um in the prant group uh the prant 360 365 and 380 35 um those are available with liquid cooling you can check them out on their Booth today we have all the cool plate systems there um in HPC the XD 2000 and the 6500 the 2000 is a CPU based platform and the gpus um are in the 6500 those today are one of the I would say most popular AI systems that we're shipping a lot of we could do four-way systems eight-way systems all liquid cooled and those are going to be one of our Front Runners and then everything we've done in cray um actually as table Stakes these systems are only liquid cooled and have no fans and I'll show you a bit about them but that is where a lot of the industry is going if we can we want to reduce the Reliance on air Cooling in the server as much as possible and move to co plate cooling because of all the Energy Efficiency reasons I mentioned so the other stuff uh again what's really exciting is I have almost everything here on the HPC show flum floor so if you go towards the AI section this stuff is there you can touch it and feel it see how big it is um so we'll start on the right and we'll kind of move towards the left here when we think about liquid cooling today this is really the perfect example a lot of cool plates cool and distribution units no fans this is the absolute best density and Energy Efficiency story when we move into the middle this is HP cray XD and Proline today we have a lot of flexibility because we're using direct liquid cooling on all the hottest components the gpus and CPUs but we're using air cooling for dims capacitors VR all the other little stuff and so it's a bit cheaper but it allows us to have a lot greater flexibility for skews what you'd like to put in racks and then what's on the left liquid to air cooling so this is where there's a lot of kind of different opinions in the industry and this is where I'll I'll put mine anything that requires facility water the building needs water and it gets pumped to the racks I consider liquid cooling so rear door heat exchangers and arcs to me are also liquid cooling we're not using coal plates but we are taking facility water we're Plumbing it directly to this hardware and we're creating cold air very close to the it and so if you're able to go that barrier of adding liquid into the data center you have a lot of options all of this is at play so we'll talk about kind of the the the spread so today you know especially in Enterprise almost all of the racks are below 20 Kow 7 Kow 17 kilow that's kind of where a lot of that it plays but where we're going with higher chip densities 350 500 watt CPUs quickly we can move into that 40 Kow range but AI easily is going to be between 60 and 80 depending if we're fully populating racks so what is popular with HBC today is going to become very common in AI as well now customers absolutely can buy one box two boxes but for that kind of Maximum efficiency of putting a lot of stuff in a rack building a smaller data center it's going to create higher density racks which we'll need to think about Cooling and so you can kind of see the the spread of how these different Technologies play at different power levels but I think really this is kind of that sweet spot where where a lot of AI systems are going to go between 60 and 80 Kow which means you have a lot of choice for what kind of cooling Technologies you want to use so let's let's talk about each one especially if you've never seen them before um and again you can go and touch them but liquid to air cooling rear door heat exchangers the main takeaway because I'm going to go through this quick is rear door heat exchangers are all about neutralizing the hot air coming out of the rack to decrease the thermal footprint of these Solutions in the data center so imagine today you are using 10 Kow racks and the data center is really designed to handle that much for air air um handling you want to add some AI racks they are now 40 Kow 30 kilow of additional heat often can overwhelm the data center for what their air handling was designed for rear door heat exchangers take facility water in they cool down a large coil and the servers are going to push their hot exhaust air through it and it's going to cool it back down to a room neutral temperature so as far as the data center is concerned this has Z thermal footprint you still need a cold aisle management system you still need hot aisle containment because air is moving around but this allows the rack to exist in an older data center without overburdening it with air infrastructure so I always like to include just a quick cfd model just kind of helps drive home you know how this technology really works so we go one level up we start talking about HP Arc this is the Adaptive rack cooling solution or cooling system similar to rear door heat exchangers we're going to take facility water we're going to cool down a coil but the major difference is this is going to generate cold air on the inlet side and then the arcs is going to suck in all the hot air so this is pretty similar to like in row uh coolers that you see in the data center today for every few racks there might be something making cold air but it's pumping it out to the data center kind of broadly ARS is actually tying one of the arc's coolers it's in the middle two four racks and they're contained kind of like a pod structure and so it's actually very quiet to to walk near one of these because they're acoustically sealed and they trap all of the heat so we've actually had customers that deploy these in warehouses very non-typical data center environments because it needs no external air handling but similar to before data centers that are struggling with air heat this also will have zero thermal footprint from an air handling perspective but we can go a lot higher for rack densities so kind of a quick again visual you can see the arcs unit is creating cold air providing it both sides left and right to the racks they're sucking it in like a normal data center but unlike arcs it's also sucking in all the exhaust heat and pushing all of that temperature uh increase to water so this would be kind of a 100% liquid to a solution so as we start talking about DLC this is again the really cool stuff with coold plates that this has the really high Energy Efficiency story we have these on the floor you can touch and feel them for the most part kind of as like a general statement we capture about 70% of the server Heat too liquid and 30% needs to be handled by some kind of air handling unit still your hot aisle containment or you can marry this with a rear door heat exchange or arcs but we take about 70% of this into direct liquid cooling now when we look inside this technology this is my kind of one thing to give you today you're at the at the bar you're talking about direct liquid cooling this is kind of one piece of interesting technology what's happening inside of these coal plates the technology that's most often being used in the industry is skyed fin coal plates so if you can think about an air cooled server today you know you can look at any server and you could count the number of fins visually it's easy to count them Sky fin cold plates also are using fins but they are much much smaller so for example often we will have 100 fins per inch so often these coal plates when you're looking at them it's kind of difficult to actually see that there's gaps between the fins but this density allows us to create a massive surface area and that's why liquid cooling is so effective we're taking a very small piece of Hardware but we're really making it very large because we're allowing a lot of liquid to touch the fins and that's where we get all the thermal transfer so tonight you'll sound really smart when you're talking about skyin coal plates how they're made um and and even how they make this it's worth looking on YouTube to see it but they will really take a copper block a sharp blade and they will peel individual fins back over and over and over again it's not that complicated but if you didn't know you know nobody else will know this it's kind of my one gift to you for tonight now when we start to talk talk about the other building blocks inside of every rack um there's going to be two Loops of liquid so this is one of the things you know most folks want to learn about is how is actual liquid being moved throughout these racks the main takeaways is there is a liquid cooling loop on the primary side the building that's what the building is responsible for it's almost always water and what's happening inside of the rack is its own closed loop and all of the solutions that we ship are propylene glycol 25% PG 75% water the main reason we do that is for health over long term we want these systems to run very reliably we don't want any weird stuff to grow in them we're able to tackle that really easily with propylene glycol the other pieces that are in all systems is a coolant distribution unit so inside of this CDU mounted inside of the rack is a CDU they weigh about 150 PBS they're 4 unu in height they manage the pumping they manage the temperature they manage a reservoir they manage pressure and so they're really the heart of any liquid cooling system CDU watching what's happening inside of the racks now what's really interesting is for customers that are you know going even Beyond you know 40 kows a rack they want to go higher and they want to do 100% heat capture to liquid they want no heat Management in the data center we will often marry direct liquid cooling that's 70% per rack with an arc system so Arc is managing all of that 30% of all of these racks so think about four 100 Kow racks we can take take 70 KW of each of those racks into direct liquid cooling and then we'll manage the other 30 Kow with arcs now that's an extreme example you can scale it up or scale it down any way you'd like but when we combine them all of that heat ends up into facility water that's a really good Energy Efficiency story and really high density now I'll touch on this really briefly we do a lot of very high-end systems all we've done in exoscale this is the infrastructure and we have them kind of also shown this is the cray ex 4000 each of these cabinets 100% liquid cooled 400 Kow this is kind of as as dense as we go very power hungry but very capable systems but we also build a smaller version so just uh two years ago we have one of these on the floor to look at this is the ex2500 single rack scalable exas scale technology fully liquid cooled but we shipped these in single rack so just a bit of a different spin on the technology um what these blades look like no fans there's not a single fan in this infrastructure which allows us to get very high Energy Efficiency Antonio talked at the keynote about top 500 majority of the top 500 systems are this infrastructure but we're also dominating the top five the top 100 of the green list because they're so energy efficient that's kind of one of the nice things about liquid cooling you get Energy Efficiency and performance and density it's kind of all part of the the product uh and then lastly all the all we do in um CX also we do liquid cooling for the fabric all of the sling shot switches are also 100% liquid cooled so just something kind of interesting now kind of as I I wrap up here one thing I always like to talk about a bit is also immersion immersion is very um popular right now in the industry we're talking a lot about immersion HP we primarily are doing direct liquid cooling and the main reason is those racks are for the most part behaving like an air cooled rack density grows vertically we can build very dense racks very easily service them rack and roll we can ship them to to customers fully cabled all the hoses filled with liquid all networking we can't do the same thing with immersion we do immersion we do immersion offerings with customers today but the primary systems we do is direct liquid cooling and I believe this is going to be the future for the kind the next several years now last thing how do you get into liquid cooling really this is I think you know it's not hard to get people excited about direct liquid cooling liquid solutions but after kind of the excitement settles the question is well how do we actually do this again our data center has X capacity and we want to go beyond that well the first option you know obviously is retrofitting it can be expensive but it can be done the one thing about a lot of existing data centers is there is liquid somewhere in the data center may be being used for perimeter cooling some kind of refrigerant often we can use a lot of that infrastructure to start tying into liquid cooling systems but what we talk a lot about today is planning for new data centers and this doesn't always have to be a brand new building concrete you you know foundations we do a lot of work with pods Data Centers inside of shipping containers fully self-contained including their kind of liquid to air uh heat exchanging systems out to atmosphere and so that's something really interesting we talk about the third one is cocation I believe this is where we're going to see a very fast growth in the market is collocation data centers that are enabling very high-powered racks with liquid cooling like I said I don't find it very hard to get people excited about liquid cooling even in terms a lot of companies believe this is the path but administratively it can be difficult to get a new building built get all the permits approved so if we can actually sell a system fully deploy it but in a collocation data center often that removes some of the hurdles around that immediate where does this thing live this 60 kilowatt amazing machine but that has a lot of cooling and power requirements and then the fourth one is don't buy the hardware at all we can do HPI cloud or supercomputing as a service where it's really a pay to place service you don't have to own the hardware at all so there's really really kind of a few choices you can play in this space um but I hope they enjoyed that fast and furious lesson on liquid cooling like I said um I'm kind of here off to the side answer any questions um and I hope that you found that valuable thanks so much
2024-06-23 09:04