Breaking boundaries in advanced cooling Technologies of the future

Show video

perfect hey good morning my name's Jason zeiler I'm going to be talking today about breaking boundaries and advanced cooling the Technologies in the future that hpe and kind of other groups in the industry are looking at I'm the liquid cooling product manager and also the product manager for Next Generation infrastructure so how we're planning for the future from exascale down to Enterprise and everything in between so I'm going to level set kind of talk right at the beginning a thing we always talk about is expectations for energy usage what's happening in the data center we always like to jump back to 2006 when the EPA released a report essentially saying power was going to be rising at a dramatic Pace that was really going to be kind of setting off the alarm bells for the doe you know anyone who generates power they were looking at data centers saying this is going to be the major consumption kind of users around the globe and even today you hear a lot of that in the news and the line the trend line was really showing some alarming rates data centers were going to have some runaway alarm usage and we were going to have to do something about it now the nice thing is when we jump ahead to 2016 10 years later we were able to see that that trend line really plateaued it flattened out and this was the result of really two driving factors one was multi-core technology there was a lot of innovation happening at the chip level to reduce the energy usage that we able to produce much more energy efficient chips and not not have to have kind of runaway power issues in the data center in the same way but data center Innovation itself was also happening more Innovation around inro coolers free cooling adiabatic cooling there was a lot of changing happening in the data center now where we are today it looks very similar to that kind of plateaued line you know around 2006 but there is a major change happening in the data center today and it's this yellow line I'm about to show with the power War Trend so again we see 2000 to 2010 we're at that single core technology multi-core from 2011 to 2017 but where we're going today is much different Power War Trend our friends at AMD Nvidia Intel there are some amazing new products coming out but there are some power implications that we are seeing and this is something we talk about again you see it in the headlines with exascale technology very high-end supercomputers but this is also an important issue for everything at Enterprise one or two servers inside of a rack can also generate a tremendous amount of we're going to talk a bit about that so the when we talk specifically around I would say GPU and CPU power in today and I would say even last generation of chip technology a high-end CPU might be around the kind of 200 250 watt range that would be kind of high TDP processors Next Generation we're going to see CPUs Beyond 500 watts gpus today kind of a 500 700 watt Max we're going to see gpus in the next five years exceed 1000 Watts now that itself is not a major surprise TDP has been increasing at a pretty steady rate for the last decade but one thing that is changing is how the chips are being designed again some amazing technology that is going into these chips is increasing the importance for cooling and decreasing their thermal uh I would say resistance and so the line I show here the Silicon temperature of the T case in really Layman's this is the maximum thermal temperature that these CPUs and gpus can withstand so today I would say a pretty typical T case might be around 95 Celsius a CPU can get very hot and still run nothing blows up everything runs very smoothly in Next Generation technology there are going to be some chips not everything but they're going to be a lot of chips that are going to have a much lower T case I've seen T cases as low as 55 Celsius so a 40 degree drop is significant so if you can you know do some quick napkin math imagine in the data center today air cooled data center purely let's say the facility air temperature is 30 Celsius maybe let's say 25 Celsius if the T case before was 90 that's an enormous gap between those two temperatures where the CPU could get pretty hot do you need that efficient cooling technology maybe not but if the T cases 55 to 60 that Gap is much more narrow we need much more effective Cooling in order to capture the Heat and bring it away and so that's why I kind of built this this box unsustainable with air cooling there's a lot of technology that is going to be I would say possible to cool with air cooling but is going to be very difficult and is going to be very expensive so today's talk is really about liquid cooling and this is kind of where we really want to spend our time on is what are the value props of liquid cooling where is the industry going and why are we doing this so traditionally this has been an easy sell for HPC performance has been the name of the game we can run very high TDP processors on turbo mode for very long periods of time but what we talk a lot about today is around efficiency especially for kind of any CFO or any Financial folks in the audience this is all about Opex how can we reduce electricity costs throughout the data center every day and this is really around cooling when we talk about Poe or power utilization rates it's all about the energy used against the energy used for compute and we want those to be as close as possible and with liquid cooling we can get very close another important element is around density in kind of the most simple explanation if we can build more compact racks higher density fully populated we can build smaller data centers overall that's more of a cap Story how can we really build better data centers lower electricity usage capture more heat so the stuff I want to kind of focus on this because we talk a lot about you know the value props of liquid cooling but it's nice to show some tangible pictures what is liquid cooling and what are some of the things to be thinking about I would say short-term and long term so on the left are very prevalent Technologies today all of these require facility water and this is really when we talk about liquid cooling liquid to air cooling essentially takes facility water and we create cold air similar to Data Center Technologies today these don't require any special servers they're cooled servers but removing cold air very close to the server so we're going to talk about each one of these individually on the far right is fanless direct liquid cooling all cold plates this is what we find in the crayex today kind of exascale computing the very highest in Energy Efficiency very high TDP processors in the middle is kind of a hybrid or a mix where we're doing direct liquid cooling on about 70 percent of the components and then we use air for the rest of the components we have a tremendous amount of flexibility in what components we choose and then kind of in the middle here we talk about immersion this is where servers are fully immersed in a dielectric fluid capturing all of the heat but has its own pros and cons so quickly on each one especially if these are kind of newer Technologies to you rear door heat exchangers are exactly that they mount on the rear of a rack they're all about neutralizing the hot air that exits the rack so if you can see and you can kind of imagine the cold aisle infrastructure the data center is already making cold air it's being pushed through the front of the racks like normal for a moment where that red arrow appears that's the hot air coming out of the servers it's then immediately neutralized through a rear door the main value prop for rear door heat exchangers is around reducing the thermal footprint in the data center so if your data center already has I would say tremendous strain on air cooled infrastructure cracks or craws kind of around the perimeter this is going to allow you to deploy some of these higher density racks better Energy Efficiency but have very little thermal footprint on the data center but what it doesn't do is provide Inlet air so you still have to put this in a very standard data center location in order to have liquid cooling now arcs is I would say rear doors to the next level arcs very similar to rear doors are going to take facility water into a coil you can kind of see the cfd model at the top here spinning but unlike a rear door where it only neutralizes the back end it's also creating cold air on the front so it is creating cold air which is getting sucked into the servers and then it's sucking the hot air back into the back end of the arcs also a highly energy efficient solution you don't have to worry the same way about kind of perimeter cooling but another nice thing about this is these can be deployed in very non-typical data center environments this does not require any air handling outside of these racks they're completely sealed with acoustic foam so they're quiet to work beside but they also capture all of the Heat so when we start talking about direct liquid cooling how hpe in a lot of the industry deploys these Technologies is in these kind of closed loop racks so you can see this rack itself is completely closed it contains 25 propylene glycol very stable fluid excellent bio side protection excellent Corrosion Protection but it is kind of one closed loop system on the bottom is the primary side so this is what's on the facility side this is often water it could be ethylene glycol but it is most often water the water from the primary side is pumped from the facility into the bottom which is a CDU or a coolant distribution unit inside of this and this is I think an important concept to kind of always get across is these two fluids never mix they do interface through a plate to Plate heat exchanger so that's where all the heat exchanging happens but they are completely different fluids and why this is so important is for kind of some of the points that I had highlighted here was we can rack and roll these Solutions and I think that's one of the kind of Big Value props that liquid cooling is leaning into kind of in this generation of HPC Enterprise and exascale is being able to build these systems in a factory fully cable them fill them with fluid test them ship them direct to customers versus US kind of scratch building things on site that's a trend that I really see is how can we deploy more modular technology that's ready to go you need to plug in power networking and water and these systems run so the direct liquid cooling itself the image I'm showing here this is of one of the cray XD servers it looks very similar for the proliant liquid cooling essentially the aluminum air heat sinks are no longer used we're taking these copper coal plates that I'll show you what a copper coil plate looks like on the inside but we're putting them right on top of the CPUs and gpus to capture the majority of the Heat and then sometimes we also have the option for memory cooling so depending on kind of what the end user is looking for how much heat capture in a nutshell the thermal difference between with memory cooling or without memory cooling can be about 10 percent CPU cooling on its own can be about 65 of the total server heat load captured into coal plates and with memory cooling we can add about 10 more percent one of the big things that we always talk about with liquid cooling is it has to be resilient and very easy to use and so everywhere we use these direct liquid cooling systems we deploy these dripless quick disconnects so they can be connected and disconnected with one hand no drips within the racks but it allows you to service the servers very easily so in addition to really just the the inlet and Outlet line exact same servicing for these servers compared to an air cooled server and the tubes are all flexible so they can be rotated out of the way to access the CPU and GPU and the memory as well so cold plates itself this is one of the things I like to just show because it really you can all leave here feeling a little bit smarter you can talk about this at lunch what's in a direct liquid cooled system the technology that is I would say most prevalently used is called skiving it sounds like a fancy word it is essentially we take a copper block we take a very sharp blade and we peel back very thin fins over and over and over again and it allows us if you can think of what an air heat sink looks like the gap between them is quite large these fins often we can get about 100 to 150 fins per inch so they are extremely dense but it allows us to create a tremendous amount of surface area for the coolant to capture all of the Heat and so this is one of the things I was like talking about this is what is inside of most liquid cooling systems that uses coal plates there are other technology vendors that use different stuff but this is what is inside of the crayex today when it's inside of cray XD and is what inside of a proliant so you can kind of talk about that skiving it's a kind of a buzzword you can take away today so immersion immersion is a technology we see a lot of today one of the big advantages of immersion is it's 100 heat capture we can fully dunk servers into immersion tanks capture all the heat they're also very quiet so similar to arcs it's a really easy system to work alongside the downside is these are not off-the-shelf servers we have to in every case customize these servers take out some components we can't use spinning disks we have to look for compatibility within cables even networking cables how are these cables going to live long term brittleness factors what is all compatible so there's a lot more complexity with immersion systems today hpe offers no off-the-shelf immersion systems but we do have a wonderful OEM group that works directly with customers when they are motivated by finding an immersion vendor that they want to work with we can have kind of that three-legged process to work with our OEM groups to retrofit the servers and work within their immersion environment but immersion is not one of the technologies that HP offers today so I'll finish on kind of talking about the stuff on fanless Direct liquid cooling so this is I think where a lot of the industry is going and this is the the I would say the Holy Grail there are trade-offs in when we work towards fanless direct liquid cooling um this is really wonderful that we have one of these demo systems here um just at the compute area we have one of our HPC crayex systems that is pretty much this exact system no fans within the system 200 kilowatt rack densities very dense all liquid cooling now the pro is obviously extremely quiet to work alongside you can have a very normal discussion power supplies CPU memory everything is liquid cooled it is also very heavy so that is one of the trade-offs of these systems is as data center operators start designing for liquid cooling some of the infrastructure becomes a bit heavier so that's one thing we always have to consider the other thing is around flexibility in the design and that's kind of where we'll jump to with this kind of shot here is as we go towards higher Energy Efficiency and more liquid cooling there is often a trade-off with the design parameters and so for customers that are looking for I would say um kind of fixed design parameters they know where they're going they're going to be designing full racks at a Time Rack and roll direct liquid cooling works wonderfully because we can do all the work in advance have excellent thermal conductivity and just kind of works really well within the data center the trade-off though is you need a lot I would say less flexibility in picking components and rapidly changing things out so for example if you are playing with pcie cards different storage form factors liquid cooling really doesn't work in that that way that you can swap them all out have a kind of heterogeneous setup within within one rack but it is one of the issues that we continue to explore is where today direct liquid cooling is really a homogeneous environment racks are all the same stuff they're either all compute or they're all accelerator they're all one thing or another but but we are starting to see more developments especially within our groups a very heterogeneous setups how can we control flow thermals even density within a rack but with a very different technology and that's kind of one of the areas I wanted to kind of end on today is what are we looking towards for the future for for liquid cooling across the industry and these really apply to again exascale Enterprise and everything in between is customers are very concerned about increasing thermals you know I kind of start with the shock and awe at the beginning 500 watt CPU is a thousand watt gpus those are quite concerning for sure especially if kind of your groups are looking to utilize kind of top TDP processors or even Explore some of the the bottom end stuff 200 watt CPUs I think are going to become table Stakes very quickly and when you fully populate a rack that's a lot of thermals we can be playing in the 40 kilowatt space very quickly and so increasing chip densities are really one of the discussions that we have every day I would say that the rest of our day today my day will be filled with meetings specifically around this Rising Opex and capex this is another one that's really important Opex quite simply electricity usage how do we power the fans that are going to be cooling these 50 60 70 kilowatt racks a little bit easier with liquid cooling but even capex how are we going to build these data centers that are quite expensive using very expensive chips in information in you know land is becoming more expensive especially in we talk with our friends in Europe there's not a growing amount of real estate that we can build in so the name of the game is about kind of capitalizing on what we have and even in the U.S that is becoming I think a more prevalent discussion universities research centers are not always willing to move to another state so they can just kind of build they want to work within their existing kind of land leases to see what can we get out of this real estate but even what are we budgeted for power that's been an important topic maintenance has been a very big discussion when we start adding liquid cooling to the system and if you have one rack liquid cooling quite easy to take care of what happens when you have 100 liquid cooled racks rack density with the health of a coolant is really important so how do we take a 25 Pro propylene glycol mix and ensure its health over 100 racks in hundreds of gallons of quantities that's something that we are starting to explore with customers that are entering a liquid cooling space for the first time quite honestly once they kind of cross that barrier it is something that becomes kind of table Stakes all data centers are building out their planning they're kind of our health and safety manuals around how do we handle liquid cooling which itself is a kind of an inert very safe liquid but still requires kind of a new level of training and then the other one is lack of infrastructure standards so when you're going to buy or build many many racks that are going to have liquid cooling and over time you may want to mix kind of different vendors how are you going to do that the one thing I would say with confidence is vendors will not mix within the rack but vendors will mix within the data center so you could have OEM vendor 1 and OEM vendor 2 having different racks within the data center but within the rack itself I foresee no mixing for warranty the Big W that's going to be the biggest kind of point we'll see is warranty issues are going to drive that and so I'll end on today and of what customers are going to require for those customers that are already using direct liquid cooling very commonly the temperature of their water is between 17 and 27 Celsius here in the United States where customers want to go is beyond 32 Celsius in Europe they will will not answer our bids unless it is 40 Celsius which is very hot and the reason for that is energy usage if we can decrease or Reliance on chillers substantially it makes an enormous difference on our electricity usage and what we can also do for heat reuse can we reuse that heat pump it somewhere else in the building that's a very interesting use case more room neutral cooling how can we ensure that whatever we're adding for more racks in the data center doesn't have a significant impact on the other racks that are there so when we talk about rear door or arcs that's kind of the name of the game ensuring that there's no kind of additional thermal footprint so I'll kind of end there today I would love to kind of have any follow-up chats we're chatting right on the side um kind of in our session over here but if you have any questions yeah please follow up but yeah thanks for attending the session

2023-06-23

Show video