John Fanelli and Maurizio Davini Dell Technologies CUBE Conversation October 2021

Show video

(techno beats) >> Hello, and welcome to this special CUBE Conversation here in Palo Alto, California. I'm John Furrier, host of theCUBE. We have a conversation around AI for the enterprise, what this means. We've got two great guests, John Fanelli, vice president, Virtual GPU at NVIDIA; and Maurizio Davini, CTO, University of PISA in Italy. Practitioner, customer, partner.

Got VMworld coming up, a lot of action happening in the enterprise. John, great to see you. Maurizio, nice to meet you remotely, coming in from Italy for this remote conversation. >> Hey John, thanks for having us on again. >> Nice to meet you! >> I wish we could be in person face-to-face, but that's coming soon, hopefully.

John, we were just talking before we came on camera about AI for the enterprise. And the last time I saw you in person was in a CUBE interview, you were talking about some of the work you guys were doing in AI. It's gotten so much stronger and broader in the execution of NVIDIA and the success you're having.

Set the table for us. What is the AI for the enterprise conversation frame? >> Sure. So we've been working with enterprises today on how they can deliver AI or explore AI or get involved in AI in a standard way, in the way that they're used to managing and operating their data center running on top of their Dell servers with VMware vSphere, so that AI feels like a standard workload that an IT organization can deliver to their engineers and data scientists. And then the flip side of that, of course, is ensuring that engineers and data scientists get the workloads positioned to them or have access to them in the way that they need them.

So it's no longer, you know, a trouble ticket that you have to submit to IT and count the hours or days or weeks until you can get new hardware. By being able to pull it into the mainstream data center, IT can enable self-service provisioning for those folks. So we make AI more consumable or easier to manage for IT administrators. And then for the engineers and the data scientists, etc., we make it easy for them to get access to those resources so they can get to their work right away.

>> Quite a progress in the past two years. Congratulations on that, and it's only the beginning, it's day one. Maurizio, I want to ask you about what's going on as the CTO of University of Pisa, what's happening down there? Tell us a little bit about what's going on. You have this center of excellence there. What does that mean? What does that include? >> You know, University of Pisa is one of the biggest and oldest in Italy.

To give you some numbers, it's around 50k students and 3000 staff, between professors, researchers, and administrative staff. So we are looking into data operation that the center especially supports for scientific computing. And this is our daily work, let's say, taking us a lot of time.

But you know, we are able to reserve a percentage of our time for R&D. And this is where the center of excellence is coming out. So we are always looking into new kinds of technologies that we can put together to build new solutions, to do next-generation computing, as we always say.

We are looking for the right partners to do things together. And at the end of the day is a work that is good for us, is good for our partners, and typically ends in a production system for our university. So it's the evolution of the scientific computing environment that we have. >> Yeah, and you guys have a great track record and reputation of R&D testing, software-hardware combinations, and sharing those best practices.

You know, with COVID impacting the world, certainly we see it in the supply chain side. And John, we heard Jensen, your CEO at NVIDIA, talk at multiple keynotes now about software, NVIDIA being a software company. Dell, you mentioned Dell and VMware, you know, COVID has brought this virtualization world back, and now hybrid! (laughs) Those are words that we used basically in the tech industry; now you're hearing "hybrid" and "virtualization" kicked around in the real world. So it's ironic that, you know, VMware and Dell and theCUBE, eventually all of us together, are doing more virtual stuff.

So with COVID impacting the world, how has that changed you guys? Because software is more important; you've got to leverage the hardware you got, whether it's Dell or in the cloud. This is a huge change. >> Yeah, so as you mentioned, organizations and enterprises, you know, they're looking at things differently now. The idea of hybrid, when you talk to tech folks and we think about hybrid, we always think about, you know, how the different technology works.

What we're hearing from customers is "hybrid," you know, effectively translates into two days in the office, three days remote, you know, in the future, when they actually start going back to the office. So hybrid work is actually driving the need for hybrid IT, or the ability to share resources more effectively, and to think about having resources wherever you are. Whether you're working from home or you're in the office that day, you need to have access to the same resources. And that's where the ability to virtualize those resources and provide that access makes that hybrid part seamless. >> Maurizio, your world as really changed; you have students and faculty, you know, things used to be easy in the old days, physical and this network, that network. Now virtual's there; you must really be having a heavy impact.

>> Yeah, we have, of course, as you can imagine, a big impact in any kind of the IT offering, from designing new networking technologies, deploying new networking technologies, new kind of operation. We found that we were not able anymore to do bare-metal operations directly. But from the IT point of view, we were, how can I say, prepared, in the sense that we ran from three or four years a parallel environment; we have bare-metal and virtual. So as you can imagine, traditional bare-metal, HPC cluster, DGX machines, the multi-GPU's and so on. But in parallel we have developed a virtual environment, that at the beginning was, as you can imagine, used for traditional enterprise application or VDI. We have a significant Horizon farm grid for remote desktop or remote workstation that we are using for, for example, developing virtual classroom or virtual workstations.

And so this is what's the typical operation that we did in the virtual world. But in the same infrastructure, we were able to develop first HPC, in the virtual world. So visualization of the HPC resources for our researchers. And at the end, AI offering and AI software for our researchers. You can imagine our virtual infrastructure as a sort of whiteboard, where we are able to design new solution in a fast way, without losing too much performance. And in the case of the AI, we will see that the performance are almost the same with the bare metal, but with all the flexibility that we needed in the COVID-19 world, and in the future world too.

>> So a couple of things there, I wanted to get John's thoughts as well. Performance, you mentioned. You mentioned hybrid, virtual; how does VMware and NVIDIA fit into all this? As you put this together, okay? Because you bring up performance, that's now table stakes. Scale and performance are really on the table; everyone's looking at it.

How does VMware and NVIDIA, John, fit in with the university's work? >> Sure. So I think you're right. When it comes to enterprises, or mainstream enterprises, beginning their initial foray into AI, there of course is performance and scale, and also kind of ease of use and familiarity are all kind of things that come into play in terms of when an enterprise starts to think about it. And we have a history with VMware of working on this technology.

So, in 2019, we introduced our Virtual Compute Server with VMware, which allowed us to effectively virtualize the CUDA compute driver. At last year's VMworld in 2020, the CEOs of both companies got together and made an announcement that we were going to bring AI, our entire NVIDIA AI platform, to the enterprise on top of vSphere. And we did that starting in March this year.

We finalized that with the introduction of VMware's vSphere 7, Update 2, and the early access at the time of NVIDIA AI Enterprise. And we have now gone to production with both of those products. And so, you know, customers like the University of Pisa are now using our production capabilities.

And whenever you virtualize, in particular in something like AI, where performance is really important, the first question that comes up is, does it work? And how quickly does it work? Or, you know, from an IT audience, a lot of times you get the, "How much did it slow down?" And so we've worked really closely from an NVIDIA software perspective and a VMware perspective. And we really talk about NVIDIA AI Enterprise with vSphere 7 as "optimized, certified, and supported." And the net of that is we've been able to run the standard industry benchmarks for single node, as well as multi-node performance with about maybe potentially a 2% degradation in performance, depending on the workload of course, it's very different. But effectively being able to trade that performance for the accessibility, the ease of use, and even, you know, using things like vRealize automation for self-service for the data scientist. And so that's kind of how we've been pulling it together for the market.

>> Great stuff. Well, I've got to ask you, I mean, people have that reaction of about the performance, I think you're being polite around how you said that. It shows the expectation; it's kind of skeptical.

And so I've got to ask you, the impact of this is pretty significant. What is it now that customers can do that they couldn't, or couldn't feel they had before? Because if the expectation was well, does it work well? I mean, does it go fast? It means it works, but performance is always concern. What's different now? What's the bottom-line impact on what customers can do now that they couldn't do before? >> So the bottom-line impact is that AI is now accessible for the enterprise across their, I would call it their mainstream data center. Enterprises typically use consistent building blocks, like, you know, the Dell VxRail products, right? Where they have to use servers that are common and standard across the data center. And now with NVIDIA AI Enterprise and VMware vSphere, they're able to manage their AI in the same way that they're used to managing their data center today.

So there there's no retraining, there's no separate clusters. There isn't like a shadow IT. So this really allows an enterprise to efficiently deploy, and cost effectively deploy it, because there's no performance degradation, without compromising what their data scientists and their researchers are looking for.

And then the flip side is, for the data science and researcher, using some of the self-service automation that I spoke about earlier, they're able to get a virtual machine today that maybe has a half a GPU. As their models grow, they do more exploring, they might get a full GPU or two GPUs in a virtual machine. And their environment doesn't change, because it's all connected to the backend and storage. And so for the developer and the researcher, it makes it seamless. So it's really kind of a win for both IT and for the user.

And again, the University of Pisa is doing some amazing things in terms of the workloads that they're doing and are validating that performance. >> Maurizio, weigh in on this, share your opinion on, or your reaction to that, what you can do now that you couldn't do before, could you share your experience? >> Our experience is, of course, if you go to your data scientists or researchers, the idea of sacrificing for months to flexibility at the beginning is not so well accepted. It's okay for the IT management. As John was saying, you have people that know how to deal with the virtual infrastructure. So nothing changed for them.

But at the end of the day, we were able to test with our data scientists, our researchers, that the performance was almost similar, around really 95% of the performance, for the internal developer to our workloads. So we are not dealing with benchmarks. We have some workloads that are internally developed and apply to healthcare, music generator, or some other strange project that we have inside. And we were able to show that the performance on the virtual and bare-metal world were almost the same, with the addition that in the virtual world, you are much more flexible. You are able to reconfigure everything very faster. You are able to design solution for your researcher in a more flexible way, in an effective way.

We were able to use the latest technologies from Dell Technologies, NVIDIA, you can imagine, from the latest PowerEdge, the latest network cards from NVIDIA, like the BlueField-2, the latest switches to set up an infrastructure that at the end of the day is our winning platform for our data scientist. >> Great; it's a great collaboration. Congratulations, it's exciting.

Get the latest and greatest and get the new benchmarks out there, new playbooks, new best practices. I do have to ask you, Maurizio, if you don't mind me asking, why look at virtualizing AI workloads? What's the motivation? Why did you look at virtualizing AI workloads? >> For the sake of flexibility. Because you know, in the latest couple of years, AI resources are never enough. So if you go after the bare-metal installation, you are going into a world that is developing very fast. But of course you, you can afford all the bare-metal infrastructure that your data scientists are asking for.

So we decided to integrate our virtual infrastructure with AI resources in order to be able to use in different ways, in a more flexible way. Of course, we have two parallel worlds. We still have a bare-metal infrastructure.

We are growing the bare-metal infrastructure. But at the same time, we are growing our virtual infrastructure, because it's flexible, because our staff people are happy about how the platform behaves and they know how to deal with them. So they don't have to learn anything new.

So it's a sort of comfort zone for everybody. >> I mean, no one ever got hurt virtualizing things. It makes things go better, faster, building on that workloads. John, I've got to ask you, you're on the NVIDIA side, you see this real up close with NVIDIA. Why do people look at virtualizing AI workloads? Is it the unification benefit? I mean, AI implies a lot of things.

It implies you have access to data. It implies that silos don't exist. I mean, that's hard. I mean, is this real, are people actually looking at this? How's it working? >> Yeah.

So again, for all the benefits and activity that AI brings, AI can be pretty complex, right? It's complex software to set up and to manage. And with NVIDIA AI Enterprise, we're really focusing in on ensuring that it's easier for organizations to use. For example, I'd mentioned we had introduced our Virtual Compute Server, VCS, two years ago, and that has seen some really interesting adoption in some enterprise use cases. But what we found is that at the driver level, it still wasn't accessible for the majority of enterprises. And so what we've done is we built upon that with NVIDIA AI Enterprise. And we're bringing in prebuilt containers that remove some of the complexities.

You know, AI has a lot of open-source components. And trying to ensure that all the open-source dependencies are resolved, so you can get the AI developers and researchers and data scientists actually doing their work, can be complex. And so what we've done is we've brought these prebuilt containers that allow you to do everything from your initial data preparation, data science, using things like NVIDIA Rapids, to do your training, using PyTorch and TensorFlow, to optimize those models using TensorRT, and then to deploy them using what we call NVIDIA Triton Server, our inferencing server. So really helping that AI loop become accessible and that AI workflow, as something that an enterprise can manage as part of their common core infrastructure. >> Yeah, having the performance and the tools available, it's just a huge godsend; people love that.

It makes them more productive and again, scales up existing stuff. Okay great stuff, great insight. I have to ask what's next with this collaboration? This is one of those better together situations.

It's working. Maurizio, what's next for your collaboration with Dell VMware and NVIDIA? >> For sure, we will not stop here. We are just starting working on new things, looking for new development, looking for the next piece to come.

You know, the virtual world is something that is moving very fast, and we will not stop here. Because the outcome of this work has been very big for our research group. And as John was saying, the fact that Dell, all the software stack for AI are simplified.

It's something that has been accepted very well. Of course, you can imagine, researching is developing new things. But for people that need integrated workflow, the work that NVIDIA has done in the developing of software package, in developing containers that gives the end user the capabilities of running their workloads is really something that is, some years ago it was unbelievable. Now, everything is really easy to manage. >> John mentioned open source, obviously a big part of this. Quick follow-up if you don't mind, are you going to share your results so people can look at this, so they can have an easier path to AI? >> Oh yes, of course.

All the work that is done at IT level from the University of Pisa, is here to be shared. So we, as much as we have time to write down, we are trying to find a way to share the results of the work that we are doing with our partner Dell, NVIDIA. So for sure it will be shared.

>> All right, we'll get that link into the comments, John, your thoughts, final thoughts, on the collaboration with the University of Pisa and Dell VMware and NVIDIA, where does this all go next? >> Sure. So with University of Pisa, we're absolutely, you know, grateful to Maurizio and his team for the work they're doing and the feedback that they're sharing with us. We're learning a lot from them in terms of things we could do better and things we could add to the product. So that's a fantastic collaboration.

I believe that Maurizio has a session at Vmworld, so if you want to actually learn about some of the workloads they're doing, like music generation, they're doing COVID-19 research, they're doing deep multilevel, deep learning training. So there's some really interesting work there, and so we want to continue that partnership with the University of Pisa, again, across all four of us, the university, NVIDIA, Dell, and VMware. And then on the tech side, for our enterprise customers, one of the things that we actually didn't speak much about was I mentioned that the product is "optimized, certified, and supported."

And I think that support cannot be understated, right? So as enterprises start to move into these new areas, they want to know that they can pick up the phone and call NVIDIA or VMware or Dell, and they're going to get support for these new workloads as they're running them. We are also continuing to think about, we spent a lot of time today on the developer side of things in developing AI. But the flip side of that of course, is that when those AI apps are available, or AI-enhanced apps, right? Pretty much every enterprise app today is adding AI capabilities, all of our partners in the enterprise software space. And so you can think of NVIDIA Enterprise as having a runtime component. So that way, as you deploy your applications into the data center, they're going to automatically take advantage of the GPUs that you have there. And so we're seeing this future, as you were talking about the collaboration going forward, where the standard data-center building block it still maintains is going to be something like a VxRail server.

But instead of just being CPU, storage, and RAM, they're all going to go with CPU, GPU, storage, and RAM, and that's going to be the norm. And every enterprise application is going to be infused with AI and be able to take advantage of GPUs in that scenario. >> Great stuff, AI for the enterprise. This is a great CUBE conversation. Just the beginning; we'll be having more of these. Virtualizing AI workloads is real; it impacts data scientists, impacts the compute, the edge, all aspects of the new environment we're all living in.

John, great to see you. >> Thank you, John. >> Maurizio, good to meet you, all the way in Italy. Look forward to meeting in person. And good luck in your session. I just got a note here on the session. It's at VMworld; it's Session 2263, I believe.

And so if anyone's watching, want to check that out, love to hear more. Thanks for coming on, appreciate it. >> Thanks John for having us; thanks Maurizio.

>> Thank you. >> It's a CUBE Conversation; I'm John Furrier, your host. Thanks for watching. We'll talk to you soon. (mellow tech beats)

2021-10-07

Show video