Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Hi, and welcome to this talk on spark, kubernetes, and vmware, vsphere my name is Justin Murray and I'll be introducing my co-speaker, in one second, and we're, very glad to be here at the spark a I summit, 2020, our. Title. Today is. Simplify. And boost Apache spark, deployments. With hypervisor, native, kubernetes, well that's, quite a mouthful, but we're going to be talk talking here about a very tight link between kubernetes. And the, VMware hypervisor as, a basis. For running, Apache spark. So. My. Co-speaker, on the next slide is. Enrique. Koro an Alaskan Rica to introduce himself briefly, here. Happy. To, be here we just love, you thank you, thanks. Again and I belong, to the cloud services, business unit within VMware, which, is actually running. VSphere. Technology, our core hypervisor. Technology. On. VMware. Cloud on AWS, which will mention, briefly here, as well, as running it on premises, and, that'll. Be a little novelty in this talk for you here, so. Let's. Move to the next one okay, so what's our motivation for this talk VMware. As you know has been in business for about 20, years or so and there's really served the needs of the IT administrator. To be quite frank we've, given. Scalable. Infrastructure. Hybrid. Cloud infrastructure, and by that I mean both on-premises. And in, the cloud on VMware cloud on AWS, and on other hyper. Scalars we've, given that infrastructure, for years and made it easy to manage and easy, for, users to consume that infrastructure, but. We know there's a community, of developers out, there on. The, next section and those. Developers, really. Need a place to build their applications. That. Is a reliable, scalable and cost-efficient, way to do that and largely. Developers. And DevOps, people, are building containers. Today, so. They're using tools, their own to do that and the, major platform. On which you're running containers is, kubernetes, so. This. Talk really, centers, around running, spark, and kubernetes together, which you could argue is displacing, Hadoop from its traditional, Big, Data role but. What, we've done here. In the vSphere 7, release is, integrated. Those kubernetes, services, that you see on the bottom left here with, all of the other services, that you find network. Services, compute, services, storage services, they're all first class citizens, in VMware now. Kubernetes, is a first-class citizen as well it's tightly integrated, into, the control plane of VMware and the, idea is to integrate, and harmonize, the infrastructure, for, both the administrator.
And The developer, so, that the developer, can say to a, these, fair environment, give me a kubernetes, cluster please, I want to use that for about a week or two I want to build my application and, deploy it into kubernetes, and then I want to tear it down and put. It into production on, an even different kubernetes, cluster so, all, of that is now on target. For, VMware vSphere and I, got to hand over to my colleague andrey, k to describe, that in more detail, enrique. Ok. Ok, i'm going. To start by talking about vmware. Vsphere, with, kubernetes which. Is a renew game. Work platform, designed. To bridge the gap between infrastructure. And application, development. We. Have that from version 7, this. Fear incorporates. Kubernetes. As a series of negative processes. Within, the hypervisor, this. Allows the rapid provisioning of, developer, services such. As the container runtime, and, registry, networking. And, persistent, storage volumes all, these services are consumable, from, the standard kubernetes, api which. Is very important, for developers, nowadays. The. Integration, of kubernetes, and the hypervisor. Improves. The vSphere, administrative. Productivity, and allows, IT operations teams, to focus. On. Christian infrastructure, attributes, such as performance. Security, available. The cost, and troubleshooting, at, the same time, DevOps, teams get cell service environments. But allowed them to call test. And deploy and, support. Modern, applications, with great agility, let's. Consider that the. Container orchestration, approach offered. By kubernetes, also, applies to spark, which. Is officially supported. Sorry which is officially supporting, to burn it is a Sun, Orchestrator from spark. 3 version. Now. I will. Talk about the, importance of a new platform designed, to build grown and manage modern. Applications, such as a spark on, top of properly managed enterprise great kubernetes, platforms. At. The heart of VMware Tansen, we, have the tons of kubernetes tree also known as TKG, the. Times of kubernetes breed provides. A consistent upstream. Compatible. Implementation. Of kubernetes, which, gets tested sign, and supported, by BM where neither. Deploy tons of kubernetes trees across your, piece fear clusters, and, also across Amazon, ec2 instances. We. Are working to extend TKG support for. Mark public cloud providers, besides, AWS. We. We. Also planning, to support mark, kubernetes. Flavors. In the future the. Transfer kubernetes Creed has a native, of ordinance of multi. Cluster paradigms, and this allows you to manage, any number of two. Entities cluster, from a centralized, location location. Which has, many. Administration. Advantages. Here. Installation, our IT. Operations teams. Can manage their time suitable Nettie's cluster, from, the d7, user. Interface on the, Left panel you, can see the hierarchical, organization. Of the data center, following. A top-down, order, we find the physical hosts, group by bhisma. Clusters, inside. This we. See a new grouping component, called namespaces. You, can think, a namespace, a supporter resources, dedicated to, one one. Or multiple, tensor to burn a disc Luster's the. Wrong tunnel showed, that it takes about the status, and number. Of kubernetes clusters, running. On the cloud infrastructure, the. Panel also shows the capacity. Allocated. For the namespace, and, how much of the allocated resources are. Being used. To. Simplify the deployment. Of the, operations. Of vSphere, seven and ten suitable Metis clusters, VMware. Exclude, all the infrastructure pieces together within. The helper compares, platform, called VMware. Cloud foundation. He. View. Of the physical architecture, of, the platform, you can deploy a cloud foundation, on a wide range of supported, vendors in, in. The past two years we, have worked with in. Solution. Leverages. Different. Interrelation. Technologies, for, machine learning and Big Data the. Spark may greatly, benefit, from, these hardware. Components, to see incremental performance gains, how. A foundation, integrates a computing, networking. And storage layers, of the hybrid cloud infrastructure, following. A standardized, validated, architecture. Mr., protector gets automatically, deployed and lifecycle. Management. Components. Included. With the solution the. Left side of the and left, side of the picture we. See the operations, module, of cloud foundation, called the management domain from.
The Point, IT, operations gets. All the tools needed to operate the hybrid, cloud environments. Including. The tansu kubernetes clusters, as, shown on the right side, of the picture, development. Teams such, as that engineering, and the science can, take control of the kubernetes resources. Using, a standard API here. We see a typical view of an end-to-end analytics. Pipeline, with, apache spark at, the core. We. Nettie's clusters available, for developers, it is possible to deploy many, open-source, applications. Using the bit nomicon charts if, you are not familiar with hell, you. Can think of it as an open-source package, management solution, for kubernetes. Young. Charts allows you to deploy and remove software using. Very simple command line instructions. As. The compliment, be dmg continuously, monitors, and updates, a, catalog, of more than 100. Open-source. Applications, to, ensure development. Stacks and are. Always, up-to-date and, secure here. We show part of the catalog, of open source technologies, offered by bitNami as. You can see on the right apache. Spark is, also, part of it. Now. Let's see how all these works together in, a brief demo. Here. The review of the demo plot, first. We, will explore the, new communities, in the Bissman interface, designed, to, manage kubernetes, resources, then. We will deploy a new kubernetes, cluster using. The command line interface and. We will verify the status of this newly created cluster. Then, we, will explore the big namaskar charts catalog, which, includes a part a chart, for apache spark, next. We'll, deploy apache cluster, using the Khan chart finally. We'll, verify the functionality of, the newly created spark. Cluster okay. Let's explore the new kubernetes, capability. Is incorporated. In, the visual 7 management, interface, here. A view of the of, the cloud infrastructure, components. At the top we. Have data center objects, and the. Typical resources, they manage. Within. The data center object, we see a new element called namespaces. Which. Every dates at Copernicus posters, from. You. Can monitor, the status of the kubernetes, components. The, number of cluster deployments. And the resource capacity that, the kubernetes clusters, are consuming. Now. Let's deploy, kubernetes. Cluster name, k-8 for spark using. Just, one TKG comment. We. Here, we see TKG create, cluster, command in. This case it is running in dry run mode so, we can verify the cluster, specification. Before. Being built the. Specification. Defines things like the kubernetes version, to be used the. Configuration. Of the network and, storage services, and the, number of control, plane and worker parts, that, will, support the cluster operation, now. Let's run the TKG command, for. Real to, spin off the new cluster, notice. That, a manual deployment, of kubernetes, can take a good bunch of comments and here. We only need to run just, one comment to do the job, wait. For a moment and the. New kubernetes cluster for, spark gets created, now. We can use the TKG get, command to create the status, of the new cluster, we. Do this several times until. We see that the control plane and the two worker thoughts have, reported. Us running. Now. Let's verify the kubernetes, cluster operation. By, deploying a couple of engine exports on it. It. Is time to use the cube control command, to deploy nginx, from java file one. Script control gets executed we. Get confirmation that, the engine exports got deployed. Then. We use v control a couple of times to. Create the expert. Status, until. They, get reported. As running. Now. Let's meet the bid Nami got a lot of young charts which includes, charts. For Apache spark. Minami. Provides a catalog of curated, containers, and charts, for thousands, of open-source applications. With. Apache spark included, here. We see their options available to, deploy spark. Either on docker or on kubernetes. If. It. Takes us to the github repository for.
The Spark cam chart here. We can see an example of, the two helm comments, required. To spark. On kubernetes. We. Can also the deployment can be customized, by modifying, the spark charts configuration, parameters, the. List of parameters includes. Things, like the, image registry, the, network service, port, numbers, CPU. Memory operations. For the master and workers, and the, number of workers replicas, there. Is a total of 97, parameters. Available the tale of the deployment to, your needs. Now. Let's, Apache, spark on the kubernetes cluster previously. Created for, this purpose we. Will for install spark using only two hmmm. Comments. We. Start by adding the Vietnamese, charts opposite Ori to the local helm records. Next. We, proceed the film is stored command to make it to make a new deployment call, spark. After. Several seconds we, get confirmation that spark. We. Are, giving some references, about how, to launch, the, web UI and, also have to submit jobs. Next. We use cube control, to verify, that the, spark boards are working. We. Keep doing this until, we see that the master and the workers, are all up and running. Thank. You we. Switch to where you are, to. Verify this the sparks. State. From. This interface, script. The second, we. See that no applications, are running and not. Completed, because, the, cluster is new and we, have confirmed that the cluster status, is, alive. Finally. Let's verified that. The, spark cluster deployed, and kubernetes is operational. By executing, the job here. We use the cube control, execute, command, to submit a PI number estimation, job. Available. From the examples, jar file that, comes. With spark. Estimation. Task gets launched, for. A total of 100, iterations, when, iterations, get complete the, result is printed, in the screen as you can see then. We, switch back to the spark, web UI and. We. The last application we. Click on the app ID, I'm, verifying that that, the job state is finished which, indicates that the job concluded, in a normal way so now, that we've seen how to deploy spark, on kubernetes, let's. Take. The testing, up a little bit into, heavier. Workloads. And we. Did that in our performance, engineering, lab and I'm. Going to describe that now so this is testing, spark on kubernetes, for, performance. So. We. Wanted to test spark. On kubernetes, versus, spark stand-alone, that is spark running outside of a hadoop outside, of yarn just, using the spark cluster manager, to. Manage it so, we had the same setup of hardware for both of these same, virtual, machines same, hardware, same, conditions, same test suite, but, in one test we were running spark stand-alone and in, the subsequent tests we were running spark on kubernetes, and we were trying to find would, there be any impact impact, on performance and also trying to see what, benefits, did we get from spark running on kubernetes, and as. I mentioned before kubernetes. Is a resource, manager, so it's largely, taking, the place of legacy, big data systems, here, so. Here's, the architecture. Of spark on kubernetes, as you see here, in, this case we, were running the spark submit, not, to the spark master, but, to the API server, in kubernetes, which, is now acting as the resource manager, and we. Run the spark driver, different. A little bit different to this diagram we run the spark driver on the same virtual machine as a spark as the kubernetes master. But. The executors. Were being spun up on the, fly on the spark submit command, so, you'll see this a bit more in the next slide. So. This, is just the same picture blown up so, you can, choose whether your spark driver runs, in a pod in it in in your kubernetes. Cluster or. In. The spark cluster or, you can run your driver on the client side that's called client, mode and we actually use client, mode here but. The, functionality was the same client. Mode allows you to execute remote, from your kubernetes, cluster and, driver. Our cluster, mode would allow you to run the driver within, your cluster, and have everything together so. The. Communication, that you're going on here to say schedule, a pod etc, that's all being done within the same virtual, machine in our kubernetes, case here but the executor, --zz are running, in pods and they're being fired up on the fly here, so. Next. Line, so. This, is the architecture, at the hardware level and at the software level all in one and the. Four rows here host one two is four represent, for second. Generation Intel. Xeon cascade. Lake servers, quite, powerful, servers, with two. Sockets. In each one until. Platinum 80 to 60 and. 2.4. Gigahertz, with. Hyper-threading on which we recommend you. Have 96, logical, cores or hyper threads and 768. Gigabytes, of memory so decent, sized machines, here but not the biggest machines in the world by any means on, each of those we.
Ran For spark, worker virtual, machines and on, the first host we ran the spark master, and smart driver together, as I mentioned the spark driver is now, outside. The cluster, to some extent so. Spark. For the for the spark master, VM we had eight virtual, CPUs, and 64, gigs of memory quite, a small virtual, machine actually and for the spark workers, we gave them a little more power they had sixteen virtual CPUs, or V CPUs and 120. Gigs of memory each and so. In total, on the first host we had four, times 120. That's for 80 gigs for the workers and another, 64. For the SPARC master, making five, hundred and forty four gigs allocated, on that first host now, we're going to fill those empty slots, on the host two three and four in when, we deploy kubernetes, onto this and that's going to be the next picture, that you'll see. Remember. The same hosts, the, same virtual machines in, all cases it's. Just now that instead. Of being just a spark worker the, individual, VMs, for VMs, that are look, alike on each host are now kubernetes, workers, so. Same, hardware. But this time we have three kubernetes. Masters, this is to simulate, highly. Available system, and we have an H a proxy, running on post for there in the first DN so. We, had three extra we, have we have three extra virtual, machines in this case in the, first slot on each host and. These. These kubernetes. Workers. Same, sized VMs. The. Masters had eight virtual CPUs, the workers at sixteen and notice. In red on the bottom left hand side one, virtual machine. Represents. One kubernetes, worker and we, assigned one spark, executive, pod to each worker, node and, one. Or more spark executives, of course can run inside, an executor, pod so. Very, simple, design here of a very. Simple. Approach to doing, this for. Uniformity. Across the two environments so. That's that's how we set this up now a few notes on the on the next one. The. SPARC submit, which. We typically. Supply. To the SPARC master, it can call a kubernetes, master, instead of a spark master by putting kata as. The prefix to the URL or, your IU given, what, we did in preparation. For that was create a private namespace. Just, as you do in regular kubernetes, with, which we call spark, and then, in that namespace we, created a service account also. Called spark and we created a cluster role binding, to allow that role, or that, service account to actually, edit and therefore. Create pods in the cluster within. That namespace so these, are standard procedures that you would apply if, you're setting up our back for, your kubernetes. Cluster nothing, unusual here. So. We. Used we, we use cluster, mode here which is the spark driver runs in the cluster we, also use, client, mode another experiment. So both worked fine on vSphere. Next. Slide please, so. These. Are the results of the tests and this was ResNet 50 which is an image classification test. Running, on top of spark. If Intel, big-deal libraries, and, a program written using Intel big GLS the driver. Enrique, mentioned some Intel. Software at, the start we worked closely with Intel, on, increasing. Performance. Both. Both, running on the same machines, with a varying number of virtual machines higher. Higher, is better on these charts and the blue represents, spots stand alone the orange represents spark, and kubernetes as you can see they're within 1 percent of each other now, the, number of images per second here is very low because this is not GPU, enhanced, deep.
Learning This is regular. Cpu-based. Deep, learning and, that's. That's. An experiment, to drive a lot of traffic through, this rather, than a test of deep learning it's a it's trying to saturate, the system as much as you can we could and you will see that when, we go to the next one but my, main point in this section is that performance is roughly the same whether. You're on SPARC standalone just running in virtual machines or SPARC, running in kubernetes, on virtual machines. Ok. So. Having, done that, we. Wanted to look at some other things. And. Here's. The kubernetes, console, and the, purpose of showing you this is really to show you under. The CPU, requests, there that these CPUs are working very hard they're, at 95. Percent and, above and also. That you can use a standard kubernetes. Dashboard, to look at your virtualized, kubernetes, just, as you would if it was running elsewhere, we, also have a console, of our own called tandue Mission Control and the tanju, brand that Enrique mentioned at the beginning is, a whole family of products including tanzy Mission Control that can look at your kubernetes, clusters, whether they're running on vmware vsphere or, running in the cloud on AWS, or running on vmware, out on AWS, any, of those can, be controlled, by tons, of Mission Control ok. Let's. Go to the next one so, having. Done that performance. Test now we wanted, to go back into training and say could we use part for training on VMware, and. We. Took. An example of a tool here, which, does. Training and took. The output, from that tool which is a Java, object and. You see this, setup. Here actually, this is in VMware cloud on AWS, and the user interface although. I am using the bright background rather than the dark background that Henrique was using you. Can tell this is VMware cloud an AWS, because right, in the center of the screen it shows you the. Domain in which we're operating which is uswest and then, on the top left hand side the, address mentioned is VMware VMC comm which means this, is VMware running on the public cloud on AWS, hardware, and those. Six. Machines on the top left-hand side of the navigation with. Their IP addresses, $10 cetera those, are physical machines, in an AWS, data center running VMware vSphere but. The reason that I highlighted the in red here is this is the virtual machine running the. Machine, learning training, tool that I'm going to show you in a second it's not an unusual virtual. Machine it's just got four virtual. CPUs, in it and. 50. Gigs of memory so, it's not a. Typical. Virtual, machine is quite a normal one and we, brought this across from the on-premises without, changing the virtual machine we run it on premises, and then run it run, it on VMware tyldum AWS, as well so.
Here's. The user interface, of that tool it's a very nice user interface I'm, not going to go through it in detail this, is h2o, is driverless, AI tool which does training based on principally. Tabular, data and, we wanted to show you two forms of data being processed here, tabular. Data is very common in business image. Data is what a lot of deep learning is about but a lot of business runs on tables, and this is tabular data for, credit cards and we're, trying to predict whether somebody, would default on their next payment that's, the left hand column but. I'm not going to through the go. Through the details of the training here instead, we're going to hit the deploy button on earth in the middle of the top there and generate. A Java object. From this training session and deploy it into spark so, when we hit deploy we. Get a Java object which is in a stores, terminal, like Oh Simon, ology called a model. Optimized, Java object, or a mojo. Having. Got that, pipeline. That mojo you see it on the third line of the docker file on your right hand side there we're, going to copy that pipeline. Model. Optimized Java object, mojo we're going to copy that into our container and then, we're going to run a. Rest. Server in, which this is going to execute that that, pipeline is going to execute just, for testing purposes, just to simulate the life of a data scientist, here so we created our docker image, we tagged it we pushed it to a repository by, the way there's a repository inside, VMware's kubernetes as well called harbor parth of the fans new family and then. We tested, that docker container, on its own by simply doing a docker run but. More interesting than that was deploying. That same thing that same container. Image, into. Kubernetes, and you, can see a kubernetes, coupe cut will apply there on the second from last line and the coop cut will apply node port which, first, one deploys, the score that, we've just made, into, a docker image and the second one surrounds. It with a node ports, as we can get at it from outside the kubernetes cluster so. This is simulating what a data scientist, might do just to bring up a test in, kubernetes, of their. Future. Spark, object, or future spark container, so. Now, let's move on to a more serious deployment. Of that in what's. Known as and. This by the way is the rest server running and the lines, at the bottom indicates that the, prediction. The score is running, so this is spring, boot executing. A rest, server within, the container and being, executed, against. The VM here, so. Now. Let's, go back to SPARC and h2o. Happens, to have, a flavor of their technology, that works with SPARC is called sparkling, water sparking. H2o and you see it here and we deployed that same pipeline that same model optimized, Java object. Into sparkling water and into, standalone. Spark both running under. Virtual, machines with kubernetes, so. This proved to us that the. End-to-end. From. Training, right.
Through To model deployment could. Be done on spark, on VMware, and spark, is typically used in training but, we used it here for inference as well as training just, to show that that could be done. So. That sparkling, water and standalone spark, and. Finally. Here, what came out of that predictor, or that score was. The set of rows, that you see in the middle of the screen and the set of rows you, see in the bottom of the screen they. They both have default. Payment next dot, zero and that one as their titles dot zero means no default. For, that particular, customer. In the next month dot one means there, is a potential, for default, for that customer in the next month just to show you the score actually, working based. On the, training. That we did in h2 O's tool, driver let's say I at the beginning and that by the way is among a set of tools for, automated. Ml that we encourage, our parties. Or third-party companies, to work with us on. All. Right so. Now. To, conclude what. You saw from the very beginning from Enrique's, section, was a unified, hybrid, cloud platform, we call that VMware. Cloud foundation, or VCF, it, runs both on premises, and in, VMware cloud on AWS, and other clouds it. Gives you the ability the agility of kubernetes, with enterprise capabilities. Of vSphere many, many thousands, of companies run vmware vsphere to, support. All all their applications today, now, they can run kubernetes, on there in an integrated way and run, spark on top of kubernetes, that, gives a pretty compelling development, and deployment system. Kubernetes. Definitely, simplifies, our, methods of deployments, of spark the, spark. Workers, came up with, spark automatically. In the kubernetes case they had to be installed in the, standalone, case with spark itself, you. Can easily get started with spark using the bitNami help charts and rikka showed you that ms demo and then, we went on to move to stink performance, of spark on vm's with and without kubernetes, and they're about equal within 1% of each other and kubernetes. Definitely. From our perspective, is becoming, the method of choice for deploying both, training, and inference parts of machine learning and both, machine learning and deep learning applications. We've also tested deep learning applications. On kubernetes. Deploy. Very well onto vSphere, onto, kubernetes with vSphere and just, a reminder that the. World is not all about deep learning there's a lot of tabular structured, data in the world that, also should be an important, part of your machine learning deployments, and we showed that in, our demo here, with, the h2o, tool.
Okay. So. All. Of what I described, in the performance, part is given, in this first URL, here, we'll come back to this URL so you can take a picture of it there is a general, blog site, at vmware called blogs at vmware that calm slash app /ml, for machine learning you, can find tons and tons of information there, about how, to use GPUs, with VMware how to do SPARC on VMware and we've. Also done a lot of testing of Hadoop and spark together, on VMware. As, well as the standalone spot that you saw earlier, and we've, got many, papers written about Big, Data and the vSphere, in general that, you can see here in the, last three, references. Okay. So. Please. Give us your feedback we, welcome, your feedback and questions and, we look forward to your questions after we've done here. Please. Rate the session and review them for us so as we can improve this for the next time we really appreciate, your attention to us here and so. On behalf of my colleague Henrique, Corinth, is from. VMware's. Office of the CTO and Justin, Murray here thank, you very much for your time and we'll, we'll. Get your questions coming up. Thank. You. You.