Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark

Welcome. To our session. In. This session we, will present ray, and spark a, new. Feature we provided, in, an etic zoom it. Allows users to write, ray, code directly. In line with your, spot code so. That you can run those new emerging. Applications, and. Your existing. People caster in a distributed fashion, so. This is agenda, for the talk we. Will first give a quick, overview of the backgrounds, and at the end array and then what I'm into the details, on the Ray on spot, and as, well as some of the real what is cases. So. At Intel, we, have been working, a lot. Those. Initiatives, to, bring a night to play later, so. One, example is, big idea which is a. Distributivity. Playing, framework, we, open sourced in. 2016. It. Allows user to rise those new deplaning, applications. A standard. SPARC program. On. Top, of those lower-level. Different. Frameworks, and big, data systems. We, have also oversaw. Genetic, issue which. Is a unified. Data context, and a platform, which, allows users to apply, those, AI, technologies. Such as tensorflow, pine touch cameras. And so on to, big nature performs, for. Instance spark. Frank, rate, and so on. This. Line, gives you a very, high level overview of, the, analytics, to technology, stack the. As I mentioned before and, then takes you is built on top of reserves. Deploying. Frameworks. Tensorflow. Pine touch, of. Home you know for instance as, well as those distributed, analytics. Systems. The like apache, spark up a shrink-ray, and, so on you. Can run analytics view on your single laptop and, then transparent, a sketch on your, cluster. Such as kubernetes cluster on, a bigoted elastic. Inside. The ethics jews there is three layers at. The bottom layer, there. Is what, we call integrating. The data analytics, and AI, pipeline, which. Is a horizontal, layers, that allows user to apply, AI. Models, in agreeance, choose, the ability, cluster. In a distributed fashion for instance a lot, of our users are running distributed. Tensorflow, expect to process very large dataset, on. Top. Of this pipeline layer there. Will a, automated. And machine any workflow layer which. We try to automate. A, lot of the machine learning tasks. When. The user trying to enter builder the entry in the piper presence. Alpha. Ml for time, series analysis. Automatic. Create distributed. The model, slowly and so on. The. Top, layer we, also provide, a set of beauty, in models, and everything's, for. Common use cases for. Instance recommendation. Time series analysis and so, on.

User. Can directly use those beauty models in the underlying pipeline. And workflow. In. Addition we, also allow user, to, use. Any standard intensive, flow our Python models, and she published, by you. Know the. Community, you can just use a standard, model young at antics. So. Before going through the details I like. To, provide. A high-level. Overview, of. What, should our objective, is so. The. Goal we have when building an idiom, is try to provide, a unified. In ethics and a a platform, so. That the user, can easily, scale their, applications. From a single, laptop to. A distributed. A big dinner cluster so, if you think, about the lifecycle of a data, science project, it's, a Juris that way the prototyping. With some sample, data on, your laptop. For. Insular maybe you're writing a Python, notebook a laptop, if. We are happy with the notebook. Will, help, you with your prototype. Then. You may want to experiment with your history danger, there, are last year's nature, winston's which could be pretty large and started, on a. Piccadilly, class and. Then. If we're happy with the experiments, you may, want to deploy, your pipeline. Unto, your product environment, for. Say maybe testing, today. Going from laptop. To class Church production, environment, is, a very conference. Error-prone. Process you, will need to rewrite your code transfer. Your data to, convert. Your model and so on so, what. We try, to accomplish with. Edit exhume is allow, user to transparent. Rescale, from the laptop to, the distributed. Faster you, can directory. Building, the, in tight end and the pipeline on your laptop, processing. Your product in nature in your bigger faster and the with almost no code changes you, can run. Your single. Know the notebook, on your clustering, in this real fashion, so. That's, the, goal, we want to achieve with, energies you and that's, exactly. Reason. Why we try to provide. A rayon, spark in an industry so. Next I'll, let. You try to explain, what. Ray is and how we implemented, an. Railsback. And how to use, the fire, uses. Okay. So census. Thanks Jason for giving an impressive, opening and, a high-level overview of, our work these years enabling. The latest a high technologists. On big data especially, the analytics, do projects, that we have been currently working on next. I will continue this session and, focused. On introducing, the real spark, functionality. Of analytics. - I will, elaborate, how to use, real, spark, to run emerging, air applications. On big data pastures. Okay. So I will get started so. At the very beginning in case some, of you may not be that familiar with Ray I would first of all give a quick introduction, - right so ray is the fast and simple framework open, source by UC Berkeley which is particularly, designed, for, building and running, distributed. Applications. Break. Hall provides simple, primitives. And a, friendly, interface to help users easily, achieved. Terrorism. So. For Python users they, only need to add several lines of great code. To. Run Python, functions, or, class instances, in parallel so this, page shows some code segments. Of using, ray first, of all to, start. Ray services, users. Just need to import, ray and call ray dot in it so. Let's take a close look at left. Part of the code, it. Shows how to use ready, to run Python functions. In parallel so given, an ordinary, person function f here, which computes, the, square of a given number so. Normally, if your code is functioning. A for loop for five times then these five function, course are executed. Sequentially. One. After another, right, so, however if, you add the, rate dot remote. Operator. To this function then, this function magically. Becomes a ray remote function, that can be executed, remotely. By ray so, again, if, you, call. The remote function, your for loop for five times then, these. Five remote. Function calls are executed. In parallel so. The only. Difference. In coding, you need to pay attention to is that instead. Of just calling the function name as you would normally do you need to add remote, to the function name when, you call remote functions, so, here, you need to call F dot remote instead of just F so. Finally, you can call rate gab to invoke. The execution, of remote functions, and retrieve the corresponding, return values. In. Addition as you may notice here when, we, add the rate dot remote operator, you can specify the, number of resources, needed. For this function so for example, how many number of CPU, cores are needed to run this function and if, you specify this Ray internally. Would allocate, such.

Amount Of resources, for you, though. Similarly, for Python, classes that are state 4 you can also add a remote. Decorator, to, the python class to make it array actor and Ray. Would. Initiate, instances. Of actors, remotely. So, the, example, on the right. Creates. Five remote, counter, objects. With the count value as its state so. Now these or these, counters, are all reactors. And we can increment. The value of the counters. Five. Counters at the same time so. Still you need to add the remote when you create the counter object, as a reactor. And you need to add the remote as well when, you call the master it's off a. Hectare. Okay. So these, are two. Simple, examples, of using ray to. Achieve, simple param, in change of several lines of code and such. Kind, of rate code our ray applications. Can either run locally, or schedule, a large cluster. So. Actually. Is. More. Powerful and, useful and. Simple a writing section is such kind of trivial code so ray is packaged with several, high-level libraries. To accelerate. Machine, learning workloads, so. First of all rate tuning, is a passing library, built, on top of Ray for. Experiment. Execution. And hyper parameter, tuning at any scale so, secondly, a, layer, provides, a unified, API for. A variety of deep, reinforcement. Learning applications. Andreas. GT, implements. Sting. Wrappers, for cancer floor and pi torch, for. The East air for ease of data. Power distributed. Training, so this rely breast would be useful for you to. Build. Emerging, applications. Easily. So, this is just a quick overview of ray and if you want to know about ray you can visit their website for, more details. So. Ray is quite a good framework for, building, and. Running emerging, applications, such. As hyper, parameter, tuning and reinforcement. Learning and. Actually. In the industry there, is now more and more amount, to embrace, emerging. Technologies. And. Apply. Them on the production data to. Bring benefits, however. We. Observes, that developers. Are facing, several challenges. When, they try to do this first. Of all, in. The production environment the, production, data is, usually stored, and processed on, big data, clusters. However. Quite. A lot of efforts, and steps. Are required to, directly, deploy, ray applications. And existing. How to post our clusters. Secondly. It could be a concern for high, spark or ray users, to prepare, the Pisan environment.

On Each node without, draining side effects to the existing, cluster. Last. But not least a. Conventional. Approaches, would set up two separate, clusters one. For, big, data applications. And the other for, a I flick, patience, and. This inevitably. Introduces. The. Extra, expensive. Data transfer overhead, and additional, efforts to maintain separate, systems, and workflows. In production, so. It would be great, and cost-saving, if we can build a unified, system, for. Big data analytics and, advanced, AI applications. And. That's why we take opportunities. To do our work for real spark, the real spark can easily. Enable. Users, to inject. Advanced. AI applications. Of brain into, existing. Big, data processing pipelines. Okay. Next I will talk. About design, implementation, details, of real spark, we. Develop, real spark, to allow. Distributed. Rare applications. To seamlessly integrate, into. Spark, data processing, pipelines, so as the name indicates real. Spark, runs right on top of pi spark, on. Big data clusters. Here. We in, the following. Discussion I would take the young cluster, as an example, but the same logic can be applied to other, clusters. Such. As the, kubernetes, cluster or. The mesos cluster, as well. First. Of all for the environment, preparation, we leverage Condor pack and young distributed. Cache to. Automatically. Package, and distribute the. Person dependencies. Across. All the. Nodes in the cluster at, runtime in. This way users, do not need to bring stored, necessary. Dependencies. All. Knows, beforehand. And clustering. Environment, remains clean, after the tasks, finish, so. The right figures, the. Figure on the right here, gives an overview of the architecture, of, real spark, so. In, Sparks. Implementation. We are quite familiar that. We create, a spark contacts object on the driver node and this, pair contacts, launches, multiple, executors. Across, the young classrooms. And to perform, tasks. So, in our real, sparks implementation. We additionally. Create a ray contacts, object. On. The spot driver and it, utilizes, the, existing. Spark contacts, to automatically.

Launch. The Ray processes, across the young cluster the. Ray processes. Exist, alongside spark. Executors. And. One. Of the Ray processes, is the remaster, process, and the remaining are a slave processes. And they, are also called rillettes. In addition, ray. Contact, is also responsible, for creating a, ray. Manager, inside. Each, spark executor, to manage, the Ray processes, that, is to say the ray manager, would automatically. Shut, down the, Ray processes, and, release. The corresponding, resources, after. The applications. Finish. So. In the setting of real spark, we have Ray processes, and spark processes, exist, in the same cluster and, therefore. It. Makes it possible for. A. Spark, a memory a DDS or data frames to be directly, streamed, into ray, applications. For advanced, AI purposes. So. This is basically. Architecture. Of real spark. Okay. With, regard. To the usage of real spark. Users. Only need to add several lines of code to directly. Round. Reapplications. On the. Young clusters, so, three steps to do this first, of all you, need to import, the corresponding. Packages, our. Analytic zoo project, and create. A, star contacts object, using, the API in this part on young we provide, of. Course you can use an existing, contacts. If you wish in. This pecan youngs assets. App spark, on underlying. Young cluster, it, helps, to. Package. And distribute the. Specified, Condor environment, with all the passing dependencies, across. All the stock executors. So, when calling this function you can also specify. Stock. Configuration. Such. As the number of executors. And execute. Of course exceptions. So. After we create a spark context. Step, two is to create, a Rea context object and recontact, is the contact point between. Ray, and spark so. You can, also input, some, very specific configurations. Such as the object memory store when, you create a ray contacts object, and you, call ray contacts dot in each to start. All the Ray processes, across the, young, cluster. So. Now after doing these two steps we have both. Spark, and ray ready in the young cluster, and now. We can directly, write some ray code and, and. Make them on, the young cluster so the, red, box on the right, is. The real stock, code you need to add and the. Black box is the, puree code, that. You have already insane, already seen in the previous slide to create, several. Reactors. And do two increments and. After. The reapplications. Finish you can call ray contacts da stop to, shutdown, the Ray cluster, so. This is basically, a the code you need to add to use real spot which, should be straightforward and, easy to learn and if, you want, to have more instructions on. Running, real spark you can visit our documentation. Page for more details or. In. The last part of this session I'm going to share some, advanced. The real-world use. Cases that, we have been built on. Top rails on top of real spark. Which. I suppose many of you might be more interested in, first. Of all we have built Auto empowering, analytics. Su for. Scalable, time series, prediction the. Author I'm how automates. The process of. Feature. Generation. Model selection and have a parameter, tuning, for a time series application. And we. Have already some, initial, customer, corporations. Or two ml so, actually, in this conference my, colleagues, have another session to particularly, discuss, Auto ml and its use cases so, here I won't go into positi, tears now and, but. If you are interested. In our work for two ml you can visit, our. Github. Page to, find more details and related. Use cases a.

Platform. Auto EMEA i will have built data. Parallel. Deep. Learning model, training pipeline on, top of real spark, so. You know pipeline, first of all with, the pot user to, use either PI spark or rave or parody, the loading, and processing. During. We. Implementing. Wrappers, for, different. Deep, learning frameworks. To automatically. Place that have to distribute it environment. On. Big Data clusters, using real spark, so, the REA SGD, I mentioned, before has already done some of these works that we can extend, and refer to, but. In addition to, using the, native. Distributed. Modules, provided, by tens ago of high torch based, on the parameter, server architecture, we, also support, users, to choose the horror wood framework, from uber as. The other backhand for distributed, training so. With such data, pyro distributed. Training, pipeline users. Do not need to worry about the, complicate, is that have for, the distributed training, on big data clusters, and what they need to do is just you. First of all write a training, script on a single node and we do, the work for you to, make the distributed, training happen and you only need to add several lines that. You modify several lines of code, in your original code, to, achieve this. Lastly. I would share, our, the successful, cooperation, between, tio and Burger King to, build a recommendation system. For Burger King's drives through scenario, using real spark. So. Drives true first of a drive story as a common scenario, in the past food industry, where the gas purchase. Purchase food. Without leaving their cars so, the gas first, browse the mmm. Out browse, the outdoor and the menu on the menu on the outside, digital. Menu board and they, talk to the, cashier. Inside, the restaurants, through a microphone system, to place their orders and the. Gas would be given recommendations. Displayed, and. Outdoor. Digital. Menu board when they place their order, so. But as a word Famers a fast, food company, Burger.

King Collects the large number of transaction. Records. Every day and they used spark to perform ETL, or data cleaning and. Pre-processing. Steps on their big data on. Their big data and they have own, big data masters and. After. The data. Unprocessed. And the, contact. Distributed. Training, on these, data. So. They choose, MX. Net as they are deep learning framework. And, before. Cooperating, with us they. Would. Allocate, a separate, GPU, cluster dedicated. For, distributed, MX night training but, the funds as such a solution is not quite. Efficient, things, ending. In top apply a large portion, of the total time is spent on copying. Data from, the Big Data clusters, to the GPU classroom. So. They need quite. A lot lot, of additional, efforts to maintain the GPU clusters, regularly, and it. Is open the case that for most. Known for many companies GPU. Resources and, not that. I'm. Not. Quite. Are relatively, limited and compared. With the CPU resources CPU, server resources, so. After, adopting the, Railsback, solution. And. Their entire. Solution, becomes more, efficient. And easier to maintain things. We. Run the distributor, max that training on exactly. The same faster. Where the big, data is stored and processed. Similar. To rarity he, will implement a lightweight, wrapper layer around native. And Mexican, modules. To. Handle complicated. In. Distributed. Environments that, have of em next match on the young cluster, and each. Imaginer, worker takes, a portion, of the. Data set from spark on its local node and trains the recommendation, model and. MX. Networkers, and servers both. Run. As, reprocesses, and they, communicate, with each other through did. Distributed. Key-value store. Natively, provided, by a max net so, in this way din type applying. Runs. On a single cluster and, there, is no extra, data transfer, needed, in. Such a solution, has been, successfully. Deployed. In, to Burger King's, production. Environment, you serve their. Drive-through. Customers, and, this. Solution has been proven to be efficient, scaleable and easy to maintain. So. Here comes to the end of this session and, as. A conclusion, in this session we mainly talk about our. Work for real spark, and we develop. Real spark, to, enable, users to directly, run, emerging, applications on. Big, data platforms. And, I. Introduced. It the. Implementation. Details of real spark and our. Real stock solution, has been adopted, by Burger, King India. Production. Environment, and we, are also cooperating. With other customers, to seek, for more, use cases of, real spark, so. If you want to have a review, of the details of real spark don't, hesitate to a, look. At our blog of real spark, with the link given here. Real. Spark is a key feature of analytic, zoo and we, have developed analytic, zoo as. A unified, platform for, data analytics. And, AI, so. If you are interesting, kid in analytics zoo you, go to our github page or, documentation. Page for more details and I'm. Sure. That you may find as. A function, functionalities. Be, useful, to you as well so if you have a github account don't. Have please. Kindly give. Us a star and so that you, can, find. Us on github whenever, you need so. For the future work we are now working, on the full support and more out of box solutions, for easily scaling out -. AI pipelines, from single node to cluster, based on rare. Spot and we. Would be glad to share our, progress, and more use cases in. The future if we have two transistors. So. The last page here, is the overview of the Intel, optimized, end-to-end, data analytics. And AI pipeline, so, in tier is devoted, to help. Our customers build, optimized, solutions. On Intel platforms, from, the bottom. Hardware. Architectures, to, software. Optimizations. So if you want. To know more about how, Intel can help you to build, your pipeline, you, can go to our website at Intel comes - AI or sub, we are in telecom, for more details so, that's. Pretty much for this session and thank. You all for choosing, this session and hopes, that what chase and I have talked about would be helpful. To you so. Thank. You so much and if you have any questions, we hope you to raise and have a good day thank you. You.

2020-08-22

Show video