Vectorized Deep Learning Acceleration and Training on Apache Spark in SK Telecom
Hello. My name concern, I'm from, SK Telecom, which is the largest, telecommunication. Company. In Korea, today. I'm going to introduce our, collaboration. Work with Intel, focusing. On our network, quality analysis. And prediction this, case, our. Presentation. Is going to follow this order. First. I'm going to introduce a, peak dimmer or network. Quality analysis. Visualization. Next. I will briefly cover network. Quality analysis. In SK, Telecom, and our. Own in memory datastore, for a pasta, called fresh space. Then. I was praying, how we Authority, the. Melon in pre-processing. By using paralyzation. From. This patron I'm going to hand it over to Jason who. Is a senior, principal engineering. In Intel. And a. Participant. Pm's member, and the creator of Intel. And artist you. This. Demo we will be july's network quality analysis, we just add points, with, different colors. On the map and, dynamic. Charts on the right side. Well. The point to. The blue color it means the better quality and. To. The red color image. The, worst quality. If. We move the location or, choose, the different, time spans graphs. And charts are, updated, right. Away according. To each location and. Time span ranges. The. Data set has 100. Million. Records. Jimmy'll, means, quarry more data to aggregate, and, visualize. The point sent us. This. Instant. Update, is possible. Because we accelerated. Quarry, processing. By, using, irrigation. Push down and vectorized. Positive. Okay. Let's see the demo. You. Next. Our pre-pre, introduced our use case and our memory. Data store to a bustah named, flesh base by. Summarizing, the, seniors talk. As. I told you earlier SK. Telecom is the largest, telecommunication. Company, in Korea, our, company, has half the population. Of our country as, our. Subscribers. To. Support, them we have three thousand, three. Hundred thousand. Cell towers, and, each. Cell tower the, network devices that, connect our subscribers. To our core network generates. A, huge volume of logs in, a very short time period like, 10 seconds, and. Each. Row has your special tags according. To the set house locations. In. 2016. We try to build, a real-time network, quality, analysis system, which. Can ingest, 1.4. Million records, per second, meaning. 120. Billion records every, day, and. At, the same time we. Need to return, not the network quality or analysis, result, in, a few seconds, for. Specific, time for, specific, location. Looking. Back it was challenging. Requirements because. Which could not be solved. With Big Data technologies. At the time. First. We try to solve, this problem based. On advice. System, and a pasta but. How, device team cannot efficiently, handle, numerous. Partitions. With, very, small size data. Reflecting. Specific, locations, and specific. Time. So. We try to build a new data store. Pasta. Named. Fresh base which can support much more partitions, with. Much, smaller data size and hot PI system. So. We gathered the, best known open source software. First. A pasta, for, a secure interface, and discrete, quarry engine. Readies. For GM cables true and rusty. Be full assess. The cables true and. Assesses. For shorter. Latency, and small side parallel, aisles I. Thought. That we could easily, be the new data store, by, just integrating. Them together however. It. Has been very challenging to. Breathe life, into a. Collection. Of of sorts of tears and make. It as a working memory database system. But. We have all chipped in for the last four. Years. As. A wizard flash-based. Has, the following features, which, are mostly record, for. Commercial. In-memory database, system, such as recovery. Replication. Scale. Route and so on. Especially. For ingestion, performance, fresh. Space can, ingest. 500,000.
Records. Per, second, in, a single node and each, ingestion. Operation. Is handed. In an atomic, way. If. Potentially. Our. Datastore, can, store millions, of partitions, by, taking, advantage of, Chiba. Store, and in memory storage devices. This. Can be use to. Greatly reduce the such as pay for big data queries with. Better predicate, of, high. Cardinality. Columns, like time spans and locations. This. Is a quarry, example. Of a network quality analysis, for. Specific, time and for specific, location. By. Using streaming. Streaming. Stream. Partitioning. And push down filters, we. Could finally reduce, the quarry time for your point one trillion, vehicles. To, one second. So. Far I have introduced. Our own in memory datastore, up a spa from. Now on I'm, going to introduce our demon, in this case. Since. SK, Telecom, is the telecommunication. Company, we. Need to predict, various, network polity indicators. For. Anomaly, detection, and real, time network infra management. We. First try to network, quality prediction. In five minutes. By. Using well-known, sequence, to seconds model, but. The model could not predict, sudden. Peak changes, very well, therefore. We came up with new, memory. Augmented, model, it. Can learn from the periodic, changes, of particular. Patterns by. Using its own memory. For. The details of this model, you can refer to, the, last year's, talk. This. Is the test result, our model. Like. Seconds, two seconds model, about. The. Pillow, our. Memory. Augmented. Model can, predict, the sudden changes very well. In. The legacy architectural, data pipeline, protein, learning, we. Had, inefficient. Battle, at points, from, data source to demonic, influence, such, as data is potation to disk files. Serialization. And deserialization. With. Different data formats. To. Address this problem we. Collaborated. With Intel. And, implemented. This new architecture of. Pure. Memory. Data pipeline. Based. On fresh space and inter, an artist you. Through. This pipeline. Pre-processing. And demonic, influence, and training, can, be integrated, into, just. One, single. Star. Cluster. Furthermore. The. Whole pipeline is, based on a past, past RDD, without. Any collector. Equation. After. Pre-processing. Job is done by fresh face the. Results are delivered to Intel an artist you as sparks. LED, and. An. Altitude performs. Inference. Operations. With, vectorized, processing, by exploiting. Internship. Is seamed, instruction. Set. Cost. Country, we could reduce. The, end to an inference, time. Substantially. And, more. Details, will be covered, by co-speaker, Jason, later in this talk. Ok. So far I've covered our team learning use kings now I'm, going, to explain, how we accelerated. In. Learning pre-processing. By, using aggregation. Push down and vectorization. So what is pushed down, it. Means offloading. Computations. To low level committing modules, which. Is closer, to the original data, source, in. Other words it means new data, processing. Currently. A pasta, can push down only. Theta, predicates, and projecting. Columns, to. External, data sources. If. Aggregation. Cori's is, cutest, start push down filters, and protected. Column, to the data source first and then. The only, little data, retrieved. To the opposite, us if tutors. Finally. Spark, is cutest handles, the aggregation. Operations. But. What if aggregation. Can be done in the external, data sources. And. Push down filters, can substantially. Reduce the data size to. Be transferred. With. Aggregation, pushed down but. If tutors, can retrieve, fully. Pre-created. Results. Thereby. Greatly, reducing. The data size greater, processing. And, data. Shopping. Moreover. Vectorized. Processing. In the de éstos can utilize, modern. CPUs native, seemed. As, I, am the instruction, set the, chest abs, abs. Tube and a base 512. Which, can process up to 16 protein. Point data in. A single instruction at, the same time. This. Is our design. For aggregation, push down, because. The original pasta does, not provide, aggregation. Push down feature we. Use a pasta. Catalyst. Optimization. Rules. We. Defined a custom, optimization, Rule named. Propagate. Aggregation. Rule and, appended.
It To the original set of rules. This. Custom. Rule is cuted, is executed. After finishing, the, ordinal, optimization. Rules in the catalyst. Optimizer. This. Is an example, query to, retrieve content. Memories, high top engineers. Grouped. By years. The. Original, plan is stated. In the following order. First. Relation. Plan B Azadi D from the data source, then. The vertical relation. Defined the, attributes, in, the data source. Next. The data of a DD. Refined. By the filter. Plan and the, columns, are selected. By, the project plan. Finally. The prune the Peters data. Aggregated. By the aggregate, plan. In. Order to push down aggregation, operations. We, create, numeration. Class, named. Relation. For aggregation, and, replaced. It with the. Original relation. Class. Then. We made new, logic, for a pasta, to, be able to push down the expression, trees of group, I and aggregate. Functions, to, the new relation. This. Looks simple, but we face many problems to, impress increment. This idea. Because. Current, our pasta, is not designed. To designed. To conveniently. Push down the, aggregation operations. In. More details, our custom. Role transforms. Expression. Trees into, attribute, and wrap, them inside, a product, plan. Finally. The. New relation. For aggregation, gets. The pushdown expression. Trees for, aggregation, in, capture eyes by, the, attributes. In the project, plan and. The. Logical relation, is also changed. Reflecting. The updated, attributes. In the project plan and. The. Box at the bottom, right side, of this slide shows. The quarry, if, plain Richards, which. Shows the final and optimizes. Logical. Plan. Customized. By our new optimization. Rules. Inside. Our data source. Fresh-faced. Handles, the pushdown. Aggregation. Operations, and the. Expression, trees. First. Base has a columnar, data structure. Have. Been similar, data layout, to Apache, arrow so. The computation. Operations. Can, be easily, vectorized. Fresh. Base mesh groups, and for, each group aggregation, operations. Excuted. Accelerated. By, internship. Esteemed, instruction. Set. Ok. This, is the test results. This. Is the and the training. Data set for the network quality prediction. With. A major. In these columns, and 2.7. Billion. Records. Using. This data set, we. Conducted. Our experiment. And the. Test reality, is like this. 48. A normalization, in demeaning, we, need mean. Mean, minimum, and, maximum, average. Aggregation. Values, for the entire, dataset. As. You can see in this table. Min. Max a bunch of equations are, accelerated. By, aggregation. Push down and vectorization. And the, performance, gain was up to eight times. Next. In order to, produce. Time. In each time interval average. Values, from. The original, raw data with, ten-second. Testing, ten-second. Time, interval. Group. By aggregation. Operations. Are required. Since. The group by aggregation. Required, to create your hash, table inside, the fresh. Veg data store the, performance, gain was, reduced. But. It was still, about two times faster, than the, ocean. This. Is an ignition, widget. And the popos, keeps improving. And. Also. We break. Down the, performance, gain of. The data. Validation, test. Current. Puffles, came. From, vector ID processing. With simple instructions that was. About, 1.5. Times. The. Dominating. Factor or. Acceleration. Is from, aggregation. Pushdown, itself, which. Is five times by. Reducing. The data transfer, size, and later competitions. Theoretically. Can't, seemed, in substance that can, enhance pokemons, up to two. To three times. We. Discovered, that the, smaller gain of our vectorized, processing, was, from in efficiency.
In Fresh, based can operations. We. Are addressing this issue now so, we are expecting in. A spot most gained, from. Vectorization. Okay. That's all for my presentation, from. This part part 1 I'm going to hand it over to Jason. Thank, You. Fascinate. So. As I mentioned by Nature we have been working with SKT. To. Enable, the unified. Entry, and the bigger they chantix, and, pipelines. For. SKT the, network card prediction. So. That is an entire genetics. And the, deep learning training, and the inference pipeline, can run on the same spot classroom, to. Scale out transparently. Across a large cluster and, everything. Is done in memory first. I talk about the, unified, software. Architecture. We have building, with SKT, using. Analytic students, at. The Intel we're helping focusing. On bringing. A I are, deep, learning to, pay attention we. Have open such a big. Media which. We had a distributed. Here deep learning framework, for, Apache spark, where. The bigger people can write to normal spark. Oh just like the news Amalia. Spark, home button, now they can use big deal to write new, distributed. Differently, applications. Unstuck, in, addition. To big idea, we have up also open sourced a new, product a card, an addicted, you which. Is a high level software. Platform, building, on top of observe the different, frameworks like tensorflow, pine, taj Kara's and as. Well as other, the big bigger very distributed. And X frameworks, like Apache spark ray, Archer. Shrink the, girl you really bring. Out those, deep. Learning technologies. Whether, it's tensorflow up high touch to, a bigger platform and they're, running to unity integrated. And the distributed, fashion. So. This, slide gives you a very, quick overview of, nad, exhume as. I mentioned before it is built on top of other the, between frameworks, like tensorflow pi touch also. OPA, me know and so on and. Utilize. Spa. Frank ray essential. To build a distributed the. Entry. And. Dynamics. And the a pipeline. Inside. And it exudes every, three layers it's. A bottom layer they're, really so-called, integrated appendix. And a Python which. Is a horizontal layer, that. Allows user, to apply, the AI. D. Plan me mother. The like tensorflow are Python Jane salon and spa, in, a distributed fashion. On. Top of the background layer there is a, automated. Email workflow layer which. Provides. Automated. Processing. For. Many of the metronomic tasks, for instance we, have built a auto, email, support for. Time series analysis, and, as. A top layer there, is also a set to help your team models, such. As the recommendation, mother's time series model and so on people, can directly, use in, my workflow, and Piper because. One. Can also use a may standard, as a pencil flow Python, models, directory. Using, analytics, view as well so. And I think Julie is a open source project, chain that you can refer. To the, github link on slides. So. Let's look, at some concrete, examples on. What we. Mean. By applying. Deep. Learning your EMR those to your spark. They. To pipeline in a distributed fashion but, this is one example Accra is this is use, the Esk. The use case image we. Allowed use it to apply distribute. Here the tensor flow models, directly. Write. Your cancer free code in line with a spot code which. Can then run distributed, across your spot cluster so. As you can see user, can use a me spot code to, process a danger, me. To get a IDDM. Dataset your data frame and then. Take. That you abbi be a data frame we, provide. The API basically. Take. Your distributor, set across the cluster and, then. Conformity. In terrain, essentially. An oddity, of tensorflow. Cancer, all, you memory artists, should really the classic custom, and. Then you can use a may extend a tenth of a code in, this case the well using some slim code to build, a physical, model and. The. Active - you, can use the, API provided, by any Big Jim for, the sheriff in our inference, under. The hood we. Essentially. Replicated. Tens of role model the shipper did across the cluster and. As an issue worker. -. We were filled with the petitioner, with, the tensor for cancer, we just provided, we, just prepared previously, and, the feeling that data into, a physical model so, we can compute, your local, gradient, on, each, worker, and then. We also provided, a in memory, our reduce layer which, is built on top of the a DB and brought, cash. Layer back. Bracket manager, should. You to, perform in memory, or reduce purchasing. Organization the Kosmos, worker so. Essentially. We can transfer, tray around the distributed, training our inference. With, just standard. Things the flow model and. We'll. Which. Use your. Spark. Bit, processing, where, there's a DB a data frame as the imputation. And. I cannot always for, instance a lot of the users are really you.
Know Spark, users a right have you to the apartment in sparkly the frame and spike, ml pipeline, and. We. Allow those, doses, it should directory, use different models with its kara's models in this example inside. The you're stuck, in my pipeline. So. Let's. Look at the, escapees, your, skates edit please, mentioned, after. You get all the data stored, in your, Big, Data posture. You. Know life in DB escapees, case and then you can process, your day trading Spartans, Mexico. Previously. You will need choose somehow. Its, party a teacher. May. Be exported to a CSV file, and as a preprocessor. And. There's a copy to a different. Church if your custom so you move, it across between. Those clustered and then you process your nature maybe you need pendants maybe you need tasks and. Then run. The training and the inference on those data so essentially, you need to manage multiple frameworks. Multiple. Separate, clusters, and you, have basically. To work from two separate workflow, which you, need to use some guru code that you tie, them together. We. Don't. Work with skp essentially, to transform, that into a. Essentially. A unified. And entry and architecture. So. You can have run. The pre-processing, do this Mexico, everything, is in memory and then, you can get you a set, of our DB clusters, and, after. -. Use. Analytics, June to take the tensor flow model, and runs. The distributor, training our inference does a video test. Pratensis, in, memory the, other you can see there. Is a. Lot of benefits. Of running a unified. Entry. And access really anything zoo first, of all you reduce a lot of overhead in terms of page, exported it's a Capri and esalaam and a second ray it makes the. Entire. Pipeline much. More, productive. Because you essentially, you're writing, all the program and one. Slack. Job, when start program and it can run on your spot cluster, and everything, is automated so. Let's. Look after some of the performance, improvement, by using, an addiction compared, to using in, a. Separate. The GPU cluster. On. The left, side they're doing the comparison with. The inference, performance, and. You can see as, we move from a separate, EPO class trip to a integrated. Pipe onto the anything new but, on a single, note I can we, can get about 3x. Speed. Up in terms of the end-to-end the inference pipeline and then. Anything, you allowed your transparent, rescale, out your. Inference. Background users, will not need to change any code but, you can we leverage this path to transparent, reruns, inference. Pipeline on three nodes and they give you additional, veera. On the, right side there is a comparison, comparison. Of the training. Performance. But. You can see em. Using. One server well, and once, abusive, and. With. The job training performance is actually about the same as, a one GPU, card and, then again the benefits, of energies and it, allows, you transparency. A scale out your training pipeline again. Users, do not need to change. Any code it just can transparently, scale to a trail. Distributed. Refreshing, and the you string, node in this case we again get about the model 3 XP, lab so. We, have been working with the SKT on this. Entry, and. Spar. Kinetics, and. The deep, reme in particular distributed. Cancer cell in a, integration. And unified the pipeline, genetic. Zoom so. As a future, work we. Are looking for, additional. Solutions. Try, to improve. The. Time series analysis, in a big project. Joe who is a, new project, with, just a kick off it's, an open source project built, on top oh and it gives you the provides, a. It. Provides, a a high. Level application framework. For, user to easily developers, a time, series analysis, applications. But, as mentioned before analytics. View provides. Other. Integrator. Pipelines. The ml, workflow as well as building, models on top, of that we, have put together a, set of. Some. Serious cases and building. Models for, time series analysis, as well as a, module. Culture, auto TS which is entry is a. Auto. Email, framework. Specifically. For, time series analysis. So, it's automatically. Generated. Features their, models, and, pure. Parameters. For, your time series analysis, so. Project. Toe is the open source project you can check the options, we, have a link on the right, button this, x-ray allows, you to easily build, your, distributor.
Time Series, analysis. Pipelines. Including the, feature engineer as well as different, models with other best part and in, particular we would build a lot of. Use. Cases for, common core, applications. Network. Attack forecasting, anomaly, detection and so on and it's. The optimized. And the. Scalable. Using, you doing. Spock. Running on top of logic, cast of Zeon, servers so, it integrates also, the library. The pincer throw pie touch and, care and KPN and, the scale out raspberry, scale to the test, so, yeah, that's, a. Future. Work I mean something some working progress we're country working on and the you were coming to looking. To detail our, website. Thanks. S our for our presentation. You.