hi good afternoon thanks for coming for the session today today we're going to share about the success story between Intel AWS and DBS on running this uh solution and you're going to hear a lot from our speaker here on talking about this uh project itself unfortunately the DPS executive director GPU has the last minute changes so he's not able to make it so he actually did a video and we are going to get our AWS expert to actually finish the presentation on his behalf have before I start I just want to give you a b of view about AWS Intel relationship we have been together since the start of the AWS and you can imagine the 18 years ago the only silicon you can talk about is actually Intel so you can see that we have today about close to 60 70% of instances inside the AWS running Intel and the work we do with them is not just E2 we help in optimization we also have new Services actually roll out from here I'm quite sure my next speaker Akasha will talk a lot about that in the future yeah and for those who are looking at jumping into the journey or the cloud with AWS we have a program to support you we have program that looking at a partner we have program looking at user and basically that give you a bit of funding to actually do address some of your concern when you move into the cloud Journey without further Ado I going to invite my next speaker an we dwell into de details about AWS and DBS partnership thank you Jason hello everyone thank you for coming today this it's uh late in the afternoon of Thursday so hopefully after this you're going to go and enjoy um your replay my name is Anan um I live the W HPC and quantum specialist solution architect team um we are the part of the advanced Computing team our team is working closely um with customer in many industry uh and help them to migrate the APC workload to AWS in addition we also work with a partner like Intel to help the customer in this case the DBS Bank to migrate their and build their workload on on AWS today I want to briefly talking about um the Financial Risk Management and risk modeling uh are some of the most popular workload migration uh and bursting to the cloud Financial Services organization like Bank rely on HC calculation um to calculate their risk or run portfolio uh analysis and provide the report to internal control or external organiz um uh regulator or hedge fund leverage HPC uh for complex quantitative risk analysis in particular area like high frequency trading or AI driven strategy there are number of regulation in financial services that require large scale simulation um uh using HBC for example um uh FTB fundamental um review of trading book require 3x to 10x increase of Compu capacity in the next few year or C card or ifss 17 for insurance own require lar scale um simulation we have many different financial services customer uh running on AWS and here are some of the example um of customer that migrate the HC workload on uh AWS so while driving this transformation we have listened to customer pinpoint from running their HC workload uh on uh AWS customer have told us that they want to reduce the runtime uh innovate faster or meet the regulatory requirement and increase uh why increase the simulation um from the market risk they want to take advantage of large scale uh simulation using the latest great CPU GPU and storage and also fast networking on AWS they also want modernize their algorithm for example rewrite their CPU um application using GPU or they also want to focus on business and last but not least redu the it complexity and also save cost at AWS we have a complete set of HBC from HBC orchestration to Cluster management like AWS batch or AWS pilot cluster that many of financial services customer have been using to run their GD workload we also provide the last set of HPC optimiz instances low latency High put EFA Network also different type of file system such as FSX for luster EFS or S3 they own mid FSI customer requirement I want to go briefly about AWS badge we introdu a B for for quite some time now ad B is the fully managed BGE Compu services that plan schedule and run your large scale optimize APC workload on AWS across the full range of AWS compute uh offering such as a AWS ECS AWS eks AWS farget and Sport and ornament instances um we have customer run um million of core for example the bastter institutes uh um run 2.1 um uh million core on graviton chipset to do the DNA analysis um analysis um uh and also we have many Financial customer using AWS bch for their res simulation for example uh uh for for example like aqr and numeric another another solution that we have is uh AWS B Custer uh this is the open source solution slum Bay HPC uh solution that we provide for the user you can go and download the the the can run AWS P cluster using command line integrated uh with our a services like a for luster ec2 EFA and ncv we have many Financial customer running aw P cluster by quickly prototyping using uh slum base or migrate their on pram workload to AWS using pilot cluster in addition to our current office offering uh of AWS services and solution like AWS B and a all and AWS parallel cluster and to address all the new requirement that I address uh and business driver that I mentioned previously we have just launched a new HBC manage HBC Services called parel Computing Services or PCS PCS simplify the cluster setup Administration and access management for the user some of the key feature is we manage the cluster update and uh version upgrades for you support the most popular schul like slum and in the future going to integrate more schul into uh PCS provide a unified compute and remote Vis visualization management and also allow the user to dynamically resource provisioning and scaling last but not least we provide a reset of telemetry cost management and also budget uh budgeting how AWS PCS address the key customer um the key customer need uh allow it's allow HPC I'm sorry it's it's it's automatically yeah it I'm sorry it it automatically run uh customer need to I'm sorry for the I'm very sorry everyone give me a few seconds all right so I I can do this um allow allow HPC uh administrator to maintain your cluster and support the user using uh R Telemetry and diagnostic also Sly upgrade the you want to stop for a few minutes until I get F no I think we can keep going right sorry about this somehow uh all right so I think last but not least one thing important is that uh we want I want to point out is that security on AWS is our job zero uh we provide the most secure infrastructure also the full set of um security automation for the user to build the secure infrastructure and application on AWS all right uh all right ad won six best award um at superco Computing in Atlanta a few weeks ago uh and uh best APC uh Cloud platform in seven year uh running and also I would like to congratulate uh to DPS Bank uh to win uh the best use of HBC in financial services hello everybody it's G from DBS bank and here I would like to let you know that U going to run the video quickly how we transform our Quan and risk pricing engine in past few years with help from AWS and Intel Technologies so this is today's agenda I will introduce a bit about DBS and our team then I will share the business challenge facing our industry which is the pricing needs then I will share how we develop qpe which is one inous solution to address all the challenges then we'll share the design the outcome Etc then finally we'll share a Intel and AWS Technologies which has enabled this development thanks set okay so this slide is a quick in introduction of uh my bank which is DBS we are the biggest bank in Southeast Asia and in the past 10 years we have invested heavily in digital Edition and we have achieved a lot of outcomes which was well recognized in the industry and my department which is Global Financial markets handles DBS Market trading we are leading player in many asset classes providing full service to uh our banking customers from retail to corporate to even financial institutions then my own team the Quang and Tech Team we build pricing models and pricing engines to enable and scale up our trading business so later I'll introduce a bit more on our business scenario let me proceed okay we need to I need to share with you what is pricing so when the bank sells a financial product to our customer or when we do a treat with our counterparty we need to know the fa value of the product first how to achieve this the industry practice is to use mass or air models so okay a lot of Alo a lot of calculation involved to do all this pricing then the numbers turned out from this pricing is they are very important as they decide whether the bank can sell or treat with a suitable price and also whether our customers can purchase a financial product with a Fire price so even after the trade is done let's say the trade is 3 years or 10 years or even 50 years for every day our Traders and our Risk Managers will still need to do pricing every day or even every moment so that they know what's the current value of our positions with that our Traders can do the risk management let's say if the market moves right they need to do a reprice they know our Explorer then they can Hedge without proper heading we may end up like Lion Brothers okay let me proceed again then let me introduce why this Quan pricing load is a suitable load for HBC of a cloud firstly we use a lot of simulation which is M color simulation to do the pricing this pricing Demands a lot of calculations because you be use 30,000 or even 100,000 simulation pass so means we do a lot of iterations to determine the price with this right you can see a lot of calculation capacity is required then every day we have a very large amount of pricing no matter is for customer pricing or for Trader or Risk Managers revaluations we do hundreds of millions pricing a day and each pricing can be a uh can be a certain level of pricing uh pricing job then uh all the pricings are independent means pricing a or pricing B and pricing C they can run 100% separately this also enable us to move the load to Cloud because we don't need to link them together we can distribute the job to different places different servers then because our load is very Dynamic let's say when customers wants to price something or when a market moves we need to do a lot of risk revaluation then there's a big pricing load and by then we will need a big Computing uh capacity it's Dynamic and you can see why it's suitable for cloud because cloud is on demand then lastly the bank limits let's say customer or counterparty informations to be shared to external parties but luckily for pricing these are all numbers on numeric staffs so we can pass all these to let's say to a cloud provider for calculation without any worry on data breach issues okay then let me share the technology challenge my industry faces due to the growing business needs we need to develop more and more complex Quan and MTH models then these models will require a lot of more Computing resource and also because the market is now more and more digitized so the our customers because it's easier for them to compare price between different providers so they also demand a very fast pricing and also our Traders our Risk Managers they demand a realtime pm and risk uh calculation so that they can take action immediately so all this lead to a need of a powerful and flexible pricing engine but we have some challenge with our arm Prime infra and our Legacy pricing engines why so is because uh previously we spent I mean a lot to build our on Prime infra but the issue is the bank can only have a certain uh capacity because it's fixed we cannot grow and if we need to grow we need to clear a lot of procedures and usually the time to Market is quite long also it's quite costly to maintain on Prime you need to support yourself support staff server room the data patches the server patches whatever and also us we cannot have access to the latest infra because the bank takes time to bring them in and also uh for the Lexy pricing engines because they are vend Pro provided sometimes it takes very long for them to deliver the new products to us and also they charge us licensing licensing cost and also uh it's difficult to scale Up Performance because let's say for example uh a lot of softwares are charged per CPU so with this limitation you cannot scale up let's say Sr wish because we only paid for let's say 1,000 or 2,000 copies they cannot scale up easily so this limitated our growth how do that our solution is QP so we build a pricing engine by ourselves 100% inhouse And we incor we integrate our existing mass and AI ml models then we build it to be very flexible with API with microservice so it can be easily integrated to DBS existing trading systems then we bu it on a with Intel Technologies so it's 100% Cloud native so it means we can scale up our performance without any limitation and we can also we already see the performance we got doing 100 to 200 million pricings a day sometimes at the peak right we can support more than 10,000 per second then we also use GPU because AWS enable us to uh to use the latest GP technology so we build our Cuda uh speed up with that we can achieve 500 times acceleration so our customer pricing because of that can be much faster then finally uh this QP in DBS has been powering more than a doen different trading systems so it's already generating a lot of business outcome okay so this is our journey since 2018 I mean since we embark on this journey with AWS and with Intel together so we started with some SI class then with the quick one with the with the Su success seeing by our management by our Traders then we have more uh resource to quickly develop more products and more asset classes so through the years we have almost covered all DBS traing pricing needs so now it's a central pricing engine for the bank we cover most product pricing even xva even data calibration uh covering FX EQ I credit etc etc yeah and along the way we have building uh new products we have building uh new features always and this slide is a quick uh and simplified review of our architecture so you can see it's not it's not very complex because we prefer to be simple we use uh red which is uh elastic cach to serve as our message queue and also as our data cache and also job queue then we use different CPU and GPU workers to uh talk to these rers to fetch data to fetch job and to return results so we we have a very lightweight system design so that we can achieve a very high pricing efficiency later we'll talk about that more so in the design we employ different a uh services from ECS ec2 L cash and uh and others so again uh thank you Gango um as you have seen Gango work through uh that Journey uh to build um the cloud native qpe application and HPC platform on the cloud with a AWS and Intel technology before I go deeper into that technology I would would like to highlight one thing HC had five pillar application orchestration compute storage of data and network and your HPC cluster or application will run as fast as the slowest one uh in within that five pillar DBS team has spent a lot of time and effort to optimize each of the pillar to maximize the performance and QP you many different technology and some highlight are the the application developed by using C+ BL and chb Cuda uh to implement the pricing model the application also is a rest API driven and the team also use reded elastic cluster for in memory catching and data loading re resilient framework DBS had implemented very robust resilient framework uh for their UK qpe platform there is no production downtime since 2019 and no need for regular maintenance to increase the high availability um service they use multiple instances for high avilability also automatically detect and F instant F and replace it to deal with J F which is very important for APC application they build their um their application with the CH F protection and auto resume when the inan uh come online in addition they implemented a well architected uh infrastructure on AWS uh by utilizing multiple a in Singapore also combined with their on premise uh private Cloud running on open siip performer requirement for a modern pricing engine DBS work backward from the business driver they need to price the AR ifq super fast to stay competitive need to process super large load for risk monitoring also had little tolerant for job failer they also very cost sensitive perform performance potential botton neck as I mentioned before identifying botton neck in HC a very critical step to maximize performance in order to understand their job characteristic the DBS team run using C++ code uh profiling from Intel C++ compiling Optimizer and also adopt G to accelerate the the application performance they also did the memory profiling uh to optimize memory footprint also for networking it's very critical to scale and increase the performance uh in the application uh they use the message sharing and Al also message compression last but not least to redu the total read and write um of the implemented the redist um elastic cat cluster uh for fast data loading to and also speed up the application another Performance Tuning metric QP scaling method this is very important because the APC need to know that when they're going to scale up when they're going to scale down the DBS team using the the first technique is using the QEP or they call CPU utilization based scaling optimize for total throughput like Risk reporting also balance between cost and performance suitable for last cluster another metric that they use is time based scheduling optimize for minimal respon time and ifq pricing also more suitable for smaller Custer um in the last uh metric that they use is the fixed Capac capacity um this method is called um uh um hot po uh techniques meaning that they prearm their their pooling um their po of the capacity uh before the chob run so that they can immediately processing and this one when you don't need uh uh you need immediate need of capacity and also uh no cost sensitive the job going to run immediately by looking at this graph you will see that uh the um qpe uh platform Auto scaling uh when uh when the load going up is scale up and then when there's no chop running the infrastructure completely scale out to zero as I mentioned previously the DBS team has done multiple optimization end to end from CPU memory and networking they also uh paid attention to the data loading Time by implement the redit cluster uh for faster loading and safe computer cost this is some way that they they tuning their um uh the red disc cluster for example using the red red monitor um on on the dis cluster to analyze the load of the red disc also aniz the CPU and IO of the red disc node also using the partisan technique uh to subu the hoskey on the red disc and um elastic cat uh luster performance maximize CPU utilization one of the most important metric um within APC to achieve the uh 100% um uh CPU ization we have seen that many uh on Prem cluster um uh HPC cluster uh barely achieve between 25 to 40 uh% uh of CPU utilization consequenc they uh they waste a lot of um uh CPU cycle and Ono memory the graph here is showing that uh the lower the lower graph is the load from uh the application as you can see that the the cluster and then so the qpe uh platform onway achieved the 100% utilization uh CPU utilization performance uh speed up from Intel technology um the theb best Quan team has been working closely with AWS and Intel team uh to optimize that qpe application uh with Intel architecture in the past two year uh qpa got a boost from Intel technology for example newer CPU at the C C++ compiler and here are some of the summary uh they are using the C7 uh instances c7i instances on AWS Power by intel four generation Intel z scal uh processor or Cod name Supply uh rapid they also use the Intel CPU parallel um uh processing capability C++ compiling uh technology also Intel uh profiling tone that help them optimize the application performance speed up from GPU this is very interesting uh because I mentioned before um customer could love to uh utilize um the different CPU and GPU architecture on AWS and DBS team also have utilize C GPU to speed up their pricing model especially the EQ and Fs derivative uh ifq pricing uh some of the Highlight uh the nature of GPU utilize a thousand of small GPU core to run parallel uh Computing suitable for algorithm with many independent calculation like multicard Lo simulation they also have to consider cost versus speed in GPU they can achieve 50x to 100 x performance um uh 50 to 500 x faster performance but also 50 to 200 uh X more expensive so they need to balance uh between uh cost and performance need to also need to build a special uh GPU model uh that can run on GPU for example there some of the metric at the at in that slide um they can run the ifq pricing redu from 1 second to 001 second and runtime redu from minute to under a second also the fso um MLV calibration they redu from hour to Minute by running on GPU additional technique as you can see day tuning application orchestration layer compute networking and also storage they also apply other technique like bandwidth throttling load balancing and also overlap acing workflow to avoid botton neck cross job data sharing to redu duplicate com uh Computing balance is critical we cannot design a system and a single go because there must be a balance including building effort in consideration the DBS team has put a lot of effort in consideration uh in balance between business uh driver versus technical optimization for example they need to balance between performance reliability maintainability build effort and infrastructure cost some example CPU versus GPU cost vers speed if they want to catch more data or less data how how they make decision between the scaling up scaling down based on the application requirement so the cost is that's going to relate to that cost sensitive QP performance sum number that you can um most customer IQ pricing can be processed um in 0 0.1 second to 1 second also Daily pricing 100 of thousand ifq to the customer and they can serve 100 of million of risk um uh reev Revolution and also overhead less than 2% all right QB uh the vendor licensing CT because of the QB is um 100% inhow um solution uh they successfully remove own the DBS licensing uh vendors software licensing and save a lot of from the licensing cost QP short the time to the markets and build new product um because they are the inhow team uh they quick it's the quick and efficient communication between team so that they can prioritize that alignment quick quickly um they also uh modeling capability build in the past two decade they can reduce uh a lot of existing um uh work from the previous um uh generation they also build a robust Foundation ofly rele resilient framework flexible design by using API and microservices for expandability how could be enable business growth so by utilize the QP infrastructure and um the APC uh on AWS um they allow them to uh fast pricing speed uh realtime customer I pricing uh and help uh increase the customer satisfaction and help theb strengthen its market value also um real time infr refresh enable Trader to better U manage their risk simulation and respond to market movement also allow them to um uh come to markets faster um and also the less infrastructure bill by Saving cost how qpe uh development benefits uh from AWS um and Intel technology first of all flexibility flexibility to choose from CPU GPU instances um ver unlimited capacity also access to a latest uh uh technology and using a different metric they can build a better uh application and APC uh solution for uh the bank uh they also use Amazon uh ECS to redu um uh the inter uh integration work on AWS so uh now I would like to introduce uh a Cana from Intel to come up and uh talk about the AWS and Intel Services engagement thank you thank you on afternoon everybody wow do you guys want to like come in a little closer this is a very large room and very less amount of people I guess we're standing in between this session and reply so we'll try to make the rest of this as valuable as we have already um kicked off with um gangu as well as on so my I'm a I run team Amazon at Intel and really excited to be here uh I know it's the last day but it's been an amazing journey at AWS reinvent I hope you guys have had fun I hope you have learned with us in the industry time and time again we recognize that when there is a technology that's accessible that's consumable and when the barriers of his adoption are low this gives rise to a tremendous amount of demand and through that demand rise various business models various usage models and ultimately opportunity cloud computing continues to bend the curve on what is possible and AWS is no exception so AWS is one of our favorite Partners there's also something else that we recognize as Intel in the world of cloud there are three things that Intel brings to the table to make Cloud actually come into be number one performance number two performance number three any guesses performance performance as well as cost and TCO efficiency is so important when you guys are taking your journey on AWS cloud and exactly to address that we at Team Amazon do two kinds of things with AWS one side of us make sure that our latest and greatest of our processes are C customized specifically for aws's cloud so as Jason mentioned um we have close to about 400 or 450 instances and you would have heard Matt Garmin talk about the couple of new instances that we've launched we've launched the i7 IE instance today as well as the U7 instance for sap but basically what we do with AWS is we make sure that they get access to the latest and greatest of our instances as well as our Technologies to ensure that we are able to since the past 18 years customize for your requirements by your application needs and what you want to move from your different worlds into AWS cloud and regardless of where or which service you're using just make sure that you know even if you don't want to know these specific instances just know that Intel normally is always under the hood of all of the ec2 environments that you guys play around with so if you look at the slide that I have regardless of the applications that you guys are driving or using within your environment whether it's sap whether it's storage whether it's memory optimization whether it's your HPC services that we've talked about yes um and so basically as you guys can see we are making sure that we have optimized the journey but once we have optimized aws's cloud with with our instances there is the other side of Team Amazon at Intel that starts to work with you guys as our customers and partners to help you with the optimization Journey you saw a beautiful example right here because Finance is continuously a very strong industry that's adopting Cloud the work that we've done with DBS is pretty special and this exact optimization journey is what team Amazon on the other side of the world also wants to make sure that we are offering that in partnership with our favorite friends AWS and so the typical Journey that DBS took also and of course all the rest of our customers it's really threefold so again back to your response if we did do benchmarking if we did the optimization as well as execute towards the uh movement of your Cloud journey and then validate it by modernizing the availability of your applications we will make sure that we can help you with the best cost benefit as well as the TCO for your Cloud environment or your Cloud infrastructure doesn't really matter which application there's always some Intel instance under there that we can then unlock and unleash the juice off with our features that are built into our instances and make sure you see that end Journey come alive now we're at AWS reinvent and probably this has been the war cry or rally since the since the beginning of of every um event that you guys have been attending definitely reinvent has been talking about how AI becomes a real reality how gen is going to transform the world um but I like to take this in a very different perspective while we have ai it's really just about data we've got a Forester survey that basically talks about the fact that by 2026 there's going to be 500 billion devices and over 7,000 billion sensors smartly connected over the internet and 45% of those devices are going to communicate machine to machine so what we need to think about whether it's machine learning whether it's deep learning or it's geni we've got to figure out how we can make that data actually work for you and give you the Insight so that you could build better products and be more successful in the market and so that journey is more crucial for us to make sure that you validate the use cases you validate the reason why you're taking on a geni project and then make sure that you've got a good community of people that can make that happen with the right type of skill sets so regardless of which industry you sit in geni is a reality machine learning and deep learning are a reality now what does Intel bring to the table for AI G uh an actually talked about it and so did gungu it's the conversation about choice cost efficiency performance now whenever you guys pick up a gen project the auto default is a GPU instance every GPU instance has a CPU instance built under it and so when you use Intel CPUs we have a feature called AMX it's called Advanced Matrix extensions these are AI features that if we unlock for you with your applications in your Entre ire trained model Journey we will be able to help you boost your inference and so once you have faster inference your end production is going to be cheaper and faster just because you've unlocked a little bit of juice from our CPU instances and that's the myth that we want to bust that not every Journey not every llm requires heavy expensive accelerators you can do them with your existing environment within the CPU spaces and in addition to the fact that you know AI is everywhere and has been democratized the other industry that has been democratized as well I was just at supercomputing holding the award along with Ian on the HPC award that uh that AWS W but HPC has also become mainstream are any of you guys in the HPC World weather education research okay well it has become that mainstream that that you possibly need HPC within your world if you have a lot of number cring to do especially when there is data you're possibly going to have to do a little bit of more HPC adoption whether it's predicting the weather whether it's curing cancer doing better drug drug testing or it's about making sure that you make better products or benches and chairs and Manufacturing regardless of the HPC application um and I know that on you shared uh the compilers and the performance analyzers that we have but this is the actual stack that we applied with gungu as well as his technical team uh at DBS to make sure that the performance optimization that I've been talking about comes alive so not a lot of people not a lot of you might even know that while Intel has silicon what makes the Silicon work is the software that sits on top of it and over I think it's been about 50 years Intel has been building the best compilers that are on this planet but in addition to the compilers under our 1 API toolkit we also have performance analyzers we also have math kernel libraries that sit and make sure that your math functions are also as optimized and then we also have cluster tools to make sure that if you've got some clusters it gives you the right recommendations to ensure that your cluster is working far fast and wide so working with the 1 API toolkit on your environments would be a super easy easy thing for us to do and enable for you make sure again join us on the engineering Journey AWS has all of the toolkits so do we and so really the key takeaways from today's session you guys has we're very excited every single time that we have a success story like DBS on stage but we'd love to see more while AWS takes you on a journey for the migration that you guys are taking in or embarking on we want to make make sure that if there are Intel instances under the hood we would love to work collaborate partner with AWS and make that Journey more successful for you faster for you performant for you so would love to open the stage up to make sure make sure that you guys could come in and collaborate with us we appreciate the partnership from AWS thank you on and thank you for the HPC team and appreciate the partnership from DBS as well we hope that we could do more with the workloads that they move to AWS and then let's make sure that you guys take advantage of the existed Solutions and the recipes that we built across the different Industries and the customers that we have worked with really that is uh the big point at the end of the day over here um but uh in addition to that maybe I would like to invite an and Jason back on stage to see if there are any other questions okay awesome well thank you so much you guys I know we've got two more hours come visit us at the booth because you don't want to get ready for replay instead so come visit us at the booth we've got some cool cool demos to show you about the different optimizations across different Industries uh join us learn more about us at amazon.com inel and let's keep collaborating have a rest a good rest your day thank you thank you
2024-12-14 04:41