AWS re:Invent 2024 - Generative AI–powered graph for network digital twin (TLC202)

AWS re:Invent 2024 - Generative AI–powered graph for network digital twin (TLC202)

Show Video

hi welcome everybody every few years technology presents an opportunity to completely reshape and transform how we build networks run them and operate them and that's something that we're going to talk about today welcome to TLC 202 my name is Robin harwani I lead the worldwide technology teams focused on our customers and partners within the Telecommunications IBU and with me I have Oe why don't you introduce yourself Al Yeah Hello nice to meet you I'm Olivia shimo I'm VP of smart networks and data in in orangan thank you and Ean the Visionary who started this effort within AWS hi everyone my name is IM Bay principal I IML solution architect in theu in charge of the II and network domain thank you even thank you oi so as we go into what we're going to look at today generative AI powered graph for Network digital twins building smarter Network leveraging the Technologies available today which were just not possible if you look at an operator who a telecomunications operator that is running a worldwide Network across multiple countries such as orange operating in 23 countries they have unique challenges problems that are geographical policy governance and so many different facets that they need to plan for and what we're going to talk about today is go into this conversation the technology collaboration that we have had with orange over the last years uh and talk about the problem set that orange is dealing with uh across the world this is by far a very very complex Network that operates across multiple countries multiple geographies different parts of the world across continents and that's something we're going to talk about the challenges that we Face there and all is going to walk us through that after that Eman is going to walk us through the problem framing the solution uh that we are looking at from a graph potential numeral networks as well as generative AI Technologies how they come into it and then demonstrate how all of this works for a Network that has thousands and thousands of network elements across access transport and core networks and that's something that's going to come up in the mobility scenario but it can be applied to any network fixed Enterprise Mobility you name it so we're looking up to looking forward to sharing what we have learned in this space and then going into the depth orange and AWS orange and AWS have been Partners uh at least for the last six seven seven years that I have been at AWS and we've had a deep partnership across orange business uh orange technology teams as well as within the the the network domain now over the last few years we are collectively working towards supporting orange and enabling orange team for a their aim to reach level four in terms of autonomous networks by 2025 and I'll share just in a little bit what that means solution collaboration uh for paths to production you know we want to leverage the latest and greatest Technologies when it comes to uh the capabilities available in the network as well as in AWS to bring the autonomous Network Vision to life so what does autonomous Network truly mean autonomous networks based on the the guidelines from industry bodies like TM Forum is is a level to of a network where the network is able to detect breath and make sure that there is automation within the observability management uh understanding of faults root cause analysis and bringing to life the failure detection and failure remediation eventually these are stages of maturity that have been discussed in the industry but the Advent of in artificial intelligence and generative AI makes it possible in conjunction with Technologies like graph neural network to make this this a reality I'll hand it over to Oli to brief us as to how they see their Network the challenges they face across these 23 countries and how they have started the Innovation with us AI sure thank you Obin so just just a quick look about orange for those who are not familiar with with us uh we are hotel cooperator we are operating mobile and fixed Network across 26 countries uh these 26 countries are mainly uh in in Europe and in M region mostly in Africa for M region plus plus Jordan and Egypt that are a bit of size we have almost 300 million customer hopefully we will cross the bar soon more than 100,000 employees and we are the eighth uh Global Telecom brand uh in in the world so what what you need I think to to understand uh about about Telos if you are not familiar is is the size of what it what it means in terms of the size of our network uh so the the first first thing is that we have a lot of different equipment in different network domains like transport cor Network radio Access Network we have more than 1 million piece of equipment and we have to manage this piece of equipment we have to understand when they are healthy and when they are unhealthy and the uh complexity about it is that the their profile is is really different if you consider a base station for instance in in Paris in a very dense uh City environment and a base station that maybe in ginaa in a rural environment the shape of the traffic the patterns of the traffic are clearly super different uh and uh it means that detecting anomalies for instance with static methodology just does not work because the the patterns are so different that you cannot do that and this is one first reason when why we need AI because the network complexity is getting big bigger and bigger and we still need to manage this this equipment but we need clearly new methodology that can uh let's say um accommodate this difference between uh different equipment another uh thing is the number of um the volume of data and and and the number of type of data that that we have so just another example is uh what we call the cold detail record so cold detail record are the way operators used to build the customers so at the at the old time of analog call a call detail record was generated when I when when I call emen for instance at the end of the call I had I had a call detail record that say okay you call the man this is your number this is her number and you call her for like two hour maybe maybe two minutes or two hours I don't know depends on depends on the day uh and um now you have C detail records for everything when you when you go to the Internet when you you you go to Google Maps and and so on you you have thousands of C detail record that are generated and overall for for orange uh the number of cod details record we have every day is more than 1,000 billions if 1 trillion if you if you imagine the volume is also huge uh the cumulated volume of um all the data that is generated by equipment and probs is uh above what petabyte every day obviously we are not recording everything but you can imagine that as such it's a it's a challenge to manage this level of data and we have even more type of data that we we are not looking at so much like like CIS log and cold Trace recall and there is a huge potential with this data to continue to increase what we're doing with with the network and finally we have also millions of alarms um so you can you can tell me why have billions of alarm if your network works properly the thing is that an alarm is not necessarily an incident if just another example if you go if you're driving and you go to a tunel Chunnel maybe you will have a c drop it generates some alarms but it's not an incident so we also have to manage this huge volume of alarms in a way that cannot be humanly looked at and and this is I mean uh part of the of the project that that we we are doing so just a quick look about where we are using AI so the the first area in the story uh where we we started to use AI is about Network investment it's about the decision making of where we're putting a new site where we are adding capacity and in order to make this decision making correctly you need to have a methodology to understand the way the traffic will evolve side by side and even cell by cell sometime AI is good as doing that you also need to have the capability to understand the way uh the customer will evolve so for instance if there is a shopping mall that would be created in one year so you need to invest there if you have Enterprise that are moving you need to change your investment plans and and so on and this is the first area where we started to use AI with very large return of investment if if you are a Telco and you're not do doing that you should do that you will I mean it's a matter of tens of hundreds of millions the second area is predictive Network m what does it mean it means that we have a lot of operation to do on the network and it could be reactive or it could be predictive it's better to be predictive and in order to uh be predictive you need to be able to predict and AI is very good at at prediction and here again there is a really very large source of money uh that that that you can get one specific area is intervention so um obvious usually with these millions of equipment that are across a lot of geographies we have a lot of people going on site to maintain this equipment to change boards to make operation and so on it's about two billions a year for an operator like orange The Spence that that that we go in in this direction uh and if you can avoid uh let's say useless intervention then you uh you have immediate benefits in your Opex obviously and also in your carbon footprint because this is in many countries this is the first source of carbon footprint for operators U before sometime depending on the on the country before the network network optimization so we are not using AI only to avoid incident or to manage incident we are also using AI to tune the network especially the radio access network uh a radio access network has thousands of parameters so obviously this is not something that you can let's say obviously you can manually do that but you you you do not get the the last 10 or 20% of benefits that you can get without Ai and we we' started that and finally the network change so it's another process so Network change means that uh every day we need to introduce new equipment or we need to change some equipment or we need to make software update and this kind of of stuff uh and um uh it's changing something is always a risk and in order to manage this risk you need to be able to kind of simulate what what will happen what could be the impact who could be the customer impacted is is it let's say very high value customer and and so you apply different procedure and here again doing uh digital twin and uh doing simulation is is a must have um all right just a couple of example of some use cases that we are doing in in countries so uh for the network so on the left side you have Network Operation Center this means these are the teams that are monitoring the network and managing some change remotely and on the right side you have field operation which is a couple of use cases that we that you have here where people have to move uh to the different site so just one or two example network capacity I talk about it already with um investment optimization uh the project we are working on uh with AWS is is about uh using graph representation of the network to do root cause analysis we will go in the details about about this one uh we are using AI for instance to um manage ticketing so ticketing is a is a basic process that you have everywhere on operator when you have something to do on the field especially uh and uh and tickets especially for incident tickets could be really comp complex it's not just like okay you have to go on this spot and you have to remove this board and and put put this board usually the tickets includes a lot of thing about uh the anomalies that were detected the alarms and so on and a ticket could be like 20 penes it's not it's not like a Metro ticket you know it's it's like a book sometime and uh and we have we have we have started to to use gen uh to um basically summarize what is important in the ticket summarize what is important to know for the technician to do on site and also to include to enrich this ticket with some documentation for instance about the uh for instance the the brand of the equipment or the type of equipment that that needs to be to be touched uh and uh let's say maybe one one last uh example uh we we are also using AI for two three years now to detect incident in the voice quality so the idea of this detection is uh to look at uh time series of of kpis and to detect if if there's something let's say unexpected in the in in the pattern of the of this kpi and this is usually a way to to start investigation um yes and and just just one thing that I have not said so we are doing that across all the geographies basically France is our first uh Country and this is also the the biggest country and where we have most of the team so we are using France usually to to uh to to start some project but uh we are obviously uh doing that in our 26 country so let's focus now into the discussion that we will have today most of the discussion so here you have a typical example of the way uh incident are managed with AI for an operator like orange and the first step is to detect the incident it could be obuse but it's not obious at all uh I I told you uh some minutes before that we have millions of alarms so it's not just looking at okay there is a red button uh starting to flash because you have red button flashing uh like it's it's it's not it's not it's a Christmas tree basically that that that you have so the first thing is to to look at the different metrics that you can collect and to detect that there is something strange or something unexpected or something unusual happening and AI is very good to to do that when you are sure that there is something weird happening then the second step is always the same it's okay there is something weird happening but what is the source of that what is the equipment that is is uh that is problematic or the system that that that is problematic so root cause analysis is the Second Step it's clearly uh the step on which we concentrate the most today because the first step now is kind of let's say mastered and we are deploying that widely uh root cause analysis is a bit more complex and imen will will talk about that and the last step is okay you have detected that it is this specific equipment that that needs to be uh fixed but what is to be fixed what do you what should you do do do you need to restart do you need to change the equipment do you need to make a software update so this is the remediation AI is starting to be useful here but uh I would say it's it's kind of experimental today most of the remediation action that operators do today are at least confirmed by human people and usually uh triggered by uh by by human people all right so root cause analysis so this RCA that you see everywhere means root cause analysis so this is the second step that I have described the way it works today uh is first we're using time series so time series could be uh time series of kpis like for instance cold drop rate you monitor the number of cold drop of the ratio of cold drop that that that you have uh and you you you try to detect that there is something unusual ual you can do that with one time Ser you can combine also usually you need to combine with several kpis because in order to be sure that there is an incident one kpi is usually not enough and finally you can combine that with other piece of information like alarms for instance uh and you can also include uh in in this uh investigation some what we call Expert rules um so this this is basically the way it works today uh but there are there are limitation uh and um the the first uh limitation that we have is these expert rules uh this is the way really the operator have managed that their Network Operation Center uh in the in the history they receive they collect a lot of of data and they apply they apply some rules on this data for instance if this specific kpi cross this St and you have beside another kpi that is below this threshold then you need to to do this and with time they enrich these rules they try to maintain these rules but typically for a network Operation Center of orange uh you have now thousands of rules that that are in place and the problem is that it's super difficult to maintain it's also very impacted by when you introduce a new technology and it's it's it's very static and kind of inefficient it takes a lot of time with this methodology to uh especially to make the the root cause analysis another problem is uh we're using a lot correlation so we as said we have different kpi and we can correlate them together and with this correlation uh it can help to make this root cause analysis but the problem is that correlation is not uh you can have um a correlation between uh let's say eating chocolate and being a Nobel Prize probably it's not because you eat chocolate that that you will be a Nobel Prize you know and it's a bit the same in the network so correlation can somehow help but it's not at all a good methodology to be sure that uh of your Ro codes and uh finally another problem is that a lot of the data that we collect are coming from Individual equipment uh but what we what we discovered I would say with new methodology for the last two three years is that actually you you learn more about the relation between equipment than really what is extracted individually by specific uh equipment and though and so we need we need to let's say analyze the whole system together with all the coration between between the system including with a multi-domain approach I mean uh historically you are you had guy operating the transport Network other people operating the Enterprise Network other people enterpr the radio Access Network the cor Network and so on kind of in Silo and really uh with the complexity that is increasing it's not an approach that is sustainable and now we need an approach that put kind of all the system together analyze the relationship and this is really the core of what we're doing with AWS so with that uh I I give the the remote control to thank you thank you Olivier so as Olivia was mentioning correlation it's not cation and the network dependencies it means how the network node are connected together were lost in the previous deade into a lot of tabular data so uh instead of keep going with you know anomaly detection aggregation that are static and a lot of subject matter expert rule to put into a rule engine and then keep moving with this uh those approaches since years we stepped back and redefined the problem differently so our our mindset here is to say we need to think the network data right the network data it's not a tabular data only it's not uh let's say um unconnected data so it is connected data that it's covering uh going through for example radio Access Network transport core Network and also to the user home network so how we are doing this so uh if you are working in the network domain I don't need to convince you that the network is a graph but if you are not from the network domain let me give you an example I am a user I have my phone is connected into a cell the cell is connected into a Mobility manager to capture where I am in term of location and then this Mobility manager is served by routers it is connected to other databases where my profile is is described so all the network node are connected together and they connected through what we call network interfaces so the network interfaces basically you can measure the latency the Jeter you are measuring how this link is doing so that your call or your streaming is doing well so all those measurements they can have alarms if you have for example degradation in your call or degradation in what you are having uh in the service that you are using so this is basically a very simple example of mobile networks where you can see the different you know uh different nodes in the network and the link in between are the network interfaces that are called edges in the graph domain so basically here you can see that you can have persons you can have databases you can have routers you can have base station so by definition the network as a graph it's called hetrogeneous graph so hetrogeneous graph it means the network nodes are all different they can be human again router databases Etc and the other criteria is that the network is evolving over time so you can have new nodes when moving from 4G to 5G we have new nodes moving from 5G to the C 6G we will have new technology to be added in the network so this is from a topological perspective it is changing over time but also each node is having properties that are the measurement the kpis and the alarms those are also evolving over time and changing over time and as Olivia was mentioning it's not one or two kpi it's thousand of kpi per node so today there was databases for kpis databases for alarms and the topology and the inventories and the dependencies were completely lost when we are doing AI so keep in mind that the network is heterogeneous and temporal graph so this is the first uh foundational question that we are having here as part of our approach now how we are graph how we are using graph for the root cause analysis or why the graph is important for the root cause analysis so basically again we come back to the concept if you Google like correlation it's not causation you will see that this is a topic since 20 years that everyone is trying to solve so here we are revisiting this with the graph Technologies to bring the connection together in the network so this is the first component we are leveraging the connection that are natively existing between the network node cross the different network segment the second part is that we are capturing this is the uh The Innovation we put in place we are capturing the temporal changes because if you don't capture the temporal changes of the network you will miss the current state you can you will miss the current exact behavior of your network node into a graphical area and the last part the graph techniques or the graph as an approach is meant to scale it means it can go from a couple of nodes to billions of node in the in the and then you can capture all the connection there and the advantage here of having this built in tows it's our techniques that are optimized to run in millisecond with billions of notes in the graph and here you can visit for example Amazon Neptune Amazon Neptune analytics and the latest uh advances on the graph rack that were announced this morning by swam so the other question that okay now you have your network represented as a graph you have your ions of node that are represented there you can go from a simple network node into also very low granularity like representing even the card the node and other for example configuration files you can go to whatever granularity that you would like to have and here we help it orang to have a trade-off between what is needed to be modeled to get into the road course analysis without overloading the graph and having a lot of noise that's very important also in in supporting the modeling part so once you have the graph uh the graph as a database with all the network nodes represent presented how how you will I mean which technique you are going to use on top of this graph so once you have this graph representation in a digital format into a machine readable format which we call digital twin you need uh let's say optimize a technique to Traverse that graph you want to know for example uh how to predict the for example I take an example of of the network domain you want to predict the congestion in the in in the cellular network today you will use classical machine learning to detect anomalies you will have a lot of noise when you are detecting the anomalies because you are not taking into account for example the cell interference you are not taking the distances between the cell those information could be simply represented into your graph and to capture them if you use graph neuron Network which is just simply deep learning on graphs you can do forecasting you can do anomal detection but in the same time you are feeding to the machine learning the dependencies and this is a game changer for the network domain the other part is the scale of course you want you want techniques of deep learning that can scale to million of node that can scale to changing Network domain and network representation and this is what Neptune machine learning which is the layer that it's with built-in models all of them are open source optimized by AWS and running with Amazon Neptune layer within the Amazon Neptune graph database they are built in for you to use and to uh to to run for the different use cases again for for example regression classification clustering of graphs Etc the last part is that if you are working also on the Network domain you know that there are many uh let's say uh customization that are needed so we are here having Sage maker to help you use for example an open source graph neural network which are from the from the family of what we call spot temporal graph neural network that capture the topology and also the temporal aspect and you can uh use those model andj them into a sage maker platform and then you will use all the mlops feature the evaluation of the model the different criteria for running this into production so you have different services and different support to run graph techniques graph neural network techniques on iws and specifically for the network domain and and use cases so remember here we have the first part is the graph modeling the second part is the graph machine learning and the most important part is the graph analytics so what is the difference between graph neural network and graph analytics the graph analytics are not necessarily machine learning models so you don't need to train those are kind of Statistics or probabilistic or model that will run into the graph Traverse your graph and bring to you the insights so example you will know what is the most important node you will know what is the most important Network note connected to the others you will be able to create communities in your network so for example uh as Olivia was mentioning about the traffic you can uh for example um uh cluster your network per traffic pattern you can cluster your n network by the volume of alarm occuring in different geographical area and you can have different insights in a couple of seconds so this was MGA last year in last reinvent so here it's all also open source model so you know the models you know what you are running there is a lot of transparency there and it is all optimized to run in millisecond for billions of nodes so we have three component that are the foundational aspect of our approach the graph modeling the graph machine learning and the graph analytics now with those foundational component we build the uh Network digital twin for orange for Mass orange uh for with the network operational team and the support of Olivia teams those are the feature before going to the feature I want to remind what are the problem we are solving we are solving that the inventories are not complete today the network inventories are missing information we are breaking The Silo between the run the core the transport and the uh microwave uh Network because we need the data to be seen unto one by the machine learning and by the graph analytics and we are scoring the different node to find out and isolate the problem in the network in couple of seconds so basically the in summary we have we will show you the an an topology cross three Network domain today from different network vendors so we break the sty also in term of data model and in term of differences between the thex we also will show you the temporal Network graph so this is one of the first like worldwide representation of how your network is moving over time with the differences that are changing in term of kpis or cences of alarms and other and other and the last part of course the generative II is consuming those insights so the generative II is connected to the uh graph analytics endpoint it is connected to the graph database it is connected to the graph ML and consuming those insight to deliver one Consolidated report to the network operational team so with that let's see this in action so here the idea is to show you those features with a video demo that uh is running on Pont VRA region and on mass orang m orang uh in Spain so we worked on uh on one region to capture all the different difference in that region and identify the RCA so this is how it looks like the um the UI of the tool first of all so uh this is how Network operational team will navigate will use this uh the first feature is a network topology Explorer so graph Explorer is already available in Neptune our AWS Amazon database where you can uh visualize your network topology and and identify the insights so here for example the network operational team will select node which is W wireless device it it can in a couple of fcks unfold and find that this device which is from the wireless domain is connected into an aggregation router from the transport domain so this is going be beyond the silos that they used to have between the network tools and the OSS is there so here it is bringing together those View and can you see it's each time it is you know clicking on that it is unfolding and discovering the topology and you can see that it is going from the microwave to the trans ort into the radio Access Network now the other feature is that this uh the graph I mean when you are modeling as a graph there are different languages to access the graphs there are for example open Cipher gremler or ADF the network team are not necessarily you know using those languages in general so this is why here generative II is very useful they will navigate the graph they will extract the Insight from the graph without them going and and to to learn those uh languages to extract the information from the graph so here you can see that it was showing all the differences alarm from that gra uh the other feature which is the core of the topic so here each of the scenario that you see here is taking 7 hour plus for the network operational team to find the root cause so here we are showing you the example number three so these are sequences of alarm in a couple of second you will see that the graph analytics calculated the score so what does it mean it means the the the algorithm was seeing all the nodes with alarms and telling you all the ones that are in purple are not the problem it is finding the problem which is the microwave device in yellow in a couple of second the blue ones are the radio network know that are impacted and this is the root cause so this is as simple and as fast as you see it here this example was taking 7even hours a o te to be find and here in a couple of second the algorithm spot this directly so all alarm the run but the problem is coming from the microwave and the algorithm was able to den noise the other alarm and point the problem directly so generative II is trigger it automatically once this calculation is done to share the report so here if you can check the report just a couple of second to read you can see that it is navigating the heartbeat alarm that are appearing in the run it is telling you when those alarm appeared and it is bringing to you the root cause which is completely from another domain so today why they were spending hours because they have tools for the microwave tools for the Run tools for for the core and they need to navigate those tools separately there is nothing bringing the dependency in between so this is the power of connecting everything together this is a more complex uh scenario and you will see here the list of alarm so intentionally showing the list of alarm you can see how differences how different are those alarm and this is only a subset of what is occurring so when you look at that it's impossible to find the root cause again here the algorithm take all those alarm map them into the topology and run the graph Analytics algorith and you can see that the problem here is the router in yellow all the thing around in green and in blue are the ones that are mostly impacted and all the others that are far in the graph are completely just noise and not really impacted by the problem so here again uh the root cause is identified this is the router that is uh causing the the problem into the network and having a trouble for uh the the radio access network node and then the model can of course uh as Olivia was mentioning can go until the recommendation of what can be done in the network and proposing how to travel sh so this is uh basically uh the uh the difference uh between rule based and um uh graph analytics when you capture the dependencies how you can go into the root cause quickly so now the last feature to show you here is this is for example the view of the pedra subset of the node that are running and running by means this is the topology evolving over time the KPS for example we will zoom on the place where you have the red links this is to show you that this is a red link now you have your graph analytics you click on the button you know what is the root cause in a couple of second so you go from a lot of like say screen with a lot of kpis to follow with threshold into into a temporal topology that it's telling you where the problem is and you are augmenting this with generative I and graph techniques so what is behind the scene you know behind this magic there is a workflow there is a lot of data preparation there is a a data pipeline which is the graph data pipeline in this use case so basically here of course we worked with mange teams and Olivia teams to first of all understand the logic what is the granularity that they want to identify and model in their Network and this is of course extensible if they need to have uh more more network node uh modeled uh the first part is modeling the network topology automating this this means that each time you have the data coming from the network this is all automated and it will be represented as a graph in the graph database directly so this is kind of topology Discovery automation capturing the changes over time the Second Step which is very important is you need to inere the missing links because again the network inventories and if you're on the network domain you know that those inventories are not complete so you need to inere the missing links and here we have techniques like um link prediction which is a machine learning model that is graph machine learning model or graph neural network model built in in Amazon Neptune ml that you can trigger and it will inere for you the missing links to complement your topology so once you you finished step one and two now you have let's say an n21 topology completed and of course you keep uh upgrading and verifying this over time this is a data pipeline so it means it is always changing now you move to the third part which is graph analytics from uh the neun analytics engine and Sage maker for the uh graph neural network and the generative AI accessing to all those insights another view which is more the architecture from uh from U data driven perspective so this is high level view of the architecture and if you want more information please reach out so let me describe it per block to simplify the view so the first part is the data preparation the network data is heterogeneous so you have XML files you have yl files you have csvs you have uh you know tabular data and structured non structured data and you need to make sense of all of this because all of this is connected at the end of the day so to build the graph model we first start by a lot of data engineering that we automated and put in place to transform row data into graph format and from there build all the analysis that's the first block the first block is the data preparation the second block is the intelligence and the intelligence we are combining graph neural network and graph analytics for extracting the insights from the graph topology and all the alarm that and kpi that were maing into it and the last block here is the construction of what we call the network knowledge it means your graph database with all the temporal uh data that is connected to each and every node and of course the generative II to access this information and make sense for the root cause analysis for the change management it is actually fortive scenario and can work also for it Network for the visualization of the um temporal uh temporal Knowledge Graph we were using Tom Sawyer which partner of iws to visualize this over time and connect this directly to the iws database now let me let Olivier talk about the benefit of how this approach helped them in the root cause analysis thank you man thank you man it was it was really brilliant uh and it's it's really a project that that is fully in my heart so I'm very happy to be that there there today uh so just just before wrapping up a couple of U let's say summary of of the benefits that that that we get with with this project uh the the first one is obviously the time to root cause analysis what we call the time to root cause analysis which is a proxy of the time to remediation because uh a lot of the REM the remediation could be could be short depending on what what what is to do uh but uh for root cause analysis in the example that uh Ean showed in the two examples we were about SE several hours order of magnitude it was seven and N hours I think because we we made we made really the exercise to to do that manually and the system that Ean presented um has has been successfully capable to deliver the right right root cause analysis in a couple of seconds maybe in 10 second 20 second it does not it's not really important it's what you should have in mind is we we come from hours to seconds so this is the this is the first benefit another benefit that could be a side benefit but it's not is uh that this system is also capable to deliver the endtoend multi-layer topology of the network and again uh the way Network are done the way what we call OSS so the the it system managing the network is is still very siloed which means that it's difficult uh to have the endtoend view of the let's say the the full Network and here in this case we have we have done the the job with the transport Network and with the radio Access Network combined together and uh building the capability to do root cause analysis on on on this whole system uh in the future we could add all domain and uh obviously the the last benefits in in this in this summary is the alarm management again we are talking about thousands of alarm every day and uh this uh system including the add of of gen to let's say create the automated report and also um drill down uh because you can also raise a question with the with the geni uh about an incident and if you don't have the complete information in in the in the report even with people not capable or not killed with like SQL or other technology to really dig in the database the the Gen that that we put on top uh was capable to both deliver the report and interact with the with the network operation teams so uh it's not done we we still have a lot of thing to do uh it's it's a project that maybe I will be happy to come back next year to to to give the last uh the last word about it so the the first thing that uh we we need to do is to um make it really live in a sense because uh we we have done this as an experiment to to make it fast uh we we have used um extraction of it's it's it's real network data obviously uh but it's it's based on extraction so this means that the full benefit in order to get it you need to have life PIP plan because what you want is to detect an anomaly and to make the root cause a couple of uh minutes maximum after the incident happen and obviously if you rely on on batch then you don't have this benefit so this is the this is the first thing on uh on our table to to do to make it live uh the second thing that uh that we need to work on is to scale uh to a full country So currently uh as IM said that uh we we' we've used the data from a region that is called portra in in Spain uh and we should go to uh let's say uh multiple uh multiple cities multiple region or even the full country and uh and we also need to continue to work on this special temporal graph Neal Network we have a very good basis it's working very well uh but with going live and also increasing the scale we will probably uh fall on some uh let's say new issues that maybe we did not have yet on on this project and it will be the opportunity really to make um this uh this tooling even better that it is today yeah I think that's it so maybe a couple of I just want to say thank you thank you uh to the orange team uh for for all the work Ean over the last year coming together bringing this entire collaboration to life from data management perspective Network topology Discovery perspective bringing together graph attentional neural networks and then making it real with generative AI I think this is the proof is in the pudding what we saw uh scaling these Primitives that the cloud architecture from AWS provides on a operator that is leaning in to the vision of autonomous networks is is how I think we can make the future uh real so thank you thank you so much appreciate it

2024-12-22 14:32

Show Video

Other news

The Nuclear Option FULL SPECIAL | NOVA | PBS America 2025-01-13 18:00
Ancient Indian Civilizations - Advanced Alien Technology | Full Documentary 2025-01-13 11:25
US Blacklists China Tech Giants, Used Rolex Prices Fall to 3-Yr Low | The Opening Trade 01/07 2025-01-10 13:33