How to integrate and verify time-critical applications on DO-178C multicore platforms

Show Video

first of all a very brief introduction to provide some context um for the use of multicore processes in saf to critical avionics applications uh now many of you may be aware that uh multicore processors have been have seen widespread use in a range of Industries um over the last 20 25 years maybe a little bit more um but as yet they've only just started to penetrate the avionics Market um and well why is this only happening now well multicore processes are fundamentally very complex and at least in terms of timing Behavior they are nondeterministic um there's also a lack of certification guidelines or there has been um and this lack of certification guidance means that there is significant extra risk and therefore potential cost associated with certifying a multicore based system um and because there is has been no formal guidance um there is been no possibility of a commercial solution to enable um to to enable the successful certification um so why are they suddenly becoming a thing now well first of all the size weight and power the swap Improvement that are possible um by collocating many different systems onto for example different CES of a multicore processor can be very very significant um there is also a long-term availability Gap U while for example 16bit microcontrollers have been available as single core parts for 30 years and I can guarantee you that 30 years in the future um they will still be available but at the high performance end of the market every other industry has been going multicore for quite a few years now um and new high performance processes are all multi-core these days um instead for high performance parts there are increasingly few single core options available um there is now formal guidance um cast 32A was was released in 2016 and while that was not formal guidance um it said so on every page um AMC 2193 which we'll talk about in a couple of slides time um is formal guidance and therefore there is a clear road map to a successful certification of a multicore processor so AMC 2193 uh we use the brackets because it's an acceptable music compliance in Europe um under's perview um it's an advisory circular uh in FAA pilum but the two documents are technically harmonized and 2193 formalizes objectives that were first introduced in cast 32A um with a couple of minor changes and clarifications so a very brief overview of these objectives um there are two planning objectives planning one and planning two which are really about what you're going to do and how are you going to do it um there are some resource usage objectives the first two are concerned with configuration settings for the processor um and the second two are related to interference channels and resource capacity and we'll come back to these topics in a few slides time there are two software objectives um software one and two which concern themselves with worst case execution time and data coupling and control coupling respectively um then we have a safety net objective which is all about what happens when things go wrong um how do we detect errors and how do we recover from them and the accomplishment summary is just about summarizing the results uh those eagley among you may have spotted that resource usage 2 has been crossed out um this has been disappeared in 2193 um on the basis that it is a duplicate of something which is already covered in AMC 252a so throughout uh today's webinar we'll be using examples from an industrial use case um this is a flight control system um plus Associated monitoring that is going to be hosted on a t142 that's a quad core platform from nxp um and the execution environment will be using the asterios engine um which is a real-time kernel that g will introduce you to in a bit more detail um the nxpt 1042 as I said it's a quad core power PC each core has a private level one and level two new cachee but there is a shared interconnect that all of the cores share there is some shared DDR3 or ddr4 memory there's a shared platform cache and of course all of the peripherals that are connected to the platform are also in principle sharable between the different cores and the particular platform that we'll be using as an example um uses four gigabytes of shared ddr4 memory uh we're going to be using some components from the rapid to Mac 178 workflow so Mac 178 is Andrew very briefly introduced as rap's multicore um certification solution and what we've attempted to do with this is to create something which is um d178 style collection of procedures templates and checklists um that facilitate um particularly the timing related aspects of multicore certification the entry point for all of this is our plan for multile aspects of certification at the top of the diagram on this slide and this this Pac is a companion document to a more conventional pack that planed for software aspect of certification um and as such details at a high level um the plans for the rest of the certification then we have our msvp the multicore software verification plan which is a companion document to the more conventional SVP and is the next level down in terms of detail for the proposed View and process um you notice that this diagram then splits into two on the left we have our platform related our Hardware related activities and on the right we have a timeing verification or software related activities so I'll will be talking mainly about the platform side of things and gu will be talking mainly about the software side of things our platform characterization is broken down into different steps Each of which has its own procedure template template and checklist associated with it first we have our Hardware resource identification Where We Are identifying all of the resources that uh may contain interference channels on the platform then we proceed to identify all of these interference channels which we'll talk about in a bit more detail in a few slides time we identify critical configuration settings associated with those interference channels and we characterize those interference channels to determine well how significant are they which interference channels have we successfully mitigated which interference channels do we still need to be concerned with um and then we write this up into a multical platform characterization report um you may also notice that there are two activities I haven't talked about yet that are hem identification and hem validation um these are related to identifying the performance counters that we're using um to verify the resource usage both of our software and to test that we're exercising the interference channels we think we are um so we need to identify them um this is explicit in 2193 and of course they are implemented as a handy debug feature they are not intended for high Assurance applications or for providing Assurance evidence there is also a hem validation step where we test every Hardware event monitor um that we're using to verify that it does exactly what we expect it to so for example if we want to use a counter that tells us how many cash accesses we have made then we need to test this counter before we can rely upon it on the software side um I will leave gon to talk a bit more about the sort of activities that we expect there but primarily we're looking at performing some analysis of the timing requirements um and then characterizing the software to make sure that these requirements are being met even when we have multicore interference U Mac 178 um is split into several different components um so all of the procedures templates and checklists along with some white papers I think there are six white papers at the moment um are bundled into what we call foundations so this is really the entry point to using Mac 178 and then we have what we're calling the blueprint which is an off the-shelf upskilling platform or learning platform to learn about multicore certification um there are all the tools some of which we'll be demonstrating um later on today uh and finally there are the services um where skilled rapitor Engineers can help um in whatever way is most useful for a multi certification project using Mac 178 um so um we're going going to be talking through some of the parts of a multicore development process so like any um avionics development process we start with some high level system design architecture and then this has broken down into some Hardware work and some software work and then there's an integration process that follows where the hardware and software are integrated and so particularly what we're talking about today is it the hardware side of things identifying timing interference the software side of things um mitigating interference um and then the integration side of things verifying that all of these mitigations are effective U and ensuring that our system is schedulable so first of all taking a deeper dive into the hardware side of things um I promised you would introduce interference channels so here we go so the mCP resource us 3 objective in 2193 um specifies that the require the applicant has identified the interference channels that could permit interference to effect the software applications hosted on the mCP cuse and has verified the applicant's chosen means of mitigation of the interference um and I have put here um the definition of an interference channel from 2193 that it is a platform property that may cause interference between software applications or tasks note it is a platform property it is not a shared resource and it is not an intersection of two paths between initiators and targets so I'll give some examples um from some of these different classifications of interference Channel we have um first of all we have direct interference channels so if you are to look at a block diagram for multicore processor and identify some shared resources um these shared resources all to contain one or more direct interference channels so examples here could be competition of a bandwidth or an interconnect or competition for bandwidth on a shared cache um or it could be competition for space in a shared cache or comptition for um slots in a queue or a buffer then we have indirect interference channels these are perhaps less obvious um and these are for example where if we have a network on chip rout um interconnect or a ring interconnect um there may be more than one rooting which is possible between any two nodes and the exact route that's chosen may depend upon what other traffic is currently present on that interconnect another good example of an indirect interference Channel um is a cache coherency mechanism which keeps private caches in different CES synchronized such that if the same data is present in two private caches and one core modifies it the cache coherency mechanis will make sure that the data is either updated or evicted from the other cache and so this is another mechanism by which something executing in one core may affect the execution time of something executing in a different core and finally there are unknown interference channels and these are things that uh maybe the behavior of the hardware does not match the documentation maybe there's silicon bug um an example might be in the nxp p480 um it is now well known that the four CES of this power PC processor do not have the same priority on the shed interconnect C zero will always win arbitration for that shared interconnect um it's very important that when performing tests to characterize interference channels if there's something that doesn't make sense or is not as expected um this is investigated properly because this could be um pointing towards one of these unknown interference channels so we've identified some interference channels then we need to be able to characterize these interference channels and so we characterize our interference channels um using rapid demons so rapid demons are what rapid to call our interference generators which are designed to generate configurable and repeatable contention on specific Hardware resources um and the idea is that use these to measure the possible impact of interference so in the diagram you see here on the slide um we have a victim an application or a rapid e running on core zero and it's accessing our DDR memory device and then on the other cause we can also execute rapid evons um trying to generate contention on that shared resource and exercise that interference Channel and we will hopefully be able to see some difference between when our software and call zero our victim is executing loan and when we have rapid is generating contention and we would repeat this process for each interference Channel we have identified so together with um our friends from asteros Technologies um we've been putting together what we're calling a multicore management toolbox and so down the left here we have the activities which we have just been talking about that we will typically need to perform and where where that fits in our volore uh management toolbox first of all we have our interference Channel identification and that is completely covered by the Mac 178 Hardware analysis side of things when it comes to interference Channel characterization so this is using rapid demons to exercise interference channels um and make On Target measurements uh well the rapid demons and the On Target measurement will rapid a we using rapid time which is our timing analysis tool um but our steros Technologies provides the technology to schedule rapid um using the asterios engine um and also provides some timing profiling capability as well in terms of interference Channel mitigation the Mac 178 um product does not provide anything in terms of mitigating interference channels other than perhaps advice on how it can be performed um but the asterios toolchain allows for example Advanced memory configuration tying exclusion configuration and a few other things um which gam will talk about in a bit more detail later when it comes to multicore deployment and sizing so working out for example time budgets well there is the timing analysis side of things using tools like rapy test and rapy time to drive tests and to execute them um to understand the behavior and worst case execution time of the software um and as stereos provides things like the ability to specify time budgets to compute um inherently schedulable systems um to provide some schedulability Assurance um the high level idea being to integrate a system which is inherently verifiable and which we can um have confidences will be schedulable before we've done any On Target testing and then the verification side of things again software timing analysis um verifying that our deadlines are being met is something which is covered by the Mac 178 tools um and the design verification runtime verification again G can talk about this in a bit more detail later from the asterio side um so with a bit more application to the t142 um we're going to now jump to a quick video segment where we're looking at uh what some of the characterization for a particular interference channel in the DDR controller of the t142 um might look like and so this particular interference channel is competition for R buffers in the DDR controller so the DDR controller contains a finite number of these things called row buffers which are capable of buffering a single row of data from the DDR memory device and it is possible for in fact it is commonly a significant interference channel in DDR controller there different tasks executing on different CES um can make requests of the DDR controller that can cause data to be Ed from Rob buffers um which belongs to other cores and this increase the latency uh fairly significantly um that's associated with making DDR transactions so I'll now jump to this quick video segment so here we have the rapid time guy um and we can see here we have some results for four different tests um so under timing we have measurements for when we have no Contender that is um a rapid demon running on one core and the other three cores of the t1040 are idle um then we have a second test where we have one contending rapid demon with two contending rapid Dem with three contending rapid demons this where all four cores are loaded you can see here we have statistics for the minimum average high water mark and maximum execution times observed in all of those cases um now we can drill down a little bit deeper and we can have a look at the ex execution profile so first we'll have a look here um this is what we call an Extinction plot um so this is telling us the proportion of executions that exceeded a particular um time threshold um you can see that 100% of execution times were less than 5.4 million Cycles give or take and then by 5.5 million Cycles we had um no measurements that were observed that were greater than that um we can also plot this as a h histogram like that um interesting to look at is actually we can plot out four different scenarios um so in green we have our Baseline scenario and we have no contenders and as we're adding an extra DDR rid um on the first core on the first and second and the first second and third course um we can see the impact that has had both on increasing the execution time and also making the distribution of execution times wider now of course when we're exercising interference Channel we need to have some confidence that we're exercising the interference channel that we think we are now in this case we're exercising an interference channel in the main DDR memory um so we have gathered a few metrics that are going to help us demonstrate this so we have stores translated we know we're not generating Stores um so in fact there are a few stores here we're seeing couple of hundred um which are related to the instrumentation overhead um loads translated while we're intending to do loads and so we're seeing just over 51 a half thousand um loads we're seeing the same number in all cases within a very tight margin um now every load that misses our level one cache will cause an access to our level two cache we hope and so we see here that the number of um level two axises that we're seeing is very very similar to the number of loads translated so this is telling us that um virtually all of the loads that we're um executing um are missing the level one cache and accessing the level two cache um then we have our level two data cache hits and actually we're seeing here that we're seeing a very small number of um level two cache hits a number that's a couple of percent of the number of aises um so this is actually telling us that the vast vast majority of the aises that we're making um across all of our test cases um are in fact missing both levels of cach and are going straight through to main memory um so this is how we gained some confidence um that our test case is actually exercising the interference channel that we believe it should be um that the second half of the video concerned itself with verifying using the performance counters that are on board on the T10 42 um to provide some evidence that the rapid that we're executing were indeed um both performing data loads um and these data loads were both missing the level level one cache and hitting missing the level two cache and therefore going right through to the DDR memory controller so um I have one final slide before I hand over to Gom to to take over um it's just to point out that the um RVs tools that are being used here this is rapid test to drive the tests in rapy time to collect the information display the results have a do 330 or ed250 and qual kits um and Rapid demens also have qualification kits to go with them as well and one key Point here is that the rapid time instrument um itself is qualified which means that there is no need to perform any code reviews for any instrumented source code um and last note before I hand over um I have just had in on our internal chat um the promise that we will be able to share links to the videos you can watch them in your own time after the webinar um and with that I will pass the buck to Yu-Gi-Oh thanks Sam so now we are going to have a look at the software part and we'll focus more particularly on the question of functional interference uh usually functional interference mitigation is under as part of the integration process when considering the actual deployment On Target but we will show here that it is possible to deal with it earlier as as part of the software process than thanks to Aus so as introduced earlier on the asteros solution is developed and commercialized by asteros technology and its main goal is to support the development of critical realtime software to with the integration on complex Hardware plat form and in particular multicore ones and for this it offers a programming language to help the user describe the application uh the application's timing Dynamic architecture based on High level requirements and then link it with the functional code and it comes with a tool chain which allows to compile the dynamic architecture and also to configure specific integration on specific Target in a first step the resulting application can be tested on a PC using the asterio simulator without having to deploy the application on actual Hardware Target and then in a second step it can be run on target thanks to the asteros engine which in particular enforces the dynamic architecture at one time and of course there is one asterus engine provided by Hardware Target and with current support for different kind of architectures power PC arms and RGS and also note that there is additional tools which come with the asteros tool chain to help support the certification of an application developed with asteros so now let's see what AOS could help with for developing a multic CO application so one of the objectives targeted by the MC 2193 is related to the concept of functional determinism so for example in case of con concurrent accesses to Shared resources time interference could occur and this means possibly additional timing overheads which could imped the execution of different tasks on different course and as a result we could end up with some task on one call which is delayed and executed after another call on on another after another task excuse me on another core and this could result in different data or control copying and so one major concern is how can we achieve determinism When developing and integrating multicore applications and when when we talk about determinism here we mean the ability to produce a predictable outcome and to produce it in a repeatable way as defined by the MC 2193 so to deal with this determinism issue we can rely on the asteros programming model and thanks to that model the user can describe the dynamic architecture of the application solely based on High level timing requirements that means without without considering the actual integration on tget and as a result we have a separation of concern between on one hand the timing AR the timing architecture of the application and on the other hand it's actual implementation on specific Target so being single core or multicore integration so how does it work in practice uh we have the Asos programming model which is based on the synchronous logical execution time Paradigm orlet and under this Paradigm the physical execution time of the different computations are abstracted by the concept of Elementary actions uh one Elementary action corresponds to some portion of the user functional code which execution is bounded to occur within a specific logical time window and that means it has to be executed after some release date and before the corresponding terminate date and moreover uh data exchanges between different Elementary actions can only occur at boundaries of the respective log ult time Windows this mean that a data produced by an elementary action is only made visible to the rest of the system from its terminate date and as a consequence the actual execution of the different Elementary actions cannot have any impact on data flow determinism so as soon of course as each execution fit in into its logical interval time at one time so thanks to the synchronous logical execution time Paradigm we can achieve determinism by Design and this has a huge impact on integration because the same Dynamic architecture that is to say the same SL design can be deployed over several different Hardware Targets in single core or multicore without having an impact on determinism and actually as as I said before it can even be tested in simulation the functional determinism of the application is solely related to its logical timing design and not on an actual physical not on actual physical execution times so how is an integration on a hardware actually under by a stereos so from the users timing design we have the asteros tool chain which automatically generate an integration specific configuration for the targeted Hardware platform so in particular this uh generated configuration will ensure the correct execution of the dynamic architecture and the estor tool chain guarantees that we get an integration for given Hardware Target only if this integration complies with the dynamic architecture and in particular generating an integration means Computing a physical scheduling which ensure that the functional code associated with each Elementary re action will be executed before it release date and after the release date excuse me and before the corresponding terminate date to enforce determinism and to do so we'll see later on WE rely on robust robust time partitioning so now we were supposed to illustrate the this um integration and how determinism can be ensured uh can be S preserved uh either considering a single core or multicore integration but as it's also done with a video in this demo we will illustrate a transparent distribution from single core to multicore thanks to the arterior solution for the purpose of the demonstration we focus on a single application which is the flight control system the flight control system application is in charge of commanding actuators to follow a Target speed and altitude the application is divided into several functions for requiring inputs Computing the commands and so on which are each mapped to a separate task not that the aircraft Dynamics are simulated using an additional task for the purpose of the demonstration we focus on the main functional chain which from a speed and altitude input compute the corresponding commments to be applied to the actuators and retrieve the actual aircraft speed and altitude five tasks are involved in this functional chain which are all 20 millisecond periodic note that we consider here a simple temporal pattern for the sake of the demonstration but complex multi-ray Dynamic architectures can be easily implemented with asteros as well data between the different tasks of the FCS application are exclusively exchanged using asteroid Communications which follow the synchronous logical execution time paradig thus communication data are only visible between tasks at predetermined dates corresponding to The Logical time interval boundaries this means that thanks to asteros we have a Time predictable design for FCS application independent from the actual integration on a hardware Target we will now consider two different integration for the St design one in single core and the other one in multie the FTS application is first deployed on a t142 hardware board by considering an allocation of the different tasks on a single CPU call here is a view of the integration tool provided with asterio dynamic as you can see we deal with a single partition which corresponds to DFCS application for this first integration all seven tasks are located to CPU core zero the application can be built from the asteros dynamic inter ation tool in one click the FSS Dynamic architecture design as well as the functional code is automatically compiled and linked by the asteros toolchain note that some specific configuration settings can be provided if needed for example to modify the memory mapping as part of the build process the asteros tool chain generates automatic configuration for the targeted platform to enforce Dynamic architecture determinism in particular a static schedule is computed to enforce the temporal design in the case of our current integration it results in a single core schedule the asterus computed schedule consist in a head part which is executed only once followed by a loop which is repeated periodically the CPU time on core zero is divided into several execution slots one for each of the FCS tasks then we execute our FCS application on the t42 hardware board at the same time we monitor their speed and altitude that are computed by the fsus in response to an input scenario the target speed and altitude are depicted in yellow while the speed and altitude computed by the FCS are displayed in green color these trait will be used as a reference for comparison with the multicore integration now that we have our single core reference Trace we will deploy the fs application again on the t142 board but this time considering a distribution of the tasks on all four Calles thanks to the asteros dynamic integration tool tasks allocation on the different CPU cores can be easily done for the demo we'll consider multicore integration with three tasks on core zero two on core one and a single task on both course two and three once tasks allocation has been done the application can simply be rebuilt without any additional activity from the user this time the as toolchain generates a multicore configuration for the the targeted platform to enfor the dynamic architecture determinism as a result a multicore static schedule is computed this means that the CPU time on every core is divided into execution slots one for each of the FCS TS as previously the asteros computed schedule as the head part followed by a loop once again we execute our FCS application on the t42 hardware board using the exact same input scenario as for the single core integration this time the air speed and altitude computed by the FCS for the current mtic integration are dynamically compared with the single core Trace obtained previously as for the single core execution the target Spin and altitude are depicted in yellow with the previously computed outputs for the single cor integration shown in green the speed and altitude computed by the execution of the current multicore integration are displayed in blue we can see that the outputs for a multicore integration it's perfectly the one for the previous single core integration so even if both integration are different and those tasks can be executed in different order the data and control coupling for application remains unaffected this mean that thanks to asteros we can achieve functional determinism by Design whatever the underlying integration but yeah the B basic idea here is to demonstrate how um we can simply just design our application using slet so the ASO programming model and then just deploy it on different applications so by mapping the different tasks on the different course and end up with the exact same temporal Behavior so as some said earlier we are going to to send you the the video links afterward so you you can have a look at them so uh as we've said just said the asteros programming model cl to ensure determinism independently from the actual integration and that is to say being a single core or multicore integration and that means uh improved testability and also higher scalability and modularity and with regard to multicore certification this mean that some of the AMC 2193 objectives are natively fulfilled at design level and we are talking here about mCP software one which deal with functional correctness verification but also as we said earlier data and control coupling verification in mCP software 2 so perfect uh now now that we have consider both hardware and software Parts we'll have a closer look at the integration part and in particular we are going to focus on the the issue of time sizing as part of this integration process and for a multicore integration in the context of AMC 2193 this means in particular dealing with time interference so as we've just seen the determinism for an Asos application is fully defined by the dynamic timing architecture which is designed by the user from high level timing requirements and also let's recall that under the asteros programming model a functional code is encompassed into Elementary action with each Elementary actions execution bounded into a particular logical time interval so of course when dealing with integration on an actual Hardware Target we have to consider we have to consider physical time and to be able to integrate our timing design we have to consider we have to consider in particular the physical execution time of the functional code embeded into each Elementary re action and for that the asterus uh D solution offer the concept of time budget which corresponds to the maximal amount of CPU processing time which is allocated to the execution of an elementary action at one time so of course it's up to the user to provide time budget value per Elementary action and that is because while the dynamic architecture is derived from high level timing requirements time budget values are highly dependent on the actual targets so for example the CPU frequency frequency but also on the integration if we are in single core or multicore and moreover uh time budget evaluation is clearly dependent on the user needs for example if you deal with critical application you have to be sure that the time budget is large enough for the corresponding Elementary action to complete their execution and that means considering considering an upper bound on the worst case execution time for the different Elementary actions so as you may well know because something has been talked a lot the recent in the last years the execution of a piece of software at R time may vary a lot that's due to different paths in the the code because of conventional branches for example but also due to the hardware the use of the hardware by the different tasks and in particular M cor some has talked about earlier contention or on shared hardware resources may cause additional overheads on the task execution and so the idea is to bound all possible execution times but of course without being too pessimistic to avoid wasting too much CPU BWI so to to integrate uh applications d Dynamic architecture onto a specific Hardware Target we said that asteros relies on robust time partitioning so from the timing design and the time budgets which have been provided by the user the asterus tool chain will automatically compute a static schedule of course if it is possible and uh we say we rely on on static scheduling because it offers strong guaranties on predictability and is so quite well suited for safety critical Unique Systems and the the the schedule computed by the as toolchain is referred to as rsf which stand for repetitive sequence of frame and indeed it corresponds to a cyclic scheduling of execution frames which uh with each frame corresponding to an allocation of C time to a single Elementary reaction and to comply with the dynamic architecture the asterio tool chain ensure that frames for an elementary reaction are scheduled such that such that the execution will take place into the interval defined by The Logical timing design and as we said earlier the Asos tool chain ensures that the scheduling is generated only if the platform has enough resources to execute fully all elementary re action within the specified boundaries and that mean that the correctness of the platform sizing is ured by computation so as St Shing offers good guarantees regarding Timeless and predictability and in particular with regard to IMC 2193 it provides some evidences to fulfill mCP software one objective and of course this holds only if time budgets are upper bounds on the worst case execution time of the corresponding Elementary actions so as you may have Wonder this live the question how time budget should be evaluated to provide such guarantee so to evaluate safe time budgets we have to consider both both software variable so the different possible execution paths in the user code but also the variability due to particular integration on the hardware so one possible way to deal with such a Time budget evaluation is to proceed incremented starting from a Time timing evaluation in single core and this allows to hand the different issues separately so single core timing analysis can focus on software variability and then based on those result we can conduct some characterization for hardware and software to assess potential interference sources as well as the impact the actual impact on the application and the the result of the different characterization can be used either to derive an additional margin on the single core time budget uh to account for time interference over it or to decide on some mitigation and both case we end up to achieve a multic Time budget so one again we play a video to illustrate uh how we could um how we could use rapit tools to characterize one interference Channel and then mitigate it using using asteros in this demo we'll show how asterus and rapidas tool can be used to provide evidences for AMC 2193 and CPU resource usage three and four objectives which po deal with inter channels those evidences can also be used as part of mCP software 1 and two objectives justification process in particular to determine WCS for the purpose of the demo we focus on a single interfront Channel related to the DDR shared memory it corresponds to competition for rord in the DD controller for the demo we'll consider once more the use case introduced at the beginning of this webinar for the purpose of the demonstration we focus here again solely on the flight control system application the FTS application is developed and integrated with other application on a single t142 Hardware platform thanks to asterus a multicore integration is considered for the FCS application on all four cores of the t142 on the t142 the four cores have access to the Shar DDR memory the DDR memory device stores data in rows when data is requested from a particular Row the DDR controller buffers it in a row buffer while the row is in the buffer many reads and whites may be performed on that row when the row has been finished with the buffer is needed for another transaction or a timeout has been reached it is written back into the DDR device thus it is likely that competition for Rob buffers may cause interference to deal with this potential interference Channel we'll proceed in two steps first will quantify the actual impact of this interference channel on our FCS application then we'll show how we can mitigate this interference first we conduct a software characterization to evaluate the impact of the DDR robu for inter channel on the software application indeed depending on the actual use of Hardware Resources by the different tasks interference could be negligible for some inter channels to conduct a software characterization for stus application the connection between rapidas tools and the asteros solution is required it support the execution of Rapid demons with asteros and allows RVs to derive asteros relevant timing results the tools automation work for for software characterization includes three stages creating a test oness running tests and collecting test results and finally analyzing results the first stage consists in creating a test oness that can run test to investigate interference effect on the multicore platform rapid demons help asset this interference effect through the application of a configurable degree of contention on specific shared hardware Resources by generating high traffic on them in our case we consider rapidement specifically tailored to Target the DDR Rob first controller RS instrument the multia software so that timing measurements as well as metric values from performance monitoring conts on the hardware can be collected during tests in the case of our asterus application the instrumentation is done for each task at the start and exit of every Elementary action as we are interested in deriving time budget estimates per Elemental reaction tests are then used to specify different rapid configuration as well as the behavior intended for the test in particular the metrics to be collected to quantify the impact of the DDR robber controller for a demo we consider a set of four test scenarios one isolation scenario where our FCS application is running alone on core zero and three scenarios where rapid imans are running in parallel with the FCS application on one two or three of the other Calles for each test scenario we'll collect execution time measurements as well as four additional metrics from performance monitoring counter related to DDR access finally rapid test turns input tests rapid IMS and instrumented code into a test harness that can be run on target to collect results the next stage in the workflow is to run the tests on target to collect measurements for the different metrics the different test scenarios can be run on target directly from within the RVs user interface for our demo the four scenarios are automatically run one after the other with results being collected for each one results for the different scenar SC arios are automatically merged by rapid time so they can be easily compared across scenarios after multicore tests have been Run results can be analyzed with RVs rapid time summarizes the different timing results and value from performance monitor encounters observed during the tests results from different tests can easily be compared to identify the impact of interference in multicore timing Behavior result for the demo show that the FCS application is very sensitive to Source contention on the DDR controller robers when compared to the isolation scenario where no rapid demon is running test run with a rapid targeting the DDR R buffers running on one two or three other the CES increase the FCS tasks execution time up to 600% for the AJ alarm task now that the DDR controller ruers has been identified as a major interference source for FCS application thanks to the software characterization step some mitigation should be considered the mitigation process includes three stages identifying mitigations implementing mitigations and finally verifying those mitigations the first stage consists in identifying possible mitigation for the specific Hardware Target and software application under consideration Hardware mechanism could be used to mitigate multicore interference for example on some Hardware platforms we could configure cache partitioning or even disable some Hardware device to remove the corresponding interference channels mitigation could also be achieved through modification in the software architecture for an aserious application logical time intervals could be redesigned to ensure that Elementary reaction from different tasks are exclusive with one another eventually mitigation could be achieved through specific configuration when integrating the software application on the hardware Target on the one hand spatial partitioning could be enforced by defining an appropriate data code placement in memory for the different software tasks on the other hand time partitioning at scheduling level could be used to achieve exclusion between some tasks execution and thus avoid concurrent access to some shared hardware resource for our demo some Hardware configuration could be possible but without completely mitigating the interference Channel moreover we do not want to modify the dynamic architecture of the FCS application in order to preserve functional determinism at integration level spatial partitioning would be considered for the demo as it would be more combersome to implement for shared memory architecture like the t142 so we'll go here with time partitioning once mitigation means have been identified they have to be put into practice for our demo we'll focus on mitigating multicore interference for the AG alarm task as it could be impacted a lot by such interference as observed during the software characterization step with asteros time partitioning at integration level can simply be achieved by specifying a list of exclusion groups Elementary action from different tasks but contain in the same exclusion group will be guaranteed to be exclusive with one another at one time as a result a common resource accessed by those Elementary actions in our case the DDR control Rob buffers can only be accessed by at most one Elementary action from the exclusion group at time one those exclusion groups have been defined the FCS application can simply be rebuilt without any additional activity required from the user the asteros tool chain enforces the specified exclusion groups by generating an appropriate static scheduling we can see that in the case of the FCS application using the previously defined exclusion groups the multicore schedule generated by AOS depicts a Time exclusion between the frames allocated to AG task and the frames allocated to the other tasks thus execution between those tasks will be disjointed finally the implemented mitigation means have to be verified to ensure that the considered interf Channel have actually been mitigated to do so we conduct another timing analysis but this time considering the actual multicore deployment targeted for FCS application so no rapid is considered in this case as we're interested here with timing results under the fin fin intended configuration from the timing measurements collected on Target Rapid task can be used to see the task level scheduling behavior of the application for a demo we can see that as enforced by the asterio generated static schedule the execution of the AG alarm task is strictly exclusive with the other task execution by comparison for an integration of the FCS application without exclusion groups a alarm task can be executed congruently with most of the other tasks to further evaluate and validate the benefits of the implemented mitigation execution times computed by rapid time from the On Target measurements can be compared with results from timing analysis conducted without any mitigation for our FCS application demo we see that the execution time for the AG alarm task in multicore can be significantly decreased thanks to the time partitioning mitigation in this demo we Illustrated how AMC 2193 mCP resource usage three and four objectives dealing with interone channels could be answered thanks to the combined use of asteros and rapidas tools uh very basically the idea was to do something quite similar to what Sam was presenting before the video you don't really see actually but the idea is we we use Rapid but this time so to try to exercise some interference chnnel this case only one because it's for demo and this time we play our actual um software actual application on one call just one call and we we see how the different execution times vary due to the different contenders so different rapid Dems running in parallel and that allow for example to discriminate between some case of because you could have one task that actually to not really use some part some some Hardware resource for example and so as is not impacted by by one interf channel at least and also how how the sort of the the the amount of interference each task can suffer and then depending on the the result we get we can decide whether to implement medications or not and regarding mitigations of course there's lot of different ways lot of different possible mitigation uh so you could for example deactivate some anware resource so in our case we we're dealing with the DDR memory it's not really feasible uh you could if you are using cash you do cash part you can do cash partitioning on that kind of stuff or you can try to do some configuration uh during the integration process for example a different memory mapping to avoid having accesses to same to same pages in memory or or like that and another another way is to uh to have temporal exclusion and for that asteros offer different mechanism for mitigating uh those time interference so the first one maybe the most obvious one if you have followed the rest of the presentation is that we can use the asteros programming model to design a dynamic architecture with explicit exclusion between some of the elementary reaction logical time intervals that's fine but of course this means modifying the the design for particular integration and we have to be cautious not to have an impact on on theism so to to keep the exact same data flows between the different Elementary actions and but there is also another possibility and that is to configure exclusion between Elementary reactions at integration level and in that particular case we rely on the as tool chain to enforce the time partitioning in the generated in generated static schedule and so the other just specify some configuration saying okay I don't want this particular Elementary reaction to be executed in parallel with those Elementary reaction because during the characterization part you've seen that all those for example all those Elementary reaction access in parall the same resource and to could have a huge impact on intern on one another and so you just specify those exclusions and then the the S to chain will try to compute a schedule where uh the execution slots uh cannot happen at the same time and such mechanism can be used to achieve time interference mitigation and so help answer imc2 93 mCP resource usage three objective and to further support uh this AMC objective asteros offers the Checker tool also to automatically verify the generated the generated platform sizing and in particular it verify that the generated generated static schedule excuse me enforces the time partitioning by scheduling the different frames appropriately in particular in accordance with the exclusion configured by the by the user so okay we've now come to to the end of our webinar and I hope we've showed you how rapid and as solution can be used together to help integrate and validate uh multical applications so just to summarize we have asteros on the one hand offering a separation of concerns between high level requirements and physical implementation on specific Target and we've seen that it allows to achieve determinism by Design as well as easy integration process for complex Hardware platforms and on the other hand we have repap tools and expertise to conduct timing analysis in multicore and in particular for time interference identification and characterization with asteros mechanism that can be used to mitigate them so the combined use of asteros and rapidas tools can could help provide evidences to support several of the amc21 and3 objectives and now I think it's time for the question session so back to you Andrea thank you gam um so to our live Q&A um thank you to G and Z for their presentations I'll try to get through as many questions as possible um and answer them and we will try and do some emails if we don't get through all the questions and get back to you on some of the questions asked um first question we have was from Javier and his question was with aerious you only run one application at the same time to avoid in interference channels is that right you're muted K excuse me I'll try again with the sound now actually depends we we can uh we we can can we can do different things actually we can have several applications and when then in parallel if we as we say we have good mitigation or mitigation means or we see that they do not for example exercise the same different channel so we are not I mean we're not too not compelled to do to do that but of course one mitigation mean could be to just say okay I have one mitig one application in a time slot and then another application in another slot and once again it's very dependent on what you want to achieve and how you want to achieve it but no just to summarize you you can run several application at the same time on different call with STS thank you gam um another question can you give some examples of the TOs for Rapid demons yeah so I I mentioned that we have a qualification kit for Rapid demons uh now we have both user TS and developer t for them um so the developer TR is go into a huge amount of detail um regarding things like we gave the example of a rapid demon that is um trying to cause evictions from the Rob offers and the DDR controller um that rapid probably has 15 or 20 developer TS um which are concerned with um such things as not only isn't accessing DDR memory but uh you know in terms of the virtual to physical address translations are these translations always hitting in the in the tlb um and I'm not going to go into a huge amount of detail but we'll it's a lot more complicated than just yes it misses some caches and it hits the um hits the DDI M um and in terms of user TRS they are typically a lot simpler and I concern mainly with um ensuring that the deployment of the rapid evens um is in a correct environment and some high level checks for yes is this is this valid is this correct thanks Sam um we have a question from antoan um temporal exclusion looks natural for peripheral access but what about DDR access as far as I know from an application POV point of view you can only statea piece of code tries to execute load store but you never know if it actually does due to Cache mechanisms dddr DDR internal buffers and all new CPU features then how could you precisely characterize DDR access of an application to better exclude hungry apps um I I can certainly answer part of this um and skam feel free to buzz in if there are any um any further nuances um but for DDR access actually this is really quite doable so as you mention um modern processors will have typically between two and four levels of cache um between the processors and the DDR memory and provided this cache is partitioned in a way that prevents one application from evicting data belongs to another we can make some argumentation that well if I execute a p software on one core and it's cashes are partitioned I can use my performance counters I can instrument it using rapid time work out what its resource usage is um and I can demonstrate well hey I'm only making you I might making 10 million memory accesses but most of those are being served by the cash hierarchy and I'm only making a thousand accesses that actually fall through to my DDR memory and provided my partitioning um of my memory hierarchy is robust enough um that would allow me to make very strong argument that this piece of software is not going to cause any significant accesses to the DDR memory um while another piece of software whose data set does not fit in the levels of cach for example and keeps missing cach and keeps going to main memory again we can use Rapid time to instrument that software for its resource usage and say hey this is something which can be quite aggressive on some of my DDR related interference channels um so certainly I think that answers your um question about if I can actually read it it's quite small on my screen um how do you precisely characterize DDR access of an application um in fact we didn't talk about this but the nxp t142 does have the capability not only to tell you hey there are these memory accesses it can also tell you about things like um row closures um due to refresh cycle row closures due to expiration and so on um in the DDR controller itself thanks Sam um question from Steve does the AER ious worst case execution time budgeting require that applications on other cause behav as intended that no applications on the on the cause go Rogue yes that's something I I didn't talk during the presentation but uh there is a mechanism so sort of error management system implemented by the the the Asos engine which ensure that um cannot exceed an elementary reaction cannot exceed its allocated time so it's so of course depending if you have at time one t one Elementary reaction which is has not been completed within either its execution execution frame or different execution frames if it could be preempted then depending on the the policy you you've implemented you can either just kill that the corresponding task or kill the whole application or kill part of it actually depends you can do different combinations but of course we we we guarantee that you you cannot exceed your the the time that that has been allocated in the schedule which is computed form time budgets and that is to ensure that at least the order the determinism for the rest of the systems uh will be preserved so I hope it's it answers your question thank you gam um more question from attorney how to demonstrate SE Authority all the unknown interferences are identified for example ddl controller without knowing the details of the ddl controller inside the design I hope that makes sense yes it does uh so that's a very good question and there are actually quite a few nuances to my answer uh so bear with me um so the the first thing I would say is that actually this is fundamentally not just a multicore problem uh with any commercial offthe shelf um processing Hardware um there is always the possibility that there exists some silicon bugs um there's always the possibility in fact the likelihood that the documentation is not 100% correct and accurate and to be honest even when

2025-02-20 00:46

Show Video

Other news

Salesforce to Buy Informatica, Apple’s Tariff Headwinds | Bloomberg Technology 5/27/2025 2025-05-29 12:47

What's new in Flutter 2025-05-27 08:21

Nvidia CEO Slams US Chip Rules, Trump’s AI Action Plan | Bloomberg Technology 2025-05-26 20:00