Improving app performance with the new Memory Profiler | Unite 2023
okay uh hello everyone welcome to my talk about memory profile 1.1 and how you can use it to improve performance of your obligation well a little bit about myself I am Anton krov senior software developer uh profile a team in unity my primary field of work uh all the lowlevel stuff in unity profiler and memory profiler and that's why like a advanc foring this talk would be quite a lot about like lowlevel stuff how memory works on the pr system was quite a lot of theory so uh no video no audio no funny stuff like just weing black text uh so let me tell you a bit about what the stock is going to cover I will start with overview a short introduction of what we will look at what the stock will cover what's not uh the next topic will be resident memory uh it's a new addition to memory profile at 1.1 uh I will show you why we added it and what it is useful for and the last will be about yeah sorry uh we'll analyze memory snop uh by looking at each category in it and we will talk about uh all of them uh in the context of new available information about Resident memory and the last one will be about unrack category I want to dig into what is unra what usually highest there and how you can analyze it so my goal by the end of this talk is is just for us together to understand what memory profile it shows you like understanding uh what are like allocations what's your in physical memory why new Resident memory was added and what it gives you and how all of that affects your application memory footprint and performance uh well question to the audience uh how many of you use memory profiler could you please rise your hand oh my God okay well I mean it makes us happy it means like you use our tools which is nice so let's begin so I will start with the hover viiew for this talk I've made a snapshot uh which I'm going to use uh through all my slides and it was taken from Android game uh and it well obviously because it was taken on Android game it has Android bits but but apart from like this like small section which is Android specific all the rest are like Universal and applies to any platform I'm not going to cover manage SL garbage collector garbage collected memory much as this topic is well covered uh I think we have quite a lot of YouTube videos tutorials documentation and blog posts uh about that so uh there will be a link at the end of the uh presentation but this talk will be more focused on like lowlevel stuff and uh they actually it's the talk is kind of like less about what you allocate directly and more about what is being allocated for you because Unity operation system and third party libraries could allocate and yes that's what we're going to focus on uh also it's about allocations which might look scary In Sum R viw but in reality might not be what is actually affecting your memory footprint or application performance uh we will talk about a bit about how operation system manage memory so that would be like theoretical part and why some vehic locations can be ignored or like not as scary as they look like and before we dig into the captured memory snapshot as I mentioned we need to spend some time on Theory and the theal would be mostly about like two particular terms uh through this talk I will be using term allocated and Resident uh we had kind of kind of like a keep uh a set of updates through member profile where we changed what terms we have used because we use committed and now we realized like like it kind of like not what we uh actually sh so the latest profile it used specifically two terms allocated and resident and I want to make sure that we're on the same page what the they actually means so in simple terms uh allocated just how much memory has been requested and when you requested operation system like makes a promise uh that it will be able to provide you that memory it does so by committing physical resources uh like system memory or space on dis platform for that reason because it operation system need to commit resources call this memory committed well specifically on Windows um and well if for some reason operation system doesn't have enough resources to commit uh you will get out of memory um here like as always we have multiple platforms while I have to make it not because it's not actually completely true on Linux because Linux allows you to over commit but we will get to that later and the next big thing uh is resident resident is well strictly speaking is what is actually currently present in your physical memory and not all allocated memory is actually resident and that could be for quite a lot of reasons and they are well the memory just could be empty so you or somebody allocated memory but never used it operation systems are quite clever so they just know how to Mark these kind of memory like regions as empty and they don't contribute to your application memory consumption because they are never present in actual physical memory well the second biggest for like most common reason uh it hasn't been accessed for a long time this is much more platform dependent uh because uh uh it can be moved to a secondary storage like swap file and this is commonly uh can be found on desktop platforms but as well it could be uh in compressed memory uh which is desktop and mobiles as well because on iOS you have compressed memory and uh the last most important reason would be uh if for some reason operation system needs physically physical memory right now for something else else and that something else is actually more important for operation system than your application um most likely if that happens you're probably already running low on memory uh because operation system needs to start Le like juggling what it can load and unload uh as well as like why is it important is that as soon like even before you get to that uh you actually start experiencing performance regressions because in modern operation system there is no such thing as free memory all the memory which is free mostly used for different caches and uh uh well caches uh and uh even if it's like not about caches uh you have mapped files which are being made resident because you access them and well they've been discard so after that they need to be reloaded again and any access to second storage is pretty slow I we going to cover that in more detail in when we get to executables and mapped category in our snapshot and the last one as you can see is dirty uh for the rest of the presentation we only use allocated and Resident terms but I think like because in some tools you will see these uh like term Ed so I think like it's important to cover it as well in short term in short Z just means modified so if uh it's like the part of memory that operation system can't easily discard which is currently like present in physical memory because it doesn't exist anywhere else well from the point of view of uh operation system uh not all Resident memory is dirty because well whenever for example you have map files operation system load uh part of that map file into memory but actually uh it can discard it at any time because the same data that is loaded into memory uh exist on disk file or on on disk so uh so it it just consider like a uh that whenever it's just dirty it just like the the the limit that operation system cannot go below at any case well unless it can be swapped into s file uh to summarize it all just consider like allocated as like what you have requested like if if you want like a short and simple summary of that and Resident is what is currently physically present in memory is kind of like very simplified but if we focus on this too much we're probably going to spend the entire day discussing it uh and before we jump into like looking into at actual snapshot uh I want to talk about about like what actually allocates memory apart from like your script uh so before memory profile 1.0 and unity 22.3 we mostly showed you only unity allocations and uh mono alpp virtual machine State like keep uh but for memory profile 1.0 and 1.1 and uh we just wanted to
reduce uh to the minimum of the amount of like uh memory like the difference between what you see in memory profiler and what you can see in any native or operation system like profiling tools I don't all like task manager activity monitor uh or any profiling tool well of your choice so that if you open like task measure you will see the same amount of memory roughly because there's kind of like different ways to measure uh what the total like what total memory is but uh they will be quite close so for that we need to show you all allocations not just what Unity makes directly or like you makes directly uh all allocations associated with your application and most importantly when we talk about all of memory well obviously you don't want just just learn how much because well it kind of like gives the information but it's not very actionable but as well you want to want you want to know who allocated it uh which allows you to uh act on it and talking about like allocation sources the first one I want to talk about is shared libraries uh your uh your application got relies on quite a lot of external shared libraries these shared libraries might intern rely on other libraries and so on all of them will contribute to your total allocated memory and obviously uh your applications memory footprint well just a simple example you on Windows use datactics uh uh which uses some kind of like Graphics drivers and drivers use system libraries and so on so all these libraries will be loaded all of them well you but will will be mapped uh into your application uh address space uh they will use some kind of like a amount of some amount of physical memory as well as they will going to do allocations to maintain their own States the next important one is like uh runtime environment it's not present on all like platforms but in some cases operation system might need a special virtual machine uh for your application to run most common example almost wellknown uh would be Android runtime uh on consoles we can kind of like think uh about uh SDK libraries that provide the operation system like environment as this kind of like runtime as well because they do locations as well uh another big Source could be drivers operation system drivers for different devices like video video network audio drivers might allocate indirect reaction to your actions like your requested texure and it allocates texure for you and does allocations for that or maybe just like uh to maintain it internal State like to communicate with a device for example and the last important Source like aart from like unity and like the previously known let's call them that is object that operation system needs to maintain its context uh it's like P Stacks heaps and so on all of these contribute to your total allocated memory uh of your application and obviously need to be accounted and my be quite significant it just we will talk about this in detail in an TR but I will just give you an example of my recent case that people ask me uh to investigate why we have like almost like 2 GB of un trct memory on Linux empty project uh well Unity by default on machines we like a uh creates almost like a 2.5 let's go like two and a half uh thread per core and on machine of like a with well lettuce like thread rer you can easily get to to a state of like two or 300 threads by default and uh unfortunately Linux requested request around like 8 meaby stack for each thread so you can multiply and you will get just uh almost 2 GB allocated just for maintaining frexs so this information is quite important and can uh kind of Rise a lot of questions what why so much memory is being used so let's move to the Practical part and let's take a look at uh actual memory snapshot U as we move to to the to that part as I mentioned I took Snapshot from Android game uh for the talk we assume that uh the game targets device with 2 GB of memory uh so they just to set kind of like uh environment and expectations and let's kind of like think that we open that snapshot and look at the summary page as you can see it in here and yeah as I mentioned like we agree the device require like device will have 2 GB of memory and immediately we see like in allocated memory distribution the total locate memory is surprisingly 2.2 GB and yeah as mentioned like we just agreed our device have just 2 gab so are we in trouble or not like it does look like our game is not going to run on our Target device and uh surprisingly like it's quite a small game but we can see that Android runtime and executables and map taking surprisingly a lot lot of memory like 60 to 70% and you might think like okay that's my most like that's like the trouble is but the question is if we take a look at the like section above uh memory usage on device and it doesn't look like uh where where we need so much space so just why is it like I mean it does say like that the amount of physical memory being used is just uh well around 500 megabytes which is well within the limit that we set for ourselves so the question now is just like are we in trouble or not because well it does look like a bit confusing so let's take a look at all the categories and uh well see what's in there to understand uh are we actually in trouble or not so we start with Android or like like from the biggest one Android runtime and Android runtime is just like a virtual machine that dri uh which runs Java B code on Android platforms it's kind of like similar to Unity like mono or lcpp virtual machine it manages heaps have garbage collector compiles B code in native code and so on all Android applications well apart from system one probably all Android like typical Android operators like uh share uh Android R time H and uh because applications use it we need to show it in memory profiler just to see to to show you the full picture because well applications might be different some might be quite clean on on the Java part but some uh can create objects in Java cotlin allocate in the h in the Java hip uh or use third party sdks and toolkits which do have quite significant Java part uh unfortunately when if you if you read about Andro Tri time you actually don't have any control over its size uh it's being defined by your phone vendor and the size being set at initialization of operation system so it kind of like allocates at the at your phone start and doesn't change after so does it mean that Android runtime contribute a lot to your memory consumption but if you look at the resident memory size it just 4 megabyte so if only like just tiny fraction of it uh why do we even include this well as I mentioned all applications are different like first of all like all applications are different and you might have a bigger chunks Chunk in here and then in this case you will need to analyze uh the second one uh is that well we just agreed we're going to show you all allocations which are associated with your application and Android for time is definitely associated in your application because it's being mapped in your applications address space uh so if it's like that uh does it mean like we don't need to analyze it uh the answer is no but it depends so in this particular case you need to focus on Resident size rather than allocated size because allocated size doesn't give you any information uh and in our application case in our applications case is Just 4 megabyte and probably we won't be able to reduce it any further and compare it to the rest of 400 it does seems like a pretty small size so it does seems like it's not a problem for my application uh well if it's not a problem let's take a look at the next big category executables and mapped uh we already had executables and DLS uh in previous versions of memory profile however well you need to understand the difference because uh before unity 22.3 and uh memory profile 1.0 uh it actually was kind of fake because uh previously we it showed you just a collection of binary files in your build folder which obviously doesn't map to what your memory consumption is uh and uh we upgraded it so starting from Unity 20 2.3 it actually shows you uh in the same
way as operation system sees it so all the now it shows you all executable files FES all shared libraries all MapIt files that contribute to your applications memory footprint uh as seen by operation system so you can see them all listed in here uh and for example yeah if we look at it yeah at the top it will be uh open jail Library some Frameworks uh and uh ex the application executable when you Analyze This section because this is like when we talked about doy uh this is the section that uh kind of like affected by this the most uh because if these files are just being loaded we don't modify them so the the Opera system always have like a copy of it so because whenever you like map a file a person just makes like the the the the part which is being accessed it being brought into physical memory and because it's not dirty and it's not modified obviously because that uh the data on memory with thata in memory can be discard at any moment uh and well that means that despite the fact that all of this contributes to resident memory it actually might not contribute to your out of me out of memory issue directly but it will contribute to your performance uh because as soon as you get to the point uh when you are close to uh well maximum memory footprint that your operation system sets uh operation will start to discard stuff but if you keep on accessing the file it means like it will need to reload them and discard something else so you will start experiencing performance regressions well before you hit the out of memory issue and you need to look for it uh well for Simplicity we haven't include a dirty into memory profiler because I think it just adds too much but in most of the cases resident is always always bigger than dirty and much better indicator of your application's Healthy memory needs so this again means that Resident size especially in particular in this case is more important for analysis well as well is for Android TR time uh compared to allocated uh as it's the part that operation system uh consider important to keep in memory so uh yeah if if we look at the again like uh at the list for our application it doesn't look like anything in particular is like super large and uh well probably you can get rid of open jail if you need open gel so um yeah I I mean I did like a an extra profiling just thinking about like what can be reduced but and and here most of the time apart from looking for SDK to like that you kind of like haven't expected to see or two large uh Unity binary which usually a sign that uh you probably abuse generics without CVP uh these kind of things uh which uh are particular of interest in this category but generally just assume uh if you see too much resident memory uh in this particular C category uh it might need additional investigation so so the next big one is Graphics uh well if previous categories kind of like felt confusing because well what you need to look at like some of them are pre-allocated for you for by person so this one is actually the most confusing so stay focused first of all just remember memory profiler shows you only uh system memory information so it never shows you what's on your GPU which is kind of like well at this point you might feel like well it's strange because you have Graphics what about that uh and that's the thing graphic isn't about GPU local memory uh it's about what's what's actually in your system memory and it's not about this INF famous read WR enabled flag so just again a bit of the theory uh why is that uh there are two common um memory architecture when it comes to platforms uh we work on uh the first one is em uh it's unified memory access so all Graphics allocations are in system memory so GPU shares memory with CPU uh and it's called Graphics stolen memory when GPU just steals memory from CPU and the second one obvious like Nona when you have dedic usually dedicated graphics card which have dedicated uh memory uh so I mean it's kind of like obvious with EMA because we allocate from system memory GPU shares memory with CPU so and it's all in syst memory no questions so why then uh it we show it for non platforms as well the thing is is is when we have GPU with dedicated memory uh usually on desktops uh operation system needs to share that local memory uh with between multiple applications and uh the way it's designed uh so that at any given moment um operation system like like reserves right to evict uh any resources out of GPU at any given moment U because it needs it for another application and it's Driver's Responsibility to make sure that all the resources that your application needs will be maintained in a valid State and what drivers do uh they allocate memory for any resources uh that you use which is being called like local bacon store uh if you profile on Windows I don't remember how it's called on on micros but uh the fact is that drivers maintain kind of like Shadow copy of your resources for cases when it needs to like it's being asked to evict all resources application uh associated with your application so uh just keep it in mind that even if it's not uh your application uh will receive these these allocations and they will be associated with your application and come towards your total allocated memory uh and another kind of like nail in this is just it's all kind of like hidden so it's somewhere well within kernel and we can see these allocations so if we can see these allocations well we can track them why and how we are showing them we actually kind of like fake them and that's why we now try just to be absolutely clear we write estimated so we take information about uh resources we know has been allocated likeit tecture weights uh height uh bits per pixel any any additional information about format and we estimate how much uh they should be like how much we think they will be uh taken me how much memory they will be taken so that alog together allows us to estimate how much your how much memory has been allocated by uh Graphics resources and just to make sure that we uh not creating any additional memory whenever we do estimation we substract it from anract and that's why anract have this asteris uh next to it just to indicate that we just made some magical compensation just to be able to show you this Graphics memory uh and because we estimated we don't have any information about whereabouts of these resources in memory and because we don't have information about their whereabouts we can say how much of them are actually resident so if you switch to resident view in memory profiler uh to be absolutely truthful with you we will just show you na just to indicate we don't know how much uh of that memory is actually resident and all of that memory will be counted towards untracked because we kind of like operation system reports you that this memory has been allocated how much memory uh is actually resident but we don't know who actually allocated it so it it doesn't spoil like total located but unfortunately makes it like less informative on the positive side uh with latest Unity we spend quite a lot of time to improve estimation cost so now it uh takes into account well I think like everything U well 3D texture MSA memory list render targets compress structures uh all of them now needs to be I seee what I hope um estimated correctly um so even though there still estimates uh this information is quite useful uh if you need to estimate how much memory is being consumed by Graphics uh resource the next one is native and Native is so much easier compared to all the previous ones so it's actually everything that Unity allocates uh for you outside of uh well not manage here but uh inside of it uh all Unity allocations because we have like uh memory manager would uh perform uh all allocation management inside of unity uh all allocations are labeled and pulled uh so whenever you do allocation we kind of like allocate inside pool which if we don't have enough space inside pool we allocate another chunk and allocate inside of that chunk um that's why you can see reserved because well we allocate Chunk we allocate inside of this chunk so some of that space despite the fact that operation system thinks that it has been allocated is not being used by anything so it's been reserved and will be used the next time you allocate something uh and as well like probably you've seen Resort previously on summary view we moved it in here because in Breakdown view now uh because now you can uh get into Resort and see which allocations uh which allocators in particular uh well are been not properly used I would say and I have to mention that whenever you allocate this particular session probably as well as like resident which is directly information about like your memory footprint allocated is as important because all of these objects are quite oftenly used and allocated is like a very good indicator uh of how much resident memory U applications need to function properly so even if though it's like not 100% resident expected might be at any point and talking about our Android game well the only part like when I analyze it which looks suspicious was reserved in particular because well it's quite large to be honest so uh as I mentioned well now you can just uh expand it to see what's in there and as you can see the biggest offender uh a lock profiler you won't see it in actual release Game it just means that profiler need needs to do allocations to function and that's what caused uh this particular allocator to expand uh the second biggest is AOG default this particular probably a bit more Troublesome because uh usually well you you allocated something that cause uh Unity memory pool to expand and then you released it theoretically speaking If You released everything in the CH chunk will be released so it's not like monoh heip which never contract but if you mix together like long-term and short-term allocations it means like the chances of your memory fragmentations are quite high so if you have like a 16 megabyte chunk and have like 8 byte allocation inside of it uh this chunk will still resides like you can say like in memory but being marked as allocated and uh it probably won't affect your uh resident memory size as much but as like I mentioned that anything in any particular time I become resident as you can see quite significant part of Resort uh is actually resident as well it just the indication that it was used just a moment uh before uh we actually we took the snapshot uh we kind of provide information how to customize this so it's not like completely unactionable but unfortunately probably like you need to have deep understanding about how Unity memory manager Works to do it properly well you can try like we have documentation P page uh about memory allocator customization uh which allows you to do that and there would be a link at the end but well you can do that but to be honest I discourage you from doing that because probably you will you will do you might do it like make it worse uh so the next category is managed and that's well known and we talked about it a lot the M ceria shows you all allocations related to Mono or ltpp virtual machine uh I won't go much deeper into the detail as I mentioned at the beginning uh I think it's uh pretty well described so just let's go briefly about like what you can see in here and the first one is virtual machine and it's any allocations which mono lcvp makes to maintain its internal State uh Resort in this particular case slightly different from the previous it just um manage hip have sections which are not being used so anything not being used by manage objects uh but um inside like manage Cape goes into Reser and while manage objects it just manage objects like the thing you allocate in C or we allocate for you uh there's interesting beat about this section uh as you can see well it's not big but it's 100% resident uh why is that uh it's the way boam garbage collector Works uh what it does it kind of like scans your memory looking for pointers of objects to make sure that it can Mark objects as uh alive so it means like it actually touches everything all the time and uh make sure that all of your manage Cape is actually present in physical memory so keep this in mind and actually that's why like maybe like just targeting managed uh Hep is just manage Hep uh is not such a bad strategy because uh if you like scripts heavy which many games are uh this usually is the biggest offender well not in our case but still and if you're interested just in manage objects I would recommend you to use use Unity objects VI uh because it allows you to just laser focus on just manage objects and it does provide you a bit of extra functionality uh to quickly find like duplicates or just look at the view differently and the last category I want to talk about is anract uh let's talk about like what are the most common sources of an trct so why why we get this the first one is well as I mentioned at the beginning thread Stacks in our particular case it's not that large because well Android is not as generous as just auntu probably so in our case just 8 megabytes which is pretty healthy but as I mentioned at the beginning uh different platforms does things differently some sometimes you can see a lot in on Tru just because it falls into uh thread allocation uh the next is like I want to mention uh third party plugins and third party libraries allocation used by like third party libraries used by unity which do allocations uh not all libraries provide us uh functionality to overlight override allocator and if we can't unfortunately allocations go into trctor as well but if you use third party plugins we now have API which led third party plugins to use Unity memory manager to kind of let us truck the location but not too many plugins actually use it but so uh most often if it just like all these plugin use like normal allocation functions like for C C++ like malog new uh you will see in here like it's called Hep or an on lipy malog it depend on like uh how it was managed through uh lipy or system hips um but generally speaking uh yeah that that's usually uh the way they look and well as we talked about Graphics I mentioned that like Graphics allocations can be tracked or well whenever you have uh uh resident memory view so make sure that uh that like you understand that in when know you switch to just resident side or like whenever you allocate it in Resident or resident uh all your graphics resources will be shown as Na and we'll go into somewhere in here most commonly private sometimes you have well for example uh Dev Molly in this case that's uh Molly TPU because we're on Android on iOS uh you will see iio accelerator because in here we show you uh different tags uh and these tags are coming from operation system so some operation system provides you functionality to uh kind of name uh allocations and all like associate them with certain subsystem uh it unfortunately like not very actionable but provide you at least some information about uh what these memory uh like who could have allocated this memory so it's like Last Resort so we we haven't found anything in like monip Unity allocations or exec so let's let's show you at least something maybe you because you have much more context about your game so you might guess what these allocations are so we talked a lot about like quite significant time about antr and uh well unfortunately like if all these taxs doesn't help you much uh what you can do uh so the only the only option in you have in this case is just native platform tools I don't want to like go deep into like uh how they work inside I just want to list them just in case if you decide to use them uh so on Windows it's Windows uh performance recorder Windows performance analyzer uh it's slow and very not very user friendly but allows you to profile user and colonal me memory allocations so you would see Graphics allocations as they happen and it's totally free on Linux Intel vune and I was mentioned uh during the previous talk that do Trace actually allows you to profile on vun and should be working with unity as well but everything else I try it is actually crashes or doesn't work with un at all uh on micos iOS obviously uh it's called instruments uh nice instrument and work out of the box on Android you have Android Studio unfortunately you need to have rooted phone to get it worked properly so it might work but most most of the time does unless you have rooted phone so let me jump into summary because we're quite close to the time limit first of all resent memory is the key use the new Resident memory column uh for memory footprint analysis and uh it's your source of truth about what is in physical memory it also helps you predict out of memory issue because you're in trouble if you're close to 60% of your target device physical memory it might vary between different platforms uh but General role is that if you close to that limit uh first of all you will start experiencing performance issues because uh AER system will start to actively manage what is in physical memory and secondly some platforms will just have hard limit about like how much memory you can use my the context as I mentioned like each category needs to be looked at so there is no like a silver bullet uh different categories require different approaches sometimes you need to pay attention to resident only and ignore allocated like entirely sometimes both are kind of like important so remember uh the moment you touch any object it becomes resident and with so much focus on Resident uh you might wonder why we keep allocated at all so for example if you it kind of kind of works as mental if you allocated 1 gab of memory and uh you want to see this one megab in tool of memory profiling to just to understand that this is exactly the object you has alloc you have allocated uh and that allows you to kind of like build the mum verage oh this is the object I allocated and that's how much actually memory is being taken in physical memory and obviously resident memory doesn't work with graphics allocations and if we want to show Graphics allocations we need to keep allocated that as well and use memory profile 1.1 it provides more detailed and correct information comp compared to all the versions you can use it even with all the snapshots uh so it doesn't require you specifically uh latest Unity it just gives you much more information with latest Unity but you can use better and updated UI even if you uh on Unity like 2019 something and well as I mentioned use Unity object view if uh the only thing you're interested in is memory is manage K uh because in this case it's much simpler much more focused and uh it will help you with simple cases this is these are the links I mentioned uh well 1.1 package allocator customization uh my blog post about memory footprint and uh my colleague post about uh Unity objects View and how to inspect object memory thank you very much for uh you know being present here and yeah do we have time for [Applause] questions
2024-02-07 06:17