Highly Technical Talk with Hanselman and Toub BRK194

Show video

[Music] W how are you sir I'm doing very well how are you I am very well are you excited for a highly technical talk I'm so shocked that they let us put that in the title but I'm feeling pretty excited I am too I don't know what it means but I'm very excited for it uh I wasn't sure what that the quotes impli that it's not technical at all highly technical highly technical so I am Scott Hanselman and this is the incomparable Steven Tobe uh who has been doing deepnet big hand for Stephen Stephen called a couple of months ago and said I've just got so much. net I just want to get it out to the the people what can I do and I said call me and I'll push record and we have so far we've done five fantastic episodes of de.net BC right more to come and we're going to start to include folks from all over theet ecosystem and they're going to be sharing their knowledge as well but this is we're going to consider this a live taping absolutely of de.net and we'll end up putting this video up on this playlist as well and it'll be of course available on demand at uh build. microsoft.com cool so let's go into it demos yeah right I'm I'm sad you were zero minutes to demos yesterday know this was like 30 seconds to demo it's unacceptable it is unacceptable um so um a bunch of the comments on the the deep. net were about

things like you know specific topics but then also how do you do what you do right how do you um find these performance optimizations how do you figure out what you want to talk about and so what I thought we would do here today is actually improve something right so on the way here uh on the plane on the way here just before I got on the plane I I I oh this is moving um I was exploring uh net Looking for really cool projects that we could look at and I grabbed a humanizer humanizer is uh a project it's super popular it's used all over the place it's had like 50 million downloads from nuget it's used by Roslin um we can even see what it what it does with rosin if I have a a class called uh person for example and then uh I say uh list of p uh of person you can see it suggests people yeah right in the that's that's that's humanizer it's basically oh the the picking of the word people the picking of the word people it's not person that co-pilot but people was rosin and that's humanizer and humanizer is is CLA nne it is yep and she did that so big hand and big shout out to CLA for letting us use her Library absolutely 50 million downloads it's huge super impactful but the key thing is you know across the ecosystem no matter what library you're talking about there's always room for improvement including my own I you know I wrote a a fairly lengthy article each year on the improvements to my own code right uh everyone has opportunity for improvement so um I have not changed the code in human at all yet the only thing I've done to make my life a little bit easier is it it multitargets all the things net 8 Net 7 net 6 net standard whatever right now I just changed that to just be net8 so I don't have to worry about okay so for now we're going to optimize net eight for the purposes of a 45 minute talk exactly so there's a lot of varied functionality across uh across humanizer let's just take a look at a a very simple thing that it does one of the things that it does I can just go up to my console. right line here um it's got a function called uh trunk um and I can say truncator for example fixed length and then I could say truncate and I give it a string like hello build hello build this is not my keyboard that I'm used to uh how are you uh and you give it a length that you want to truncate it to like eight and then what you wanted to truncate with uh and if I run that contr F5 where's F5 um it builds and you see it truncated it to just the eight characters from what I was and that's stuff that we do all the time we're always and then you got to go and write index of and you know from the left from the right it's very human API so anytime you're looking to make something faster you first need to understand where you are right so let me just take let me zoom out just a little bit I'm going to take this call here and we're going to use uh test and paste in contrl V the the same code that I I had now to make this work with benchmark.us with uh type of wow this keyboard is hard you want to switch I do yeah I'm gonna go like this and pretend it's me it's perfect uh from assembly type of uh tests assembly. run arcs and we can run

that actually before I do that I'm going to add a couple attributes to my test class one of them because we have limited time here uh I'm going to tell it to only do it more quickly than it would normally do and I'm also going to ask for some memory information things like amount of allocation gc's and so and we talked about Benchmark but net before you can see that there's no public static void main there it's all implied by that run and it's pulling it it's test it's a test framework for performance and it's testing it on that current type and it grabs the assembly name from that type exactly and you can see what it did here it built a a simple application that included my test and it's running it and we get results really fast saying that uh each invocation took about 15 nond and allocated 96 bytes Nan n not milliseconds yeah okay billionth of a second billionth millionth billionth billionth a lot yeah it's a lot I think am I math right I think so billionth uh so not a Pico second that would be more than that yeah yeah um so we're going to uh you know it's it took 159 A seconds that's pretty fast but we can do better the interesting thing for me though was the 96 bytes because the allocation yeah because this was h l o dot dot dot so eight characters two bytes a piece in net this should be 16 bytes plus a little overhead for the for the string object itself 96 is more than so what I want to do is profile this to understand where where is that coming from uh so I'm just going to uh write a little Loop here we'll say I is zero I is less than 10,000 uh i++ uh I'm just going to call make this easier for myself later uh we'll just say tests T equals new and then here I'll do t. test and so what I'm doing here is I'm just setting up something for the profiler to profile um and I'm doing it 10,000 times because I want to make sure that there's a fair amount of allocation that happens at net application startup and I just kind of want to be able to separate the signal from the noise okay uh so now we'll profile this I'll go to debug performance profiler uh there's a whole bunch of profilers here that are hard to see there's uh CP usage profiler there's file information file um IO profiler there's an allocation profiler that's the one I'm going to choose and all of this ships now and has shipped for years and you should think about this as the audience what percentage of you use the debugger every day and how many times you use the debugger every week and how often have you spent real time in the profiler right and so the interesting thing we notice now because I use 10,000 basically we're going to divide by 10,000 to figure out what's happening in each of these and you can see that there are these two allocations here or or two lines here so we're allocating 20,000 strings and 10,000 of these fixed length truncator which is a little strange that's three objects per call when I was sort of expecting just one for the string okay all right so let's figure out why um now if I double click on the uh the the allocation I care about here you can see the back trace for where it came from over here we can see it's coming from my test function now I didn't actually have a call to uh to that in my uh to that allocation in my test function what happened was something got inlined into my test function uh and so if I go to see what that was I'm going to go to the fixed length property you see see any issues yeah yeah every time I'm accessing this property it's allocating a new object now there might be a reason for that it might be that this object is state and we don't want to give back things that for multiple threads might be mutated in some way but if I go and I dive into this object I can see that there's no State it's just a function right so that static class is becoming effectively a factory class without maybe intending to exactly my guess is whoever wrote this intended for it to be the equivalent of uh semicolon that the equivalent of that right and if we were profile again just by that one change we've now cached this imut this effectively immutable stateless object so that we don't pay for that each time okay and then this new syntax here though is this is fairly new C yeah it's a few years old basically it's it's the C compiler it's called an auto property the C compiler is creating a backing field a readon field of that type in this case a static readon field of that type it's initializing it to this object and then this fix length property is just returning that field time you very nice way clean way to get a backing field without having to see it and use up a bunch of lines exactly so now the other two were coming from this fixed length truncator and we certainly expected this to allocate a string because I was asking for it to create something new like the signature returns string um but why was it producing two well we can see the default here for truncate from is truncate from right I could have changed that but if we scroll down to the code that handles truncate from right you can kind of get a sense for where these two allocations are coming from this plus here is exact equivalent in fact the C compiler will translate it into this exactly equivalent to have having written instead string doat right because the plus is an operator and the operator it's an operator overload it is a syntactic sugar for string. concat exactly and so the C compiler lowers it into the equivalent of this and now we can see where those two allocations were coming from We're creating a new substring and we're concatenating this now we need the concatenation because this function needs to return uh a string but this substring is pure overhead now when this I looked to see when this code was written this was written 10 years ago and 10 years ago that's as good as you could do today we can just say as spam and this is a really really important Point these are smart people they're great Sports you know whether it be Claire or Simon and all the folks that worked on humanizer it was as good as it was it was as good as it could get 10 years ago that's right span didn't exist in this way exactly and things like as span now are available to us and everybody wins exactly so now we can go back to our program here I'm going to comment this back out I'm going to uncomment our Benchmark switcher we'll run this again do you remember what the numbers were I think they were 15 NS and 96 bytes does that sound about right um but now that we've made these changes we'll run it again and what we should see as the numbers start spewing out is that our changes have uh measurably improved both throughput and an allocation and in fact when this finishes in just 3 2 one now um you can see that we're down from 15 NS to 8 NS and from 96 bytes to effectively cutting both allocation and time in half exactly and this is a really interesting point because when people think of performance um optimization for net um they you know people think about oh I'm optimizing throughput why aren't you starting with the CPU usage pile and focusing on that but in my experience focusing on allocations first accr to both and there's a variety of reasons for that why why I start with the um the first is allocation has some cost right um now for small objects it's a small cost often the GC is able to just sort of bump a pointer but it still needs to zero out the memory if it hasn't been zeroed out and for a larger object like an array it might have to zero out a lot of space there's also for whatever I'm constructing there's a Constructor to invoke so there's arbitrary code that's sort of a side effect of allocating I'm also calling a Constructor that can take a lot of time depending on on what it is the um the other sort of two interesting reasons though are one is anywhere you're allocating there's often interesting things happening and so when you start focusing on the allocation first you often end up seeing things around what you're looking at that you can also focus on as well and that ends up taking out Cod and the the fourth which is the the the least technical but I think the most interesting is at least for me a performance op optimization is addictive I mean it is super addictive you get a little taste of it and you want more uh and looking at these allocations like you can you saw we were able to just like that pick off a few yeah and now I want more but do you need more I mean eight billions of a second well maybe not for this function but there's tons of other files that we can go look at I can quit anytime I want man exactly so so let's do that let's let's look at another one um so there's a lot of other functionality in in um humanizer let's look at another one there's a function called two uh a type called two with an APL called transform so you can give it some text um and then you can say I want to I want to transform this in a certain way so let's say hello build how are you uh and the transform that I want to produce is uh to use sentence casing for example it's got lots of different kinds of casing sentence casing basically is make sure the the uh phrase that you're giving it starts with an uppercase character okay uh there's title casing and there's lower casing and upper casing and so on uh so let's let's run this through Benchmark and is it necessary to do things 10,000 times like and are you how many people are calling that 10,000 times most of us aren't calling that more than once or twice well you're probably only calling it in your you know you're rendering your web page maybe a few times a handful of times but how many times is your web page being hit right it adds up and there's your 10,000 and there's your 10,000 probably way way more than that so this particular operation took uh 28 NCS but this this is a pretty large number for a sentence for a sentence especially for a sentence that um was already sentence case right it already begins with an uppercase letter um that's interesting right oh yeah I mean you could uh just go no don't even need to be here right exactly you just bail immediately so let's let's run another profile of this just to see exactly what's going on we can kind of guess but we want to actually uh see so let's go to Performance profiler we're on the exact same thing we just did quickly finishes you know 10,000 alloc doesn't take very long we go to our allocation view allocations we zoom in oh wow yeah right what is going on so there's a few interesting things here so now we have four objects being allocated when again I sort of expected there to be one or none right we have two strings we have an array being created for some reason and we have an enumerator object this SZ generic array enumerator is the type you get back when you call get a numerator on an array on a on a a t you know T array um so let's let's drill into these let's focus first on um uh on the strings so is that two almost two Megs of strings uh yeah for these 10,000 yeah uh so I'm going to go to uh double click on this see where they're coming from zoom in over here and we can see they're coming from two cases so one of them is coming from this uh two sentence case transform and the other one is coming from string.

concap so let's let's focus on this first one I'm just going to uh right click on it and go to go to source oh that's not the one I wanted to go to Source on I wanted to go to Source on one uh one line below that where string cat was actually just to let people know what you just did there because you said go to source file you actually dropped into the source file for theet for my code for code you wrote it work exactly right and then that moment right there that s that symbol loading you know you've got symbol source and every Source server all set up here it was just going straight to the code for the net runtime nice because I asked the two so um you know the thing we we we observed is I was passing in a string that was already sentence case for strer equ uh concatenating a new Str with another substring even though it was already what it needed to be so let's just special case this this is another form of of optimization right you find something that's super common and you special case it so I can say if Char is upper input Sub Zero bail bail just return the original string right um we'll deal with the rest of that later uh so that that takes care of one of them but if I run this uh the profiler again now well actually I'm not going to run it again we'll just we'll just address everything in one go uh let's go to the uh this other interesting case over here um I'm calling this uh transform function so let's see if I go to the source for that uh source file has changed oh because I just Chang you just changed it yeah let's back up one then we'll go to um we'll go to well two sentence case is a little more sophisticated than just adding the first making the first character you know have an ex have a capital letter right I mean it's looking at well in this particular case no it's it's not okay just curious yeah uh in this particular case it was just up if we can go to the implementation of it it was just upper casing the first letter and concatenating it with the substring for the okay um so let me go to what this this transform function is yeah I would think with a span you could just go and flick that first one to uppercase and get out you would think so you would think yeah let's let's see where we end up here um so this is interesting uh a couple things that from a a performance minor person jumps out at me here first pams right so with pams when I call this I can call this as I did out here with just passing in whatever transforms I wanted to apply and at the call site though it's you know if I look at what function this is calling it's calling this pams array so it has to allocate an array that's where that array right and then index into it multiple times exactly uh but that indexing itself I'm not indexing directly I'm got this call to Links aggregate method oh which needs to enumerate all of the Transformers that are here that's where that enumer and in this case there's just one in this case there's just one right so there's a variety of ways that I I can choose to fix this um the way that I'm going to choose to address it I'm going to add a new overload and this is a good case where you know I don't own this open source project so if I wanted to contribute these fixes which I plan to do you know when in the comfort of my hotel room later um I would need to confer with the maintainers of the reput aside are you okay with me doing this these kinds of changes I think it's cool but let's talk talk about it um now uh instead of doing params array there's this really cool feature in C 13 which I'm using here that's the one more change I made I told it to use Lang version preview um I can actually change this from an array to a readon span of I String Transformer and to be clear C 13 not out yet not out yet but you can grab the bits today and and play with it you can play with it that's exactly what I'm doing um and if I go back to the the call site we can see now instead of talk calling the pams array version it's calling the Pam span version and this is incredibly efficient from the call site what this is going to do it's going to store this into a local variable and it's going to end up being like I said new readon Span in that variable so it's just creating a span that to refer to that exact memory location and passing that in zero allocation now I go back to my code over here though and I have a squiggle right because I'm trying to use Link so uh I do need to change this code to uh to use Link so it needs to aggregate over that read only span a different way but that is a super cheap thing to do and a super easy thing to write so with arguably less complication than using that aggregate overload where you have to know what the delegate is doing and how it's being called I can just do what co-pilot told me to do which is for each Transformer and Transformers pass in the input I currently have store it back into and then just return that and the squiggle is gone and the squiggle is gone nice now it is worth noting though that one of the things that humanizer is really well known for that you mentioned at the very beginning is builds for every version of net backwards and forwards up and down sideways and you've gone and customized this to for the purposes of our talk just Target 8 and you've turned on langing preview if you were going to work with the team to add this Improvement you'd have to ask yourself well which versions of net support thing that I want and how would I want to write it differently if I was targeting a previe and then do you ifdef your way around that in some cases this one wouldn't actually require it this one will well it it would require it for this one wouldn't require it this one would just work but yes in other cases that we'll see we would want if def you can use the new version ofet and Target the new version of c and Target the older version ofet yep good so you get all the benefits of the syntactic sugar of the language and all the goodness of the new C 13 and you can still maintain compatibility lovely uh so let's let's run our Benchmark again uh now we're still using the sentence that is already sentence cased uh but in that case we should see a dramatically different answer now before it was I it was about 30 NS and about 2 16 bytes of allocation and you can already see in the numbers that are coming back now uh that when this is going to be about 3 or 4 nond there we go uh three NCS and zero allocation so for that case where it was already sentence case thank you yeah but wait there's more remember the the rule of scale is if you can do nothing you can do it infinitely yeah a lot of performance optimization is finding things that you don't need to do and not doing it anymore um but we can still fix the case where we we were doing something and we can do it exactly as we uh we saw previously um if I go to defition Dr to it hurts when I do that don't do that stop doing that yeah so you know again this is the exact same case we saw before and we can apply the exact same solution so I can say string. concat I can do comma uh come over here I'm going to make this as span instead of substring and I'm not making this any more complicated now you notice I'm getting a swiggle and that's because this two upper is actually producing a Char uh not a string and there's no string concat overload that takes uh what was happening previously was the the Char was um there's an optimization that came into the C compiler like two months ago or something where uh the compiler was actually recognizing that it could create a span uh directly around uh that character and use it directly um so you got a Char in a string here yeah so what I'm going to do I'm going to do here is I'm just going to create a do it myself do that optimization myself and hopefully in the future the compiler will be able to do this as well okay uh and then I'm just going to pass in a span myself so I'll say new uh new readon span read only span of char passing in that c and for the people who are not VAR fans if you hover over VAR what does a compiler think that VAR is right there oops that's not what I want to do no no what are you doing uh it thinks it is a Char oh it think it's a Char because you yeah there you go it it is yeah it is identifying that as a Char ex checking yep and so this will also then handle the case which we don't need to run because we already saw it doing amazing stuff but we've now eliminated all the allocations except for the one that this this was returning ban freaking test right and now so we're we're we're 20 minutes in and we've already made a whole bunch of things yeah better and we're also giving a framework for people to look at their string heavy applications whether they be ones that use humanizer or ones in the similar space how how do you decide when to just stop because you said it's addictive y you're making huge improvements here but you really the profiler should be used to look at the hot path of your entire application exactly now I'm I'm cheating here a little bit because normally you don't just do what I'm doing which is pick things to make faster without any context right you're using some application that you're profiling or some user scenario you have a reason you have a reason and then you're profiling that and and fixing the things within there um I don't have that here so I'm making do without it uh where do you stop well what are your scenarios what's good enough for your scenarios if a performance optimization doesn't make the code any worse and doesn't make any scenario worse then sure if you can prove that it's better and it's it's a pure goodness in general though the optimizations where they only goodness their pure winds are relatively rare a lot of optimizations involve trade-offs where for example you make something way faster in exchange for making something else a little slower and then you need to evaluate you the hope is that the thing that got a little slower is very rare and the thing that got faster is very common or it got so much faster that it's going to you know counteract any downsides but we make these trade-offs all the time when we valuate them all the time so you're not only a plumber you and the team are designing pipes new kinds of pipes while other people in the audience might be actually making houses that use your pipes are they expected to know about this level like how how much driving stick shift are you doing here and would you encourage people to spend time learning this will that make their text boxes over data you know business line of business apps better I expect them to Benchmark and profile their application agreed and then everything from all levels of the stack pop out Y and if they can address their scenarios with stff at their own level that's where they should start if they can't they should come talk to us now hopefully we fixed the stuff at the lower level but if we haven't come play with us in net runtime come play with us in net aspet core and let's fix it together but the interesting point is the interesting thing is most applications in the net ecosystem don't just depend on net runtime net coret core and nothing else they depend on a plethora of libraries from across nuget here we have an example of an amazing Library that's been around for a long time is used by tons of people 50 million and there's still opportunities for improvement that can then Ur to those applications so we can all kind of come together Kumbaya moment and fix all these things across the ecosystem fantastic yeah let's take a look at another uh let's do a more complicated one these are you know we're kind of just playing with characters and stuff are you plugged in uh I was supposed to be plugged in is that your plugin thingy there yeah um this one goes over to here uh this wasn't plug there go plug that in great now we got to do the whole thing again faster okay cool all right you got so let's do let's do I'm going to close some of my windows here just to keep this nice and tight we're going to declare tab bankruptcy y tab bankruptcy I like that it's kind of like what was it your desktop bankruptcy you just move everything move everything into a folder called desktop I moved it back when I got back to the hotel that's awesome um all right so um I was browsing around seeing what kind of functionality and what one of the things that I saw was cool that it had was um it's got this functionality for Roman numera bles this is this I've seen this Claire showed me this a while back this is one of the coolest things that you didn't know you needed in your life and then and then you play with it so like you can say you know from Roman I'm going to have to do this in my head when I was a kid when I would watch the credits on old movies and they would go back and it would be like MCM BBV I be like 1957 exactly you know like I just want to but the challenge is to get it before it Scrolls off so I can give it uh 2024 that's the current year right yeah uh and so you know let's find out should give me back 2024 uh so let's let's profile this Oneil tests now any kind of parsing thing like this yeah should be basically free right there should be there's no reason this needs to allocate and the amount of work this needs to do should parsing should be free both stat pretty pretty close to it right we're looking at six characters and figure out a number that it translates to but it allocated 216 bytes and took 240 n that's not free if I'm doing this a lot now I'm not sure if parsing Roman numerals is going to show up on your aspnet hot path but let's pretend that it does You're Building you know Roman mythology well if you're thinking about the Roman Empire basically five six times a day as one does or or 50 billion times a day as one does yeah yeah um so let's let's profile this now because it was taking a lot more time yeah I'm actually going to switch this time and use a different profiler because I I'm curious to know why it was taking so much time so uh now I'm going to switch over and I'm going to use CU we were we were looking at allocations before saying that that allocated a lot of memory in this case here it took it took a chunk of time exactly right when 16 NCS probably not worth CPU profiling 250 I'm curious what that's going to do so uh we're going to select CPU usage um and I'm going to come back over here on comment this now for I am noticing by the way that this is a thing that you're doing enough it makes me wonder if there's something that could be done to The Benchmark to like easily switch back and forth between these two things now for app profiling MH I generally want a small enough number that uh it's not going to overwhelm the system with data but a large enough number that it's going to escape the sort of the noise um for CPU profiling I want this to run for a lot longer so uh I'm going to because the way CPU profiling works is it basically takes a sample of every thread once a millisecond right so I I need to run for long enough time to collect enough data and just quick aside we've talked about this before the underscore is the comma for the North Americans periods for everybody else but it's the thousands separator that you look you put there to make you feel comfortable to make sure that with at a glance I can see uh what my number is in reality it's IGN it's ignored yeah uh so I could put it anywhere I wanted it wouldn't affect anything but I do it so that I can clearly see oh yeah this is 100 million so you do 100 million or something yeah uh so now we're going to go we're going to run that profiler uh we're going to do CPU profiler start that you can see uh in the background here uh it's starting to it's showing me that it's you know that it's running I have got 16 logical cores here so that's about 1/16th of of the size I'm going to let it run for about 15 seconds didn't parallelize it so this is just running on one one core yeah exactly I'm going to stop that let it collect all the data that it wanted to collect it's processing it and now it's going to give me a report showing uh what the most interesting things were in this trace this is super interesting because we did a whole hour and a half on Rex we did and you'll notice that uh it's oop it's telling me control one there we go it's telling me hey uh you're spending a significant portion of your time in a Rex so it's your fault it is my fault um but I'll fix it ah okay we're also spending a huge amount of time in Dictionary lookups right basically all of our time here is being spent in Dictionary lookups and in Rex processing that's pretty interesting right that is interesting yeah so let's figure out why so um I can drill that I can click on these things and figure out what's going on I'm just going to dive right into the code because it's just a a simple function from from Roman man laptops are hurt um uh so far so good we've got a string and oh obviously we're processing a span so some this code has been updated a little bit since man was introduced nice uh which is pretty cool uh I come in here I'm trimming it and then there's this is invalid Roman numeral that's interesting let's look at that and here's our reject so we're doing an is match this is them trying to get out as soon as they can if it doesn't look like it's a valid one numeral probably if it doesn't have m's and C's and v's and I's bail has a big Rex for that and it's a big Rex for that now this is a big Rex I'm not going to stand up here and try and optimize this especially because I don't really looking at this at a glance I mean I know Rex is but but I also know that if you have a Rex and it passes all the tests you should look away because you they sense fear they do sense fear they they yes it works I don't want to piss it off but I'm going to I'm going to play nicely with it we're going to you know become friends and it's going to tell me about itself so I've just made the class partial and I'm going to take advantage of the redx source generator to learn to help me to do two things one it's going to optimize it a little bit better than uh this could it's going to do it at build time rather than at runtime right uh and it's going to give me information that I can use to gain some additional insights into this so I'm going to say generated redx the difference here is rather than jitting the redx it's going to actually show you the source it's going to Source generate the Rex rexes are a little tunneled programming language within your programming language and it turns into a whole C application and you can go and check that out in our deep. Net series where we dig into that exactly now there he's not changing it I'm not changing it I'm just moving some stuff around now there's actually a an analyzer fixer that would have done what I just did automatically for me oh really yeah there is uh but just before uh we we came on stage I was getting really annoyed with some of the style choices that were made in this code base uh and the analyzers that were giving me errors when I didn't agree with the style uh and so um I turned off all analyzers right um so I've converted this into um a a a source generated Rex and then I just need to go fix up one call site down here because I made this a method so um I need to go and replace this with a a method call but the interesting thing you saw popping up there and getting the way of my cursor that's amazing though that should be called out as being amazing yeah definitely so what we see here is I hover over this you can see this description of exactly what this that is a generated description the generator is already run and you're looking at docs exactly so I'm going if I go to the definition here we can see the source generator in addition to spitting out all the code to implement this regular expression it also spit out those docs to describe um what this is what this I want to make sure everybody understood what just happened there right he had a regex he then made it not a jitted one but a compiled one The Source generator made this all of this was generated passively in the background yeah live as live as you were typing as I was typing exactly now one of the interesting things that you know this helps me learn about it but it also helps certain things pop out about this regular expression and if I just I'm going just going to double click here and highlight the word capture you see on the the right hand side all that yellow I see a lot of greedy and a lot of capture yeah so there is a whole lot of capturing going on and the interesting thing about regular Expressions is if I go to the the you can see the regular expression up here parentheses mean multiple things in regular Expressions it's not just a group that used to make things convenient they're capture groups so there's actually work Associated every time you have parentheses in a regular expression except all I'm calling is is match so in my in my code here if I go back and look at all the things that are being done all the times yellow is popping up here that's all work associated with doing captures not related to matching not related to matching because all I'm asking is did this match not please tell me what the matches were so I'm going to go back to my uh where that reg regular expression was and I'm going to add one more option here to Rex options and that's explicit capture and this is saying that double duty the parentheses we serving have them only be about style and not about capture groups okay right so let's run this uh now humanizer is known for its tests when you make changes like this you're going to want to be running those tests and make sure you didn't break any actual functionality I will and there's a whole bunch of tests over here that if I was a good citizen and had more than 9 minutes and 40 remaining T I would I would run I just want to call out our friends who work on humanizer 13,000 plus tests yes huge fantastic uh so let's run this again I remember it was about uh it was about 250 uh milliseconds I believe the nanc before we made one change we added uh Rex options. explicit capture did it

matter that you changed it to a compiled Rex using a storage generator the result is the same the result is the same the the redx Source generator and the redx compiler are pretty close in the source to the output The redx Source generator is a little bit better but in this particular case it's not going to matter a whole lot so just by adding that one option we've now shaved off about a fifth of the cost of this thanks to that Insight that the profiler told me hey go check this out no I could probably optimize the redx further but I have bigger fish to fry because if we looked at the the amount of time that was being spent in the from the profiler it said a huge amount of time was being spent in Dictionary lookups it was why is that well we down here to the main processing Loop we see we have a dictionary here uh and we also see some other unfortunate things like we're calling two string on every character that's being passed in here so we've got an individual character we're calling two string in order to do a dictionary Lookout so it's pulling them out and what it's getting out charge and then and then converting into string now it's if you look at the dictionary it's pretty clear why that choice was made because there there are things showing up in the dictionary here that's more than just a single character and it's doing ordinal ignore case so it's trying to handle lower case as well as uppercase why is it handling things that are longer because we were just doing individual character lookups well this when I went and explored this is used in a couple places not only in this function but also in the two Roman function it's enumerating over this dictionary in order to produce the right out and if it was a single character it could be a faster lookup it could be it could be you know lowered to an array lookup an array index rather but these are sometimes one sometimes two right so what we realize is we actually have two different purposes that have been joined together into the same dictionary we have a purpose that's enumeration which would actually be better served by an array and we have a purpose that's just for individual characters which would be much better served by not having to two- string each individual so we're going to go and we're going to fix that we're going to double we're going to copy this whole thing and we're going to create another one and we're going to create things that are destined for uh their own individual purpose okay single responsibility principle single responsibility so instead of a dictionary I'm going to have a uh key value pair from string to integer and we'll call this R Roman inumerable sequence isn't a dictionary already a key value pair stepen toe uh it is but it is much faster to enumerate an array than to enumerate a dictionary right so and I'm now just going to go and I'm going to quickly just delete a whole bunch of things here hopefully I don't delete you know this is where tests would come into play because it's probably pretty easy for me to accidentally delete one of these lines that yeah what could go wrong right exactly uh we can use collection expressions and then I'm going to take advantage of one of my favorite features that saves me so much time in visual studio and that is if I do um alt drag I can now type on if I don't hit a key while I'm uh trying to do that I can just type the same thing on each line so key value pay. create right um and I'll do the same thing on the other side everyone everyone clapping's never used VI or emac at any point in their lives uh seriously though knowing that kind of stuff if you want to make people at work feel bad about themselves do it in front of someone with just do it casually just throw it in meeting uh so then we're just going to use this array down here for for the two case and so this is this is going to thank you this is going to make this case better while I'm here you know I noticed you I said we also see things as we're optimizing these allocations with this Loop you look at this Loop and this Loop is basically going from the highest value things to the lowest value things and it's basically saying use as many of the higher value things as I can and then move on to the next order matters right so it's saying um while the input divided paed Valu is greater than zero so it's looping over like it's doing all the M's and then it's doing all the CM and so on there's a faster way to do it than dividing when it comes to numerical operations plus minus times divide divide is the most expensive for some reason it was written this way when it could have just been written as long as the value is you know the input is greater than what's left so we can just save a few cycles for there as well not a big deal but you know these things show up around these allocations now we have this dictionary to deal with so we already said we were're only doing lookups with an individual character right so I'm just going to go and I'm going to get rid of all the double ones oops again that's where uh tests are valuable ah I'm doing something wrong and it's your laptop so you can't complain about keyb I know I can complain well you are Des you and our desktop people so we we should have brought our mechanical keyboards there you go all right and you know I'm just going to get rid of all these extra braces as braces as well okay I keep hitting the uh I'm just impressed that things are happening all right so we're down to the data that we care about okay now we could do this as a dictionary and you know I'm going to come back in here and I'll do my little um my little trick of replacing these things so I'll do that right you can go from um character value but you know C has a built-in construct that does this and I don't even need a dictionary right it has a switch statement which has been around for well since c yeah but if you do a switch then you do a go twos and then it's a slippery slope and you know no Stephen loves his go-to because nothing is you can be a go-to hater but it's faster than heck so let's just have a character come in here um and I'll tell you what we we won't saying switch statements are a gateway drug okay then we won't use switch we'll use switch Expressions how's that okay great impress uh so then we'll just say C switch uh and for each of these characters oh okay yeah cop will do which you're oh uh there we go we'll do a co-pilot suggesting great perect that um and then we need to change this to be the integer value okay now we do have uh one difference in oh read only yeah right it's no longer a dictionary um there is one difference in Behavior here right we we had that string ordinal ignore case thing which we don't have here um so there are a variety of ways we could tackle that I could come in here and I could say or M I could well this is challenging because this is important now if soon you can do MDC but when you get down to the eye you've got the Turkish eye problem the Turkish eye problem which is the whole thing yeah but for this purposes not going to worry about Turks have four Eyes by the way if you're Turkish you you know this and if you've done any kind of intern humanizer is very well known for its support for multiple languages if you have a lowercase eye with a DOT over it and you up to Upper it in in English in most languages it becomes an uppercase eye in Turkish it becomes an upper uppercase eye with a DOT on it yeah Z X 130 or ZX 131 whatever the the values are so you just got to I just have a whole speech there and you got to make me look bad by calling out the actual Unicode code Point well I might be wrong do you actually know oh yes there actually uni code code well I'll look it up after um I'm not going to worry about that for now though because I want to call out the techniques but let's assume we're only dealing with asy so the the the asy table like the first 128 characters of utf8 okay that came about in the 1960s and 1970s via a variety of Standards but the people that designed it were brilliant and they had this really cool thing they put in place which is that the uppercase uh Latin English characters uh differ from the lowercase ones by a single bit that you turn on and off and so there's this really fun trick you can play in fact we can I bet we can even ask C say um the uh aski value of a is 65 great yeah that's right uh the uh oh it's already telling me for lower Cas a okay yeah um uh okay the difference is okay dude dude back off you get the idea but there's a a a bit trick we can play which is if I have a upper a lowercase letter yeah and I and stick The Landing in 90 seconds yeah and I end it with this yeah it will become the uppercase version and if it's the uppercase version it'll stay the uppercase and that might not work for a really really broad series of letters but the only ones we care about are Roman numeral letters so very constrained thing so you can probably get away with this exactly and it's already been verified because of that rex that we saw earlier uh so I'm just going to come in here delete that don't delete that am I going to compile we'll try we'll come back over here we'll run this again is the mission impossible scene no it's not it's not it's a non-related uncopy just I'm just talking just random notes sure you are okay you have 42 seconds so we were at 250 NS with 216 I think bytes of allocation and we're down to five times this fast and zero allocations and 30 seconds left very nicely done sir thank you woohoo fantastic how fun is that this was a highly technical talk well thank you if you enjoyed this highly technical talk I would encourage you to rate it and share it with your friends be sure to subscribe to the net YouTube and take a look at our Net de.net series and have a great time tomorrow we've got much more for you here at Microsoft build thank you that was fun e for

2024-05-31

Show video