Memory Safe Languages DO NOT PREVENT MEMORY LEAKS

Show Video

If you've been watching my Cosmic Alpha streams, which you should be because they are very fun, and I find some bugs that no normal person would run into. On the other hand, I run into bugs that are relatively easy to find. It's just slightly outside of the normal expected, hey, I'm the developer, I know what sort of stuff I'm going to look for behavior. Right now, especially in the latest stream, there have been quite a bit of talk about memory leaks. As of the Alpha 5 release, there are some very, very severe leaks, which, if you accidentally run into them, make the desktop literally unusable. Combining that with the fact that there is a leak that happens upon Cosmic closing or crashing, it doesn't seem to clean up process correctly, and something just runs absolutely wild.

You know, Alpha Software will be Alpha Software. My understanding is there is talk of doing a 5.1 release, so hopefully some of those things get dealt with before 6 happens or before the beta happens. But I thought Rust was memory safe. How are you leaking memory if it's written in a memory safe language? Rust bad! Rust developers are trying to ruin C! C is the best language! Only write things in C! Well, think of it like the difference between the average everyday usage of the word theory versus the scientific usage. If you're not a developer, memory safety probably doesn't mean what you think it does in that technical context.

And in fact, during the Rust beta, there actually was discussion of trying to make no memory leaks be part of the memory safety guarantee of the language. What they realized, however, is doing so is pretty much impossible. Ignoring the purely developer error, which is one we're going to start with, even outside of that, you can't really do this. So let's talk about what a memory leak is, and how that is different from what memory safety is. At a fundamental level, a memory leak is when your code does not release memory that it no longer needs. The very simplest kind is something that no language could ever possibly deal with, and if you're using a tool that detects memory leaks, this isn't even going to come up as a memory leak, because this is basically the behavior of a programming language.

Okay, let's say that we have a list, and this has a number of elements in it. It doesn't really matter how many that's going to be. And let's say this list is, I don't know, a scoreboard system or something else like that, and every 30 seconds, we ping a server to say, hey, who are the current people that are in the scoreboard? And then we're going to display it on some sort of, like, scoreboard system like this, it's going to have, like, one, two, three, four, so on and so forth, however many people want to display on the board. Okay. Now, instead of updating the content of the list, what you do is go and use the add function. You make the mistake of calling the wrong function.

All of a sudden, all of that extra data is added onto the list. Then you do the same thing again. And again. And again.

And the list keeps getting longer and longer and longer and longer. What you are supposed to do from the start is replace the list. You made the mistake of adding data onto the list. This is a kind of memory leak that is completely unsolvable at the application level, because whether this is intended behavior is entirely dependent on the context of the application. One application like this, replacing the list, might be what you're intending to do. For another application, though, making that list grow longer and longer and longer and longer, that is the intended behavior.

The only way to stop this kind of memory leak is knowing the context of the application. This case can only be solved by writing a test suite for that specific application. However, whilst to the user, this is going to display as a memory leak, the application is going to use more and more and more memory the longer it is running.

When we start getting into the more technical sense, this isn't really considered a leak. Because, again, there is no way to generically detect if this is possible. A true memory leak is when the memory is either partially or completely inaccessible. Now, when you see people say, oh, Rust people say that Rust solves memory leaks. Obviously, it doesn't solve memory leaks if it can leak memory.

Maybe someone says it, but they're wrong. Because there is literally a section in the Rust programming documentation on how to leak memory. Rust's memory safety guarantees make it difficult, but not impossible, to accidentally create memory that is never cleaned up. Known as a memory leak.

Preventing memory leaks entirely is not one of Rust's guarantees, meaning memory leaks are memory safe in Rust. Now, to keep this a lot more accessible, I won't talk about Rust-specific libraries or terminology or anything like that. If you want to go see that, actually go and read the Rust documentation.

Instead, I'll talk about this in a more generic computer science sense. What we are going to do here is create a cyclic reference between lists. So we make variable A, and this right now points to a list that has two elements in it. The first one, let's just make it number five.

Doesn't really matter what it is. Could be a string, could be anything else, but we'll just use a number. And the second spot is empty.

So this is a null, a nil, whatever your language calls it. At this very moment, the A list has a single reference to it. This variable made before, that is all.

Okay, let's make the second list. This we'll call the B list, and this is going to have two elements in it as well. So let's have another number here, and then the second element.

We won't make this nil. Instead, we will make this A. So this is going to have a reference to this list over here. But now something has changed with the references. List A no longer just has a single reference.

Now it has both the variable and this reference over here. So it has two references, and list B just has a single one. But what if we go and update list A? So now instead of being nil or null here, now it has a reference to list B. So we then update this one, because now that is set to two. And here is where the problem happens. So at the end of the main block, we clean up the B variable.

But because there is still a reference to it from A, we don't yet clean up the B list. So this is now down to a single reference. And then we go and clean up the A variable.

But because there is a reference to A from the B list, this still has a single reference to it. Now we have a problem, because now we have a section of memory that has no variables pointing to it, but each of those lists still has one reference to it. They are self-referential. This memory has now been leaked.

There is no way to get rid of this memory. If you then make the mistake of calling the tail function, so get the last element on list A, okay, so we're going to go from the start, that's not the end. We'll go to B, that's not the end.

This is not the end. This is not the end. This is not the end. This is not the end. Not the end. Not the end.

Not the end. Not the end. And it just keeps going around and around and around until you have a stack overflow. This is obviously a toy example, and you would have to be really stupid to actually do this in production code. However, there are much larger cyclic situations where this can absolutely happen, and just this basic thought experiment shows that it is absolutely possible to do in Rust, and that is why it is in the Rust documentation.

This is known as an indirect memory leak, because the memory is still accessible, but it's only accessible from other leaked blocks. You don't have direct access to that memory. So with there being an indirect leak, you might reasonably expect there to be a direct leak as well. And yes, that is a thing.

So let's make A be a pointer to just some block of memory. It doesn't really matter what it is. And this is in a local scope. So we're calling this inside of, like, a function or something like that. And we don't go and free this memory after that local scope. So that pointer, we no longer have access to it, but that block of memory was never freed.

So this is completely unreachable. We don't have any way to deal with this. This is a direct memory leak. This is unreachable memory. Now, in a language like Rust, if you're just writing regular code, this situation just doesn't happen.

If we're dealing with something like C, however, this is very easy to do. All you do is exactly what I said. You make a pointer to some block of memory and then don't free the memory. And the pointer is just not accessible now. And you just have a problem. You've leaked some memory.

However, there are some very specific scenarios where doing this does need to be possible. So it's not correct to say that it's impossible with a language like Rust. It's just not possible unless you explicitly say you want it to happen. So there is a whole system in Rust about making memory and then losing access to it.

But if you want to use this, you have to explicitly say, do this. It's not default behavior of the language. It is something you have to actively go out of your way to activate. Now, let's go back to a thing I said before. Memory leaks are memory safe in Rust. What is the difference between a memory leak and a memory safety error? And what does a language like Rust or any other memory safe language out there actually protect you from? So memory safety is about dealing with problems that in the majority of cases are just mistakes.

These are not things that are like, oh, in a normal application, you'd probably want to do this. These are things which in most cases are just bad and just errors and just shouldn't be done. These can be broken down into two main categories.

Spatial errors, accessing memory in illegal areas. And temporal errors, accessing memory at illegal times, either before its creation or after its deletion. Said another way, memory safety is about dealing with undefined memory behaviors.

If you make a cyclic reference, this is not undefined. It might be stupid, but it's not undefined. If you go and make a list that's way too big, that's not undefined. It's just a really big list. If you make a point until memory block and then don't free the memory, that's not undefined.

You just made a mistake. Now, here's one you've very likely heard of, a buffer overflow. If we have an array, an array is just a fixed size list. So when we create the array, this is the size of the array. This is how much can fit in the array.

Let's say I want to ignore that and I want to add an element right here, just outside of the array. Well, I can't do that. That's outside of the array. This is undefined behavior. This is going to try to write over some random other thing that just happens to be in that block. What that's going to do? Literally anything.

This is exactly what we mean by undefined behavior. It's going to do something. But what that's going to do is going to entirely depend on what is run on the system earlier in the day, what is currently running, all the manner of other variables that are completely unknowable. Now, another one is a buffer overread.

Oftentimes, buffer overread isn't really said and people just call it a buffer overflow. This is the exact same thing as an overflow, where instead of doing a write here, we try to read here. Again, this is going to try to read whatever memory happens to be there. Now, let's talk about these temporal issues. The simplest one is uninitialized variables. Now, most languages like, I don't know, Java or something.

If I make, say, int a, even if I don't set a value for this, it is going to be given a default value. Usually for an int, it's zero for something like a string. I can't spell. Excuse me, I'm Australian.

That's going to be a empty string. There is other things. You're going to do things in different ways. Sometimes it's a null. Again, depends on the variable you're making.

However, if you have a truly uninitialized variable, so it is a, it is set to nothing. Not null, not zero, not empty string. You have not set anything in it.

This is uninitialized. What does it point to? Something. Whatever happens to be there. Related to this is a wild pointer.

So normally a pointer will point at some block of memory, like a number, a string, an object, whatever the language is going to be. But we make a pointer and then don't initialize what it is pointing to. Like with the uninitialized variable, it just points to whatever happens to be there.

This is bad. This could do anything. Now let's say we have another uninitialized variable. Again, it points to nothing.

And let's say that we go and try to free the memory that this variable points to. Normally, this would free whatever is here. But we've not assigned anything to it. So what is it going to free? Something.

Now when we talk about memory safety, the two poster children are use after free and double free. So let's do something completely valid this time. Let's have pointer A pointing to a block of memory.

Nothing is wrong here. In this state, it is totally fine. Let's go and free this block of memory, but still have pointer A lying around. Now, we make the mistake of trying to use pointer A.

What does this point to? Well, it doesn't point to what it's supposed to point to, but it still points to that same area of memory. But at this stage, anything else could be there. This is known as a use after free. Now, a double free should be pretty self-explanatory.

So we call free on this block of memory here. Okay, memory is now freed. Memory is now all cleared up. It's all good. Then we say, well, just call free on the same place again.

Well, the memory has already been freed. We don't have to deal with anymore. And this is going to attempt to free whatever just happens to be there.

And whatever happens to be there is the exact thing we don't want to be dealing with, because that is completely undefined behavior. And that is exactly what memory safety is about dealing with. Dealing with undefined memory operations. You want to make sure when you're dealing with memory, it always behaves consistently. You don't want to mess with things outside of your application's control.

Now, I want to be clear that everything I've talked about so far with memory safety is in the context of single-threaded development. It also applies to multi-threaded development. However, there are additional problems that you also have to deal with. So there is a thing known as a data race. Okay, keeping this simple, let's say we have two threads.

We have thread one goes on for some time, thread two goes on for some time. Okay, we have variable a, this is set to one. Now, on thread one, we are going to do a plus five. Okay, and on thread two, we're going to do a times two. So, let's say that on thread one, a plus five happens first, and then on thread two, it happens way over here. This is going to mean a equals one plus five, so it is six at this point, and then when this happens, now it equals to 12.

But what if that happens in the other order? Let's say a times two happens first, and then a plus five happens over here. So now it is one times two, so two, plus five, seven. So depending on the order of these operations, which are happening on different threads, so depending on what else is happening on the system, they could execute in any order. The result is going to be different.

And don't be mistaken to think that this is the only kind of data race. We could do something like, say, a is a pointer to some block of memory. So let's say over on thread a, we call free. And then on thread two, we try to do something with A. Well, now we have a use after free. Let's say we go and call free on A on thread one, and then sometime later, we go and assign a new value over to A.

But on thread two, in the meantime, we call free again. What is that? That is a double free. Now, there is certainly some debate about whether you should or shouldn't include thread safety in the definition of memory safety. And there is a lot of arguments on both sides, which I think are perfectly reasonable.

For the sake of this video, though, I'm just going to talk about it in a single thread context, because that's the way it's typically discussed. Now, with that in mind, whilst Rust people do talk quite a bit about memory safety, it's not the only memory safe language. In fact, it's actually easier to list out the languages that are not memory safe than the ones that are. Because with the exception of C, C++, Assembly, and a couple of others, any language you would reasonably consider using is memory safe.

Now, before somebody gets to my case about C++ being like, oh, modern C++ is memory safe. And yes, you can 100% write memory safe code in C++. You can write memory safe code in any language, but C++ does actually provide facilities to safely do things and ensure that dumb mistakes are not going to happen. However, you can also choose to write basically C code in C++, as opposed to other languages like Java, C Sharp, and of course, Rust, where if you want to write unsafe code, you need to explicitly say, this is going to be unsafe. I know what I'm doing. This is intentional.

Don't stop me. And importantly, in the case of Rust, even if you use the unsafe keyword, it does not disable the borrow checker. So there are still some kinds of memory on safety that are simply not possible in the language. So the TLDR are, a memory leak is when your code does not free memory that it no longer needs, and you use more and more and more memory. Memory safety is about dealing with unsafe memory operations, which in most cases you do not want to do. Rust and other memory safe languages, which is most of them, do not stop memory leaks.

They might stop certain kinds of memory leaks depending on the language, but memory leaks will always be possible. A memory safe language is about stopping unsafe memory behavior, unless in cases like Rust, there are ways to do unsafe behavior in specifically defined unsafe blocks. So I never want to hear about this topic ever again. If anyone ever talks about it, I am just going to link my video.

That is all. So if you like the video, go like the video. And if you really like the video and you want to become one of these amazing people over here, check out the Patreon, SubscribeStar, and Liberapay linked in the description down below. That's going to be it for me. And I guess at some point I had to do a programming video and I had to put my software engineering degree to use.

2025-01-21 07:02

Show Video

Other news

The HD, WIDESCREEN Tube TV! Sony Trinitron KV-30XBR910 2025-05-30 19:30

The Fight for AI Market Dominance | CNBC Marathon 2025-05-28 09:37

Bring your own model to Windows using Windows ML | BRK225 2025-05-26 17:57