TW presents: Ready for Rust
- Let's get started. It's really weird for me to do this. I mean, I've done webinars before but this is a Thursday night before a long holiday. And normal wisdom of course, would be to never do an event at such a time, but these are not normal times. So starting with the actual presentation, I hope you can all see the title slide that says the title and my Twitter handle.
And one thing, and many things and Stephanie mentioned some of them. But one thing I will be upfront with you. I'm not a Rust expert.
I've actually programmed in many languages also commercially for many years but I'm not really programming commercially in Rust and I'm certainly not an expert. But I do think that what I'm showing you tonight, which is an overview, a personal overview and some of my first impressions from learning Rust and becoming a reasonably okay programmer in Rust are actually probably quite helpful for you if you're also new to the language. Let's start with the why. Why is Rust a topic that we wanna talk about? Some of you may have recognized this.
This is the StackOverflow questionnaire, the survey that they're sending out once a year. And there are some number of different categories. And one of them is the most loved, dreaded and wanted languages.
What we see here is the most loved language. And you can see the Rust here sits in the top spot. Interestingly enough, Rust has been in that top spot for four years in a row. This is obviously the 2019 survey. And Rust is only out for four years.
That is Rust 1.0 it was released in 2015. So that's quite an achievement. I would say that for its entire real existence, it was the most loved programming language.
I think part of it is due to the excellent community. It's an amazing community. When I wrote a blog post, which was a precursor of this talk, it was so helpful what people responded and how enthusiastically they were helping.
And I think this is really a sign of people really loving the language that they're working with. Loving the language is one thing. Let's look at the flip side. There's also a section talking about the most popular technologies.
And if you look at that, I'll spare you the searching, Rust is much, much further down. It sits here at 3.2%. So there's a huge discrepancy that we are still seeing between Rust on one hand being really really loved by the people who are using it, but on the other hand, when you look at how far it has spread in commercial use, it is not that popular right now. It is just not being used. You can, if you squint, you can see... Actually we don't have to squint, you're sitting in front of computers and not in a big audience screen.
You can see there's a Professional Developers tab also. Trust me, it doesn't look very different there either. So it is just not that hugely popular at the moment. Stepping back a little bit, think about what do these three companies have in common? Apple, Google and Mozilla.
All three of them have written a large amount of software in C and C++. And all three of them don't particularly like the experience. Which is why all three of them have, in recent years, and I include the last five, six years here, created their own new programming languages almost exclusively to get away from C and C++. Apple has created Swift. Google has created Go. And Mozilla has created Rust.
They're all quite different languages. Comparing them is another talk in itself. So I won't do this. I will focus on Rust. But I wanna return to Apple briefly to further highlight why people are excited about Rust as a programming language. This is not specifically selected.
This are the knowledge-based document accompanying an Apple security update. And as I said, I didn't specifically select it. This just happened to be the one that was the most recent one when I wrote the first version of this talk last summer. So it is not selected in a particular way to make my point. I think it is quite representative. And I'm not leaving anything out except two vulnerabilities at the beginning.
They are ordered alphabetically. So if I look at this, it tells me what Apple fixed in the security update. And what we'll see here is, in Core Data, there's a description. An out-of-bounds read was addressed with improved.
A memory corruption issue. A buffer overflow issue. So three on the first page, memory-related problems.
I'm scrolling down and I'm seeing another four of them. All the ones that I marked with these pink arrows are memory-related issues. And again, I'm not leaving anything out, I'm just flowing here. The next stage looks very similar. Again, the large number of these problems that were fixed in the security update had to do something with security, sorry, with memory problems. So this is the Silicon Valley view.
What about that large company from Seattle? I mean, I'm talking about Microsoft here. AWS, interestingly also does some Rust in some of the high-performance parts of their cloud solution internally. But I'm talking about Microsoft here.
Where are they at? Last summer, in obviously, Visual Studio magazine, there was an article titled, C++ Memory Bugs Prompt Microsoft to Eye Rust Instead. So what we're seeing here is that Microsoft, of course being quite far ahead in the early 2000s came up with a language almost as a replacement for C and C++ called C#. It was more similar to Java, or is more similar to Java. And that really hasn't proven successful in the areas that C and C++ were successful. So while they created an alternative very early, this is not currently in the race.
So for some of the more systems-related programming, they are eyeing Rust instead. This is not just because some engineer said so, they actually had Microsoft Research look at all their own security issues. And this is the summary. I'll let you read this. So what they're saying here is, they're saying the vast majority, or they're saying the majority, but I remember from the article it's about 70% of vulnerabilities fixed that had a CVE like a vulnerability report assigned are caused by developers inadvertently inserting memory corruption bugs. So and this is Microsoft.
They have a long history, they have a big attack surface. They wrote a book, "Writing Secure Code". They made a big, big focus. They have some of the best tools available like find bugs and all the static code analyzers and so on. And yet the majority of the problems that they have are related by memory-related problems. So there really is something that is beyond what most human developers can do.
Personally, I had a slightly different journey to arrive here. And at ThoughtWorks, we are increasingly working with embedded software. That is where the software is only part of a larger solution. Generally it's some hardware solution that contains a little computer or a larger computer as you can see one here. This is an NVIDIA Jetson board that is used by some of our clients.
This is a developer board. The one that would actually be put into the devices is a different form factor but the same thing. They're quite powerful. And again, we're thinking, okay, you're starting to build something new from scratch.
Why would you repeat the pain from the past and keep doing this in C, C++, or even the transpiled MATLAB version, which is also not uncommon, I would say. You know, in fairness I would say these boards in particular are quite powerful. And I know that some of our colleagues are deploying Docker containers on them, and they're writing services in Go also. But it was my motivation to think, if we're going to bigger hardware, if we're trying new things with embedded, why not also look at a new programming language? One company that is not a ThoughtWorks client, but is public about this is (indistinct), the company that also does trades.
They're also specifically and have talked about publicly are looking into using Rust in such a context. Okay, let's get to the code. But not right to the code, let's get ready for some of the code. It shouldn't really be said that introducing a new programming language you should really build on the shoulders of the experience. The shoulders of the giants, of what other companies, other organizations have done before. And in my personal experience from knowing and having programmed in many languages, Rust really got almost everything right here.
So the first thing you need to do is you download the script that creates the Rust installation. The Rust installation can actually live in your home directory. After that you have this command that I highlighted up here, the rustup command. And with rustup, you can update your entire installation of Rust on your machine. So really it's just, yeah of course it's nothing special, but that's I think the good thing, it just works in my experience, unlike some of the other environments that I've seen. Once you have Rust in your system, it also comes built in with a package manager.
This one is called Cargo. The names by the way will not help you. Googling for Rust and Cargo when you have issues will give you lots of other things because Rust and Cargo are obviously commonly-used terms in the English language. So that is a bit of a downside sometimes, but nevermind. So Rust is the tool that you use to build. It manages your dependencies.
It allows you to create scaffolding as we can see here. So here I'm saying, Cargo crate a new scaffold and the name of the software product onto this called hello-rust. And what it creates is absolutely minimal.
main.rs is the main programming file like main.c or java or whatever. And Cargo.toml is the file that describes the product
as well as the dependencies. It's a bit like packaged.json or the gradle or Maven files that you may know from other programming languages. What I have decided, I was asked to do this as a presentation at a conference by somebody who I know. And he asked me whether I could do this Rust, and I go Rust introduction? Therefore this is really gonna be quite tough because Rust as a whole is a large programming language compared to some of the others, for example like Go. Go could almost be seen as a simplistic language and it's easy to learn.
Rust on the other end is quite large. I thought, how can I do this? How can I do this justice? And it's really tough. At the conference I tried to do it in 45 minutes. Today, I have the luxury of having an hour plus time for questions, if you want to stay around. But still, it is not easy.
So I thought, how can I do this? And I really did not want to do a hello world. So what I did instead, I took a piece of software that I wrote when I wanted to learn Rust myself. This is a hobby project. I had written this, and I had come back to this much later in the talk, in Clojure originally.
Clojure being the Lisp dialect on the JVM. It is a piece of software that simulates agents. It's a combination of genetic programming, genetic algorithms, but really specifically, genetic programming and artificial life. These agents are running around in the world. They're bumping into each other.
They have a strategy. The strategy can be something like turn left, go ahead, look around, eat some food. They have to eat food to survive. When they bump into each other, they create offspring.
And then genetic programming kicks in and the strategy that they have is then going through a genetic programming algorithm that creates the offspring with a new one. This is basically what you need to know about the sample code. It is quite nice because you don't have much UI, you don't have much I/O. This is really about computation. And the idea really is you have some agents running around in the world and they have strategies expressed as a little mini program.
And they have some genetic programming to... You won't even get into the details of that. Some genetic programming to create offspring.
If you have noticed it, I'm actually using IntelliJ here which many of you if you do know it, probably know it is a Java IDE. Normally these days Visual Studio Code is my go-to editor, IDE if you will. And I actually tried Visual Studio Code at first and it was a good experience for using Rust. Luckily I also have a full license for IntelliJ and I tried it just because somebody told me. And I personally found that I was even more productive inside IntelliJ even though, as I said, it's... I mean, JetBrains has a number of different IDEs but for me the features were there and the features work really well.
Not everything works as well as you would expect it. Inlining code, for example, the refactoring, automated refactoring doesn't work or it doesn't work quite well. But on the whole I found it a really good experience. What is also interesting, what you can see here is, in the declaration of dependencies further down. You can see I'm declaring a dependency on something called chrono.
You can also see that a little bit further down in line 10, the random number generator is also an external library. And also interestingly in line 11, serde, the serializer/deserializer is also an external dependency. What is even more interesting here as you can see in the following lines, I'm declaring dependencies on serde_json and serde_derive. That means even the dependencies are quite modular.
I can only create serialization and deserialization without having to use json. One of the reasons here is obviously that Rust, and I didn't mention it so far, compiles into a single binary that includes everything it needs to run. It doesn't have to have an environment installed. It's a self-contained binary. And of course we wanna make the binary small. It's easier to transmit.
It's easier to roll out. At the same time, it is also decreasing the attack surface. If you have less code possibly in your binary, you have less possible attack vectors. What is also interesting, the last line here, if you look at uuid, I say I want to generate UUIDs and some of the libraries know each other. And you can see here at the end of the line, that in uuid I'm saying explicitly, I want two features.
I want the version 2 UUIDs and I want Serde. So what I'm saying is uuid library, I want your feature to deserialize and serialize UUIDs. So you can even work on that level. You can create features that projects or products that depend on you can choose to use or not use. Again, all in the name of creating small binaries of creating the amount of code that needs to be included in the deliverable.
One last thing that is worth mentioning is in line 5. I can explicitly specify an edition. And here I'm saying, I want Rust edition 2018. I breathed a sigh of relief when I saw this after my experience with Swift.
Apple with Swift also created a language and it's evolving over time. I think that's the right approach. But Apple really created a whole mess for the developers. If you are now checking out code in Swift that you last touched three years ago, I did this last year, you cannot even compile it in the current version of the IDE. In one case I had to go to an older version of the operating system. With Rust, they have really made clear steps of editions.
You can select them and you can say which edition you want to target. And it works with today's toolchain. So they evolved the language but they preserved its backwards compatibility, which I personally believe is again, another small thing. If you have never seen anything else, you think, yeah that's how you do it. But it is in contrast to what other languages are doing. So very quickly, cargo build builds the application.
You can see this is running. And you can see if you've worked in compiled languages and really compiled languages. And it's not super fast. I mean this is running on a MacBook Pro. It's okay, but it's not super fast. I mean, the good thing of course is, it supports incremental builds.
That means of course that if I make a change, only the pieces that I have changed need to be recompiled. If you want to do a release build, an optimized build, that will take a little bit long. What you also might see is now this is the build for the toy project, for the hobby project I talked about. You see, there's not that many dependencies. This shows you also the transitive dependencies. This is slightly deceptive.
The moment you include a web server, this looks much more like an npm build. You literally get hundreds of dependencies, of transitive dependencies. This is only because it's a small project. And one last thing, these dependencies, they're all compiled. Because I'm compiling everything to a single binary.
That means if you look at the target directory in which everything is assembled, these can be quite big. If you're doing regular backups of your machine, please exclude them from the backup. You'll be backing up gigabytes of binaries that you really, really don't need because they can be recreated obviously.
So cargo test. Testing's of course the first task citizen. One thing that is quite curious is that the tests are included in the same file that the functions and methods that you're testing. So rather than like in Java or in other languages have it being in a different directory, they're included in the same source code file. Took me a while to get used to. I can now see that it's a viable way of doing it.
And it's just much of the motions, I guess. What you did see if you looked, the running the test is actually quite fast. It is faster than the terminal console also. And the combination of incremental builds and the really fast execution of the unit test really was working well for me. It never got me out of the flow.
I'm quite sensitive to this. I really do test-driven development. I love doing test-driven development. And I'm sensitive to being interrupted to being pulled out of my flow. But again, even though the compiler probably doesn't look that fast when you're actually working with it, you're making small changes, that red, green refactor flow that you have in test-driven development really works very well, at least for what I've seen so far. So 15 minutes into this, let's have a look at some code.
First thing, if you look at this and it's the first time you've seen Rust, maybe you notice a certain tendency for brevity. Almost everywhere, they've abbreviated the word. So you can see pub instead of public, fn instead of function. impl instead of implementation. u64 for unsigned 64-bit integer.
And then similarly, you see struct and impl. Interesting here is, even though I will talk first about the more object-oriented aspects of Rust, it really isn't an object-oriented language. You can see here, I'm declaring the structure, the World structure, which is the structure in which this agent, the little creatures live. And that could be it. It's like C, it's a structure that can exist.
I can almost attach, and I'm putting these air quotes around it, attach implementations to it to create something that resembles a class in other programming languages. And this is what we're seeing here. In the bottom half of the screen, you see impl World. And now I'm seeing this of course more further down, I'm seeing methods, if you will, that are related to the structure I've declared. But the structure can stand alone. You can just create a structure in memory.
Other things that are noteworthy. Rust does have an Option type. Hooray, no more null pointers. So what I'm saying here is this name of the World is an Option of a String.
So it can be, second line, it can be None. This is the actual Rust code. And see, there's nothing in here.
There's no name given. Or I'm saying Some, and then I can say, My first world. This is almost identical, it totally reminds me of Scala, of how Scala is doing it. And I will show you later that there's a lot of syntactic sugar in Rust to make working in that way quite nice.
You have to always unwrap it. This here shows you the first declaration of the function. You see it's a public function.
It's named is new. We see it has two parameters. One is called name, and its type is &str.
Ignore this for the moment being and assume it is some other thing like a String. There's a reason why it doesn't say String. The second parameter, confusingly is called params. And it's of type Params. I apologize but that was just the way it is.
This is when I'm creating this World and I'm passing parameters in it like how big is the world? What is the shape of the world? What are other parameters of it? So these are being passed in. Then we see this arrow-like notation that is also probably well-known to other languages. And the return value of this function is World. So I'm specifying input, and the type. This is the syntax.
I think nothing specific here. This though, I should say, is just the convention of how constructors or constructor-like functions are written. This is just what people happen to do. This is nothing specific to the language.
Actual construction of an object or a structure looks like this. There's special syntax for it. I'm writing the name of the type and then in curly braces, I'm giving the values of all the fields. And what you notice is I have to specify all the fields.
I can't just omit something or rely on some random initialization. What is interesting is what we see here in the lines that I have marked now. So random is assigned RNG::new. So we're seeing I'm using the same convention for constructing a random number generator. And at the bottom you see log.
I'm also creating a new thing that the world can log events into. What is also interesting are those two lines here, params and terrain. There's a shorthand.
I could have written this. I could have written terrain: terrain because what I want to assign to the terrain field in the World structure is the local variable terrain that is declared in line 26 further up. But what Rust does let you do is if you would write terrain: terrain, you just write it once. I'm highlighting this, because this is a design choice I noticed in Rust.
They have added a fair amount of extra syntax that is confusing for beginners. So that is what people call a steep learning curve or harder-to-learn. But it is good for readability in the long term because here you can't make a mistake. The intention is clear. It should be the same one.
I can't make subtle spelling mistakes. It's more concise. But in the end, it's a little bit more hostile to somebody who picks up the language because it is more things you need to know about the language. In all fairness, when you write the code, you can write terrain: terrain, and the compiler will certainly accept it. It is only when you read somebody else's code that you're thinking, what is going on here? And this is one of a fair number of examples where Rust has these nice shortcuts for the experienced developer that may be slightly confusing for people that have not programmed in Rust before.
So moving on, here's a very simple implementation. This is a method, if you will, in other programming languages. One thing that you might notice, oh yeah, what it does, it runs a number of simulation cycles of the World. What you have to do is you have to pass in self. This is a bit like in Python. Explicitly a parameter called self is passed in.
It's somewhere like Objective-C where it's hidden but it's also passed in. But here you have to be very clear. self is passed in. And that is actually the indicator what differentiates a function that stands alone from something that would be considered a method. For now ignore this modifier there, the &mut, we'll come back to it later. And then you see your second parameter is num.
Here, obviously the number of cycles that the simulation should run. And it is an unsigned 64 integer. Rust is quite specific about the number types. There's signed and unsigned then 64-bit and 32-bit. There's size_t and so on.
What we also see is a lot of modern constructs here. You see the underscore in the for loop. We're iterating up to the number and putting an underscoring here because I don't need the variable. If I would call it i, the compiler would warn me that I have a variable that is not used. And I can use the underscore to signal, throw it away. We also see with 0..num that ranges are a first-class type.
They're not just a random structure. They're really first-class part of the language. Now what I can see then in the next line I can just call another method, run my self, do_one_cycle. And that makes this a nice example because this is so easy to understand even if you've never seen Rust, this is very, very clear, I guess. So, let's get to a more complicated.
I'm gonna step it up to a point where it's gonna get uncomfortable to you. So this is a function here, a utility function that I'm using to count the cycles of a program. So I'm passing in the World parameters, remember I mentioned them before, that defined the world and simulation. And I'm passing in the program which is basically an array of instructions. These instructions, this is from a different part. This is actually from the parameters thing.
This is the definition. You can see a command like EAT that takes 10 of these virtual cycles. And the MOV command takes five. And at the bottom, you see branch food ahead. So if food is ahead, then branch in the program.
That costs one cycle. Interesting to highlight here is, this is of course hash map that I'm declaring. Rust does not have literal hash maps. Where does this come from? You probably didn't notice it in the dependencies definition that I showed you in the cargo.toml file, there was a reference to something called maplit, literal maps.
And Rust has a macro system. And the macro system is powerful enough to change the syntax to which you'd use a feature like this. So this hash map then invokes in some special way the macro system and it allows me to have now digital maps in the language when the Rust programming language doesn't have them.
This is actually really difficult for the IDE programmers. And as you imagine, both Visual Studio Code as well as IntelliJ don't really support this. They do some basic highlighting based on these types that they're seeing, but they can't understand it because it's not normal Rust. But again as a language that is designed for large programs also, this is like powerful.
These macro systems that we learned to love from other programming languages are really really good to create internal DSLs and all sorts of other extensions. Higher level obstructions in Rust is quite good. So going back to this. So this cycle_count function is meant to add, individually, the count of the instruction cycles that each of the commands take. What I'm showing you here in the next line is now more a functional side of Rust.
So prog, and you saw that I'm also getting used to all the abbreviations that Rust uses. So prog, I'm using an iterator. All what I'm saying, iterate over this. And then I'm using a simple left fold as it is known very theoretically in many programming languages.
So the iterator uses a left fold. I'm passing zero as the accumulator, as the initial accumulator, I'm starting with zero. And then the fold function, of course, will successively call this anonymously declared function for each of the instructions as is in the program array.
And you can see normal syntax. You can see the vertical bars. They declare the parameter list of the anonymous function. It gets passed in the accumulator of the current instruction. And then I'm saying I'm taking the accumulator from the parents. I'm getting the counter instruction cycles for the counter construction.
And of course with a successive call, this is adding everything up. And I didn't mention this earlier, Rust doesn't really use return statements. The return value of a function, in this case an unsigned 64, is simply the result of the last expression in the function body. And again, Rust has sugar. So not only syntactic sugar, but also different ways of doing the same thing. And of course, this is very common, what I did up there.
I don't want to program a basic left fold myself, I'm just mapping something. So now it gets easier. I'm using the map function. I'm basically mapping each instruction to its cycles.
And then at the end, you can see, I have another built-in function. That way I can sum up everything in this new temporary array that contains any number of instruction cycles. So there's a lot of convenience functions again and again. This is something that once you know it you'll love, but if you don't know it, you're probably cursing a little bit. So I talked a lot in the introduction about memory management and why this is drawing a lot of people to Rust.
And there's really something special because Rust neither has a garbage collector nor does it make you manage memory manually. And I was corrected by Brian Watts on this, they didn't invent it. It was known in academia but the people behind Rust were the first ones who actually made this a new concept, they're called borrowing concept like lending and borrowing commercially, or not commercially, but viable to be used for practical purposes. And how does it work? Rather than trying to shoehorn this in the application, I'm basically showing you the direct examples from the Rust book. The Rust book is a book that you can buy but it's also a website that you can read and it's quite good to learn Rust, it's a good resource.
What we're seeing here is, I'm declaring a string and this variable s now owns the memory hello, the string hello. And as long now as the variable is in scope the memory to hold the string is held in memory. The moment s goes out of scope, it disappears. The system will reclaim the memory when the variable s goes out of scope. So far, so good. What happens if I do the following? And whatever is highlighted in red will not even compile.
So what I'm doing here is I'm creating a variable s1 that owns the string. And now I'm saying let s2 = s1. So I'm assigning a reference to the string, the string to another variable. And what would happen if as one goes out of scope? Can I free the memory? Can I not free it? Should both of them have two out of scope? It's really not a good situation. So what the compiler decides to do or the designers of Rust have told the compiler to do obviously is that the ownership transfers. So after I've done this, s1 is invalid.
And then it would give you an error message from the last line when I'm trying to print s1. After the ownership has been transferred to s2, I cannot use s1 anymore. Now what happens when I call functions? if I was passing s1 directly into the calculate_length function, I would have an immediate problem. Then the outer piece of code that I'm seeing here would not have ownership of s1 anymore. I would have the length of the string but I wouldn't own the string anymore.
So that really sucks. And that is not a good way of approaching it. So what I can do is I can pass a reference into the function.
And for those of you who know C or C++, the ampersand symbol is used to create references as you would in C and C++. What happens in here now, inside the calculate_length function, s is now not a string anymore. You see it's a different data type.
It's a reference to a string. So at the end of that calculate_length function when that variable s goes out of scope, the reference to the string goes out of scope but the string itself was never owned by the calculate_length function. It remains owned by the outer piece of code. And therefore it remains in memory. For those of you who look confused, which I can't see because I can't see everybody on the screen, if you have not done more systems like programming in C and C++, this is really kind of what happens.
Except not really. So this is just for illustrative purposes. This is not how Rust works. This is not how C works. And it's definitely not our Intel CPUs represent memory.
But it's close enough to explain it. So what we're seeing here is a memory dump. On the left-hand side, you can see the addresses. Then you see in hex, the actual bites.
And on the right-hand side, you see the same bytes as an ASCII representation. So when I'm doing let message = get_message, now message owns this block of memory that I've highlighted here, these bytes. And then of course, on the right hand side, the ASCII representation of it.
What happens if I do this when I'm creating a ref to a message? What does that actually do? If you program in Java or any of these languages, there's no difference, it's the same thing. But here a reference is different from being the real thing. So what it really means is the reference here is only these four bytes in memory, 006 9674. And if you peek, that's conveniently the address of the blue box in memory. So the messageRef also takes up some space but it really only points, and that's what we call the pointers in C and C++, points at the location in memory.
That means when messageRef goes out of scope, only those bytes circled in pink, go away. When message goes away, the entire message, all the blue bits, go way. And again, this is not how it really works. They would not be on the heap and so on. It's a whole different thing if you know this stuff. But the idea I think is clear.
The ref is a very different... It does exist, it exists somewhere in memory, but it's a different thing than the actual bytes of the message. So, other things you can do with your evil programmer hat on your mind.
So I'm taking a reference to s and I'm passing into this function. I'm only giving change the reference, but I'm trying to change the string by pushing something to the end of it. The compiler will not let me do this because the outer function only gave the change function a reference but didn't allow it to change anything. In order to be able to do that, I need to be explicit. And this is another key concept in Rust. First off, I need to say that that string is mutable in the first place.
So you see let mut s in the main function. And then even when I'm calling the change function, I can't simply pass a reference to s, I have to say, I want to pass a reference to the string that the function that gets the reference is actually allowed to change. So I'm saying &mut. And after that, the compiler is happy. So what happens if we take this further? What if I'm creating to mutable references? Compiler will not let you do it.
Because we all know creating two mutable references, you can keep them around. You can pass them to a different thread. And the behavior of your program will get really, really difficult to reason about. So the Rust compiler will not let you do it. You cannot have two mutable references.
You can have two immutable references as is shown at the bottom. I can do r1 is a reference to s. r2 is a reference to s. Both of them are not mutable. And I can pass them to different threads because the compiler keeps track and makes sure that all the references that exist are mutable. That's really quite nice for performance reasons because you don't have to copy everything around.
But the moment I'm trying to create r3, the other mutable reference, the compiler says you cannot do that, I've already handed out immutable references. So the rule clearly is, you can have as many as you want immutable references or, and a strict or, one and only one mutable reference. This is quite cool, I mean, I didn't mention it earlier. There's a security ISO standard for the automotive industry.
And at the highest safety level, it says you can't even pass anything by reference. They say, you're not allowed to do it because they know the programmers always get it wrong and can't reason about the code. In Rust, the compiler can do this for us. And it would allow us to actually pass references because the compiler can check whether it's safe or not. And then, last but not least, the third problem that we often see are so-called dangling pointers.
So what I'm doing here is if you look at the function aptly named dangle, I'm creating a string s in here and I'm trying to return a reference to s. But the compiler is obviously smart enough to see that there's a problem. At the end of the dangle function, s goes out of scope. Hello, the memory would be reclaimed but I've returned a reference.
So if you think back about the previous thing, I would still have the pink bit pointing at the blue box that contains the memory but the memory is reclaimed. And this is the dangling point of problem that is a huge issue. And again, the Rust compiler can detect this and will simply stop you from using something like that. So that sounded easy, right? Let's have a look at some real code and show what that actually looks like. So here we have a function that I took from the Terrain class. That is the actual modeling of the terrain the creatures run around with.
This class is, sorry it's a method. And it passes in a function that is meant to be executed for each creature that sits in the terrain. What we can see here is there's this ominous F in there.
So I'm passing in the function that is mutating. And the type of the function is plainly F. What does that really mean? I thought we had said that Rust really is quite specific about the types. The thing, why they did this is because it gets too confusing.
It is similar if you know C# where you can also draw out these generic types and explain them in the next line of code or in the next phrase, if you will. So really what we're saying here is, the function F, sorry, the function called func that you're passing in here is a function that takes a mutable reference to the terrain, a mutable reference to the actual creature. It takes a tuple, they're also first-class, two coordinates in this case where the creature sits in the world. And the return value of the function is an Option of a tuple.
And the semantics here are, if the tuple is Some, it is the new position of the creature. If it's None, the creature has died. So what we're seeing here is I can pass in the function and I can declare quite specifically what the function does. So far, so good, I hope. So what happens when I'm invoking the do_with_creatures mutable function in Terrain.
This is not cold from the world. This is a small piece of code in the world that says process all the creatures. And what I'm doing here is, the world knows its terrain and I'm calling the do_with_creatures function. And I'm passing in an anonymous function here that adheres to the specification you saw on the previous slide. So it gets passed in a terrain, a creature and a position. And then it does things.
This is now being called for each creature, right? You could see in the first line it decreases the energy points the creature has to simulate some form of metabolism. And then the easiest check is, if the creature's age is larger than the creature max_age, then returning None, the creature dies. Or if the creature's energy points are zero, it also dies because it is starved.
What happens in this case, though, if you try to compile this, I'm not sure whether you would have guessed it, this does not compile. The compiler gives me this error message. This is the actual, actual error message including all the ASCII and everything else.
The compiler is really trying to be super helpful here with the ASCII art, trying to explain to me what has gone wrong. By the way the errors are numbered. You can see that at the top left corner, there's error 0501, that's the one that I'm triggering here. And the IDE puts a hyperlink in. It's quite cool, you can click on that and it will take you to a page on the website that explains the error message in more detail. Or if you so want to, at the bottom, you can see, it says, you can call the Rust compiler or rustc and get an deeper explanation of this error.
Well, what happened here? I mean, what does it try to tell me? It says closure construction occurs here. Or borrow occurs into another one. What is the problem we get, it's trying to do? This is the code we saw earlier.
What is happening here is, inside this function that I'm passing to the terrain, I'm in the world now, remember, I'm using self.params. But I'm also further up saying self.terrain. And the terrain can mutate. Which means it can change the state of the world.
Which means I shouldn't really pass out further references to world because I don't know whether the world will not be changed. And this is exactly what the compiler told me. The closure that I'm passing in requires unique access to self.
But it's already borrowed because from the moment I did self.terrain, I'm also borrowing the reference. And this gives you a glimpse into how complicated these memory rules can be. And I mean, of course I picked a complicated example.
But what I wanted to say is while the rules are simple, the impact on programming is profound. The solution is also simple. In my case, I know that the two things I wanted here, the maximum age or the parameters and the cycle, the number of steps the simulation is in are not changing while the closure is being used. So I can just put it out into the local variables up here. And now of course, the closure can reference local variables and I can use params and cycle and the compiler's happy with me. Because now the closure that I'm passing in has no reference to the world anymore.
It has local variables that the compiler can easily capture and if necessary could keep around as long as the closure lives. This sounds simple now. It took me a while to understand this and to get into the flow. And you might also curse the borrow checker at first. But it really, really provides a huge amount of value because it eliminates all classes of problems that you would otherwise struggle when your software is deployed in the field.
Here, you're always struggling when you're sitting in front of the computer programming. Which brings me directly to the next topic, parallelism. Remember, actually, let me go back.
I said, I wrote the simulation. And when you do simulations, you want to run many of them. In this case, tens of thousands of simulations. I didn't even know the wide world parameters.
In modern-day machine learning, you would call them hyperparameters. Like is the world a torus? Is it a square? What is the best max age? What are the instruction cycles for each thing? I didn't really know. It's a huge error, sorry, it's a huge hyperspace, sorry, a huge space of hyperparameters and I didn't know where they were or what were even viable ways to running these simulations. So what you normally do is you want a Monte Carlo simulation. You create certain records of values. You create random values.
You run the simulation with random values and see which ones work. So you want to run tens of thousands of simulations. And this is exactly where I stopped with this experiment when I did this enclosure, which will become apparent later. It wasn't fast enough. But the good thing was Rust was.
The machine that I ran this on has eight CPU cores. Which meant I could run this in parallel. I tried briefly to run one simulation in parallel. That didn't make much sense because the simulations don't take very long to determine whether they are viable or not, viable sets of hyperparameters. And the coordination overhead, because the creatures can see each other and so they were just way too high. So I'm basically just running one simulation on one core and another simulation on the second CPU core and so on.
And I needed to make no changes because the compiler had made sure that my software was thread-safe and no memory issues occurred. So this is how I did this multiverse function. First off, run_multiverse now runs many simulations at the same time. I'm passing in here, you can see this at the top, I'm passing in a function called the world function that when I invoke it, creates a new world. You can see it's a function that takes no arguments, no parameters, and always returns a World.
And here you can see, I can actually inline the declaration. I don't have to use an F and declare it on the next line because the signature of this method, or sorry, this function, is so simple, no arguments and just a return down. So what I'm doing in there then, I'm iterating over the number of threads and as many threads as I want. I'm spawning a new thread. You can see that in line 26.
What do I move, oh sorry, what do I pass into the spawn function? I'm passing an anonymous function. The double empty bars are nothing special. It's just an empty parameter list. And then in the curly brackets, you see the entire body of the function that I'm passing to the thread that I've just spawned. And what it does, it iterates over all simulations that I need to do.
So number of simulations is the total number of simulations. If I'm running them in many threads in parallel, I divide the number of simulations by the number of threads. And that gives me how many simulations each thread should run. And then I'm just saying, run the world with the thread number, the simulation number and the world function. And then of course the world function here has the round brackets which means I'm invoking it here.
So the world function creates a World and that is passed into run_world. As simple as that. What is quite cool is there's this keyword in the beginning called move.
And what that does is it transfers the ownership from the thread that does the spawning, the main thread, into the subthread that the actual functional runs in. So again, there's a very clear way of handling ownership in a scenario with multiple threads. Normally you would also create a communication queue between the two threads so they can talk to each other and can send messages back and forth. That again is the topic of an entire talk. But in this case, it is so simple. I don't need to do that.
The last thing I want to show you briefly is something simple but also something that shows you how nice the syntactic sugar works. So here thread::spawn returns a handle called h. And I'm storing these handles in a vector. A vector is similar to an array. has some subtle differences. But like in many languages, an array, vector, like serial collection, I can of course misuse as a stack.
So I'm pushing the handle onto this handles stack, if you will. So far, so good, right? What do I want to do at the end? At the end, I want to terminate cleanly. So basically I need to wait until all the threads have finished and I won't need to collect them. And when the last thread has finished, the program should exit.
And this is how I do it. So the handle that is pushed into the handles array in that line is later on popped from the stack. And here in this line, line 34, you see coming together, a fair amount of nice syntactic sugar.
So handles.pop. What is the signature of a pop method? Because it can do two things, right? It can return the top most element from the stack but somehow it also needs to be able to signal that the stack is empty, that I have taken all the elements off. In many programming languages it would return null as a signifier that the stack is empty. Remember from the beginning, Rust does have Option types. So what it will do is it will return a Some of a handle as long as there is something on the stack.
If there's None on the stack, Some of h won't match. But it deconstructs it. Destructures it, sorry, I'm using the wrong word here. It destructures it. And after that, Some of h, h actually is not the Option anymore, but the actual handle. If the stack is empty, the destructuring doesn't work.
The let doesn't work. And the while loop then knows it needs to finish. So you can see how this comes together. The pop thing returns an Option and the while loop can finish because it returns None. The let together with the destructuring works to allow me to access h without having to unwrap it. But there is another unwrap in the line.
It's a different artifact from something else. But I can use h immediately. This is something that we've seen in Lisp. Swift offers something similar with the while let.
But it's definitely something that I first saw in the Lisp family of languages. Where you can make basically a little bit of an if statement, but then assign a different name to something based on whether that if was true or not. Okay, this is basically all I felt I should show you in 45 minutes to 60 minutes about Rust as a programming language. Let me spend five minutes on closing notes, sorry, closing remarks about performance. What you're seeing here, and I'm not sure how well that works over Zoom, this is actually a screenshot or a video shot of the simulation of the Clojure version. And you can see the agents running around.
And the green bits are the nutritions that they can eat. And you can see them run around. And sometimes you can see new ones pop up. It's actually harder to see with this one here The Clojure version did about 110,000 cycles per second. That is like if one creature live one cycle, that counts as one. Not 110,000 cycles for all creatures.
So 110,000 per second. That sounds reasonably fast. But if you think about the number of creatures on this, if you think about the fact that you want to probably simulate a million cycles to make sure that the hyperparameters are okay, and then you want to run like 10,000 simulations in a Monte Carlo simulation, then that can become quite burdensome and it will take weeks to run this. And also the Clojure version didn't work that well across multiple CPU cores.
Mostly because I wasn't as disciplined with the code. Because the compiler didn't push me to be disciplined. So what I did then is, I basically transferred the entire program into Rust. And I did this really by copy pasting the acceptance test from the Clojure version making sure that they were then compilable Rust code and then implemented the acceptance tests. So I'm reasonably sure that when I did this comparison, the two simulations, the Clojure and the Rust one did actually do exactly the same thing.
But as you can see, the Rust version, at 3.5 million cycles per second, was significantly faster. This was roughly, based on the experience I had, what I was expecting. I hold the JVM in very high regard. I think it's a super good Java virtual machine. Sorry, a super good virtual machine implementation.
There's a lot of really cool things. I know of high-frequency trading applications implemented in the JVM. It is not a sloth, it is actually quite fast. But when it comes to something like that, of course, something that is compiled into machine code, will generally have the edge.
And we should also not forget, I was programming this in Clojure, and not in Java. In Clojure, I did this, I'm a reasonably competent Clojure programmer I would say. I did this in an idiomatic way.
Which means the agents, basically, were more or less like dictionaries. And in Clojure, things are immutable. So when I want to change the state of an agent, I have to copy parts of the dictionary. Clojure does that very efficiently. But obviously not at the level that you get from a language which can change parts in memory. One thing though, I realized at some point I should probably do a release build and not the debug build.
And this is then, I'm like what then? 25 million cycles per second. So this is the change I got from going from a debug build to a release build. And the compiler did some optimization.
And the cool thing here is, Rust uses the same underlying toolchain that also Swift and the C lang, well it's Clang basically, the C compilers use. So everything that comes after the translation into the low-level virtual machine code benefits from all of the optimization Apple does for the Swift compiler as well as all the other things that are done for C and C++ because it is the same compiler backend. Which gives us this optimization. Look at this degree of optimization and that level of performance.
The one thing I do want to highlight though, when I profiled, and that was actually not that easy, I'm really spoiled from Java when it comes to profilers. When I did do some profiling on the Rust code, I realized that it was spending 50% of the time in the random number generator. And Rust tries to be safe by default. And the random number generator that you use by default is producing cryptography-sound random numbers. I didn't need that to decide whether the creature turns left or right. I didn't need the same level of rigor that you need for cryptography.
So I changed to a simple, I think it was an XOR shift random number generator and brought down the percentage of time that was spent creating random numbers. That of course, made Rust faster. I didn't do the same optimization in Clojure. But I don't think it would have made a huge difference. Is this just a total (indistinct)? Is it just me, I can't run Clojure? Is it Clojure being super slow? I don't know.
I don't wanna go out here and say I have conclusive evidence that Rust is faster, but I want to leave you with another anecdotal piece of evidence. And this is from Bryan Cantrill. He writes DTrace.
That's a tool in Solaris also used on macOS to actually trace system calls. And he's wrote something. This is his blog post. If you followed the (indistinct) you get there. He had similar experiences to mine I guess.
He had some frustrations trying to implement this because of I guess, the borrow checker. And so he gave up on performance. He just said, I just wanna get it to work.
So what happened when he did this? Well actually let me go back. What he did is, he wrote a piece of software to generate statemaps for traces. And the example that he gave in his blog posts were 20, sorry 229 states and around 4 million transitions. The original version he implemented in Node.js took 83.1 seconds.
And that was unacceptable to him. And I always make this joke at this point. Whoever says Node.js is blazingly fast has generally never programmed anything but Node.js. So what he did is he wrote a hybrid.
So now, remember, he knew this code and that was the code Bryan was translating to us. And he also, like I, struggled with the borrow checker. I'm sure he saw all these error messages that I showed you. He was getting frustrated and said, let me just get this to work. And this is where it ended up. It was actually faster than the C version.
He dug into it. He's more like a low-level kernel type of guy than I am. And he looked at it and looked at the code. He said, how can that be? How can the Rust code be faster than the C code? And he saw something.
How the admitted machine code was making more effective use of the CPU. I'm not sure. I mean, I know a little bit CPUs, not that much. They're really complex beasts these days with branch predictions and things they can do at the same time. They can do at the same time and so on and so on.
And it really looked like the Rust compiler, the way the low-level virtual machine code that was emitted by the Rust compiler, it was possible to translate it into code that was more friendly, if you will, to the CPU. And then in the end resulted in a higher execution speed. Most of the benchmarks, and as we all know, benchmarks are a really, really tricky topic which is why I'm shying away from trying to make any absolute statements, most of the benchmarks show you that Rust really ends up at a performance level that's really comparable to C. And not this blazingly fast, something that I have a micro benchmark that shows something. It is really on a large scale comparable to C code.
While at the expense of a little bit of pain at compile time gives you much, much safer code in the end. So with all that said, thank you. I saw flying past in the in the Zoom chat some questions already. But what I want to do also. Today's a little experiment.
And I've prepared the following. I don't know whether you've seen it before. I've only learned about this a while ago, a tool called Slido. So if you wouldn't mind, take out your mobile phones.
Scan the QR code and open the webpage. It really, it doesn't download an app or anything. It's just brings you to a webpage.
And what it should do is, if you scan the QR code and open it, it should get you to a page that has two tabs. One called Ideas and the other one called Polls. And let's do the poll one first.
And what I've asked you to do here is, this list scrolls a little bit, and take the three programming languages that you use on a weekly basis. Which are the programming languages that you, who are listening to this webinar now, that you are using? And I will show you the results, obviously, in a minute or two. What you can also do in parallel when you're bored or when you don't care about the question that I'm about to answer, you can go to the other tab, the Ideas one and you can answer the question that is highlighted there. Which is, what do you want to do with what, with Rust? Why did you come to the talk today? Why are you interested in Rust? Do you have something specific in mind that you want to use Rust for? And we'll see. And as far as I've seen this in Slido, you can comment on each other's ideas.
I think you can even upvote each other's ideas. So this, to me, is an experiment. I've participated in a call or in a webinar where somebody used Slido. I've not used it as a presenter. So bear with me if I don't get this right. But I think it's a nice way of trying something different in an online space that we probably wouldn't do in a physical webinar.
Not in a physical webinar, I can't even talk, in a physical meetup, obviously. So with that said, I'm gonna stop sharing this screen, so you can see our faces for a moment being. And I will switch to the Slido one in a minute after those of you who do want to play with it have had a chance to enter something there. Maybe in the meantime, I can see here on my screen that document where Jenna has tried to proof the questions. And let me answer them because they're flying through. So in the getting ready part, there was a question that says, I tried to ask too and what I did not like is the slow build with Cargo.
Is there a way to optimize from your experience? I don't think there is a way to optimize it. Don't do a release build, do the debug build. That is faster because the compiler doesn't have to optimize. And in the app on the other hand, I mean, it is incremental.
So if you're doing test-driven development, the first time you get the full hit because it compiles all your dependencies and the transitive dependencies. But generally with the normal code base, it is my experience the compiler doesn't have to compile everything again but only small bits. It is a general problem that I know. I know that there's built farms for C++ even. And I know from a friend of mine who works in the gaming industry that he told me that they sometimes make changes that they know are wrong depending on which header file something is in because they can predict in their mind already what changing the header file would trigger as a recompilation step, and they choose something that is cheaper. This is a problem in these languages but I don't know a way to optimize it.
Other than buying an even faster computer I guess. How well would Rust integrate into Microsoft C/C++ libraries, for example to decrypt files? How well would that integrate with Java programs using a tool written in Rust? (indistinct) Integrate with Java. From Java I guess you would have to call it as a native function. How well does it integrate with C and C++? I can't specifically talk about Microsoft. I did this on macOS and Linux, and it was fantastic. I mean, obviously Rust was created by Mozilla for Firefox.
And they are not, you know, writing Firefox and Mozilla from zero. They are gradually moving the code base from C/C++ over to Rust. And the C interop is fantastic. In Rust, you can just call a C function as if it were Rust function.
You can even do the following. Rust has annotations. I didn't show you this. You can annotate language elements like the structure definition. And you can say, dear compiler, please lay out this Rust structure in memory like the C compiler would do it.
And that then means you can pass a reference, also known as a pointer, to a C function, and the C function can fill in the structure, which is then returned to Rust. So that worked really, really well. One thing I haven't tried is interop with C++ and Swift. Because both of them use, of course, some name mangling. The names that the linker sees are not the same that is in the source code.
I'm not really sure how Rust deals with that. If worst comes to worst, you could probably, I guess always write a little wrapper in C around it. But calling C functions was completely, completely painless for me. Of course, you're giving up all the safety guarantees. And in fact, in Rust, you have to write, there's a special keyword called unsafe. You have to write unsafe, curly bracket, then that code and then you close the curly bracket just to say, compiler forget about this.
Look away, I'm doing something here. Because the compiler can't