CS50 Cybersecurity - Lecture 0 - Securing Accounts
[MUSIC PLAYING] DAVID MALAN: This is CS50's introduction to cybersecurity. My name is David Malan. And this week, let's focus on securing accounts. You and I have so many accounts nowadays, be it for websites or apps or the like. And we'll focus today on exactly what the threats are to all of those accounts but more importantly, what some of the defenses are so that you can keep those accounts secure.
But let's first consider what we mean by security in the physical world, for instance. Whether you live in a home, an apartment, a dormitory or the like, odds are you have a key that lets you into that building. Now, that key, of course, lets you in through that locked door, and then you have access to the entire habitat. But the catch is that if someone else gets that key, of course, they, too, can let themselves into that system or into that same building. So let's consider now in the digital world, though, what some of the building blocks are of security so that we can focus exactly what those threats and what those defenses are. So first, allow me to propose that we think about the security of our accounts in terms of authentication.
So authentication refers to this process, digitally, of proving who you are, that I, for instance, am David. But that alone isn't necessarily enough to keep a system secure because just because I'm David doesn't necessarily mean I should have access to your entire home. Perhaps I should not have access at all. Perhaps I should just have access to the entryway or some narrower form of access.
And so there's this related topic when it comes to the security of locations or systems known as authorization. So authorization speaks to whether or not you should have access to something, once you have proven that you are, that I am David and that I should, in fact, have access to the door that I just walked through. Now, when it comes to our accounts in the digital world, we, of course, don't use physical keys, but very frequently nowadays we use usernames, which, of course, can be public. It might be a username like David.
It might be a username like Malan. Or it might be more commonly, even an entire email address that presumably uniquely identifies you in the world. But even though that's public, the thing that you and I ideally keep private is, of course, our password. And nowadays, you and I must have dozens, maybe even hundreds of passwords that are hopefully distinct and not reused across all of those different websites, but more on that in a moment.
And so it's really this password that ultimately allows you to authenticate yourself, demonstrate who you are because presumably I am the only one in the world that knows, not only my username or email address, which all of us can know, but presumably I'm the only one in the world that knows my username and this here password. And so the presumption is if I type in both of those values to some app or some website that I must, in fact, be David Malan, in that case. Of course, it's not good enough to just have a password. You need to have a good password.
Now, what do we mean by good password? Well, ideally, this password is not going to be in a dictionary, like, literally a dictionary of English words or whatever your human language might be. Why? Well, there's this threat known as a dictionary attack. And by this I mean an adversary, a hacker that wants to get into your account, they could just start typing randomly to try to figure out what your password is. But they're a little smarter. They'll actually use a dictionary attack.
That is they'll open a physical book of words, or more likely they'll open a file on their computer containing a whole lot of actual English words or in some other human language, and then just one at a time, try this word as your password, this word as your password, this word as your password and so forth. Because if you and I have chosen a pretty guessable password, one that is an actual word in a dictionary, they're going to get into your account much faster. But even if you and I are clever-- and odds are by this point in life you know that you shouldn't just choose a simple English or some other language word, but rather you should probably have some numbers, some letters, some punctuation, or the like-- you're still vulnerable, as am I, to what we would call a brute force attack.
Brute force sort of evokes the memories of yesteryear where someone might have had a big branch of a tree using as a battering ram trying to get into the castles from past times. But brute force attacks in the digital world mean something analogously, whereby you're using software to digitally try all possible passwords. And so here, too, are you vulnerable because if your password is too short, even if it's random with letters, numbers, and symbols, odds are an adversary or a hacker that has enough time and enough technical savvy, they can just try every possible password in the world.
And eventually, they might very well get into your system. So how do we go about defending against these kinds of attacks? Well, we use these passwords, but these passwords of course come in different forms. And it's kind of a low bar that is set by default on a lot of devices still nowadays. For instance, on your phone if you'd like to chime in here in the chat, how many characters or digits, in particular, are typically required of systems? Well, I would conjecture that very often when I set up a phone, I'm only asked for a passcode, a numeric password, of four digits alone. Now, if you have a four-digit passcode or password more generally, how secure is that? And how do we even go about thinking about how secure that password is? Well, I would propose that we could start to measure not even using fancy math, but just some basic heuristics, we could measure the security of a password that has just four digits by considering well, how many possible four-digit passcodes are there? So perhaps if you'd like to chime in here in the chat, how many possible passwords are there if they're all digits 0 through 9, decimal digits, and if you only have four of them? How many possibilities are there? I'm seeing 1,000. I'm seeing 10,000.
I'm seeing 9,999. And I'm seeing a whole range. And I think a lot of you have the answer is spot on. It's 10,000. It's 10,000. Why? Well if we just think about this numerically, if I've got four decimal digits, 0 through 9, well, the smallest password I could come up with, so to speak, would be 0000.
And the largest password I could come up with would be 99999. Now, you might think OK, well, that's, obviously 9,999 possibilities. But not quite because if you include 0000, that's the 10,000th possibility. So indeed there's 10,000 possible passwords if we're using four digits specifically. So how do we actually think about that more generally, especially so that we can now figure out the math for larger passwords as well? Well, if you've got 0 through 9 as the first possible digit, and 0 through 9 as the next, and 0 through 9 is the third, and 0 through 9 as the fourth, you have 10 possibilities times 10 possibilities times 10 times 10. This, of course, if we do it out more mathematically is 10 to the fourth power, the exponent being 4.
And that, of course, gives us 10,000 as well. So that might be the more mathematical way of approaching it, versus just the more intuitive, that 0000 can go all the way up to 9999. Now, again, a question for the group, how long do you think it might take for an adversary or a hacker to get into my device, for instance, my phone, if I do have a four-digit password? If I've got a four-digit password, this means there might have to try as many as 10,000 possibilities because in the easiest case, sure, they get lucky, and my password is still the default 0000. But in the worst case, I chose 9999, and they don't get to that until the very end of their attempts. Or maybe I choose something there in between. I'm seeing 10 seconds, less than a second, milliseconds, 10 seconds a day, 4 hours.
So the responses are all over the place. So how can we go about actually measuring this or estimating this? Well, let me propose this. I'm going to go over to my computer here. And even if you've never written any code before, let's go ahead and write some code together here. I'm going to go ahead and open a program called VS Code, Visual Studio Code, which is a free program that we use in CS50 more generally that allows me to write code on my Mac or PC or really any internet-based device. And I can actually write code, and not only write it, but run it.
And I'm going to write code, in this case, in a language called Python. And this is just a very popular language, but I could use any of a dozen or more different programming languages. And the goal here is not to learn Python-- for that, we have whole other classes-- but to just demonstrate what an adversary, what a hacker need to do if they want to get into, for instance, your iPhone or Android device or anything that has just a four-digit password. Now, my presumption here for demonstration sake is that I'm going to go ahead and write code that just prints all possible passwords on the screen.
But you could imagine if I had a USB cable or maybe a lightning cable, I could connect this phone to this laptop, especially if it's your phone that I just swiped from a table, could quickly plug it into my computer here, run the code that I'm about to write, and maybe automatically send all 10,000 possibilities to your device before you even realize the phone is gone. Now, here's how I'm going to do this. I'm going to go ahead, and in a text file called crack.py, where crack is actually a term of art. It just means to figure out what a password is, to brute force your way in.
I'm going to go ahead and from a library called string. I'm going to go ahead and import digits. Now, this is a very easy way of just giving me access to the numbers 0 through 9. I could obviously type them all out on my keyboard. This is a little faster because this gives me like a list of the numbers I care about. Now, there's a bunch of different ways I can write this code.
But what I really want to do intuitively is try all possible digits for the first value, try all possible digits for the second, then for the third, then for the fourth. So one way of doing this might be as follows. I'm going to use a keyword in Python called for, which just means do something for as long as I want you to. And then I'm going to give myself a variable, like in math, just so I can use something to keep track of each number.
And I'm going to use a default value of i for integer. And then I'm going to go ahead and say that for each value i in those 10 digits, I want to go ahead and do the following. Well, for each of those i digits, for the first value, I want to do for j in digits as well. And then for each value for my third placeholder, I might do something like for k in digits. And then lastly, I might do for l in digits.
So this is admittedly not the best design. And those of you who've programmed before are probably cringing that I have this indentation, indentation, indentation. But it's a simple way of demonstrating, especially for those unfamiliar with programming, how we can try all possible first digits, all possible second, all possible third, all possible fourth. And all I'm going to do, bury inside of this code now is print out the value of i, j, k, and l so that iteratively, we should see on the screen 0000 and then all the way up to 9999. So if you assume that I've connected my phone to this laptop, ideally, then, we'll have an estimation of how long it might take until we actually have cracked into the device.
So let's go ahead and do this. I'm going to open up a separate window on my screen here called a terminal window. And I'm going to go ahead and run Python of crack.py. So in just a moment we're going to see is it going to take a few minutes, a few milliseconds, a day, four hours, or-- here we go.
1, 2, 3, go. So those of you who estimated just a few milliseconds were spot on. So what's the takeaway here? Well, apparently using a four-digit password is not very secure at all because look how quickly I, the adversary, the hacker in the story, was able to get into your phone. And in fact, I could probably unplug it at that point because I've gotten whatever data I care about off your phone. And you might not be none the wiser. So how can we go about improving upon this system? Well, let me propose that instead of using a four-digit passcode, let's use four letters instead.
And we'll use English because that's what I speak well. And in English, we have 26 letters of the alphabet, A through Z. But you know what? That might give us initially 26 possibilities for the first position, times 26, times 26, times 26 for the second through fourth. But let me propose that we actually use lowercase and uppercase letters. So that gives me not 26, but 52 possibilities for each location.
So if I do 52 possibilities, that's 52 to the fourth power. And does anyone want to estimate how many possible passwords there are if I'm using four English letters now, uppercase or lowercase? I'm seeing 26 to the fourth power. But that's not right if we're using uppercase and lowercase. It's indeed 52 to the fourth power.
And I'm seeing "a lot." But here we have estimates along the lines of indeed 7 million as well. So with 7 million possibilities, you might think, OK, surely, that's going to be a lot better.
And it's going to take the adversary a lot longer to hack into this phone. But let's try that. Let me go back to my terminal window here. Let me reopen now my code file, and let's go ahead and use not digits, but let's go ahead and use ASCII letters. For those unfamiliar, ASCII letters are simply the letters A through Z in both uppercase and lowercase. Now, here I have to go ahead and change this from digits to ASCII letters, from digits to ASCII letters, from digits to ASCII letters, and lastly, from digits to ASCII letters.
Again, there's an easier way I could implement this code to be more succinct and less duplicative, but it involves some features that we'll introduce in another class altogether. But now I have all possible ASCII letters from my first placeholder to the last. Let's go ahead and open up that same terminal window. Let's run Python of crack.py. And here now is the answer to how long might it take an adversary to get into your phone if you're using four letters of the English alphabet for your password instead.
So this time, I have enough time to walk all the way over to the screen here. And you can see that we're going in alphabetical order, first lowercase, now uppercase. But in just a moment, we are done.
And we're down all the way to ZZZZ. So that was a few seconds, which is indeed slower, but that really wasn't that much effort at all. So presumably, then, even four letters of the alphabet might not be enough to keep us secure. So let me go ahead and do what we all are told to do anyway, which is to go into your phone or whatever device in question and actually use four characters perhaps instead. So not just letters, not just digits, but let's toss in some punctuation as well. And in the world of punctuation, at least on a US English keyboard, there's typically as many as 94 possibilities for letters, numbers, and punctuation because we have 26 lowercase letters, 26 uppercase letters, 10 decimal digits, 0 through 9, and another 32 punctuation symbols that we can add into the mix as well.
So that gives me 94 possible keys that I can hit here, or 94 to the fourth power. And does anyone want to estimate what this is? We've gone from 10,000 to 7 million to I'm seeing it in the chat, roughly 78 million possibilities, so in some sense, 10 times more secure. Let's go back to my terminal window, open up my code here, and import not only ASCII letters, but also the digits from before, and also, this time, some punctuation as well.
Now let's go ahead and change just the ASCII letters alone to ASCII letters plus those digits plus that punctuation. And just to save some time, I'm going to highlight and copy what I just typed, and I'm going to change the second position, the third position, and the fourth position, as well, to use that combination of 94 possibilities. I'm going to open up my terminal window again. I'm going to run Python of crack.py. And this time, because we have 10 times as many possibilities, I kind of had 10 times the amount of time to walk over to the screen because indeed we're still in the lowercase Es, Fs, Gs, Hs.
And now it looks a bit like a Hollywood movie maybe. You can perhaps see, even though it's going across the screen fast, that there's a lot of cryptic output here because we're running through all of the letters, the digits, and the punctuation. So it looks a little fancy at that. Now, you might recall, too, from Hollywood movies, too, that they tend to be very dramatic.
And so instead of just doing this, which is iterating from left to right very slowly, the movies and TV shows tend to very dramatically get, like, the third character right, then the first character right, then the fourth character right. And then just in time, you get the second one as well. That's not really ho brute force works. You tend to do things methodically, not jumping around from symbol to symbol. But this is clearly taking a long time.
And I'm not even going to finish waiting for this to go because we still have to get through all the punctuation. So this is to say, ultimately, that 78 million possibilities is actually getting up there pretty fast. But honestly, if we come back in like a minute or so, I bet that will be finished nonetheless. And none of us hopefully has a password that's only four characters nowadays, letters, numbers, and punctuation. Odds are it's at least a conventional eight characters.
And indeed most websites and apps require as much of you as well. Now, the math here is pretty straightforward too. If you have 94 possibilities, but you have eight characters in total now, now that's 94 to the eighth power. And does anyone want a ballpark just how many passwords are possible if it's only eight characters, which isn't even that long but you have eight of them total? Too many, comes back one answer. Too much to count.
I see that we've given up here. But, oh, I did see one in the chat. It's roughly this many possible passwords, which is actually a little hard to figure out. So this is, let's see, millions, billions, trillions quadrillions. So this is 6 quadrillion possibilities. So now we're talking.
Now the adversary is probably going to run out of time, run out of energy, run out of money, run out of lifetime if it's going to take this much time to try to crack, so to speak, your particular password. And so here's one of our first takeaways when it comes to cybersecurity and securing our accounts. It really is this game of relativity and resources.
What we're really doing here is not something fundamentally different by adding in digits and letters and punctuation. It sort of still the same formula, the same approach to our passwords. But as we add complexity, and as we make it longer and longer, we're raising the bar to the adversary.
Why? Well, so long as you and I don't do something dumb like still choose 0000 0000 so long as we're choosing something that's pretty random in that range of 6 quadrillion possibilities, it's going to take the adversary way more time than it would otherwise to brute force their way into that password. And so by the time they finally get into the account, you might have changed the password already, you might not be using the account anymore, or you or the adversary might not even be on this planet anymore. And that's indeed the goal. But there is a downside, of course. The longer your password gets, and the more complicated it gets, the more likely you and I are to not even be able to remember what that password is.
And so here is that sort of balancing act, trying to figure out this balance between the usability of the account, just how user friendly it is to access, versus the security of that account. And finding that inflection point is somewhat personal or somewhat corporate in policy, typically. Well, let me pause here and see if there are any questions now on securing our accounts via passwords alone. AUDIENCE: I was wondering. A couple of years ago, there were devices, USB devices, with fingerprint recognition.
How come that's not more frequently used? Or are they too expensive or-- DAVID MALAN: A really good question, and we'll come to this topic in a little bit on biometrics more generally. But your intuition is pretty much right. It's expensive to have another device. And most consumers are not going to bother wasting money on something just for them. Some companies might. But if they have a lot of employees, that could get very costly.
But you might be glad to know that one of the topics we'll end on today is a new technology called passkeys that actually leverages a device you most likely already have, a phone, that might use your fingerprint or might use your face or some other form of biometrics. That's becoming more common nowadays, or soon will, even for your laptop and desktop which will talk to that phone, in some form. How about one more question here on securing accounts.
AUDIENCE: My question is, why if four-digits password is so unsafe, why is some program still using this password not a website, like program? DAVID MALAN: Oh, really good question. Why are some programmers using this? So it's a trade off between usability and security. If you are the programmer designing the system, you presumably want users to use the system and to come back and to keep using it. But if you make it too hard for them to access that account, if you increase the probability that they're going to constantly forget their password, lose their password, they might just stop using your system or your app or your website altogether.
And so that's probably not a good thing. Other reasons might include just unawareness or not having taken a class on cybersecurity or not having really thought through the implications of having such short passcodes. So nowadays is industry's starting to nudge us in better and better directions.
But we'll see today and the rest of this class that there are still going to be a lot of trade offs, again, between usability and security. So what can we do to defend ourselves against these brute force attacks? Well, at least here in the US, there's an organization called the National Institute of Standards and Technology, Otherwise known as NIST, that actually issues recommendations for how we as consumers or companies or more generally, humans can go about securing their accounts more effectively. And we thought we'd share just some of these recommendations so that it informs not only your own behavior as an individual citizen or consumer, but perhaps if you're in a place of business where you can influence your own company's policies, here are generally what are considered best practices nowadays. So a quote from their recommendations, "memorized secrets shall be at least eight characters in length." So that at least corroborates the quick math and the test that we ourselves just ran, that only once we got up to, like, 6 quadrillion possibilities did it feel like it was going to take a very long time to actually hack into someone's device.
So consider that for your own accounts, even on your phones. You might have to go through a few menu options to upgrade from just four digits to something more. But odds are you'll benefit from this additional layer of security.
This one's more of a mouthful but helpful as well, "verifiers," so the website or app that's verifying your input when you authenticate with your username and password-- "verifiers should permit subscriber-chosen memorized secrets of at least 64 characters in length. All printing ASCII characters as well as the space characters should be acceptable in memorized secrets. Unicode characters should be accepted as well."
Now, if you're not yourself a computer person, there's a bit of jargon within this. But first of all, the takeaways are that websites and applications should let you and me come up with passwords that are actually as long as 64 characters. Now, that's pretty long, but that's exactly the point, particularly as it's gotten more difficult for you and I to remember all of our passwords, to come up with very complex passwords. The reality is you and I might very well be better off, on the whole, by just choosing an easier-to-remember but much longer password, for instance, a sentence, a quote, a phrase that you can more easily keep in your human mind but that doesn't necessarily have a crazy amount of punctuation or digits or letters, but at least is 64 characters in length. So even if an adversary tries a dictionary attack, trying all possible English words or some other language, even if they try a brute force attack, it's going to take them way too long unless, of course, you do something foolish like choose a 64-character passcode that's 000 or so forth. So you still want to be original within that space.
Now, more technically, this recommendation is referring to ASCII. ASCII generally refers to US English symbols on a US English keyboard, as was the origin of this code system. So that includes A through Z, 0 through 9, and the punctuation I alluded to earlier. But websites and apps nowadays should also support Unicode, including things like emoji and other accented characters or symbols that you might have in languages beyond English. Unfortunately, this is not really common practice, I dare say. Just yesterday, I created an account on a new website for the first time, and it made me jump through hoops, so to speak, figuring out the right number of uppercase letters, lowercase letters, punctuation, and even then it told me I can only use certain punctuation.
So I had to think about now which symbols I'm using. That is a lot of friction that is not good for usability. And it's of questionable value for the security of the system if I can't even remember the thing afterward.
So keep this in mind, in general, that your password should not only be eight characters, minimally, but most apps and websites you maybe yourself develop moving forward should allow much longer passwords as well. And you as the human can use longer passwords if systems allow them as well. Now, here's another set of recommendations from NIST. "Verifiers--" the website or the apps that we're using-- "shall compare the prospective secrets--" the passwords you're choosing-- "against a list that contains values known to be commonly used, expected, or compromised." So that is to say when you type in a password, if it's already been a commonly used password, if it's very easily guessable, the website or app should probably say, uh, pick a better password than that just to decrease the probability that an adversary is going to get into that account. Specifically, NIST recommends that "passwords obtained from previous breached corpuses--" which is a fancy way of saying if some website, some database has been hacked, and that database contains usernames and passwords, and those passwords have now been uploaded to the internet for adversaries or anyone to download and browse, well, then you should not be allowed to pick a password from that list because it's essentially an alternative dictionary.
It essentially is a list of passwords that a smart adversary should just start with before they even bother resorting to brute force, which we've seen would take much more time. Two, dictionary words. So this we've already stipulated would be a good thing to avoid because there's just much too easy for an adversary to go through a big list of English words or some other language and try those first. Three, repetitive or sequential characters, aaaaaa or a slightly more creatively but not good enough, 1235abcd. I would also add 0000 and so forth into that category as well. It's just too easy for the adversary to guess that maybe you're doing something repetitive like that too.
And then lastly, context-specific words, such as the name of the service, the username, and derivatives thereof. This is to say, if you sign up for a Gmail account for the first time, you should not be allowed by Google to choose a password like "Gmail password," quote unquote. If you sign up for an Amazon account, you should not be allowed by Amazon to have your password be "Amazon password" or some such variant thereof because smart adversaries are going to try those same heuristics. And that's the catch too. If you can think of it, even if you think you're being clever, odds are a just-as-clever adversary can think of that heuristic as well and prioritize those tricks before they resort to brute force, like I did on my own laptop.
A few other recommendations as well. "Memorized secret verifiers shall not permit the subscriber to store a hint that is inaccessible to an unauthenticated claimant. Verifiers shall now prompt subscribers to use specific types of information, for instance, 'what was the name of your first pet?' when choosing memorize secrets." So there are a lot of companies, a lot of websites, a lot of applications that violate this recommendation nowadays. And in fact, I bet you can think of one or more accounts that you have where you've had to tell them, for instance, what was the name of your first pet or your first car or your mother or father's name or the like.
That's not good to collect either, nor is a hint a good thing because frankly, you and I are all too often in the habit of maybe typing into that hint field, if it's available, a question that's meant to help you remember what your password was. But if your hint is something like, my password is the name of my first pet, well, you're now just leaking information to the world. And anyone who can go online and figure out that information can now get into your account as well. And so that's really the threat, in this case.
If you start using personally identifiable information in this age of social media and websites like LinkedIn and the like, there's just so much information out there about us that can be discovered. You don't want to fill all of these databases, all of these systems with each of these tidbits about you because a smart adversary with enough time and enough focus on you can probably figure out all of those same values. So what more? "Verifiers shall not require memorized secrets to be changed arbitrarily, for instance, periodically." So this one, too, is something that a lot of companies violate as a recommendation still. If you're in a corporate workplace, in particular, and you're being required by the system administrators to change your password every month, maybe every three months, six months, every year perhaps, that's not generally recommended anymore, even though not too long ago, it did feel like, sound like a best practice.
But why is this not recommended anymore, to have you forcibly change your password periodically, like every few months? AUDIENCE: Because the password will be easily forgotten and be more vulnerable to brute force attacks. DAVID MALAN: Why would it be more vulnerable? AUDIENCE: Because the hackers can get access to all passwords and get hints about the new password. DAVID MALAN: OK. So yes, one danger of forcing you and me to change our password too frequently is that you and I do not tend to exert much effort when we're required to do so.
For instance, if my password today is, for instance, password 1, well, you know what my password might be in three months? Password 2 or in another three months, password 3. You and I might exert the minimal amount of energy to change the password so that we meet the company's requirements but so that it's not too hard for you and me to remember what the new password is. And so indeed, if information about my past passwords leaks out and some adversary sees, oh, well, wait a minute, your password was password 2 last month, I'm just going to guess heuristically that this month it's password 3. We might indeed be leaking information. What's another reason that you might not want to require humans to change their passwords arbitrarily on some schedule like this? AUDIENCE: We keep forgetting our passwords, too, if we change it too frequently.
So that is not a good practice for a website. DAVID MALAN: Exactly. If you make me change my password too frequently, honestly, I'm probably going to forget what my next password is because I'm going to get confused with last month's or the previous months or the like.
And so there's these sociological effects on us humans, just being human, not being very good at remembering not only the first password you made me choose that's very complex, but the second and the third as well. And so generally, you should not come up with such a scheme anymore because of these adverse side effects. And how about one more recommendation here? "Verifiers shall implement a rate-limiting mechanism that effectively limits the number of failed authentication attempts that can be made on the subscriber's account." Now, what do we mean by this? Well, odds are this is something you yourselves might have experienced if, for instance, you forgot your password or you kept typing it slightly wrong.
Maybe your phone screen was wet so it wasn't registering your fingertips properly. It turns out you can lock yourself out of your own phone. And you might have, in fact, seen something like this on iPhone. Android has a similar screen, if, for instance, you type in the wrong passcode 10 times in a row. The presumption, by Apple and Google and others, is that if after 10 times you still haven't inputted your password correctly, it's probably a higher probability that you are not David, that you are not you, but rather it's someone else who has taken your phone, stolen your phone, and is trying to get into it. Now that's not always the case.
You can imagine situations where you just were being absent minded. You were half asleep. Maybe you weren't really focusing on it, and you locked yourself out. And so there, too is, again, a trade off between usability and security. But the higher probability event after 10 wrong attempts probably tends to be that it's an adversary trying to get in and not you.
But what's the point of this? Beyond annoying the adversary and maybe more significantly, really annoying you when it happens by accident, what this effectively does is it slows the adversary down. In other words, it increases the cost of this attack to the adversary. Why? Well, we saw a moment ago that a smart adversary who knows a little bit of Python code and steals your phone can try 10,000 possible passwords in just less than a second. However, if you now slow them down by having this feature on your iPhone or Android device that pumps the brakes, so to speak, that lets the adversary try no more than 10 at a time, that's significantly slows them down. Now they might have to spend at least 10 seconds, 20 seconds, an hour, a day, or longer, especially since what Android and iPhone also do is they tend to increase this time limit.
The first time you mess up, it's 1 minute. If you mess up another 10 times, it's now 2 minutes, maybe 5 minutes, maybe 10 minutes. Maybe the phone even deletes itself, wipes itself, if that, too, is a feature that you or your company has enabled. So again, the right way to think about this, beyond the usability trade off, is that we're just trying to raise the bar to the adversary. We're trying to make it more expensive, more costly, maybe more risky to the adversary by slowing them down.
And by more risky I mean if this is like a Hollywood moment, and someone's just stolen the phone from your table at Starbucks or a coffee shop, they've plugged it into their laptop-- they're trying desperately to crack into it before you come back to the table-- well, by slowing them down, it's going to significantly increase the risk, too, that they are the act while doing it. And hopefully, too, the goal is to just get them to lose interest in your phone, lose interest in your account, and have them move on to ideally no one else's, but at least, barring that, someone else's instead of yours. So what are other defenses against these kinds of brute force attacks, or even these dictionary attacks? Well, this is a system that you and I are increasingly being able to turn on, but also increasingly are being required to turn on as well, which is, in general, probably a good thing, 2-Factor Authentication, or 2FA more generally known as multifactor authentication, is a technology whereby in addition to having one factor that you use to log in, like your password, as is tradition, you also have a second or maybe more factors that you additionally have to use in order to log in. But these factors don't just generally mean one password, two passwords, three passwords, or the like.
They're fundamentally different types of factors. And in general, they're broken down into these three categories. One is a knowledge category.
A knowledge factor is just something like your password that ideally you keep secret, no one else knows, and that's why it enables you to authenticate yourself, prove that you because you and only you, hopefully, have that knowledge. But a second type of factor would be a possession factor, something that you physically have. So you might be in the habit at work of carrying around one of those little key fobs that has a little code on it that changes. Now, those things can be expensive. So increasingly the world is just using our own phones, your own Android phone, your own iPhone, that maybe has SMS support on it, text messaging, or maybe a specific app that displays a short code that you type in.
The presumption is if that you challenge the user not only for a knowledge factor, like their password, but a second factor, like something they possess, you significantly decrease the probability that an adversary is going to be able to get into that account. Because whereas anyone on the internet, millions of people can be a threat to you by just figuring out or finding your password, a possession factor really narrows the scope of the threat to only the other customers in Starbucks or the coffee shop, only the other people physically near you because they would have to physically obtain that second possession factor. And then a third type of factor nowadays might be an inherence, something that is unique to you specifically, more generally described as biometrics.
So maybe it's your fingerprints. Maybe it is your face nowadays. Something that's inherent to you can be a third factor nowadays because the presumption is that only you, ideally, in the world have exactly that factor as well.
Now, this is a little different from what some companies, some websites, some apps describe as two-step authentication, where two steps might actually just be two passwords of some sort. But two factor more technically refers to two or more of these types of fundamentally different factors, that being the most common in our case here. Now, when it comes to those possession factors, those key fobs or the apps or the codes that you receive, specifically what you're receiving in those models is generally known as a One-Time Password, or OTP. The idea being that this is not a password that you know and keep remembering and keep using again and again. It's literally one time because it's texted to you, or it's sent via an app via push notification, or it's actually sent to something on your keychain, like this here key fob, for instance. Nowadays, your company can buy these.
And what happens is on the screen here, this one-time password constantly changes every few seconds. And it's synchronized somehow with a server so that the presumption is if I am carrying around this device, and I type in when prompted, this particular code, and that code matches the synchronized code that's on the server, I should be allowed into the account because the presumption is that it's indeed David carrying this around and not necessarily some adversary. It might also be possible to plug it in as via USB or some other technology, thereby removing the human from the formula so that the device itself can just authenticate using special software on the system instead.
Nowadays, though, you can download special apps, whether it's one from Google or other companies, that allow you to manage, all in one place, all of these one-time passwords that you might automatically see updating on the screen. And you can type any or all of them in when you're actually prompted by a website or app. But even in this space of one-time passwords and possession factors, it's worth keeping in mind that some of these technologies are more secure than others. Now, it's very common for websites or apps nowadays to want to send you one of these one-time passwords via text message, for instance.
And you receive it via SMS. And then you can type in that six-digit code, as is often the case. More secure, though, would be something like an actual app that you install from the App Store or the Google Play store that actually talks directly to some server and does not just go over the cellular phone network. Why is that? Well, as you might know, in your phone is typically a SIM card, either a physical card, a little chip, or nowadays it might actually be built into the phone as well.
But that SIM card has a unique identifier. And when you sign up for phone service, typically, with a company, they need to know what the unique identifier is of your SIM card, be it something physical or something wired, hardwired into your device. Why? Because that's how they associate your phone number with that specific device.
The catch is that it's all too possible, and in some cases, all too easy, to trick, even the phone companies, into swapping your SIM not necessarily doing it physically per se, but convincing the mobile phone carriers to update their system to say, oh, David now has this SIM card and not that original one. That is to say, if I'm an adversary and I just have any old phone with any old SIM card, and I figure out what the unique ID is, and maybe I call up David's mobile phone provider, and I somehow convince them that I am David by tricking them into believing it's me, as by telling them all of that personal information about myself, I might be able to convince them to swap my SIM card, the adversary's, with what is already on file. The implication of that is that when David subsequently gets text messages, they don't actually go to me, the real David. They go to the adversary's phone as well because they're tied to that SIM card. So in general nowadays, if you have a choice, using some website or app to use SMS or text-based messaging versus a native application that you install onto your phone or other device, you should generally prefer the latter, some first-class piece of software that actually uses push notifications or your data plan and does not rely on SMS text messaging because of this potential threat.
So what are still other threats when it comes to these systems? Well, it turns out that it's very possible, unfortunately, for adversaries to somehow get software, malicious software, otherwise known as malware, onto your Mac, onto your PC, perhaps even onto your phone . This might be because you installed a piece of software that you shouldn't have trusted. This might be because your phone or your device is infected with something like a virus or a worm.
But in general, you might be vulnerable to malware, including software that logs all of your keystrokes. Key logging refers to exactly that, some piece of software that most likely maliciously is literally recording everything you type or everything you tap into that device. And what is this software do with those keystrokes? Very often it will upload them.
If there's an internet connection, maybe to a server that the adversary controls. Now, what's the implication of this key logging threat? Well, if you're typing in your username, that's not such a big because those are generally public. But if you're typing in your password, and that's being automatically uploaded to the adversary's server, now they know your username and your password. Worse, if the adversary also sees you typing in that six-digit code, your one-time password that you might have received even from your phone or some other device, if they are fast enough and smart enough and figure out how to log what you're typing, send it to the server, perhaps even before you yourself hit Enter, maybe they can use not only your username and your password, but even that one-time password by pretending to be on their own phone or their own laptop or desktop, typing in or more realistically, automatically through software typing in those same values and accessing your account even before you had a chance to do so as well.
Now what are the defenses against that particular threat? Really just to be generally paranoid about what computers you yourself use. For instance, nowadays I will rarely, if ever, actually use an internet cafe's computer or even a lab computer here on Harvard's campus, or frankly, even a friend's computer because I don't know just how safe they are when it comes to best practices using their device on the internet. I will in general only log into websites and apps on my own personal devices, which isn't to say that I, too, am perfect, but rather at least I'm reducing the probability that I lose control over my data by using some other device that I myself don't oversee by that person, that owner, not themselves adhering to best practices. Now, even then, I will admit, that using key logging and getting it up to an adversary's server and inputting it faster than you might is a pretty sophisticated and difficult threat. But it's worth keeping in mind and realizing that these are certainly theoretical attacks. And if you yourself are targeted for some reason, these are absolutely things you should be mindful of.
So in general, if you have the luxury of only using your own device and not some shared device that, too, tends to be best practice, I would say. Any questions on these here attacks? AUDIENCE: Regarding using long passwords, do you recommend using Google passwords or Apple passwords for the system to remember the passwords for us? DAVID MALAN: A really good question. Short answer, yes, but we'll come to that in more detail in just a few minutes. So what are other attacks that we should be mindful of? So this one has kind of a funny name, but there's this attack known as credential stuffing. And we've come across reference to this already in this discussion thus far. Credential stuffing-- a credential is something like a username and password-- refers to the process of an adversary having found a whole bunch of usernames and passwords, maybe online, maybe in some database that they or someone else attacked and posted for the whole world to download.
Credential stuffing means not using dictionaries, not using brute force, but just literally using a list of already known usernames and passwords, maybe from some other application or website, to try to stuff them into a different website to see if, well, maybe if David's using this username and password over here, with high probability, he's probably using the same username and password over here. So credential stuffing is the threat that, I daresay, many of you are vulnerable to. Now, you don't need to raise your hands and admit to this right now. But if you are using the same username and the same password on 2 websites, 3 websites, 30 websites, all websites you are today vulnerable to this attack.
To be clear, if any one of those websites or apps is compromised by some adversary and they figure out all of the usernames and passwords on that system, what a smart adversary is going to do now is to try that same username and password, they found for you on Amazon, on Gmail, on any other website or app that you with high probability might be using just because those services are popular. So what's the takeaway? Ideally, if you want to be immune to this kind of credential stuffing attack, where someone takes your credentials over here and tries to stuff them into these other services over here, you have to use different credentials on each and every website, on each and every app. You cannot, should not be reusing the same password on multiple websites or apps. Username? Yes, especially if it's your email address. But passwords, no. Now, this is admittedly easier said than done.
So we'll see soon how we can try to achieve this, avoiding credential stuffing by having unique passwords, by at least having some help when it comes to managing the same. But there's another attack too that's come up indirectly here already known as social engineering. Social engineering isn't a technical attack per se, but rather a social one, an attack among humans. For instance, let me go ahead and suggest the following. If you have a piece of paper near you and a pen or pencil, go ahead and write down, if you could, on that piece of paper one of your passwords.
Any of them is fine. Just go ahead on this piece of paper in front of you and write down one of your passwords, including any letters or digits or punctuation. Now, I'm seeing in the chat some resistance. I'm seeing some heads down though and some scribbling, which is exactly the point. Why would you take my suggestion and write down your password on a piece of paper, even though I'm presumably someone you should trust in a cybersecurity class? Those of you who reached for a pen or pencil, just wrote down one of your passwords on a sheet of paper were just socially engineered because a circumstance was created where you believed or trusted the person that was asking or telling you to do something, and you took at face value that you should do that thing. Moving forward, if any teacher ever asks you to write down a password on a piece of paper, one takeaway for today is just don't do it.
That would be social engineering. And in general, if someone calls you on the phone, sends you an email and tries to get information from you, even if it seems and sounds legitimate, moving forward after today, especially, should always have a healthy skepticism, if not just enough paranoia to be healthy in the interests of protecting your account. Moving forward, if you had someone ask you something like that or even a little more nefarious, trying to figure out what your first pet was or something should kind of perk your ears up. Your Spidey senses should go up, so to speak, in the context of Spider-Man. And you should wonder, wait a minute, why do they need that information? Let me see how this plays out before I share anything about myself. And indeed, if you were, bless your hearts.
But if you did jot down your password with a pen or pencil, remember that feeling of being duped because you do not want that to happen when it actually matters. Now, after this, now, I know you'll believe nothing I say. But go shred or tear up or flush whatever piece of paper has that password on it. The point is not to share it with anyone, just to prove that particular point. Now, beyond social engineering, there's another threat that's a variant of that, but is more technical in nature.
And that is phishing. And most of you probably have heard about phishing in this context. And you can think about it sort of physically like going fishing, but trying to hook a sucker, someone like me or you who is duped into providing information that they shouldn't. And this very often happens via emails. Odds are, if you go through your spam folder sometime, you will see emails that seem to be coming from paypal.com
or seem to be coming from Google or maybe a politician or the like. And very often those emails are encouraging you to click a link and maybe make a donation, click a link, maybe change your password, click a link, and verify your information. Phishing is all about trying to use social engineering, in this case in a technical way, to try to convince you through very convincing looking emails and even websites that it is a legitimate email from paypal.com or it
is a legitimate email from a politician or it is a legitimate email or request from a teacher here at Harvard. But it's not. What they're trying to do, the adversaries in this case, are prey on your trust for those certain companies or persons. They're trying to prey on your comfort with familiar user interfaces, things that you've seen before. But unfortunately, it's all too easy for an adversary to make a very official-looking email, to make a very legitimate-looking website, even one that looks identical to paypal.com,
identical to Gmail or other services. And frankly, if you take some of CS50's other courses, it kind of boils down to copy and paste in the simplest of scenarios, just copying and pasting some legitimate website and pretending that you own it too. So how might phishing manifest itself in the real world? Well, consider one of those social media posts online that tend to invite you to comment with your favorite song from childhood. Sometimes those posts have a million responses from people you know and don't even know. But more than being interested in what your childhood favorite song was, those posts are very often phishing for personal information because suppose that you, or at least someone among those comments is actually using their favorite childhood song as their answer to some website or app secret question. Now the author of that post, not to mention everyone else, knows the answer to the same.
So how else might phishing manifest itself in the real world? If you were to visit a screen later today that looks a little something like this, well, this looks like Gmail's login page, at least here in the US when using English. And frankly, I've seen this so many times that I might be inclined to just blindly type into the form my email address and then after that probably my password. But it's important to begin to develop an intuition or a suspicion for when and when these sites might not be legitimate. How might you do that? Well, you should minimally be looking at the URL bar and making sure that it is gmail.com or probably google.com or whatever google dot country code, depending on where you live in the world, making sure that it looks legitimate and that you've actually been there before for.
When you hover over links, you can very commonly in your browser look at the bottom, left-hand corner or some corner of your screen. And you can see what URL a link will actually take you to. Even though the words on the screen might say something, the actual link might take you somewhere else. Now, even then, it's hard sometimes to discern these kinds of things. But these are just best practices. You don't need to be so worried that you don't go anywhere on the internet.
But you should learn to keep an eye out for these kinds of things. And in general, with phishing, rather than trust any link in an email that you receive, especially when it's something private like a bank account, something medical, something personal, well, that's fine. Open a new tab and manually go to paypal.com, Enter, or manually go to gmail.com, Enter. Don't just blindly trust these links.
Now, here again, we see a trade off between usability and security. It's a little annoying if I can't just click on a link and go to the place I want to go. You have to manually open the page, type it in manually, and so forth. But again, it depends on what's now more important to you, the usability of that service or the security of your account therein. This is even more worrisome when it comes to two-step verification.
And Google takes some liberties with the wording here. This is usually best described as two-factor authentication. But again, the most sophisticated of adversaries, theoretically, if they sent you a phishing email, tricked you into a visiting a website that looks like Gmail but is not Gmail.
They could theoretically even prompt you for a two-factor code like this. And then if they're smart and savvy enough with code, they could automatically now send your username, your password, and a two-factor code maybe to the real gmail.com, log into your account, change your password before you even get up and running therein. It's a more sophisticated threat, and it's not something you need to worry about as much. But it's this principle of not just trusting screens and requests that are presented in front of you.
You should have this healthy skepticism and at least some technical savvy to know how you can decide for yourself, yes, I am comfortable with proceeding with this step. Now, there's another type of attack, too, that's more sophisticated and not one you need to worry about as frequently as some of the earlier ones. But they're generally known as a machine-in-the-middle attack. Whereby, if you're on the internet, there are, suffice it to say, many other machines on the internet, very often, between you and whatever website or app you're visiting. Often those machines might be things like routers, servers that internet service providers companies, universities, maybe even your own home owns and controls. But all of your data is passing through those machines in the middle, so to speak.
If any of them are malicious and are maybe storing your data, looking at your data, it's possible that you m