Internet Technologies - Computer Science for Business Professionals - by CS50 at Harvard
So. Odds are you're on the Internet these days but, what does that actually mean. And indeed this internet, that we use very, often these days for messaging, for email for browsing, the web and other services, still there's, a whole infrastructure. That underlies it that is increasingly, powering, new ideas, new startups, new companies, new businesses, as well as new forms of communication among, humans, and yet like most every topic we've explored, you'll, realize that while it's very complex, perhaps up here or certainly seems complex, up here if we begin with some of the fundamentals, and then layer and layer and layer on top of those do we pretty quickly get, back to today's, technology but with a much better understanding of, what's going on from. The ground up so, here's, a bit of alphabet soup odds are you might have seen, one. Or more of these acronyms, to date IP dhcp, DNS. TCP UDP, icmp, and so, many more these are all examples of something called protocols. Where, protocols, are kind, of like languages. That computers speak with one another they're not programming, languages, so they're not used by humans, to make, computers, do things or follow instructions, per se a protocol, is really a set of conventions that two computers, or two computer programs might use when, inter. Communicating. And so what's an example of a protocol in the real world well we humans have some silly protocols, one, of which here is culturally. When you meet someone to extend your hand and then he or she presumably extends, their hand and you do this for who knows what reason, and now you've sort of completed, that that social transaction. But it's a protocol in the sense that when I extend my hand most any polite. Other person, knows that they're probably supposed to extend their hand as well embrace for a moment and then complete, and the protocol says - you probably shouldn't do this for terribly long and so there's these rules, of thumb or, actual, rules that you follow when. Implementing, protocols and so computers, great. As they are at following rules very, often use protocols, when they inter communicate, in order to get data from one place to another so let's tell exactly, that story if you're on the internet right. Now on the Internet what does that actually mean and how can it help us solve problems ultimately. Having access, to this this, inter networked. Infrastructure. Well, let's consider what. Happens when I first visit, my favorite webpage instance if I go ahead and visit something. Like, facebook.com. I go ahead and log in and I'm immediately presented with my newsfeed or maybe your favorite website is Gmail or your favorite website is Bing or maybe your favorite website, is any. Number of other places you might go on the web all of, which take, in as input a request from you and produce ultimately, output. The screen that you ultimately see, but how does that data get from, one location, to another let's, begin to draw a picture perhaps and this, picture might be representative, of your own home network or maybe your campus, network or maybe your office network but generally, speaking you were on the internet maybe with your phone your laptop or, your desktop device, and we'll just to pick that is this sort of abstract laptop, here so, that laptop, somehow, wants to communicate with a web server elsewhere, Facebook Google Bing whatever and we're just going to present that as way over here in the picture in, a really big corporate, office building for apps and inside, of that building are the servers, that rep.
Website. But, how do I get data, from, that server which if it's Google or somewhere else might be all the way in California or, halfway across the world and back to my laptop well, somehow I have to be able to send messages to, it and receive. Messages from it and of course in between me, and. This. Resulting, website is what we'll generally call the, internet it's kind of conveniently, drawn as a cloud, here which is another semi. Technical, term that's come into vogue in recent years and the cloud really, just refers to Internet services, these days it's not a technical term unto itself it's just a sexier term than saying my. Business is on the Internet. Oversimplification. And we'll come back to that before long but you can assume near, that the Internet is somehow, this delivery, mechanism. It somehow gets data from point A to point B and back, but, how does that work, if my, data is coming in as input and it's reaching eventually its destination, and then a response is coming back in this direction what's. Actually going on underneath the hood there, especially. Since in this story at hand all I've typed is something like facebook.com. Or Gmail, com or the like well, it turns out that, your computer, these days when you first turn it on and you connect to like the Wi-Fi, or in a room or you connect with the ethernet cable to the wired network your, computer receives some information, automatically, your computer, speaks. A protocol, called. DHCP. Typically, dynamic, host configuration protocol. But in most of these cases the acronym isn't really what's, important, certainly it's what the what. The protocol. Itself does and in this case this, dynamic, host configuration protocol. Dynamically. Configures. Hosts. Via a protocol. If you will so, what does this mean essentially. DHCP, says this when you turn on your computer or you take out your phone for the first time in your connected on Wi-Fi or to a wired network it says hello world I am, Alive I would like to be given an address so that I can communicate with other computers on the internet it's not quite that verbose, perhaps, but. It is a question, hey computers. Around me please give me an address and what it gives you is what's, called an IP address, Internet, Protocol, so just as in the real world where physical, buildings have historically, been uniquely, addressed with, postal, addresses like, Harvard's. Computer, science building is at 33 Oxford, Street Cambridge, Massachusetts USA. Oh two one three eight is, the more precise zip code as well, that. Uniquely, identifies that building in the world so, does my computer need, an address and it's not gonna be some freeform a dress like that in words it's actually gonna be a numeric, address specifically. I'm gonna get an IP address of, the forum number. Dot. Number dot. Number dot. Number so. Four numbers separated. By dots each, of those four numbers happens, to be a byte long or eight bits so each of these numbers therefore, is between, 0 and 255. And, so this means a long story short that the total address is 32 bits 8 plus, 8 plus 8 plus 8 and that means there's 4 billion, possible. Addresses in the world and that's great because people have got a lot of computers, and a lot of laptops and a lot of desktops, and servers these days but it turns out we're actually running out because. We have so many such devices so there's a newer version of IP that's, increasingly, being used called IP version 6, we're. Talking here about IP version 4 since, it's so omnipresent. And IP version 6 just, so you know uses. 128, bits, for its addresses way more than 32 so, we'll be good to go for, some time but. DHCP, gives me this address an IP address of the form something, something that's something got something and the, purpose, of this address is to help my data get from point A to point B and indeed, anytime my computer, sends a request out on the internet like Facebook. Please show me my newsfeed or Gmail, please show me my Inbox my computer, has to use that IP address so, much like if sending a letter in the real world you might have an otherwise blank envelope and you, might want to send a message to somewhere else in the world you might write their physical. Address but in the computer world we might write something like. 184.108.40.206. In. The, to field assuming. That this is the IP address to which we want to send this data meanwhile. My from address might be. 5.6. Dot, 7.8. So i'll write in the top left-hand corner by convention, whereby, that indicates, to the whole internet this is where this request came. From now, I know my. Origin. Address the source address here at top left because DHCP, told me how, do I know 1 2 3 4 how. Do I know the, IP address of facebook.com, or, gmail.com, right, we don't live in the world of 800. Numbers anymore where you dial. 1-800-843-9166. Answer that question, and say Oh facebook.com/, it's, 220.127.116.11.
Use. That address instead, now, thankfully, my, computer can now write that number on its virtual envelope so to speak and then pass that envelope, out to the Internet and because, of these numeric, addresses, it, will be properly, hopefully, routed, across the Internet to, its destination because, it turns out inside, of the, internet here. Interconnecting. Everything. In between point A and B, are things, called, routers. Or gateways. And I could draw this picture in any number of ways but the point is that it's just so darn. Interconnected. And indeed, there might be even more pathways still, or maybe even fewer pathways, indeed on the internet there's often multiple ways, for data, to get from one point to another some, shorter some longer but there's this resilience. This redundancy, and this was a feature back in the day especially insofar. As the internet had militaristic, origins. It, was meant to be redundant, too as to withstand failures. Of one or more of these nodes. These dots in the picture now each of these dots is just. A server really a special, server called a router or gateway whose, purpose in life is to do exactly that to route data upon. Receiving a virtual. Envelope like that one it looks at the to address realizes, who this is destined for 18.104.22.168. I know that that address is over this way meanwhile, if it gets another envelope from someone else it might say ooh this is some other address it's gonna go this way and so routers have multiple cables, or they have multiple virtual, network connections, elsewhere or wireless connections any number of possible. Connections, might they have to other routers and so, it, can, route it to its next hop so to speak and generally, on the internet within, 30, hops within, 30. Transmissions. From router router router will. Your data get from one point to another and it might not follow the same path each time but, it will traverse, this so-called Internet and so that's kind of what the Internet is it's, this collection of routers, and it's this collection, of networks, a network, of networks that. Is incredibly. Interconnected. In, different, ways so. DHCP, gives me an IP address so I have a unique IP address DHCP, turns out also tells me what, the IP address is of my local DNS server so I know whom to ask to. Convert domain names to IP addresses but one I have that I can now use a, protocol, called, TCP. To. Send my data ideal. Reliably. Typically. From one point to another so where as IP is responsible, for a few things one of its most important, functions, is this notion of addressing, and standardizing. How things are addressed but, TCP, one, of its most salient features, is to guarantee, with, high probability delivery. And what, I mean by that is that bad stuff can happen in the middle of the Internet these routers can get really busy they can get really congested and, overloaded, and so routers, might literate wall virtually. Drop, packets, they, might receive so many packets at once they just can't like a human deal, with it all at one time because they have a finite amount of memory or RAM or disk space and so they drop them so to speak they just delete them and they, expect the sender's to resend them TCP, is a protocol, another, agreement between computers, that if the receiving, computer realizes, I, got some of your packets but not all of them, TCP.
Mandates, Much like our human handshake, that's something next should happen TCP, says my laptop should retransmit. That, virtual, envelope. But. TCP, allows us to do something more than guarantee. With high probability delivery. Of data it also allows us to multiplex. Among, services, or put more simply it allows a server, to, receive different types of data for different types of services, for instance web services, on the on the server email. Services, chat services, enlike and so it turns out that on this virtual envelope that, gets sent from, a computer to, a server it's, actually, not sufficient, for them to be the return address and the, IP address, of the destination I, also, need to specify what, type of information, is inside this envelope or equivalently, what kind of service, I'm trying to contact and I could do this by specifying in words what's inside this envelope maybe it's something like HTTP. The, prefix that you're familiar with from the web maybe it's an email maybe it's a chat message or the like but if it is in fact something like HTTP, turns, out the convention it's not to use words but to use numbers, and so, in fact I need to put one other piece, of information on, this envelope which. Is a so-called port, number a TCP, port number which. Is numeira, numerically, printed after a colon on a virtual envelope, like this and in this case I wrote 80 because, 80 happens, to be by human convention the number we humans agreed on some years ago identifies. Web, services on servers. But this means that, if the server I'm sending this to 22.214.171.124. Actually, has other services, on it like a chat server and email server and the like this won't get confused with an email that, I or someone else is sending to the server or a chat message the, server will know upon receipt of this oh this, is a request for a web page let, me send this virtual envelope to. The, web server but HTTP, isn't the only such protocol. There are something called UDP, which is common in some, circles as well, UDP. Works, a little differently, and so, far is its feature, is, to, not guarantee. Delivery if some. Data gets lost packets. Get dropped so to speak for whatever reasons, malfunction. Technical, difficulties, routers. Are overloaded, UDP, says our protocol. Shall be not to retransmit, that data and, that's, a strange thing because it sounds worse and yet, this protocol has been around for quite some time still, used quite quite, appropriate, in some contexts.
But. What context, would you actually want to just forge ahead irrespective. Of getting, complete information, well. Go to here, is something like video. Conferencing, or audio, conferencing, or live TV, on the internet watching a game like a football game for instance if you want to watch it in real time you, might prefer that, the, stream, the, bits that are coming from the NFL wherever to your computer don't, actually buffer, don't actually stall, you would rather miss a second, so that at least you stay current in real-time with that game or videoconferencing. Even more so it's kind of be kind of annoying if you have a bad connection or some packets get dropped and you just have to wait and wait for the person's voice or image to be retransmitted, you'd rather just say what, did you say could you repeat yourself say. Again you, could just use human protocols, to deal with that - so, sometimes, you want live, streaming. Applications. For whatever purpose and you want the data just to keep coming as much of it as can make it through is great but. You don't necessarily want it to, be resent, so. Data, is going from one point to another but how long does all this take my god this is kind of a long story just, to get data there well let's do an experiment, let's, go ahead and pull up a program. That uses a different protocol, altogether ICMP, and there's other protocols, still this one's a little more technical but it's wonderfully, revealing. In a few ways I'm on my Mac here in the so called terminal, window that you can pull up something similar on Windows and other operating systems as well and what I'm gonna do is literally trace the route between my laptop here and some foreign, server for instance one on the west coast of the US Berkeley's, web server so let me do that trace route dub-dub-dub. Berkeley.edu. Enter. And. Curiously. We start to see a whole bunch of lines of output, most, of them numerical, and indeed notice that each of these is an IP address but what is it an IP address of, well. We have like 18, of these between me and Berkeley apparently, turns out those represent. Routers. Between me and Berkeley, California, each of them has an IP address and each of them has a measurement of how long it took my data to get from my Mac to, that router it's highly variable notice, it's kind of all over the place in fact this is just weird this took like three thousand milliseconds, or three seconds so I'm guessing, that that, router in row, eight was congested, for some reason some kind of network issue there temporarily, but then my data actually went through and it's not cumulative these, are individual tests from my Mac to each of these routers iteratively, one, at a time and you can kind of get an aggregate sense of how long it takes therefore for data to get from, the east coast to the west coast if we look at some of the later numbers they're kind of variable, but they seem to be around 75. Milliseconds. So, this is kind of an extraordinary if you want to fly from Boston Massachusetts to. Like San Francisco it's gonna take you five six seven hours you want to send an email or send a packet, it's gonna take your 75, milliseconds.
That's, Astonishing how, quickly the, data can transmit, now notice this is not all that enlightening, knowing these IP addresses, but eventually some of them have domain names just because the humans controlling, those routers decided, we are going to give these routers actual. Names domain, names as opposed, to just having IP addresses, and you can often but not always infer, from the domain names where, they are so, I'm gonna guess, that, at least row 11. Here, I don't know what xe7, ooo, RT, s W is but yeah. Os a.net, Los Angeles, in California I'm, guessing my data kind of came into Southern California first. But then notice what happens next a couple nameless, servers LAX, so, maybe that's the airport, indeed routers, for historical. Reasons tend to be named after nearby, airport, codes I'm, not sure what this next one is here but I do recognize. Oakland. And UCB UC Berkeley so I'm guessing the neck one of the next routers is actually in Oakland, or near Oakland, so. That's a pretty long cable, or interconnection, essentially between LA, and Berkeley, but the result ultimately is that my data makes its way to Berkeley, this. Time via this path if I ran it again now, or in a day or a week the path might be a little different based on conjecture edge congestion, and interconnected. Interconnectivity. But. The data actually, gets there and cutely enough it looks like Berkeley's web, server is called Cal web farm prod for production, is. T dot berkeley.edu. Seventy, five milliseconds. Only but what about this what if we don't stop at the edge as we do. As. We do at the edge of this continent. But keep going what's. Going to happen well let me try to trace the route to say W, wcnn. CEO JP. The, domain name for what, I presume, is going to be the, Japanese version of CNN's website, in Japan. Here. Too we have a bunch of nameless servers, just with IP addresses gets through them pretty quickly we seem to have some loll sometimes, this program won't sometimes, the routers won't respond to these queries so they remain essentially, anonymous, but now this is quite, interesting. Oh my god we, went from. Routers. 12 13, 14 15, taking, about 3060. 3 milliseconds. Give or take to, 193. Milliseconds, which isn't a blip because it stays around that value 180, milliseconds, 160, milliseconds.
177. Milliseconds, that's, a big jump of a hundred some milliseconds. Just, between routers. 15. And. 16. Why. Might that be, what. Could be between routers. 15. And 16. Well, if you know your geography it might well be the Pacific, Ocean, there's quite a bit of distance there's quite a bit of cabling, that actually connects the west coast of the country to Japan, in other areas in Asia and beyond and that's, what's pretty amazing, not only is their interconnectivity, on the Internet these days via cabling, and via Wi-Fi signals, and via satellite. Signals, via microwave signals and the like you have so many different ways for data to be transmitted, and it's absolutely, astonishing and exciting, daresay just, how interconnected the, world now is in fact, thanks to this animation online let's take a look and appreciate, just how extensive, this, network actually is. All. Right so let's actually solve a problem. Now with this Internet all right the Internet is you've probably heard is filled with cats and yet this cat images can be pretty big and indeed, bigger, still, than images or things like video files from Netflix and the like and so there's huge, amounts of traffic transmitting. Over those kinds of interconnections, so, how, do we ensure, at least with high probability that data can actually get through how can we ensure that there's some form of fairness, if not net neutrality so that my data can get just to, a destination, just as readily as your data can get there well sometimes, it's opportune to actually take big, packets, of information and chop, them up so indeed what a computer will often do thanks to tcp/ip. The, combination, of these protocols, is will take large files and large images in this case tear, them off into say. Roughly whoops, roughly, equal sized parts, like, this here, and then tear it down even further perhaps, to get it into a smaller bite-sized, piece and then, send not only, one, packet of information over the internet but, instead put. One piece, of information in, this packet here, put. One other piece of information in, this, packet here whose addressing, both to and from is identical, and then do the same thing for the two other pieces so that ultimately, we have four packets. Each of which contains one portion, one quarter in this case of the resulting, message all of, which are destined, for the same destination. But the, problem to be solved now is what, do you do with this information if I have four seemingly identical envelopes. But inside, of which are disparate, pieces of information that somehow need to be reassembled, let's. Put on our proverbial, engineering. Hats how. Do you solve this problem is, this sufficient information on, the, envelope, so that if I send this out on the Internet toward Berkeley or Stanford or, Facebook or wherever how. Does that recipient, know what to do with it like. What would you the human do if you have not virtual, but physical, envelopes, well. Here too and here's an opportunity really to bring to bear human. Intuition to a problem that seems fairly technical and well beyond next one's own technical. Understanding, and yet it really is just a technical manifestation, of a real-world problem I need to keep these in order somehow. You know what I'm gonna say something like one, of four, on the. First one like, this the, next one I'm gonna say two of. Four. On the, next one like this and then I'm gonna say three of four, and then, on the next one here I'm gonna put four, of four, and what's the takeaway now now whoever, is the recipient, of these, several envelopes. As I send them out on the Internet and indeed they don't have to follow the same path one can go this way one, can be routed that way another, can go to this router another can go to that router because they're all addressed and because all of these routers are somehow interconnected all four, of those packets will hopefully get to their destination, but. If they don't the recipient, can look at that additional. Detail I wrote on the envelope and see oh I got part one I got part two i got party but where. Is part four or four it didn't arrive because of congestion literally got dropped on the floor and not picked up so. The computer, who's supposed to be receiving that data thanks the TCP, recall can say hey please send me again, packet.
For A four and so, as technical, as the Internet might seem it really, again is just. Some, fairly intuitive solutions to problems like this albeit translated, to more technical context, more technical protocols, and more, technical languages, but, let's look at some more user-facing. Protocols. The ones we've discussed thus far a fairly low, level if, you will and indeed there's this whole internet hierarchy, of protocols, layer on protocols, layer on protocols, so that what you we humans really tend to care about if we're not the engineers but we're really the software, developers, and we're the users of applications, we care about application. Layer protocols, that, is right between the human and all, of those lower level protocols for instance these at least one of which has gotta jump out at you HTTP. Lots, are you've seen this odds are you've typed this though decreasing lead you have to still type it because browsers will just add it for you HTTP. The secure or encrypted, version HTTP, imap4. Email what. Inbound SMTP, for email outbound, SFTP, for secure file, transfer SSH. For secure shell asks, an encrypted, textual. Channel between two computers and many more but HTTP, let's focus on that one because that is, hyper, text, Transfer. Protocol. Or. HTTP. The same but the s stands for. Not. Savings secure, secure. So it's actually encrypted, in this case so what does this actually mean, well, at the end of the day HTTP. Is a protocol, that governs, what, kinds, of messages go inside, of those, envelopes that, I've been preparing for the Internet what kinds of messages go inside, of those envelopes and it turns out the simplest, message, that. A computer, sends, through, this whole Internet ultimately. Inside. Of that virtual envelope is quite often, thanks, to HTTP. Inside. Of this virtual envelope if I'm trying to request a cat from the internet might literally be a message like this get me for. Instance slash, cat. Dot. Jpg. For. JPEG and maybe some additional text after that maybe some additional text below that but, at the end of the day inside the virtual envelope if I am on the internet and I'm going on Google Images and I want to find a picture of a cat inside, of my envelope, if I, am a web browser, speaking. HTTP, is going to literally, be a textual, message that says get slash, cat JPEG, if I know that's where the. Image, is on some server the, response, is going to be what was just inside of those four envelopes back from the server to me chopped, up maybe into multiple pieces but in a way where I can then realize wait a minute you sent me only three or four please, resend me the fourth one so it works in both ways whether it's me sending a cat to someone or receiving, a cat from, someone this protocol, HTTP, governs. How, the messages. Are formatted. And what language so to speak is spoken, between web browser and server, so indeed HTTP. Is entirely, about, having. A web browser, communicate. With, the server and we can see this in action, I'm going to go ahead and pull up a so-called terminal, window again this textual, command prompt on my computer, and I'm gonna pretend, to be a browser so, I'm not gonna just trace the route between point a and point B I'm actually going to request, a web, page as though I am chrome or edge or Firefox or Safari or whatever your favorite browser, is, but of course as before all, I know is that I want to visit my favorite web site facebook.com/, for, instance but, I don't know its IP address necessarily. So let's let's go through that step how do I look up its IP address, well my Mac are red has an IP address because, of dhcp my arm already powered up I'm already connected to the Wi-Fi here on campus and so I already have my own IP address and I also have the IP address of a DNS server so my Mac just knows that but. I can use that capability. Now, to look, up the. Name the. IP address for the name Facebook, and I'm gonna do that as follows nslookup. For. Name server. Lookup and I'm. Gonna go ahead and type in WWI. Comm. Enter. And, interestingly. We, get back this somewhat cryptic response but let's make some sense of it so it looks like the server that, this response came back from is 10.0.0.0. May. Type II address here on campus that you might have in your own company or university, or even home network then, a non authoritative, answer is this dub-dub-dub, facebook.com. Whose canonical, name is curiously star mini that, seat NR dot facebook.com. Well, it turns out that, companies like Facebook absolutely, have many many many different web servers and they might not necessarily have just one IP address but we might just be seeing one IP address depending. On where I am in the world and depending on how Facebook is configured its infrastructure. The takeaway then is. That apparently so far as my Mac is concerned WWF.
Here, And indeed, what you are seeing is a language, called, HTML, inside. Of the virtual envelope if you're requesting not a cat image put a web page that has your news feed or your inbox from Gmail your search results, from Google is the, language called HTML and. HTML is not a programming, language indeed. It's not as cryptic looking as this Google as being very, our, Facebook is being very efficient, when it comes to showing me this information and just getting rid of as, much formatting, as they can to save space to save on internet bandwidth, the trans mission thereof but. It's a language that, comes back in this virtual envelope that a browser knows how to display it's a markup language in, the sense that it's going to tell the browser what. To show on the screen where to show the cat where to put words whether to make those birds words big or bold or italics, or centered or any number of other things and indeed, what you are seeing. Is. This. This. Is, WWF. Facebook.com/ graphically. As we see it in the browser underneath. The hood is that. Black and white seemingly, nonsensical, Greek. If you will that at first glance there's no way most of us would understand, it but that's because we're looking at it here, we, need to dive in a little deeper take, a look at what hTML, is how, it's actually structured, make the simplest of web pages a a hello world of, web pages if you will and then, can we realize and build back up to this point exactly what composes, pages like Facebook and Gmail and Google and Bing and others because. At that point we'll have understood not only how the internet, works but. How you can use it as a delivery. Vehicle for your ideas for your programs, for your products, for your your. Your companies, and more and actually, deliver information. And deliver katz and much more to, your users on this. Internet.