CS50 2020 - Lecture 8 - HTML, CSS, JavaScript
[Music] [Music] all right this is cs50 and this is week eight the week of all hallows eve indeed thanks to our friends here in the american repertory theater the stage looks amazing today with some special lighting and some special characters of course speaking of characters this past week you all explored 50ville for the very first time looking for the rubber duck that had gone missing and thankfully the culprits have been found and allow me to say that a little someone would like to say hello yes even he has rather dressed up for the occasion but thank you for all the hard work there so this week of course we transitioned to the world of web programming the motivation being that for the past many weeks pretty much all of the code we have written has been focused on command line programs compiling your code interpreting your code but generally just interacting with a fairly mundane uh blinking prompt textually but of course the software that you and i use every day these days is in the form of laptops and desktops in browsers or on mobile devices or apps and today we begin to transition to a set of languages and a set of technologies by which you can start to apply all of the past week's knowledge and mental models for procedural programming to a much more familiar a much more graphical domain indeed over the course of the next couple of weeks we focus on web programming and the use of languages called html and css and javascript with which today's websites are made and increasingly with which today's mobile applications or apps on your phone are made as well but in order to get to that point in the story we need to consider what the uh the framework is on top of which we're gonna run these websites or these web applications and so that invites the question of the internet exactly what is the internet all of us use it every day but let's take a couple of volunteers from the audience just to define for us what we mean by the internet all of us are literally on the internet right now but if you take a step back and think about it what is the internet how might you define it for someone less technical than you or someone less familiar sophia how would you define it the network of all the computers around the world that are kind of taking in information from the network and also giving it information perfect the internet is this network of networks so if you have a small network in your home a small network or a large network at your company or your university and you start to interconnect all of those networks using cables or some kind of wireless technology you get the internet so to speak a network of networks and this is really the infrastructure if you will on top of which all of today's applications are run so when you use the web when you use chat when you use slack when you use video conferencing zoom or the like you're using the internet but think of the internet really as the lower level plumbing that gets the zeros and ones from you to someone else and back and the applications on top of that are all implemented ultimately in software and so if we consider then that we've got all of these computers interconnected somehow it stands to reason that we need to somehow decide as a global community how to get data from points a to point b and beyond and so throughout the internet are these computers called routers and the at the end of the day they're probably a little bigger than the desktops and laptops with which you and i are familiar but at the end of the day they're the same kinds of devices with cpus central processing units the brains inside of the computer that do all the thinking ram or memory where all of the values are stored and hard disks and where data is persisted and pictured here for instance is an image from mit that depicted a few years back what some of the most significant peering points on the internet throughout the united states so each of the red dots here represents essentially one router or one very important place into which a lot of cables come in and then go out and interconnect all points of the country and then the story continues well beyond the united states these days using oceanic cables and other wireless or satellite technologies or the like so suffice is to say there's sort of this mesh this interconnection of all of these different computers and intern networks throughout the world which is to say that there's many different paths that data can take to go from point a to point b there isn't necessarily a line between you and facebook.com or stanford.edu rather there's a whole bunch of routers sometimes a handful sometimes as many as 30 that will relay your data from left to right to up to down or in some other direction in order to get data from you to the web server that you're trying to contact and then back to you with the server's response so how does all of this work well decades ago humans essentially had to get together and decide as a group what standards they were going to use or more specifically what protocols all of these computers are going to speak a protocol isn't so much a language as it is a set of conventions right back in healthier times you and i if we were meeting each other in person might extend a hand and if i did this you would immediately know that you should probably extend your hand too and we would have a physical handshake and that's like a human protocol i initiate a communication with you by extending my hand you acknowledge that communication by extending your hand and then that sort of interaction is complete so we have these human protocols in the world of computers they're similar protocols but obviously it's all zeros and ones so if the first computer sends this pattern of zeros and ones the other computer should reply with a different set of zeros and ones and so these protocols we're about to discuss just standardize what those patterns of zeros and ones are or really what all of the messages are going back and forth and two of the protocols most commonly used to get data on the internet from point a to point b are called tcp ip tcp and ip are two separate protocols but they're so often used together that you typically mention them in one breath tcpip and this is these are acronyms you've probably seen maybe on your mac or pc or somewhere on your phone settings and it refers to essentially two sets of conventions that computers use to get data from one point to another so what do we mean by data and what do we mean by moving things between point a and point b we'll just consider it sort of as an old school envelope whereby if you wanted to send a letter to someone else in the world you and i would probably reach for a piece of paper back in the day we would pick up an envelope and we would write our note on that piece of paper put the paper in the envelope and then the most important step after writing the actual message would be to address the envelope and of course in the real world you would put the recipient's address typically in the middle of the envelope you might put your return address in the top corner of the envelope and then maybe postage or something like that but we humans have pretty much standardized through all of the postal systems that kind of convention when using envelope so the metaphor here is that the envelope and the message they're in are generally thought of or referred to as packets packets of information and this would be the physical incarnation of what computers ultimately are just going to do using zeros and ones so let's tease apart the two sets of conventions they use for actually putting data in these envelopes addressing these envelopes and sending them out from point a to point b let's consider first ip ip stands for internet protocol and pretty much any mac and pc and iphone and ipad and android device these days has been designed by apple or google or someone else to understand ip it's as though those companies have written software running on those devices that make sure that those devices all support ip just like i was taught presumably by some human this human convention of shaking hands back in the day ip internet protocol simply standardizes how computers address each other so in our physical human world if you wanted to send me an envelope for instance you might write to harvard's computer science department at 33 oxford street cambridge massachusetts zero two one three eight usa that is presumably a unique postal address that addresses the computer science building on campus so that if you drop an envelope in the mail in california or anywhere abroad it should eventually via some number of hops and mail carriers and the like make its way to that particular address computers then have i similarly unique addresses known as ip addresses and so when your computer mac pc phone whatever sends data from itself to another server the address that it writes on the outside of that virtual envelope is the ip address of the remote server so for instance if i were to send a message to you i would figure out what your ip address is i would write that ip address virtually on the outside of this envelope i would probably write my own ip address on the top left hand corner of this metaphorical envelope and then i would send it out on the internet and what does that mean it would mean i take that envelope and i hand it to the nearest router so it turns out when you're at home you actually have a router of your own it's that device that connects to your cable modem or dsl modem or something like that if you're on campus like at a place like harvard or yale harvard and yale have their own routers so your computer when on campus just knows to hand data off to that and if you're at home using um or if you're elsewhere in the world like in starbucks or an airport similarly are there routers there so your computers generally know where the closest router is and then router's purpose in life is again to figure out does this packet go left right up down so to speak in order to get it closer to its destination but this sort of is a chicken in the egg if i want to send you a piece of information i need to know your ip address but i don't really know your ip address until i know where you are so there is this uh other system that you've probably seen in acronym for two called dns domain name system and this is a technology that's deployed throughout the internet that's supported by max pcs and phones these days that just translates what we you and i would typically call domain names or fully qualified domain names from those english-like or human readable characters to the corresponding ip addresses right there's a reason that companies do not advertise their websites as being a numeric ip address none of us would ever remember them they instead advertise them as microsoft.com and google.com and new yorktimes.com dns is a technology that your mac and pc and phone support that know when a human types in one of those human readable addresses a domain name dns converts those names to the ip addresses so literally if you type in harvard.edu or yale.edu enter into your web browser your mac or pc quickly looks up the ip address of that web server using the software that came with the mac or pc and converts it to the corresponding ip address and then writes virtually on the outside of the envelope the ip address of harvard or yale's web server before sending it out on the internet so these are just services dns is a service that your own isp internet service provider provides when you're on campus it's harvard or yale when you're at starbucks it's probably starbucks when you're in an airport it's the airport when you're at home it's your own internet service provider like verizon or comcast or the like so the world just decided to use that technology as well and lastly one other acronym for now tcp tcp or transmission control protocol is a solution to a couple of problems one of which is that it tends to be pretty convenient for individual servers on the internet to be able to do multiple things right you can like you there's lots of things the internet can do the servers can host email they can host websites they can host chat servers video conferencing i mean that's already a growing list of features of software that you can use on the internet and it would be nice financially administratively if one server could do multiple things at once and indeed they can so when a computer receives one of these virtual envelopes and that computer that server happens to support multiple services email web chat video whatever it looks at the envelope for one additional piece of information and that piece of information is known as a port number p-o-r-t number which is just a small integer that the world has decided represents specific services so for instance in the world of tcp the world decided years ago that our computers should virtually write the number 80 on these envelopes after the ip address to signify that this is a request for a web page or 443 on the outside of the envelope if it's a secure request for a web page using something called https more on that in a bit and there's other numbers as well email has its own unique numbers zoom has its own unique numbers and all these other internet services that you and i might use every day have their own unique tcp ports so that companies and people can have one server doing multiple things but upon receipt of one of these envelopes the server can look at it and be and realize oh this is a request for email this is a request for a web page this is a request for chat or something else all together now notably two tcp also handles delivery and it's the part of the protocol that also ensures that when you send data from point a to point b if any data gets lost because literally something's wrong with one of those routers or because maybe uh the one of those routers got overwhelmed and just received way more packets at once than it can handle that could happen because these computers have of course finite memory if you send too much data through one the internet might get congested your video might buffer and a whole bunch of other symptoms might arise so tcp also handles the process of re-transmitting data as needed if any of these packets is lost on the internet literally tcp will also compel your mac or pc or phone to re-send that data as well but what's notable about the internet is that data doesn't necessarily follow one specific path in fact if you send multiple packets from one person to another those packets might actually take different routes each time and this is actually a feature not a bug so to speak because you can imagine servers getting congested or problems needed to getting needing to be routed around and so tcp also supports with other protocols an adaptive solution to this problem whereby maybe your data will go this way sometimes maybe it'll go this way some other times but this is why in part that sometimes your internet speeds are variable because again these routers in between might be different or might be a little bit overloaded so we thought we tried to tell this story by enlisting the help of some of cs50 staff in fact brian let me start with you um would you mind taking on the role in just a moment of playing a web browser someone's own mac or pc or phone and request of me maybe something silly like asking me for a picture of a cat yeah sure so if i want to ask you some web server for a picture of a cat i need to send a message to you in order to send that request to you so i might write down my request on a sheet of paper and i'll just put that request inside of an envelope and then i would have to label that envelope with all the information we talked about in particular with your ip address that i might look up with dns and then i can send that envelope off all right and i think we need a little bit of help here because brian and i are in different places and so he and i uh can't just hand the envelope from one to the other so let's go ahead and enlist the help of cs50 staff here uh also who have uh chimed in here on zoom and see if we can't route this web this request from brian who's playing the role of a web browser to me who'll play the role of a web server in order to receive this request for a cat so here we go let's see if we can enlist the team here [Music] [Music] all right well thank you to phyllis for having handed me this envelope and what we have now is the request that brian sent me i'm gonna go open it up and i did see a message inside requesting a picture of a cat which is not uncommon on the internet so now if i'm the web server and i actually have an archive of pictures of cats i'm going to go ahead and respond to brian with one of those cats but to do so i'm going to go ahead and have to look up on my hard drive or somewhere in the computer that picture of a cat and and here's one here so i'm going to go and send brian this very happy cat i've got some envelopes of my own and i'm going to go ahead and write brian's ip address on the middle of this envelope i'm going to put my ip address on the top left of this envelope and then maybe any other identifying information i need and then i'll go ahead and put the cat into the envelope but of course this isn't really going to fit and this is actually quite commonly the case anytime a computer is trying to transmit a decent amount of data whether it's a big image or maybe it's an even bigger video file for equity's sake it tends to be good for computers to chop up large packets into multiple smaller packets in fact you might have heard of something called net neutrality or more technical topic known of quality of service in a nutshell net neutrality speaks to just what kinds of decisions computers should make when it comes to prioritizing data and a common convention is historically that all of us should chop up our large packets into smaller packets send them out so that they can get then co-mingled with other people's packets and we all sort of reach our destinations at the same rate net neutrality as an aside is all about an interest by some parties and prioritizing maybe the data from certain companies that pay a bit more and so this really speaks to just use or maybe abuse of these basic primitives here but this is not fair for me to try to cram this one big image into an envelope so i'm going to literally go ahead and tear the picture in half essentially chop the packet into two let me go ahead now and put this into the envelope because it'll fit a little more easily so i've got one packet of information for brian i've got now let's see one more packet of information for brian that i'll fit the other half of this image into but i think i'm gonna have to do something else before i drop this out on the internet and hand it back to phyllis to send out back to brian i might need some additional information on these envelopes i've already got brian's ip in the to field i've got my ip address in the from field i've also jotted down the port number that i should use for brian and my own return port number and those are decided typically by my mac or pc but i feel like i probably need a little more information what more should i virtually write on the outside of this envelope to make sure that the data is received as intended any intuition no familiarity with tcpip assumed here but if brian's about to now get two envelopes what additional data should i perhaps give him great brian ring confuses the top of the photo with the bottom so you need somehow to tell brian that this is a top and this is the bottom a link maybe to converge them perfect and so we need to make sure brian knows the order in which these packets should be reassembled so that he indeed gets the cat the right way and not the wrong way for instance so you know what probably suffices is for me to add what we'll call a sequence number to each of these packets which is essentially a number which you can think of as one of two and on the other one two of two so that brian knows when what order to reassemble the packets but also more importantly in case one of the packets or both of them gets lost or somehow dropped by one of the routers along the way there's enough information on those packets to enable me and him to recover that and resend packet one and or two as needed so let's go ahead and do this let me go ahead and enlist the help of the team starting with phyllis here and phyllis if you'd like to go ahead here and all right of course that's only half of the problem so i'm going to go ahead now and send the second packet finally in an ideal world i would actually send these out in parallel but there's no reason that they couldn't still follow different paths in fact this one i worry might take a little bit more time let's see [Music] [Applause] [Music] [Applause] uh amazing brian do you want to go ahead and open up your envelopes and reassemble them yeah so i now have two envelopes i guess i'll open up the one that says one of two first and it is the top half of the cat and then i'll open up the other envelope which is two of two and that is the bottom half of the cat and so together i think i now have the full cat wonderful well thank you to brian into the whole team and so to recap ip is this protocol the set of conventions that standardizes what gets written on these envelopes it's how computers uniquely address each other with numbers of some sort tcp governs a few different things but among them is this numbering of services like 80 for insecure web traffic or 443 for secure web traffic that ensures that the data gets from one point to another and is handled by the right application running on that particular server dns then is what we used to begin with if brian had his own domain name my computer would have had to look up his ip address or conversely he would have had to look up mine so that we humans who are actually using the internet in a human-friendly way don't have to remember ip addresses which again are just numbers but instead can remember things like harvard.edu yale.edu
and the like so that then is the internet the fundamental infrastructure the plumbing on top of which we now have the ability to get data from point a to point b and so in some sense if you're comfortable with that we can now sort of abstract the internet away and just think of it as being a mechanism that gets data from one point to another and so long as we can now assume that we have this fundamental public service that gets data from one point to another now we can start to build on top of it in terms of software and other languages and actually use it for interesting things but before we forge ahead to do those things any questions or confusion we can clear up on tcp or ip or dns or the internet or routers or any of these other new terms grid back to you i have a question so um does shopping the information create any problem because um i don't know a piece of information can go there for two seconds and another one for three seconds does it create any problem for the user really good question these packets can take different durations of time and even though i did stipulate that they should go out to phyllis's hands roughly at the same time even if she needs to pass them in two different directions there can absolutely be delays and in fact typically you and i as humans will start to notice delays if packets take more than 200 milliseconds to get from point a to point b after that it looks like there's a bit of delay and certainly if it's two or three seconds you'll really notice it at that point it's not necessarily a problem brian hopefully would patiently wait for the second half of the cat for some amount of time if he only received one packet eventually he as a human and in turn he has a computer would probably get a little anxious and would ask me to retransmit a packet if it doesn't arrive after 5 seconds 10 seconds 30 seconds these timeouts can typically be specified by the software running on the person's computer but at that point you and i would certainly notice the difference all right so if we now have this ability fundamentally to get data from point a to point b what is actually inside of the envelope that brian sent me and what was inside the envelope i sent him besides just the picture of a cat well for that we transitioned to another language or another protocol rather called http hyper text transfer protocol and this is an acronym you've probably seen or typed bunches of times it's of course what appears at the beginning of urls uniform resource locators which are uh the tools that you and i use to actually figure out what uh what website or what image we actually want to request of the internet so the web you know the world wide web is really just one of many services that run on top of the internet the web gives us web pages zoom gives us video conferencing uh other tools give us text chatting voice chatting and the like so the web is really just an application on top of the internet it's hands down the most popular application but it really is just an application it's a service that's using that underlying plumbing so http is a different protocol that really governs what goes inside of these envelopes tcp governs what goes outside the envelopes http governs what goes inside of the envelopes assuming we are talking about web browsers and web servers and not video conferencing or something else so with http it comes with a few different commands or a pretty limited vocabulary two of which are the most important terms to know which is get and post these are literally english verbs and they are two of the commands if you will that http supports and what brian probably did inside of that envelope is he probably literally wrote down get cat or something like that post is used for other applications that we'll get to before long but get is the operative word and it literally is how a browser will request or get information from a server so somewhere in the envelope brian sent me was the english word get probably bought followed by cat.jpeg or something like that there's probably a bit more information but the essence of http means that if brian wants something from me and he's the browser and i'm the server he should start his request with the standardized verb get followed by the name of the file that he wants to get so let's put this now into the context of one of the more familiar urls so here's for instance a canonical format of a url and let's highlight a few features of it so first https increasingly you're seeing this on the web even if you don't type it it's often automatically appearing in the address bar of your browser because browsers or web servers are adding it for you the s just refers to a secure version of http and we'll come back to this topic of security next week and beyond too but in the context of http this just means that the data between me and brian and vice versa is encrypted somehow it's way better than caesar or other ciphers it's way more mathematically sophisticated but it essentially just scrambles the information so that brian knows he's asking for a cat i the web server knows he's asking for a cat but if any of you or any of the tfs who were playing the role of routers sort of maliciously or nosely opened the envelope instead of handing it off to the next staff member they wouldn't understand what's inside the envelope because it would look like similar to caesar and other ciphers sort of like random zeros in one so https just means that the contents of these packets are encrypted what else is salient about these urls well here's what we call a domain name odds are most everyone knows what a domain name is and it's typically two phrases something dot something else and example.com is of course an example here but harvard.edu.edu
and millions of others these days to the end of that though is what we would typically call the top level domain or tld this is just uh the type of website historically that you're trying to visit dot com meant commercial dot edu meant education dot net meant some kind of network dot org is an organization that's no longer really the case in fact there's hundreds perhaps even thousands of top level domains nowadays that you can buy domains in that try to categorize things sometimes but there's no hard rules around most of those top level domains you have to be an accredited educational institution to use edu you have to be in the us military to do mil they're similar constraints in other countries who have their own two character country code tlds like you dot uk for united kingdom dot jp for japan and many others each country is free to standardize as it sees fit but you and i can buy a dot com a dot org a dot net a dot u s a dot there's many many many others if you go on wikipedia you can see a nearly exhaustive list but this just tends to categorize the type of website that it is besides that there's this prefix this generally known as a host name and www is just a human convention years ago pretty much any server on the internet that had a human friendly name like this www.example.com this was just meant to connote to the user that oh www this must be the address of a web server and not a mail server not a chat server or something else it's not strictly required it's just human convention and odds are you and i when you visit websites you probably don't even bother typing this in anymore but it is a historical feature that allows a visual cue clue typically to the humans as to what type of server it is so besides that there's this one hidden piece of information as well if you just want to visit example.com's homepage you might just type this url or even just type example.com and hit enter and let the browser redirect you so to speak take you to this canonical form of the url but very often you're technically requesting a specific file and if not mentioned that file name is typically index.html it can be other things as well depending on the language or the server technology that someone's using but implicit at the end of urls is often the name of a file brian might have specifically requested cat.jpg but if he were requesting not a picture of a cat but a full-fledged web page with text and other information odds are there's an implicit file name there like index.html
and this is now important because when we look inside this envelope this is a piece of information that needs to then be in there so let's take a look at some sample http requests and responses the more technical dive into what brian and i and the staff acted out a moment ago technically speaking when brian sent me a request for that cat he wrote inside this envelope not only the keyword get and something like cat.jpeg he also specified a couple of other things and let's generalize it now away from cats and just propose this inside of an http request that is any of these virtual envelopes is literally a request for like get followed by slash if you don't want a cat you just want the default home page followed by a mention of what version of http the browser and server should speak 1.1 is pretty common two is pretty is increasingly common three is even now out there but there's just different versions of the protocol it's like humans have refined what it means to shake hands these versions of protocols evolve over time but there's also a line like this host colon www.example.com because just
in case i am a particularly fancy server that supports not only example.com but maybe harvard.edu and yale.edu it's possible long story short for companies nowadays to host multiple websites and multiple domains on the same server this little clue inside the envelope make sure it's that it goes to example.com or harvard.edu or yale.edu if all of these
entities are sharing the same physical server so more specifically a request might look instead look like this if you're not just requesting the default home page but you want a specific file it might say slash index.html instead what does my response look like so i've gotten brian's envelope now i'm going to go ahead and respond with my own one or two or more envelopes inside of mine yes is gonna go pieces of that cat but some additional information as well per the protocol so my response just like in the human world i might extend my hand if i see brian initiating handshake i'm gonna respond with something like this http 1.1 which just reminds this browser what's version i'm speaking then a number which is the status code followed by a shorthand summary like okay 200 okay means i got you i found the cat here it comes piece by piece in these envelopes and i also put in the envelope a mention of the content type if it's a web page i'm going to put text html if it's a jpeg i might instead say image slash jpeg and there's the different content types otherwise known as mime types for all different file formats in the world well that's not always going to be the case that the response is as simple as that whereby your browser requests information and the server responds with the requested information sometimes the users make their way to the wrong place so for instance suppose that a browser visits www.harvard.edu the response might not necessarily be okay initially it might not be status code 200. and in fact we can see
this let me go ahead and open up on my screen here a browser window that's going to take me to let's say harvard.edu and i'm going to go ahead and type into the url bar http colon slash www.harvard.edu enter now all this happened pretty quickly but if i click on the url bar which has been simplified or shortened by chrome at the moment notice where i actually ended up somehow or other my browser did not keep me at http it redirected me so to speak to https this is probably intentional on harvard's part they would rather that i'd be visiting them securely so that if i'm reading articles or other content there's really nobody's business except mine in harvard certainly no one no routers in between should be able to see this so somehow harvard redirected me from http to https well how can i see this well it turns out embedded in chrome and edge and firefox and safari all of today's browsers there are often developer tools that sometimes you have to enable via certain menu but these developer tools are so powerful and they allow you the user or now you the programmer to actually see and understand what's going on underneath the hood of these browsers and servers so i'm going to do this in chrome specifically i'm going to go to view developer and then i'm going to go to developer tools and odds are if you're a chrome user this menu option has always been there even if you never noticed it so feel free to play along at home and then notice this pops up on the top right here i'm going to go ahead and move it down to the bottom just by clicking the dot dot dot menu and move the developer tools to the bottom of my screen just so we can see things a little wider and i'm going to go ahead and click on the network tab up here and when i click on the network tab here i'm going to see a whole bunch of information related to my last request so i'm going to go ahead and do this request again let me go ahead and go back to the url bar and let me go ahead and actually just for good measure let me do this in incognito mode and even though you perhaps are in the habit of using incognito mode if you don't want the browser to remember where you've been or what you've logged in as incognito mode is incredibly powerful for developers tool so that you can sort of reset the browser state to like a first condition without any previous network browsing showing up in your history so i'm going to do this again now in incognito mode after having opened developer tools http colon slash www.harvard.edu enter and a whole bunch of stuff just flew by the window some of what is this some of which is this chart information which shows me the performance so to grad your question earlier about noticing amount of time you can see that some of the requests that were just induced vary between a few milliseconds and over 1 000 milliseconds but what i care about for now is this fairly arcane listing down here a whole lot of stuff just flew across the screen and indeed if i zoom in on the bottom simply visiting harvard.edu induces 70 http requests per this mentioned in the bottom left-hand corner it resulted in 6.8 megabytes of information being transferred and in total it took rather atrociously 11.95 seconds so greg
like that is slow relatively speaking well absolutely speaking so what's the takeaway here well anytime you visit a webpage there's not just the one web page itself with all of the text in it there's probably images maybe videos maybe music and other things all of those get downloaded separately so if brian had asked me for a full web page like the course's home website i might respond not with a single envelope or two envelopes i might respond with 70 envelopes containing the responses to every piece of media that composes cs50's own website or in this case harvard's but for now let's focus only on the first of these requests if i look at the first row here in chrome i will see a reminder of where i visited first but notice the status column over here is 301 301 moved permanently it turns out that there's numbers besides 200 that tell browsers what to do 200 just means okay here's the data you requested 301 means whatever you requested has moved permanently to a different url so let me go ahead and click this first row and you'll see that a whole different set of tabs pops up i'm going to click headers here and now let me define a term when brian and i are using http inside of these envelopes and i write something like get slash http 1.1 or host colon www.example.com each of those lines of text is what we'll call an http header it's a line of text inside of the envelope so what we're seeing here is chrome summary of all of the headers that were inside of these envelopes let me go ahead and look at my request headers first i'm going to click view source and i can literally see the raw request that my browser sent to www.harvard.edu get slash http 1.1 host colon www.harvard.edu and then a bunch of other stuff which we'll ignore for now but those are all http headers but if i scroll back up here let's look at the response headers now what came back in a different envelope from harvard to my laptop notice here that it's http 1.1 but it's
not 200 okay it's 301 moved permanently this is a hint to my browser that uh there's nothing at the url you visited you need to visit a different location instead to know where i need to go i need to scroll down and find this header here notice that the third line in the response is location colon https colon slash www.harvard.edu so this is how the envelope that comes back contains a clue to me to say we have moved permanently to the secure version of the website and if i zoom out now and click this little x to close those tabs you'll see that the next request that my browser automatically sent on its own was to instead if i scroll down here to this request url https colon slash www.harvard.edu and the response it got this time under this general summary here was now indeed 200. so this is just a simple mechanism that allows a browser and server to intercommunicate in a way that can send them from one location to another and let me make this a little more familiar odds are you have seen not this before explicitly because you as a human would rarely if ever see the number 301 or move permanently until today now that you're a programmer who's using these developer tools but odds are you've seen another number maybe in the chat if you want to just chime in if you're thinking about web pages and numbers has anyone seen quite often probably a number that maybe now makes a little more sense brian what are you seeing a lot of people saying 404 i also saw a 500 and a 502 yeah so 404 is the code that humans adopted years ago that just signifies not found so if you visit an incorrect url or an old url that's no longer exists on a server for maybe an old cat that's been deleted the server will respond not with 200 okay but with 404 not found thereby telling your browser to display some kind of error message uh weirdly browsers years ago weren't especially user friendly and then browsers just told us humans 404 404 which frankly is not very user friendly but all it boils down to is this little hint inside of the response envelope coming back that indicates uh that something went wrong that something was not found and there's a whole list of these status codes and this is certainly not something you need to memorize but as we focus more and more on web programming you'll just get naturally familiar with some of these there's other ways of redirecting the user from one place to another 302 and 307 can be used for efficiency servers can sometimes respond with 304 which essentially means you already asked me that question the cat has not changed on the server use your own copy of the cap so long story short if brian's own browser were smart it would cache c-a-c-h-e that is remember the cat that he just downloaded from me so that if brian hits reload or he comes back to that same website again and wants to see the cat again it just his browser loads the local copy instead of bothering me the web server and wasting time milliseconds sending another cat 304 would just say the cat is the same use your own local copy then there's others you might have seen 401 or 403 before which refer to like not being logged in correctly or something like that 500 is actually bad and in fact i can pretty much guarantee that over the next couple of weeks all of you will experience your very first of several http 500 errors that's going to mean next week that you screwed up with your code and you actually wrote buggy python code that just meant the whole server didn't know what to do and that's an internal server error fixable and will help you debug it but indeed that's quite common as well 503 just means the server might be overloaded in some way and so service is unavailable and there's others dot dot dot as well so we can actually have a little bit of fun with this in a couple of different directions it turns out that if we send this http request we can take a look at what comes back and let me go ahead and do this instead of using my browser i'm going to use a command line tool which tends to just be a little cleaner because i don't have to fuss around with all these buttons let me go ahead and use a program called curl and curls purpose in life is just to connect to a url and it's not going to bother showing me the webpage or any of the content it's just going to show me the http headers if i use a command line argument of dash capital i and now i'm going to go ahead and do http colon slash safetyschool.org and i'm going to go ahead and hit enter and this is my mac now sending one envelope to safetyschool.org containing get that verb requesting the home page they are presumably going to respond to me with another envelope inside of which is some kind of response maybe it's a 200 maybe it's something else all right it looks like forgive me that safetyschool.org
has moved permanently per this 301 to this new location www.yale.edu sorry and in fact we can do this if i oh copy this url and let me go into a browser i'll use incognito again so that i don't have any uh past history i'm going to go ahead and hit enter and voila the visual effect is just as real as the headers would imply so uh indeed the funny thing about this joke is that someone on the internet has been paying for the domain name safetyschool.org for like 20 years now for this joke and the only thing it does is redirect one domain name to another now fair is fair let me go ahead and transition away from safetyschool.org to harvardsucks.org which also exists and someone on the other side has been hosting this website for some time and in fact if you visit that url uh let's go to harvardsucks.org enter you'll actually see a whole website so the yalies really went all out here and you can actually see an amazing hack here whereby at harvardsucks.org
there's an old youtube video of an amazing hack or prank that was pulled at one of the harvard yale football games uh some years ago where yale to their credit uh tricked us into spelling out uh with a bitmap if of all things um we suck so fair spam so in any card a bit of a stretch to connect those to underlying http messages but it all indeed relates to these very simple primitives let me point out one other thing as well we might also see in the form of http requests even more sophisticated first lines where you're not requesting just slash the default home page you're not requesting slash cat dot jpeg or slash index.html there might also be question marks and equal signs and notice this is an excerpt from an envelope my mac or pc or phone might send to google.com requesting pictures of cats and in fact let me go ahead and do this on my browser let me go to https i'm not going to bother using the insecure version at all i'm going to go explicitly to google.com
search question mark q equals cats so this is the human version of the url that my mac will translate into this lower level message that's going to be shoved inside of the virtual envelope so i'm going to go ahead and hit enter and voila i now see indeed a whole bunch of pictures of cats including some more horrific photos from a movie that didn't fare well as well so that is to say that it seems that once you understand url formats you can begin to pass input to servers and here's now where we bridge past weeks to future weeks thus far when we visited web pages like harvard.edu and yale.edu and the like we're just visiting static web content we're not actually providing user input like you would using getstring or input or any kinds of command line programs we've written but it turns out that urls do support user input and they are standardized if you see a question mark and then the name of a variable like q and then an equal sign and then a word like cats that's like the web based analog of a command line program having asked you what is the value of q and the human typing in cats so this is to say there is a way using urls that will actually allow us to pass input to a web server and indeed that's what's happening when you're visiting google.com but it just boils down to understanding these urls and before we begin to build some of our own solutions on top of this infrastructure any questions now or confusion on http or status codes or anything we've seen thus far anything at all yeah over to santiago when you want to for example publish a web page um why is it that you have to buy a domain name is that because you're kind of like using memory in some server yeah it's a really good question why do you have to buy a domain name um it kind of boils down to capitalism to be honest um there is a non-zero cost to running certain aspects of the internet certainly or really all aspects of the internet there are some non-profit and volunteers uh nonprofit organizations and volunteers that have historically helped govern it increasingly though there's overhead to operationalizing the internet running things like the main dns servers and other features and so there are what are called internet registrars much like a university registrar whose purpose in life is to allow people to essentially rent domain names on an annual basis and indeed when you buy a domain name it's not yours permanently instead you're paying a yearly fee once renewal fee every one or two or three years or the like it might range from a couple of dollars to hundreds or even thousands of dollars uh we can go down the rabbit hole talking about domain name squatting whereby if you think of a really cool word and you buy the domain name and someone else comes along and wants it there's capitalism at play there potentially an opportunity for you to sell a domain name to someone else but in part it helps just regulate exactly who can sign up for domain names and presumably put some downward pressure on all of them just disappearing if you could just sign up for free for as many as you want other questions or clarifications on not just http but also tcp ip dns or anything else from today's alphabet soup a question came in in the chat if you have multiple packets that you're trying to send from one place to the other do they have to be sent out one after the other or can you send all the packets out at the same time really good question um we did not think that we humans could do that very well choreographically using zoom a bit ago so we sent one pack at a time through through the teaching fellows but yes a computer would typically dump all of those packets out at the same time they would be serialized one after the other but it would happen very quickly and by chance they might all follow the same route through the teaching fellows as routers or they might go in different directions depending on just how congested or how busy the internet is at that moment in time they might arrive out of order but indeed that's why brian needs to know what the sequence number is on the outside of the envelope so he can rearrange them in the correct order anything else on your end brian how do the routers know which way to send any particular packet of data really good question how do the writers know so back in the day and in some cases it's literally hard-coded you can think of a router as having essentially like an excel spreadsheet in its memory with at least two columns one of which is an ip address the other of which is like the direction it should go out on like right left up and down like the cables aren't going in four different directions certainly but you can think of it in metaphorically in that way it tells the router that if you receive data for this ip address send it out on this cable or if it's for this ip address send it out on that cable and all of these cables are connected to other routers in the same city in different cities across an ocean to some other endpoint that would be very painful though if humans had to manually configure all of the interconnections we saw on mit's map just a bit ago and so it turns out there's other protocols out there that we won't spend time on in this class but that routers rely on in order to dynamically adapt so long story short there are protocols that will figure out if all of a sudden my packets are not getting through to brian i'm going to start routing around that dynamically and the routers are going to figure out that does not seem to be a good destination because i'm not getting any response or it's just taking way too long to hear back so there are protocols that govern how you can decide whether to start dynamically changing those so-called routing tables the the spreadsheet to which i referred earlier all right so we have now at this point an infrastructure known as the internet that allows us to send packets of information from points a to point b by writing addresses and port numbers on the outside of those envelopes we have another protocol called http which is specifically used for web browsers and web servers separate from video conferencing and chat which have their own set of conventions and protocols but we have a mechanism for get requesting information and responding with information and we know from problem set four how you can respond with a cat it's just a sequence of bits whether it's a bitmap or a jpeg or something else but we haven't yet seen what an actual web page looks like and indeed if we look a little deeper in the envelope that i'm sending to brian and he's sending to me and you're getting back from harvard and we're getting back from yale we're going to see another language altogether it's not a programming language per se it's what's known as a markup language which just means it's more about aesthetics than it is about logic and there's going to be a couple of other languages tucked in there css cascading style sheets javascript which is a proper programming language but let's go ahead and take a five minute break here and when we come back we'll learn to make web pages themselves all right so when you visit a website requesting the home page or a specific file on the website exactly what is inside of the virtual envelope a little deeper down below the http headers that you get back from the server well that language is known as html hypertext markup language which indeed is not a programming language which means there's no loops there's no conditions there's no functions or variables per se it's just text that tells a browser fairly pedantically top to bottom left to right what to display and how so let's take a look at some examples an html page is going to contain really two different concepts inside of it what we'll call tags or elements and also attributes well what are those well here is perhaps the simplest web page we can make and this is html itself and you'll see that it's structured in kind of a symmetric way some things are indented like in a proper programming language but there is some symmetry to what's going on here so let's tease apart top to bottom exactly what we're looking at here this very first line is known as a document type declaration long story short whenever making a modern web page this should just be the very first line of your file no matter what it signifies that you and i are using the latest version of html which is version 5.
in the future this line will probably change as html itself the language evolves as humans add more and more features to it below that notice is a pair of what we're going to call tags tags are things between open brackets that start with a word like html or some succinct phrase like that optionally with something like this word and an equal sign and maybe something in quotes after that but highlighted in yellow here is the first of our html tags and coincidentally this tag is the html tag and the way it works is as follows when a browser receives an envelope containing text like this it first reads that first line and says okay this file contains html version 5. what's what comes after it oh here is the contents of the web page it says hey browser here comes some html notice down here is sort of the opposite of that statement when you get to the end of this file you'll see a similar looking tag but there's a forward slash in front of the same word html that's what we'll call a close tag if we think of this as an open tag or if you think of this as a start tag this is an end tag and most tags indeed have that symmetry whereby when you open them once you should eventually close them ideally in the appropriate order notice that you don't have to repeat other stuff when you close a tag you just mentioned the name of the tag to keep it fairly succinct and that means hey browser that's it for the html all right what's inside of that if we look down below this you'll see that there's this thing here which is what's going to be called an attribute attributes tend to be short succinct phrases that have some special meaning for that particular tag this particular attribute if you read the documentation for the language html will say that if you add lang equals quote unquote something to your html tag that's going to be a clue to the browser that says hey browser here comes html and by the way the contents of this webpage are going to be in english at least in this case by default for en every language in the world has its own two digit or three digit three character character or three character code that can be placed inside these uh quotes that will standardize exactly what the browser interprets it as useful these days if you have like translation enabled in your browser it knows what language the page is written in so that it can help you translate it to your own spoken language all right below that there's two sets two pairs of tags the head tag here and the body tag here and i've highlighted them both at the same time because you can think of these as both children of the html tag so if we borrow our metaphor of a family tree and some kind of hierarchy here if you think of like the html tag as being like the parent so to speak this parent has two children a head tag and a body tag each of which is respectively opened and closed let's consider the first one the head tag what's inside of that so to speak inside of that is the title tag which as you might guess by now is going to represent the title of the webpage we're writing specifically the title of this webpage is going to be literally and just a goofily hello comma title so that's what you would see in like the tab of this web page let's back up a little bit and look now with the second child of the html tag the so-called body tag this is going to be the big rectangular region of the web page otherwise known as the body or viewport and here we see that the contents of that rectangular region of the page is going to be literally hello comma body so that is to say this is the html for a fairly simplistic web page whose title bar in the tab is hello comma title and whose body in the big rectangular region is quite simply hello comma body and it's perhaps helpful now to call out explicitly that we can think of this a la week 5 is really a data structure even though it's just text inside of that envelope that gets read top to bottom left to right what the browser is actually going to do on your laptop or desktop or phone is actually build a data structure in memory so microsoft who wrote edge or google who wrote chrome or apple who wrote uh safari wrote code that reads html top to bottom left to right like a big long string parses it that is analyzes it and builds up into the computer's memory a tree-like data structure like this much like for problem set five you built up your own hash table in memory for what was otherwise just a big text file of words so you can see the hierarchy here if you think of the whole file as being the so-called document we'll draw a node so to speak in this tree here the very first and only child of that is the html tag indeed every web page has to start with that html tag it has two children as i proposed head and body respectively and then head has a title child and that has a child itself which is just text and just to be a little nitpicky i've deliberately drawn these nodes in slightly different shapes just to connote that html head title and body are indeed all tags opened and closed these ovals here are just text those are not inside of ta those are not tags themselves that's just raw text here and here and then the document node is the one random one this is the only thing that's going to start with an exclamation point typically unless you have what we'll call comments in html which are just notes to self that we saw in c and in python there's similar syntax for those all right with that said if this is the simplest web page we can make where do we make it how do we make it so you could certainly just open up your mac or pc and open up something like textedit or notepad.exe and type this out copy and paste it save the file and open it in your browser but that's not that interesting because if you just save a html file on your mac or pc you are going to be literally the only one in the world who can visit it so ideally you want a server on which you can write and save your html so that other people your users your customers can visit the file via the internet now thankfully we all have access to a tool already called cs50 ide which itself is a web-based tool for writing code and the code we'll start right now just happens to be in html so let me go ahead and do that let me go ahead and open up a new file i'll go ahead and call this say hello dot html dot html being the conventional file extension and then let me just go ahead and retype that so doctype html says hey browser here comes version 5. html lang equals quote unquote en and now notice what the ide is doing for me for better or for worse depending on your preferences it's going to try to complete your thoughts for you so you can just type less this is increasingly a feature of ides integrated development environments because now i can type roughly half as much now i'm going to go ahead and open the head of the page notice it got automatically closed i'm going to go ahead and open the title of the page that will automatically close as well and let me go ahead and just do something like hello title and then down here outside of the head tag i'll do my body tag and do hello comma body now strictly speaking this indentation is not necessary if i wanted to be a little more terse and not use as many lines this is totally reasonable as well and it's probably reasonable up to a point if i had a crazy long title i probably should move it to a line of its own but again these details are not going to matter to the computer to the browser reading this but they certainly make it prettier and easier for me the human and presumably you to read as well so i've gone ahead and saved this file and in the past i would have used like make for c or i would have used python for python but neither of those is applicable because we're not writing or running code i now want to visit this web page and how do i do that well i need a browser and i'm all set there obviously i can use chrome safari whatever on my own mac but i also need a server and it turns out that cs50 ide insofar as it is already a web server that we use to write code we can use it as a web server to serve our
2021-01-02 02:47