hello and welcome to this lecture on networks my name is richard harvey and i'm one of the professors here at gresham college and greetings live from barnard's inn which is the hq of the gresham enterprise and um as usual when you're talking about a big topic like networks there's a lot of selectivity that goes on so uh let me just sort of get out of the way first i'm not going to talk about everything but i'm going to talk about what i think are some key developments in the development of the internet and i thought i'd start sort of right at the beginning as it were with uh this fascinating record taken from the library of uh congress which you you might not be able to read on the screen but it's um the first uh one of samuel morsi's telegraphs which was sent in 1844 may 1844 he demonstrated the telegraph in the u.s congress and he sent a message to his collaborator alfred vale who was in baltimore and this was the strange uh words that he chose what hath god wrought it was his uh question that i think he took from what sounds like old testament numbers doesn't it uh which is the message he sent and um to prove it was received uh alfred vale sent it back he sent back the same message that's an example of uh arq which is an important part of the internet which we will uh discuss in a moment so that was um sort of telegraphy and i picked that out to point out that data comms or data communication is considerably considerably older than the internet but if we just pull forward to the start of the internet then i've got a little clip of one of the pioneers of the internet so-called fathers of the internet a chat called um uh kleinrock and um this is him uh talking about how the internet first started and i should say um larry kleinrock is based in um ucla in uh california and he's discussing uh the first link really of what was then what was called arpanet and later went on to become the internet simply was to type in log now up at the other end there was another programmer waiting to watch all of this and we had a telephone connection between these two telephone connections so they could talk to each other what happened is charlie typed the l and he asked you get the l and the answer was got the l he typed the o you get the o got the o he typed the g you get the g wacko the system crashed this machine went down so the very first message on the internet ever was low as in lo and behold yeah so that was linda kleinrock explaining how another rather portentous message was the start of the start of the internet and that was a communication between ucla and an organization called sri stanford research institute what leonard also mentioned in that clip was that they were using standard telephony in order to sort of check whether this thing was working and telephony and data communication using sort of telephony type circuits has a much older history than the internet and i thought i would just sort of briefly draw your attention to the difference between the two types of uh communication so on the left hand side your right hand side uh we've got what might apply when we're talking about a telephony telephony tends to be about streams of data video audio and so on and the order in which they arrive matters that's what i mean by causal so if we change the order of words in this sentence it doesn't make sense they have to arrive in the right order but you can stuff the signals down a finite bandwidth and still not corrupt them very much and indeed some signal loss is tolerable in voice communication uh which is helpful you know if you if there's some interruptions you say i'm sorry could you say that again so you can have some you can have or you can guess what was said data aren't like that at all really data communications tends to be very bursty you know the computer doesn't do anything for a while then suddenly it needs to dump huge amounts of data on you the order in which things arrive it might matter but it often doesn't we just want the data we don't mind whether the last page arrives first it doesn't really matter we're going to assemble it all at the other end there's a benefit in having more bandwidth we can send more data we send it more quickly whereas you know we've only got one telephone line we've only got one telephone call to make we only need a telephone line we don't need any more for that but data loss is generally intolerable in the computer world so these sort of rather different ways of communicating um caused a bit of a problem in the early sort of design of computer networks so point-to-point communication between computers was well known and generally what you would do is you would use a telephone line sometimes called a leased line and if you were a consumer what you might do is you might buy one of these big boxes it's called a modem i don't know if you remember those and this is this is this is one of the first well i think this is the first british modem actually it's called the daytel modem model 1a wasn't the first first one was probably something called a bell model 101. i i didn't trust any of the photographs on the internet of the bell 101 it seemed to look like a fridge which didn't seem to me uh likely so i've i've shown you this one instead and if you were a consumer you might communicate using something called a bulletin board so you would dial up uh some remote computer and you would read all the messages on the bulletin board and you might add your own and then you'd dial off it was a sort of blog but you only had intermittent access to it they're very very important in the sort of mid history of the internet and for business to business communication um this was also quite common you know if you had a business with multiple sites you would lease a line in this country from bt or from in the usa from a t and t at vast expense and then you would either permanently connect these two computers or you would dial them up using a box that looks like this um and that was really what the telcos the telecommunication companies thought computer communication should be like really partly partly it was technological they had sort of invested heavily in what's called circuit switching the idea of if you think of old-fashioned telephone operators saying who do you want to be connected to and plugging you in that's essentially what how to telephone system works even now and that circuit switching idea was dominant but also the financials around circuit switching was dominant telephone companies existed to charge you by the minute for voice calls they didn't really know what to do with data so this was all sort of coming to the coming to a crunch in the sort of 70s when people realized that there was a potential for uh computation before we get to the invention of the internet however i'd like to sort of call out a precursor to the internet that most people regard as sort of the precursor to the internet something a network that wasn't the internet but it's incredibly important and influential and to do that i'm going to we have to zoom across to hawaii so the university of hawaii in those days had this i'm talking 1969 or so had um four campuses in hawaii and there weren't any leased lines that or telephone lines that ran between them you know that were easily obtainable life lines but there were radio communications so the uh the main campus which is shown here in in pink had a mainframe computer and what what the scientists wanted to do was give remote people at the remote campuses access to that mainframe computer so they devised this network called aloha net uh the the person behind it was norman abramson now i should just point out you know i mentioned norman amrits and i haven't mentioned the four or five collaborators who worked with him that's really for want of time and that all almost all of these developments were large team uh development so do do do go and consult um wikipedia google all the usual sources if you want the full list of uh influencers here now how did this work okay well it had two radio channels and the first channel was very conventional so they would send out a little message from the first station which was where the mainframe was and then that would radiate and it would hit all of the substations so that's really easy that's like broadcast radios it's just broadcasting some data no problem now what about the return channel though or the the back the backhaul as it's sometimes called well they only had two frequencies available to them and one of them was used for the transmit so all of the other stations had to share the other uh frequency now they could have done it as i've just drawn it on this slide did you notice that they didn't all transmit at the same time it was as if they were in a sort of rather synchronized um you know early dance after you after you after you that's called time multiplexing where you divide the time into slots and you say your slot is here and then your slot is here that would have been a very conventional solution the problem with time multiplexing of courses most of the time computers are quiet so the slots are unoccupied and then one of the computers has this desperate need to sort of burp all of the data all immediately and it's only got one fiddly little time slot so it's not very efficient doesn't make efficient use of the bandwidth nevertheless that would have been the conventional solution or they could have used a different frequency for each station that would have been another solution that's called frequency domain multiplexing what they designed was something that was really just heresy at the time it's just complete heresy what they decided was that it wouldn't care if the pulses overlapped so if one of the stations overlapped with another station of course there would be a great big mess and nobody would hear anything you just hear a sort of jumble of signals it's called a collision in our parlance but what would happen is both stations would then back off for a random amount of time and try again so it's a bit like one of those conversations that you'll have with people when you both talk at the same time you go instead of doing after you after you after you what what you should do if you want to get efficiency is you should just throw the dice and you say oh yes you got a one i got a six well you wait one second i'll wait six seconds you win right you go okay now that random back off was one of the innovations of aloha net which we're going to carry forward which we have carried forward to the internet and this idea of what's called contention network where each party shouts for space and if they get the shout everyone else keeps quiet because they can hear them listening that idea of contention has been taken forward on all of the networks across the whole of the uh the known uh planet uh quite radical and quite interesting and rather um anarchic i mean definitely a highly anarchic system okay now brief deviation this problem of how to communicate in um noisy or corrupted environments has actually been generalized um into well it's now called the byzantine generals problem and to be fair the byzantine generals problem is a bit more general than what i'm talking about here the real the true byzantine general problem is as follows which is there are multiple generals on your side and they are all looking to attack so in this case the um the uh citizens here in blue are the are on the tops of the hill and they wish to attack the uh enemy who are in pink at the bottom of the hill if they coordinate their attack they will win right if they fail to coordinate they will be slaughtered so it's very important that they work out how to uh coordinate an attack properly now in the general byzantine generals problem there are multiple generals and some of them are turncoats and some of them are liars okay now you can see why it has to be called byzantine generals leslie amphor who wrote quite a lot on this problem he was desperate to rename it because i think originally it was called the chinese generals which doesn't you know it's not very flattering being thought of as a turncoat is it so he was going to rename it albanian generals in the idea that there were no albanians albanian being a close society at the time and then his friend pointed out there were lots of albanians and they would be very upset so it became byzantine generals now how does the byzantine general's problem help us well actually as i formulated the byzantine generals problem does not have a solution there isn't a perfect communication that can circumvent it but i can illustrate one of the features that was also a feature of aloha net and has become a feature of the internet and so it works like this general one decides to send a message off to general two and general one knows he or she knows full well that that message might be uh intercepted and it might be corrupted so the message says something like let's attack you know at five o'clock this afternoon and this general then acknowledges this message and sends it back so in this case so long as you have received an act back from the general which says i acknowledge your message saying that we will attack at 5 pm then you have a little more security that you are both able to coordinate now as i said this is not a perfect solution i mean if you've got really byzantine generals of course it might be a lie of course they might have spoofed the message but the ack makes it much less likely that you will have a problem now in general this this idea is called arq automatic uh request receive request the idea that in order to have effective communication between two parties you essentially repeat part of the message or you acknowledge using some special signal um part of the message and it's a very important part of the internet protocol and again it was first developed in aloha aloha there are various ways you can deal with acts on some mediums like wires you can listen simultaneously when you transmit to listen for collisions so if you don't hear your own voice coming back at you as it were you know there's been a problem on other ones you use um receipt requests so you said well um i sent this out but i didn't get an arq i didn't get an act back therefore probably something's happened so i'll send it again so arq can work like that now how does this all tie in to the internet okay well the internet has a slightly later vintage than the uh than aloha net and um it it started with a a network known as arpanet named after arpa which was the advanced research project agency which was a essentially a way of the us military funding um things that they thought were important they've been various sort of conspiracy theories about why arpa funded this network they've all been denied they weren't sort of you know some part of global domination or anything like that as far as we can tell they funded it because they were funding lots of research groups in different parts of the us and they got frustrated that they weren't communicating properly and so they said well we'll fund a network that allows them to connect the early versions of arpanet didn't use the protocol that we're going to talk about but the real innovation was this protocol called ip the internet protocol and ip and all of the stuff associated with the internet has become um it's it's half about the standards and what you do and it's half about the bureaucracy of how those standards are introduced and regulated i think it'd be fair to say that lots of internet pioneers would be horrified to hear me use the word bureaucracy they don't think of themselves as bureaucratic at all but the standardization of these protocols is very interesting and probably worth a lecture in its own right the what what actually happens is um these documents are published usually in plain text like this and they're called rfcs requests for comments um well i suppose you could comment on them really but they are they are in essence a statement of what is going to happen what's curious about what's interesting about them is that they are a permanent record of what happened they don't get rescinded and they might might get um replaced um and if that happens you can track you can track the replacement on the uh on the website of the ietf the internet engineering task force there's a fascinating lecture on youtube actually about the bureaucracy of the internet and how it was briefly uh tussled away from the uh from the american from arpa who are responsible for it anyway this is the rfc for ip the internet protocol and what ip does is it specifies the format in which data should be sent and this is actually a quote from the most famous rfc in the world and it looks like this i'm sorry it's such a grotty figure this is how they this is how they write them and i think in the interest of sort of a historic um uh veracity uh you can see the original uh i'm gonna i'll just quickly explain it because you can see some of the features of your own internet connection in these diagrams so the first thing that comes is a number and the number says are we working with internet protocol version four which is almost everyone was until recently or six if it's six it's a six the next one is just a number that tells us how long the header is going to be the next thing i want to comment on is how long the header plus the data is that's a sort of check because if that doesn't add up to what you've really got when something's happened then we've got some flags or counters as to whether something can be fragmented this is quite important in the in the internet world you can't rely on every part of the internet having exactly the same size of computer or machine so you need a way of splitting the data into chunks and that's another difference between the sort of circuit switched or telecommunications idea telecommunications tends to be associated with very rigid protocols which everyone agrees this is not a rigid protocol this is a sort of marker saying are you allowed to fragment this or not i'll talk about this later this is an example of forward error correction over here this is a flag telling us whether this bit of data is to do with controlling the network or whether it's got data in it at all and then this is an interesting one this is how long this block of data is allowed to live in the network usually measured in seconds and that that's important for something like um voice over ip when you um when you're trying to communicate with someone by voice if your packets get lost in the network they're not going to be delivered on time so you just mark them with a time to live and the router sees them says oh no you've been around too long sorry you did he kills it ttl um what's this one this is an identifier that allows you to reassemble fragments if you need to down here we have some security options and i'll talk a little bit about those later and then it all ends on the end of a 32-bit block and the bit that i probably forgotten to talk about but it's the most important bit is the source address and the destination address now what is implicit in all this but perhaps isn't obvious is what this says is if you want to communicate on this network that we're calling the internet you your computer should construct a block of data like that and attach it to the data that it wants to send it's sort of it's a bit like a postcard it's got the information on the card that you want to send and it's got the address on the front of it and it just slings it off into the network just like a postcard just as you don't particularly worry about how your postcard got to you you don't worry how the post office chose to deliver it you just worry whether it arrives or not this is exactly the same idea and this is what we mean by a packet of data you'll see the word datagram used which i'm not very fond of but it is it is a popular word data combined with gram as in telegram so this is what the header of a packet of data looks like now what does the actual data look like okay well a couple of observations about this um before we move on this is obviously quite complicated you know it's sort of fiddly and intricate isn't it you look at it and you think good dear imagine writing a program to strip out all of that stuff you know it's it's quite sort of fiddly and getting all these fiddly bits together was the job of the rfc editor who was a man called john postel he's famous for postel's law which is the way you should deal with um these packets coming into a computer and basically it says if they're a bit of a mess coming in you should listen to them and try and work out what's been going on but you should not send out that are malformed okay that's that's postel's law which is the famous law of internet engineers now then the internet protocol is the simplest of protocols and you can see already that it's got some issues i mean the the first one that may be evident to you is well my computer's got lots of things going on you know it's got a web browser going on and it's got music being listened to and all those sorts of things how do all these packets sort of get rooted when they get into my computer ah well to do that we need uh some additional addressing and so this is where the user datagram protocol comes in which is the simplest protocol i could find this yellow stuff that's the bit i've just been talking about so that's the header so i've called it the ip header what we've got down here in green are two additional destinations and these are bit like sort of um they're called ports they are internal addresses within your computer so um for example uh port 80 is conventionally used for your web traffic so if something arrives and it's got port 80 in here then your computer knows to send it to your web browser so that your web browser can deal with it and assemble the packets into something meaningful we've also got some additional stuff here like how long it's meant to be in the checksum it's quite a few of these little checks around the place you've noticed and i'll talk a bit about them and then comes the data it doesn't specify what the data is that's entirely up to the two computers that are talking to each other of course there are protocols for certain types of data you know web data and so on now what about these check sums and checks and links and all that sort of stuff right that's a fascinating topic in its own right it's called forward error correction and so here's the question um how do you know if some data in front of you has been corrupted interesting question isn't it i mean what you would do normally if you were say an english reader is you would look at the text and you would say this looks like gibberish you know you're using your semantic understanding of the text to say well there isn't a word like that i can work out what's happened they've replaced an a or they've missed this letter that's a sort of semantic check computers don't do that at least at this level they don't what they do is they augment the data with checks and and balances the simplest check you can think of is something called parity so you count up the number of ones in the data block if it's an odd number then you add a one okay that's i always get this the wrong way round i think that's odd parity yes correct um or there's even parity oh well i'll make up the number to be even um or you could sum the number of ones in the data right that's called a checksum so you could say well there are 500 ones in this data so i'll write the number 500 into one of the checksum fields in fact there's an interesting set of codes that not only allow you to know that something's been corrupted they carry enough information for you to correct it how cool is that right that's a bit like a that's a postcard that's got all smudged and you sort of you give it a shake and somehow it rewrites itself to be as it was written originally they're very fascinating there's a whole lecture to be written about those codes they're called forward error correcting codes the person who thought of them was a guy called richard hamming um and hamming an interesting man he shared an office with um claude shannon the man who invented information now i'm sure you've noticed so far that essentially this this doesn't quite sort of bear doesn't quite fit with what you're used to on the internet i mean what i'm describing here is something that sort of slings packets out and they might come back they might not you know i mean this is a it's a fire and forget protocol and you might feel like that's happening sometimes on the internet but it isn't you know mostly you've got reliable communication so in order to do that you need another protocol and that's called tcp right transmission control protocol you'll often hear them said together in polite company tcp ip they often they go together like more common wise you know they're they're pairs now the tcp is like a bucket brigade um if you don't know a bucket brigade it's a bucket brigade is a way that a fire service might put out of a fire you form a great big long line of firefighters and one gets the water out of here and they pass the water from one to the other and pass it over that i couldn't find a youtube video of a bucket brigade so i found something similar so this is my sort of visual illustration of uh tcp perfect now i'm not sure that's approved by british building practices but you get the idea and what's happening there is that there is some complex interaction between the members of the bucket brigade over to you and so on and so on and that's what tcp does tcp is an arq protocol it has these acknowledges built into it and it looks a bit repulsive doesn't it on the screen but it's not that bad this is the old ip header that i explained earlier and this isn't a udp header this is a new header it looks a bit like it's called a tcp header has these ports these are where do you root it in your block of flats if you like but it has some additional features in here namely an acknowledgement number and tcp allows computers to what's called handshake each other so this is an example of the famous three-way handshake that goes on the tcp so tcp first if a wants to communicate with b what happens is a says to be i want to communicate with you and my sequence number is x that means the number i'm assigning to the first one of my packets is x and then b says back to a ah yes i acknowledge your sequence number is x thank you that's the arq that's the you know if there are any byzantine generals lurking around that's dealing with her interference and all that sort of things and then b says again to it and my sequence number is why and a says ah i acknowledge your sequence number is why so we've got this dub treble sort of handshake going on at the end of it they've established communication so now when a or b sends a packet it has this unique number in it the sequence number here and they can be assembled in the right order at the other end so this is very important because it means that as far as the programmer is concerned for tcp they've it's a bit like opening a telephone circuit so you can now just fire packets at a tcp port and they get assembled as if by magic the other end they're not traveling through the network in order right far from it some of them are going via moscow and some of them are going via i don't know miami but they all get assembled at the fire and so as far as the programmer's concerned you've got this easy to handle programming model and that's really i think why tcp ip is so popular well tcp originally was implemented on a system called an imp and this sort of thing on the right hand side here which is the size of a refrigerator is the interface message plus in uh computer uh imp and it was devised by a very interesting company called bolt berenick and newman who are now part of raytheon i think uh bbn or bbnm are um they were given the presidential one of the presidential medals of honor um under the last president i think um the last but one president obama and because they were so important they actually started as an acoustics company but for various reasons which you'll have to go read about yourself they got into the internet and were very very influential and did some brilliant engineering a very short notice but it's also been implemented over um carrier pigeon okay this is this is the only picture i could find carrier pigeon short notice this is some speckled gym from uh black adder so it's fundamentally this block-based protocol that gives a convincing impression of uh continuous data flow as far as programmers go right ladies and gentlemen you now know as much as you need to know i think probably more than about tcp ip now you've probably got some questions i know how i had when i sort of went through this i mean the first question is sort of how does anyone know an address that's a bit like the post office question how do i know your address uh to send you a letter well you tell me it um and it is one answer there's a central registry is the answer to that one uh called a domain name service and the domain name service allows you to convert uh human uh forms of addresses like www.google.com into a number what about these collisions i haven't said very much about that well i'll talk about that in a moment and then there's another one that might occur to you is everything's in the open here you know the data the addresses the ports anyone can listen and read isn't that slightly um alarming you know so so here are my answers to those things so the idea the addressing protocol is not really very difficult to to deal with there's a distributed set of addresses and um the idea is that your web your computer is pointed at the nearest available set of addresses and if he doesn't know the address it asks another address server and so on and so on until either it's not resolved or you're or it is resolved so that's an easy one however congestion that is an interesting question particularly as almost everyone on this call will have spent months recently on zoom or teams calls bemoaning their internet bandwidth and presumably i was going to say shouting at their children upstairs to stop internet gaming but of course it might be the other way around maybe maybe the internet gamers are watching this and um it's they're shouting at their parents to get off a stupid zoom call with work um so this is a map of the uh arpanet as it was in 1984.
it moved around a bit in the early years and uh the eagle-eyed people might spot that poor old britain isn't even on the even on the network it sort of we dropped on and off uh arpanet uh in ip because we couldn't decide a lot of europe couldn't decide whether this ip thing was a good idea and there was a bit of a debate actually earlier on as to whether uh the internet should be based on internet protocol there's great suspicion about these contention networks they didn't really like them and they're not very controlled and the telecommunications company didn't like them anyway there's a little cluster up here in the left hand side of the internet which connects uh the universities of berkeley in california lawrence livermore laboratory and lbl which i think stands for the liverpool berkeley laboratory lbl and um berkeley are about 400 meters uh apart you know but they're connected by this a bit of a circuitous uh route going by lawrence livermore and um there was a bit of a curiosity going on because um in the 80s um the internet just died um it sort of the internet crashed so um i want to talk a little bit about that and what's happening in order to do that i want to say a little bit about how tcp uh communicates using this arq protocol so what you might imagine happens is when we're dealing with tcp is it sends off a sends off a packet to b and b sends back an acknowledge so you've got this too and flow of information so this is an example of communication using a single packet window and it's horrendously inefficient because the amount of time it takes for this packets across say the ocean or some satellite link or something i could have sent more packets so this is a terrible communication mechanism in the sense that you know one side goes hello oh hello yeah it's a great big pauses so there's a tremendous pressure to not do this and to use larger tcp windows that would be more efficient so let's put you know four of them together and only send back a single acknowledge when we get all four now we can do this with tcp because the packets are ordered so although they might not travel in like a little train we can assemble them so they look a bit like a train so that's a sort of practical uh proposition and we often say that tcp therefore is self-timed using the acknowledgement signals so larger windows make more efficient use of the link but and i'm sure you're ahead of me the larger you send these bursts the larger these bursts are that you send the more you're sort of dominating the whole link so the more likely it is that you're either locking out other people or causing congestion and when you've got congestion um the way the internet works is you back off so is it oh i didn't get an acknowledgement back oh well i'll wait for a bit of time and i'll resend the data so it's a bit like one of those irritating people at dinner parties who keep telling you the same old thing again you know because because you failed to give them a positive acknowledgement you sort of went yes yes they tell you it again very irritating my recommendation is to be more like the internet and say yes thank you we've heard that long enough uh that will as a positive acknowledgement it might shut them up so none of this was really sorted out and in 1986 the internet crashed and uh it was this guy vin jacobson who realized what was going on and he produced this little early little graph of the problem and what he's got here what he's plotted here is the send time across one of these links and the package sequence number so tcp has sequence packets so what you'd expect this is to go up you wouldn't expect it to do this this is a re-transmit is it oh no all the way back to packet 10 again oh so back in 50. and this had sort of essentially took quite a speedy link down to almost nothing and that was the beginning of congestion control and congestion control is a hot topic in the um internet of today i'll just briefly explain how it works the idea is that um so why don't we just start slinging data down the uh the network the answer to that is we don't really know how much bandwidth we've got available because it's a distributed network and it could change so that leads us to an algorithm and this is jacobson's first uh algorithm it's they're named after places so this is tcp tahoe and it's an additive increase multiplicative decrease uh algorithm aimd so what you do is you start with a single packet and you say here's my first packet and you get an acknowledgement back and you thought oh great that worked next time i'll send two packets i got an enlargement power next time i send three packets next times i'll send four packets and we keep doing that until we reach the advertised buffer size of the receiver and i don't expect you to remember this but there was in a protocol a possibility for the receiver to say this is this is how much data i'm expecting uh each time so and we that once that has reached that maximum buffet size that's fantastic and we're now throwing data down and we're getting good channel utilization as people would say we keep doing that until we don't get an acknowledgement back the moment we don't get an acknowledgement back we we're going to we assume we've got congestion and there's some a sort of safe assumption with congestion is you had sole use of the link so if someone else came along and caused congestion there's now two of you trying to contend for the link so let's halve the window size so we'll give that other person 50 percent so very sort of fair and socialist sort of thinking so that tends to lead to a pattern of transmission and back off that looks like a sawtooth you've got this increase and the drop an increase in a drop and you put all these sawtooths together and that's how the internet that's how it works and uh when you're doing that properly it's called riding the sawtooth so if you hear people about riding the sawtooth that's what they're talking about they're talking about uh adaptive congestion control beautifully adapting to the uh the stochastic nature of the channel now you might be saying oh come on richard you know this is all very sort of arcane and wasn't it all sorted out in the in the 80s it can't really be a topic for current conversation but that's not true it is very much a topic for conversation so um if you're if you're wondering why your um internet isn't working at home i mean there might be multiple reasons but i mean have a look at bufferbloat.net which is a fascinating website run i think by jim gettys and dave tartt who are just sort of you know enthusiasts really looking at for looking at congestion problems and what they pointed out is um this situation can easily occur even with modern congestion control so let's imagine you're doing a backup from your computer to a server so you're going to send out some packets and those packets pretty much fill up the buffer of the server and they fill up the channel i don't think i said what a buffer is a buffer is a queue right it's a block of memory that serves as a a queue first in first out eq now once it's got here i'm sort of indicating that we've reached your modem probably has a big buffer in it as well so as we start to fill up that big modem the modem is communicating back to your computer and it's giving you acknowledgements and at the moment it's giving positive acknowledgements it's saying well i'm not full keep sending me the data keep sending it to me and your computer's loving this you know it's just piling stuff into the data and as fast as it can go so you're giving this very convincing impression of everything working perfectly meanwhile you decide to do a bit of web surfing because it's taking a bit of a while to do this backup you know now web surfing also takes place right as tcp so that's got a bi-directional protocol both of you are sending acknowledgements opening a web page has quite a few little handshakes going on as a what addresses this then have you received this data yes i've received it all that sort of stuff the trouble is your buffer's full with all of this backup stuff so your acknowledgements don't get through and because they don't get through um the web server that's trying to communicate with you just says oh well he must be on a bit of wet string i'll just throttle back my bandwidth because i don't need to deal with you so that's buffer bloat and it's an example of how sort of dark buffers in the internet can cause havoc and to be honest they're not causing havoc with bandwidth bandwidth isn't really the issue here most people's internet connections are big enough you know if you if you work out what sort of internet connection you need in order to be able to stream a netflix film do a zoom call do a couple of internet telephonies and a bit of gaming you you've probably got enough already the problem is latency and latency which is these delays between people that is what you know buffer bloat is a latency problem and it's a it's caused by because all of the data is sitting there in the queue waiting to get there so if you want a quick analogy um you're going to a restaurant and um there's a queue outside the restaurant that's the buffer and it doesn't look so big so you you say well let's get in this queue great then the the matron hotel sort of takes you from the head of the queue and weaves you into the restaurant and in the restaurant you discover there's this vast queue huge queue well you'd never have gone in if you'd known there was this huge queue and um you absolutely wait there for ages meanwhile there are takeout orders and those guys i mean i'm sure you know what happens with the takeout it gets cooked it gets put on the side and by the time you get it is freezing cold so they really need to be serviced immediately they're the equivalent for your voice over ip packets arriving at this buffer saying we need to get through the buffer's full the queue is full so that's what's happening now what's the solution to this it's a very interesting one um and one that's very common to you the solution is just as the solution with the restaurant the solution with the restaurant is one not to have unscrupulous maitre d's who take people into secret cues but also to put out a sign which says we're full go away right that's called tail drop and tail drop is where we deliberately shorten the buffers and we just drop packets as soon as we drop packets uh the upstream version of the internet your computer and so on will realize there's a problem and they will throttle back to make sure that the buffers don't get over full and the bandwidth control all works again so dropped packets trigger the multiplicative increase tail drops are i say it's fascinating because it's a very interesting book by brian christensen and others on them algorithms to live by which talks about how understanding computer science algorithms can be fast can can help your life and he he draws perhaps rather stretched parallels with tail drop with other things the one that i like is um you know you're probably your employer says when you go away on holiday you should have an out of office reply on your email and it will say something like i'm sorry i'm aware you know enjoying myself um but your email will be attended to when i return right that's the equivalent of a vast buffer i don't know if you've ever returned returned from holiday to find 800 emails in your inbox why did i promise to attend to these things it's madness what you should have said was tail drop you said your email will not be attended to right find someone else right that is the correct protocol and tail drop is a important protocol for a lot of things and is perhaps under under underused now security you've got these packets whizzing around between people and they're human readable so if that was all that was happening two people connected via wire we wouldn't worry but in practice there's often a third person and they might be malicious and interloper like this and so they or they might be not malicious at all they might just be accidentally in the chain and there's lots of ways if you're interested in computer security there are lots of ways that you can uh insert yourself in there so if you if you're on a wi-fi network for example you set up something called a wi-fi pineapple and a wi-fi pineapple is a is a a wi-fi service that looks like the open wi-fi network that you thought you were going to connect to you know so you're in an airport it seemed so long ago in the pandemic doesn't mean you're in an airport you know you see a bt open zone or an at t uh server there and you you think well i'll connect to it and it's not a genuine one it's something pretending to be the a t server it predents all the web pages but it captures your data it's called a man in the middle attack so what can be done about this well you know the the obvious sort of solution uh is encryption so just sort of thinking about this package structure again we could encrypt this bit which is the data because that's people argue that's the bit that matters well lots of things do that when you use https on the website that's what it does it encrypts this but it doesn't encrypt this bit here the header and that doesn't give you superb security because it means anyone who can sit there can work out what's going on you know so let's call that doing that it's called traffic analysis so the solution two which is often combined with encryption is to control the routing the internet is fundamentally sort of an arctic routing you know the idea is that packets can go anywhere and you shouldn't worry about it but a virtual private network or vpn is an attempt to control the routing and the way it works is you're not allowed to you're not encouraged to connect to the internet you connect to a machine that is in some strong room somewhere virtual strong room usually in your employer's basement and you absolutely you this connection here is totally encrypted meaning they can see that your people interlopers can see you're communicating with this machine but they can't see any of this any of the addresses that you're subsequently using because they are themselves wrapped in a packet this thing strips off the packet and sends it out into the internet or wherever so that's how a vpn works it essentially rewrites the headers of your um internet traffic and then the service that you're trying to access sends it back to this vpn machine and it then sends it back to you and you then decrypt it and it's hell on earth and for a lot of employees because these things are slow you know in my experience everyone said no vpn yeah you try using the ones that most people have access to they're really slow but controlling the routing in this way is one attempt at security however i thought i'd talk about one that was just a little bit more current than the vpn and to do that i just need to briefly explain something about how this routing works so if you are in some remote country and i've picked a country that happens to make a great use of this technology syria syrians are very keen on this technology for reasons that you might imagine your packets go through a series of hops so the other day i measured my path to google from my house and i measured 10 hops to get to google and it took them about six milliseconds to get there which is an appallingly long time i must say shouldn't take 10 milliseconds to get to six milliseconds to get to anywhere but you'd probably go up to the moon and back in that time but you know that's what it is this path is quite convoluted and obviously you're worried about this path because anyone along this path could read your packets and you might think that just asking for news from a bbc website is not a very you know who cares uh there are quite a few parts of the world where people do care very much whether you're accessing that news website so the the solution uh proposed is up here on the slide it's called the onion router or tor and you'll see people refer to tour browsers and or it's sort of rather graphically referred to as the dark web um i'm not quite sure why it's the dark web but you know it's used for security the original ideas here were funded by the office of naval research in the united states of america and they needed a way for people in foreign territories that weren't that might be hostile to the united states to be able to communicate safely even though the path might be controlled by a vert by others and it makes use of a protocol called sox which is a proxy protocol and what proxy does is it rewrites your internet headers to make it look like they came from that device so i send say my web traffic to a proxy the proxy strikes out my address which is on the packet puts its address in its place sends it off and when it comes back by a fancy bit of technology it says oh yes i know that was one of richard's packet i'll send it back so that's fine because proxy can obscure the the both the the back of the chain if you like so that's good but what about if the proxy is owned by some uh foreign power or somebody who wants to do some harm or just listen to you um what can you do this was the clever thing about the tour what tour did was it your computer uh it has a path through which it's going to send its things and it communicates with each one of these to get a secret key that allows it to encrypt the packet so firstly it encrypts its packet with the secret key for node four then node three and node two and node one off goes its packet it hits node one node one removes the encryption and sends it onto node two node two doesn't know about your computer at this point it only knows it's come from node one and the only thing node two can do is strip off the encryption and what it discovers when it strips off the encryption is yet more encryption so if node two was captured by an enemy agent they can't get they don't know who you are they don't know where you're going they know where this came from but this this also has to be captured by an enemy agent so that's the basis of of tor it's very tricky to capture the whole of this network so you can see how it works you just send off these packets and the onion is the the onion leaves of encryption and these are quite long unconvoluted parts which is why tor is quite slow uh thing to work and eventually uh in this case our service gets a a request for some news and the whole thing happens in reverse on the way back great um and tor is sort of um it's a high it's highly disliked by a number of politicians who feel that they jolly well ought to be able to see everything on the internet um but often and the american security services them complain have themselves complained about the use of tor for nefarious things the irony is is you know it's a curious one isn't it the system developed by them has in fact become known for uh protecting people's uh privacy it's a it's a neutral technology in the sense it has positive and negative benefits but that's the that's the basis of talk now then there seems to be a lot to me that i haven't talked about in this lecture on networks and some things i regret not talking about i hardly said anything about wireless and wireless is really important there's tremendous consumer pressure for wireless communication nobody wants wires dangling around their houses but obviously the problems of congestion and security are much more serious when it comes to wireless and they're not solved you know there's a lot of work going on at the moment on better security and access protocols for wireless latency we have talked about is the achilles heel of tcp things take a while if you've ever tried to have a choir practice on the internet it's a nightmare because not only does it take a while for your audio to reach the other side each person has a different latency so you're all singing literally to the different almost to a different song sheet i haven't talked about the internet of things which is a fascinating topic in its own right if you're interested in that there's a very good gresham lecture by martin thomas which talks about the internet of things perhaps a little bit gloomy says there are some security issues with iot and that's that's true as of now but they might get solved security and privacy i've touched upon but i think it's such a fascinating topic that it really deserves a lecture in its own right and in fact that is the topic of the next lecture which is the future of computer security if you want a primer on computer security i thoroughly recommend a lecture by tara wheeler which took place here at gresham college a few weeks ago and she was talking about whether these sorts of nefarious deeds can be considered to be war crimes thank you thank you very much professor harvey my name is simon thurley i'm the provost of the college and um i've been scooping up a couple of questions for you um first of all you will be amused to hear a piece of chat um anna g from the usa says um here in the us sometimes it's the parents playing games online and the children attending classes on zoom as soon as i said that i knew it would be the other way around i knew that there would be it would be the other way around it would be the kids doing zoom and the parents uh playing candy crush so um here we have someone who didn't quite catch what you said just wants a bit of clarification what is the name of the lecture online about the development of rfcs or something and something about being tussled away ah yes so there's um i wish i could remember you'll have to search on youtube there's a um it's the history of the ietf is a uh is a rather long presentation by one of the founder members of the internet and i'm afraid the name escapes me but i will post it in the in the chat when i when i get back home great if you could that'd be fantastic um and the other question we have this evening is what recommendations do you have for further learning about the underpinnings of the internet including its technologies and their security well you know the rfcs i mean it seems bizarre to say read the rfcs but they're very readable and if you read them in historical order they are super great and that's what i went back to to prepare this lecture because what i discovered was there's a lot of people giving overly simplified versions of the internet which is a bit unacceptable so i recommend actually the rfc start from there professor harvey thank you very much thank you
2021-05-01