FWD50 Extras: Technology Crash Course with Alistair Croll
Good morning everybody. And welcome to another FWD50 Extras. It's a little bit terrifying. This is a bunch of new stuff, and I've actually been working on this content for the better part of half a year now. We were originally going to do this session earlier in the year.
Things kept coming up that were more interesting than this. And so we pushed it forward. But I am going to try in the next 90 minutes to give you a framework for how to think about technology in what we're calling a technology crash course. I will say that this started out as a bunch of lessons on different technologies and that's still in here, but as I tried to find common threads across all the different lessons, I realized that there are some underlying questions and concepts that people need to understand that aren't necessarily about a technology, but about technology itself. And when I asked people, many of them said, you know, I have no idea what that is.
So if you're watching and you're in the comments I have a question: can you really describe the difference between analog and digital? You don't have to try. I'm just curious if you're in LinkedIn watching this and you want to type that into chat. Tell me if you can tell the difference, like, could you walk up to someone and clearly explain the difference between analog and digital, because so much hinges on that. Now, before we get started, I want to talk about where we all are. I'm personally joining you from the unceded Indigenous lands of the Kanien'kehá:ka Nation. But as we're meeting in a virtual environment, we also want to acknowledge from coast to coast, the ancestral and unceded territory of all First Nations, Inuit and Métis people that call this land home.
Et aujourd'hui je vais faire mon présentation en anglais, mais si vous avez des questions en français, posez-les. Je suis confiant que je peux les répondre en deux langues je suis heureux de vous parler ou discuter ceci en français ou en anglais. Okay, here we go. First of all, for some context, when I was 11 years old, my father died.
My mother bought me a computer and shortly after a modem, and I spent my summers as a young kid building bulletin board systems, which are a precursor to modern sites like Discord. They were really, really slow. And I can remember reaching inside my computer to push the actual physical chips into the motherboard when my computer started acting weirdly. I remember picking up the phone and listening to the carrier sound, the boot of that modem. And I could go into my computer and type something and hit Enter and I could hear that tone change as the letter I had just typed got sent down that wire.
So I was born with one foot in the analog world and one foot in the digital world. There were half as many humans on the planet the year I was born, which is astonishing in itself. And none of us expected to receive news any faster than a letter. In fact, when we made transatlantic phone calls, we would sit with a timer because the phone company would bill you by the minute.
And we didn't want to get billed for that extra minute. So we would actually spend time on phone calls and the latency would be high because the signals would be bouncing off geosynchronous satellites. I'm incredibly lucky to have grown up in that analog world, because it has helped me understand and stay relevant in a rapidly changing world today. And I didn't really realize the stack on which tech was built and understanding that was such a privilege until recently.
My 11 year old daughter is lucky as well. She has a tablet and a stylist, and really geeky parents. She has access to tools that I could only dream of. And she navigates Discord easily. She pinches and swipes and scrolls as easily as we read and write.
But these technologies are not something we should take for granted. They're changing what it means to be human and leaving behind many as they become more mandatory just to participate in modern society. And I think we're waking up to this huge divide.
So I want to take a step way, way back to really understand what's going on here. If I cut myself and then I eat some cottage cheese, my body turns that cottage cheese into repaired skin. It does it for all of your bodies too, and we don't have to think about it. It just does it, the information for how to turn cottage cheese into repaired skin that looked like the old skin is contained in every single cell of your body. I don't think we spend enough time really thinking about that.
Your entire body is made from atoms that came from the earth and some of those elements were formed in stars. And the only reason that I, Alistair, I'm in this particular form, able to turn cottage cheese into skin or speak or think is because of information that has assembled this particular set of atoms into neurons and created chemical pathways that turn food into oxygen, into electricity that flows through them, giving me cognition. That's freaky. I mean, life literally is information. The difference between this pile of atom sitting on the floor and this pile of atoms sitting here speaking is information about how to organize them. Wait a minute, you say.
This is supposed to be where I come to learn about cloud computing and AI and stuff. Well, did I come to the wrong talk? And I'm here to tell you, no, you haven't. Stay with me here. Informatio.
Isn't just how creatures exist. It's also how societies function. It's how we communicate. Humans differ from other species in one really important way. Language is shared information.
I can talk about using a rock as a tool rather than picking up a rock to demonstrate. And this is far, far more efficient in terms of time or energy or risk. In fact, I'm going to show you a tool right now.
This is a stone age tool. This is an actual ax head. It's between 400001.8 million years ago.
I'll explain why I'm holding it and why I have one later. But remember that this particular tool we created, because it was more efficient, it was a better way to focus energy. Language and the sharing of informatio, lets humans transcend time and space. It's no exaggeration to say that language is the foundation of human society.
Our brains already live in a multi-verse where we can speculate on possible outcomes and other worlds. And if you want to nerd out about this more Stephen Fry has a podcast called Great Leap Years, which is an amazing foundation for this kind of thinking. But for most of human history, the main job of technology, which is essentially the ability to exchange information and build societies using tools was analog.
The main job of technology was to make energy use more efficient. At first we focused our energy. I mean, I'm not strong enough to pull a horse, but if you give me a pulley and enough rope, I can pull that horse ramps and leavers and stones and so on are technology. This stone tool has an edge, which allows me to focus energy on a specific point. Later on, we outsourced our energy in the form of slavery, beasts of burden, waterwheels, steam reactors, uranium reactors.
These were ways of first focusing our energy and second harnessing our energy. And most of technology throughout most of human history has been in the service of focusing and harnessing engine energy, essentially focusing and outsourcing it from elsewhere. Today, we're going to take a much narrower view because we're not going to talk about technology in general.
We're going to talk about digital technology and that's super important. The word analog technology, which is what has been happening for most of human history is fundamentally different from digital technology. The root word analog actually has the root of the word analogy.
An analog thing is an analog or something similar to something else. If you think about a record player vibrations from voice, make a needle shake and it leaves a groove in the record. If you put a needle on a record and amplify its vibrations, you get a slightly imperfect copy of that sound back. If you scream loudly, while recording a vinyl record, you can actually look up close and see the scream on the vinyl. If you look at this illustration here in the top right corner, you can actually see that some of those lines are thinner and straighter and some of them waive a lot. And that means that the ones that are wobbly that's where the record is loudest.
You literally have an analog of the vibration. George W. Johnson who was born in 1856 was a pop star in his day. He was the first African-American to sing on record and he sold more than 25,000 wax cylinders. And to do this, he would literally play the same song sometimes 50 times a day, while many recording devices were pointed at him because that's how you copied analog.
And we've come a long way since that time. But all analog technology is still based on that fundamental idea that one physical thing is analogous to another, whether that's a bunch of magnetic particles on a tape, which is what you see under an electro-microscope here or the width and height of an electromagnetic pulses in a radio broadcast. But analog is a terrible way to work with information. To illustrate this, I'm gonna tell you a story. Imagine that I wanted to record the number of sales I made each day, and I was gonna use bricks to do this. So on a slow day, I only have a few piles in that brick and on a busy day I'd have many.
Now imagine you want a copy of my sales records, you need to get your own bricks and you need to make matching piles. And the size of the piles would be analogous to the sales. You need more bricks to represent a busy day. Maybe you'd make a mistake or copy it wrong, or ran out of bricks. Or estimated it and get bored.
And then if someone copies your record or estimates it, eventually the number gets degraded worse. Imagine you needed to transport my sales records to show the tax authorities how much you earned. You need to carry a bigger pile of bricks on the busy days than on the slow ones. And nobody would do that.
Right? We just write down the numbers like the digits that represented each day's sale. So I might write down four for the slow day, and I might write down 20 for the busy day and so on. The number 111 takes less ink to write than the number 88, but it's still a bigger number. Digits have nothing to do with the amount.
So you can think of analog as a knob. You can turn it in tiny increments, but if someone else turns their knob to the same amount, it won't quite be the same. Digital by contrast is a specific number. The note isn't turned to a position it's turned to some value 11 and so digital at its root just means I count it rather than making a scale representation.
And digital really goes back to the era of mathematics and the carts and the desire to quantify an dabstract the world. This, for example, as an analog clock, if you want to count higher, you got to add more sand. So we count and counting is way better. And in the big scope of human history, we only really started counting things recently. Homosapiens has been around for about 20,000, sorry, 200,000 years.
And the first known use of numbers was in Mesopotamia around 5,400 years ago. They actually had different numbering systems for counting beer or milk or wooden objects. If a human history were a whole day, we really started numbering things around 11:20 PM. This is a super important concept, but it's often poorly understood. Knowing, really knowing what the difference between analog and digital is at the root of understanding what's happening to humanity right now. Now there are plenty of numbering systems.
That Mesopotamian one had a unique symbol for every number from one to 60. Today, we happen to use decimal because we got 10 fingers, which was how we counted. So our numbers range from zero to nine. But it turns out that the simplest most powerful numbering system is binary in which each digit has only two possible values, zero and one. And there are lots of cool things that you can do with binary mathematically. Digital systems are not proportional to what they're representing.
It's not harder to store 1 0 0, 0, 0 than it is to start one zero. Whereas 10,000 bricks is a lot more than 10 bricks. When you look at this CD, you can't tell where the loud parts are and the quiet parts are. All you see is a bunch of little ones and zeros, or at least that's what the laser sees and turns back into music. So digital just means counting binary.
Digital means counting with ones and zeros. And most importantly, because binary can be represented as ones and zeros, true or false, yes or no, instead of a knob. It's a switch. That means while digital means count, we usually equate it with binary because on and off can be stored with electricity and electricity is cool because it travels at the speed of light. It can be altered very, very, very cheaply. Remember all of human history, all of technology is about using energy more efficiently, whether that's planning with language or focusing energy with a pulley or the edge of an ax, or using the energy from something else with a solar panel.
In this case, binary stored with electricity allows us to work with information more efficiently than anything else. By turning information into ones and zeros and transmitting them electronically, we can do very interesting things with it, unprecedented things, things that change the nature of how information works and how life and society work. Notice that I've said, well, nothing that I've said so far is hyperbolic.
Everything's just science. I haven't made any weird predictions. I'm not selling you. It's natural.
These changes will take generations to wash through humanity. After all the invention of the printing press, which was a very analog thing, gave us the reformation and the Republic, but it took a century to go from the first printing press to Martin Luther calling for the reformation. And unfortunately many institutions haven't caught up with the inevitable changes that are going to happen as we move to digital electrical binary information technology.
So we're going to talk about tech fundamentals today, but I'm not going to try and make you an expert in those technologies. Instead, I'm going to show you how digital technologies change the laws of information and why that means you need to adjust the way you think as a species we're in the midst of another tectonic shift from atoms to bits and we, all of us on this call, are in the middle of that mutation and like any mutation, there will be calamity and many of its offspring will be mangled. This is tragic, but normal because we are part of natural systems thing is while the printing press changed the rules of language and the radio changed the rules of information dissemination. The switch to digital is far stranger because digital changes the rules of how rules are made. Certain fundamental laws about time and distribution and the cost of a copy and more, which we took an as unchangeable up until now have completely been appended. We aren't just talking about a new medium that organizations have to embrace like newspaper or print.
We're talking about new organizations. So I'm going to take a pause and a sip of coffee here. And I'm going to say to you, if you're watching this online feel free to chime in.
There's way more content in this slide deck than I can present today. And so I've chosen a few topics and technologies that I'm going to get to, but if you have specific ones you'd like me to dive into more deeply just comment, and I will be happy to try and tailor the talk to those. All right.
Some fundamental concepts. I want to explain these fundamental laws before we get into the specific technologies. The biggest idea I want you to take away today is that matter atoms is different from information or bits.
As we move from much of our society from an analog world, where we represent things with an analogy for it to a digital world, we count it and we can transmit it instantaneously, much of what we understand as risk and what we understand is scarcity and what we stand as true changes. So you need to step back and say, based on digital, what is now abundant and what is now scarce, what has become cheap and what has become expensive. What's fact, and what's fiction.
What's recorded and what's forgotten. And so on. Let me give you some examples of this. Most of you are probably wearing a shirt, although I know with COVID in your camera's off, you don't have to but a shirt used to take months to make it was considered a luxury item.
Today, most people in the world can get a shirt with a click of a button. And I do mean most humans. This is really something that we have massively upended technologically. The shirt cost got cheaper because we invented things like mass production, but it's still not free in a digital world. Copies are free. I mean, mostly free, like for most people with very little direct costs.
There are definitely externalities in terms of pollution and energy consumption. And some people don't have access to tech. But if I send you a copy of something that copy is identical to the original and I haven't lost anything, I still have my original copy. The high cost of an additional physical unit what's called marginal cost is why we invented the assembly line. So let's make that t-shirt available.
But the tiny cost of a digital unit is why we invented digital rights management to protect songs from copying or NFTs to create fake scarcity in the digital world that copies free. That might seem obvious, but think it through. If copies are free, you can undo things because a copy is cheap. I can save a copy at any point in time and load it back in, which is kind of like a time machine.
If you had a word processor and you typed out a page, you were careful. And if you made a mistake, you went back and you used white out to go and correct that thing. That's very different. Whereas, if you're actually in a word processor you're kind of reckless and careless, you don't care nearly as much about whether or not you can see you get a correct cause you can backspace and you can change it. And so with digital, you have to ask yourself, is the thing I'm doing a one-way door, meaning there's no way I can go back or is it something I can undo easily? Is it something I can experiment about every decision you make, you now have to say, would you think differently about risk? If you could go back and change the past. So copying has as a consequence, the ability to undo things, there's more to it than that undoing things means elasticity you'll have cities to the means I can easily, I can easily, sorry.
I'm looking at the comments here. It looks like people are having a lag. So if you're having lag, try reloading because I've had problems with LinkedIn stream as well. Elasticity means I can scale things up and scale things down.
I can shrink things and I can grow things. I can create an account or a computer virtually for nothing. And as a result, the cost needed to run an experiment. The cost to get started is tiny. In the old days, if you were setting up a business, you have to go create a business license, all sorts of things.
Now you can go create an Instagram account and try and sell things and see if you get attention. And so we really have changed what matters and what the costs are, the costs are no longer in the upfront investment. They're much more in the operating costs because of the ability to copy something a hundred times, which is scaling up, or then save those things and shrink down to one, which is scaling down.
Another really important concept of this is permanence. In the physical world, things exist by default. I have to actually try to destroy something matter is permanent and unique and hard to change and expensive to copy it's permanent unless we destroy it. On the other hand, information is a femoral it's copyable and fungible it vanishes unless we actively maintain it. So this has all kinds of consequences for truth. If I can put fakery out there and make a thousand versions of something versus the one true version that's physically provable.
I have all these issues about veracity as well. There's another important concept. And I'm going to come back to this in a few minutes, which is the difference between hierarchy and structure versus hashtags and stuff.
In the old model, we put everything into a hierarchy. Yahoo in the early days of the internet was a giant list of hierarchies that had things like arts and humanities news reference and so on. And you would click down and navigate through these things to find the category you were looking for. Google came along and said, no, no, just type a word. And then we'll find it for you. This is radically different.
We've gone from having to know the structure of things and fit the world into the model that we have to just pouring all the information in and letting the algorithms search it for us. There's another really important concept, which is that of iteration. Imagine that you're building a battleship, you need to get your battleship right the first time you got to build it and then you ship it and it goes into the water.
It's very hard to sort of bring the battleship back to shore and change its structure without some significant costs. So you got to get it right. Which means you plan. And then you launch the thing when it's ready. With digital, because it's easy to make changes, remember, you can make a copy, you can modify it, if it works, you can distribute it because pushing the software out or the website out or the service out is essentially just digital means you can have constant small improvements.
It's much easier to update information than it is to update matter. So that means you can release a first version and then alter or improve it nothing's final, but nothing's finished, which means you should spend much more of your time planning, sorry, much less of your time planning and much more of your time doing. Strategy is delivery going ahead and starting with something and getting it to a first version from which you can then decide what to do. And you can test and get feedback is much more important than having a 10,000 page document that outlines every possible outcome for something you don't even know anybody wants yet.
Another important concept as a result of this transition to digital is analytics. Humans are terrible at writing things. Machines have no choice, but to do so. So this means that in a digital world, everything is tracked. I can make a copy of the records of every single transaction, and then I can slash the costs of analyzing those records with algorithms.
So that means that in a digital world, we should expect transparency, metrics and accountability to be the norm. And that we have to go out of our way, not to share the information as opposed to going out of our way, to capture and analyze and share the information. And one more important concept parallelism with technology, because I can make bits copies of bits and send them out to dozens of places and then reassemble them. I can work in parallel.
You've probably already seen this when you use something like Google docs. Google docs is not actually a word processor. Google docs is a whole bunch of little transactions between each user. Where each person makes a change to the central document. And then what you're looking at is sort of the time compilation of all those changes. You can have hundreds of people working on a document and Google docs will resolve the changes in those documents based on who's editing them.
Things can be done in parallel because what Google docs is doing is breaking up that document to do a bunch of little transactions between all the people that are working on it at the same time. This is also how search works. There's no one computer in the world that contains all of the information on the web. There are dozens and dozens, actually thousands and thousands of computers.
Each of which knows a little bit about the web. And every time you do a Google search or a Bing search or whatever tool you're using, you share that request with all those machines and they each chime in with the answers they have. And then that's shown back to you. This also means that things like crowdsourcing are possible where you can push something out on a social network or ask people to work on a task and get the answer. Okay. That was quite a lot to digest.
So I'm going to go through them and reiterate very quickly some of those points of what changed and what didn't. First of all, in the physical world, copying was hard, making a copy cost money. In the virtual or digital information world, copying is easy because a copy, which is identical, perfect copy costs nothing.
As a result, undoing something in the physical world was hard to do. You changed atoms. In the virtual or digital world, it's trivial. You just roll back to the earlier version. Amplifying something in the physical world was hard.
You have to own a newspaper because the costs of pushing out copies of the newspaper or the costs of using ups scarce parts of the analog wavelengths of radio or television were high. And so they were regulated. In the digital world, amplifying something is easy.
You just make lots of copies. Again, copies are free. And the physical world experimentation was hard because when you did an experiment, you are actually changing the physical world.
Experimenting in the virtual world is easy. You can make dozens of simulations and you can play with the copies. You can try folding protein or running Monte Carlo simulations.
You're making copies and playing with them and comparing them. In the physical world, collaboration was hard. You had to be with other people because you couldn't easily share the thing you were working on.
So you might paint a canvas with 10 other people, as long as you weren't painting the same section. Whereas in the virtual world, it's easy because everybody works on a copy and they pull them back together. Permanence in the physical world is hard. That ax that I showed you earlier is hundreds of thousands of years old because it's permanent. Nobody had to pay the power bill to keep it working.
Whereas in the virtual or digital world, permanence is really hard. Things vanish unless we work to keep them there and pay them. Truth is similarly difficult in the physical world.
Truth is, it's hard to lie because you have biometrics. You can create proof. If someone's in front of me, I know they're not also somewhere else in the virtual world. There's a separation of what I'm seeing and the facts that I'm using to discern whether someone is who they claim to be and what that person actually is.
And as a result, it's easy to lie in the virtual world because of things like deep fakes that can manufacture what appears to be. Personalization is very hard in the physical world. If I want to make something bespoke and custom like a tailored suit, I have to go and tailor it to you.
And that costs extra money. In the virtual world, facebook puts out a billion different newsfeeds every day, tailored to each person. So personalization is easy because again, what we're doing is making a small variant of the virtual content. And so even algorithm plus bits, and that makes it easy to create personalized conten.T finding stuff in the physical world is hard. You put it in a folder and then you forget which folder you put it in. Whereas in the virtual world, it's easy because we can search again because of parallelism, which is a consequence of copies.
And finally analysis. It's hard to analyze what happened because there are no records in the physical world. And it takes time to go through them by hand, because any analysis has to be a representation of them. Whereas in the digital world, it's very easy to analyze something. We have algorithms and we have lots of data.
So there's a lot of foundational ideas on this slide that you can apply to everyday technology. AI, cloud, computing, analytics, cryptography, and so on. And we're going to talk about those in a minute, but before we do, I want to talk about some serious consequences that come from all this stuff, because there's a lot going on behind the scenes here, for example, and this is very recent news. One of the courts in California just ruled that the state may make a copy of your digital content without your, and you have no recourse because in the physical world, the state would say: Hey, if you take someone's written records, that's seizure because you're depriving that person of those records. In other words, if I have a book with my notes, I can say: Hey, you can't take that. Because that thing I have is my possession.
And you're depriving me of that thing. The court just said that you don't have any rights in that regard because by making a copy, that's identical. I haven't deprived you of your right to that original good.
So we've just come up with a precedent setting ruling that says that because digital content is exactly the same when duplicated under possession law, it has different rights from something written down. In other words, if you have written notes, they have stronger legal protections than if you have a digital. That's pretty scary. Here's another thing to think about so you're more lighthearted. There are 60,000 songs uploaded to Spotify every day.
There were years where there weren't 60,000 songs published. If you're an artist, that's an astonishing number of songs with which to compete. The reason is that the cost of making a song on a digital workstation and uploading it to Spotify has vanished. Instead of a recording studio and audience and, and instruments, and a publishing contract, the major label, you can take garage band and press one button to upload it to Spotify. So abundance and scarcity, possession, duplication, and copying, these are all fascinating societal questions that we are only starting to tackle.
And so I wanted to give you this as an initial framework, because that's what makes me think about technology today. It's not just here's how AI works or here's how cloud computing works, but understanding the underlying nature of atoms and bits and what that means for things like scarcity and privacy and copies allows us to think using those kinds of first principles about technologies. So what technologies should we look at? Well, I asked a bunch of people a while back on Twitter, what concepts would be important to understand if you want it to work and thrive in a digital world. And I got a lot of feedback from it. And so I went through and analyzed all that feedback and all the messages I got from people. And I made a list that I'm not going to talk about all of them today, because many of these things are interwoven.
People asked about the internet, the web open source databases, cloud computing, AI, digital identity, front end, and backend microservices and APIs, performance latency, encryption, blockchain. These are all pretty technical concepts. I'm going to try to explain them in non-technical ways. First of all, let's talk about the internet at a very simple level. The internet is just a way to send things reliably to anyone else that's connected to it.
Everyone connected to the internet has an address. And the internet is actually about a bunch of chunks called packets that say where they're from, where they're going and which application, therefore like your web browser. And perhaps more importantly, those packets can have numbers on them. Which means that if I send you packets 1, 2, 3, 4, and they get there in a different order, you can put them in back order back in the order.
If I send you one, two, and then three gets thrown away in four, then you can say, Hey, I'm missing number three. So if you want to think about it, here's a good analogy for understanding the internet. It's the postal service. And I say the postal service, even though that seems awfully fossilized because the postal service has different layers to it. All you have to do with the postal service is worry about the address and the stamp you get.
You send a stamp and the postal system basically guarantees that that thing will get to some postal code. Well, that's kind of like the internet. The internet just says: Hey, if you give me something, I'll do my best to get it to the destination. And then the mail carrier brings it to your house.
That's kind of like your ISP. They bring it to the internet from any one of a number of trucks and vans and airplanes they bring to your house from there. And so the ISP, your local providers kind of like the mail carrier that gets it to your house. Now you may have an internet address and that address may be temporary, or it may be permanent. There may be many people at that address, but that's kind of like if a mailbox that the carrier sticks, the envelope in the envelope has a little more information on it.
The envelope doesn't just have your postal code and your address, but it also says to who it's to and who it's from. And that's kind of like an internet packet. And in theory, you could write something on the front of it that said, this is envelope one of 10. Finally, there's the message. And the message is what it's about. And that's really the application like web or mail or chat.
And so there's all these different layers that are working with one another. And this is something that happens in technology and information technology a lot is this concept of layers. So because you have a simple set of rules that says, if you give me an envelope, I will send this envelope to a destination.
You don't seem to think about that. And one of the beautiful things about the internet is this separation has allowed us to invent new things. Nobody needed permission to build a web browser because we already had the internet and someone went well, I'm just going to send things to in front of these web servers. And that was okay. If tomorrow you wanted to wake up and build an entirely new application that relied on the internet you could do so because of that separation.
That's huge. And it's one of the reasons why the internet has been able to scale the way it has is that each of these layers is independent of the others. I can replace my ISP and everything else keeps working. And so there's very clear rules about how your ISP talks to the internet or how your ISP gives you an IP address or how that IP address receives and sends internet packets. And as long as each of those rules is working okay, you can go replace your router without breaking. That's very different from like remodeling your kitchen and having to replace the stoves and stuff.
In this case, each piece is separate and atomic. Now I'm going to show you something, it looks a little nerdy, but don't worry. It isn't. These are machines between me and the Canadian government's website. At the top, you can see my address, that's the machine I'm on. And then my ISP and then you can see some data on the internet where it's going through Cogeco and a bunch of other things.
And eventually it gets to Bell and Toronto, and then it goes to canada.ca. This tells us that the canada.ca Website is hosted by Bell Canada, or at least connected to by building. So, this is simply the things along the way you can think of. One of them is the mail carrier and the next one is a truck and the next one is an airplane and so on. And some of these are close and some of them are far away, but you can actually go and look at the connections between you and somewhere else.
Now, if you want to get really nerdy, you can go inside your computer and you can say, show me all the connections my computer has going on right now. So I did this when I was putting the talk together and you can see over on the left TCP, don't worry about that. That's just the thing that gives you pipes on the internet that you can send the things over. But I can see in here, there's a bunch of things that say ECE2 dash something. That's Amazon's elastic compute cloud.
That means that one of the websites I'm using is getting stuff from Amazin. You can also see things like my computer talking to something on the same network. That's like my computer talking to the printer in my house. And then you can see this thing called local hosts. That just means talking to itself.
So it's not just that we use the internet to talk to other machines. Sometimes we use it to talk to machines in our house, and sometimes even other applications on the same server. This means anything can talk to anything cheaply. And because messages are broken into tiny chunks, you can't cut the wire easily.
If one mail carrier gets sick, the postal service doesn't stop. If one mail truck is delayed, the service doesn't stop because of this modularity, the internet is just a machine for making copies and getting them where they're going to go. Little tiny chucks that are sent all over the place. That's it. That's all you really need to know about. Now, what should you need to know about this? Most people can't look inside their machines.
When I looked at that list of things, I had a little freak out. I was like, wait, do I know what all of these are? Maybe I've been hacked and maybe I have, we genuinely don't know what's going on under the hood. So now where and cybersecurity are at risk and the digital divide is real and growing, having an internet connection as a requirement for participation in modern society.
It's also so easy to make things on the internet that most of the internet is outdated. That means that when you bought that new device, when you bought that new device, you may find that it doesn't get updated and then it's vulnerable. I'm seeing Vanessa asking questions about how the message is the app. So I will explain that to you, Vanessa, let's say that you received five letters from me. One of the letters said, regarding our last conversation about education at Dawson college, here are my thoughts and the next one said, here's that bill I owed you. And the next one said, here are some photos of our time in the Laurentians last week.
Those are three different conversations. One's got pictures, one is a financial transaction, and one is a part of an ongoing conversation. That's really something where you would go, Hey, this photo belongs in my photo album. This letter is something I have to respond to.
This financial information is something that I need to add to a spreadsheet. And so in a human analogy here, you have different apps. You're running photos, correspondence, spreadsheets. And when you take that message out of the envelope, you look at it and decide which one it's for.
On the internet, you actually have a little number that tells you what it's for. So if it begins with 80 it's for web, if it begins with 443, it's for secure web, and there's other numbers for email and DNS and so on. So that little envelope would actually say, Hey, this is suppose, this is for Vanessa, but it's not just for Vanessa, it's for Vanessa's photo album. That's a little bit of a, a split there, but that kind of explains how the envelope tells it what app to send things to. So what is the web? Because I talked about the internet there, but the internet can be lots and lots of different things.
The web is not actually the same as the internet. I know that seems weird, but like the web is just the thing that's easiest to use that most of us use. And so the way the web works is pretty simple.
A browser talks to a server and the server gives it stuff back. That's actually not that simple. What actually happens is this. The browser goes, Hey, I would like to talk to you. This is called a synchronization or TCP send request and the server goes, sure, let's talk.
And then there's, okay, hey, now we're talking. So this is called a three-way handshake and that way both sides I've had a conversation with the other side and there's a connection. Now the computers might go, Hey, someone may be listening. So let's share a secret internet keys that allow us to encrypt stuff.
I will talk more about that in a minute. Now those keys have been exchanged. Now we can communicate clearly your browser goes, Hey, can I have the homepage please? And, oh, by the way, I was here before here's a cookie.
So you can remember me from last time, because that way I'm already logged in or whatever. The responding server goes, all right, that's fine. Good to see you. HTTP 200 just means, yep, I've got the information you want. You've probably heard of other HTTPS, like 404 means it's an error. The contents not here, or 500 means it's an error.
The network's not working. There's a whole bunch of them. And then after thinking for a while that computer will send back the message you want. And it will send back other things that it may decide and you need, and you may look at that message and go, Hey, that's great.
But I'd like to have an image that was on that page. Oh, okay. Here's the image.
This conversation goes on and you get the pages you want. And then at the end you go, I'm done. Thank you very much. That's what's actually happening. For every single website you visit there's these conversations happening constantly in real time. And this protocol, this structure has served us incredibly well since Tim Berners Lee first to find the internet.
And there are standards bodies that update this stuff, and we constantly fix these things and there's lots of risks to it, but this has served us well and connected billions and billions of humans. Unfortunately, the web is kind of dying applications, give organizations much more control. Most of the way we access the internet today is not through web browsers, but through apps, we're loading Facebook or Twitter or whatever else and apps are kind of taking over from websites, which is a little bit scary. You can't inspect an app the way you can a website.
If you're on a website, you can right click and save you source and see what's there. But if you try to right click on an image, then you can't collect that image and save it because it's the app. You might be able to screenshot it, but you can't save the content or inspect the content and the web, which made the internet so easy. The web made the internet so easy, but we now rely on websites to do other things. So you probably use a web based client to chat or to use email, or you use the app that you've got for chatter for email.
So this idea that we are now running a web and an email and a chat client, as opposed to having an app that does it, or using something like Gmail, where it's web based is really blurring the separation of the different layers that I showed you earlier. And as soon as you can fleet those layers, as soon as you force a monolith out of those layers, you get rid of the ability for one layer to change independent of the others. It's almost anti-competitive in some ways now it makes things much easier. It's far easier for someone to use an app than to set up their own mail server and stuff.
But the risk here is that because apps give organizations a bunch more control they're seductively. But at the same time we get rid of the biodiversity that was possible because of these layers of the internet. It's much harder to build a new thing. So that's a little bit about the internet and about the web in general, but I want to give you a little more detail on the next topic.
And the next topic is databases. So if the internet is how we talk to things and the web is how we kind of visualize things. What we're doing behind the scenes is looking at data and data is incredibly important because while the internet is just the technology data is about humans, it's about us and it has incredibly important consequences for things like privacy and scale, and even marginalization. Database is simply a way of storing information. It's kind of like a filing cabinet, but unlike filing cabinets, databases store information about that information.
So I may store Vanessa's name, but I may also store how I know Vanessa, where the information came from about Vanessa, how often I've looked at that information about Vanessa, whatever that thing is. I may be storing data about the data that's called meta data. And that's really important because if I have data about myself, that's interesting. But if I have data about myself and how I'm related to all of my friends, I now have a way of understanding my social graph and the people I'm related to. Once upon a time we stored information in libraries, and then we started in databases and today we stored them in what are called data lakes.
And the way we collect data has changed dramatically in the last 50 years. In the past, when we collected information, we had a knowledge about what it was and how we'd use it before we collected it. And then we put that information into databases and we knew what it was for. If we were collecting quarterly sales figures by store, by sales rep and by product, we would create a table and we put columns in that table. Think of a spreadsheet for a store and sales rep and product because storage was expensive. And so data warehouses took time to manage.
And what lived in our databases had structure. I basically got my filing cabinet and I made all the folders beforehand and I labeled them beforehand. And then I started putting my filing into it. This is schema or structure first big data, which is an expression that really sums up the fact that we can store and analyze data very, very quickly. As a totally different approach.
Big data gather everything kind of collected on faith, drink from the fire hose. We don't know how we'll use it yet. We store up because we think it'll be useful later.
And we have a good reason to think so because we can analyze it quickly. And that data that we're storing is not just the columns and rows you might see in a spreadsheet. There's a lot of different structures of data. The thing on the right here is actually something LinkedIn used to suggest called your LinkedIn in graph that showed me and all the people I was connected to them, without me telling it anything about it.
It had inferred that some people I knew from different jobs or different conferences or different organizations. Data can be stored in lots of different ways. It could be a key value database or a time series database, or an object database. For example, a graph database is a way of storing people or entities and their relationships to other things.
So we have data, we have lots of different kinds of filing cabinets to put it in, but what you really need to understand is how we've changed from a model where we ask first and then collect to a model where we collect first and then. So in the old way, I would say, Hey, I got to record the widgets I've sold by city by size and by color. So I define a scheme. I think of this is making a spreadsheet. You put in three columns, one for the size, one for the color, one for the city, you collect the data from all your stores, you put it into this data report and then you make a report.
Well, great. Why are sales down by a certain city? I dunno. There's nothing in there to explain that. Maybe it's the weather. So then you say, could you go like get more data and make a new column called what the weather was like, and then we go make that data and we start again.
It's exhausting. The new way is to collect first and then ask. So you collect all the things, everything you possibly can. And then you can ask the data question, give me a report of widgets by size, color. At which point the database goes and finds those things and structures them. This is called an emergent schema or schema on read.
It just means we figure out what the data is about when we ask questions of the data, rather than figuring out what the data is about before we save the data. So maybe it's the weather. And then I go get the weather by the widgets by weather. And that doesn't show anything, maybe it's political affiliation.
So I go find political affiliation. Oh, well, maybe it's a widgets by a political party. Now I know that I've got an actionable decision and I can only sell this stuff to a particular political party. This is the kind of exploration and iteration we see in modern databases. And so that means that the cycle really changed here.
Right? You used to ask a question to find the schema, collect the data, answer the question, or find the problem. But now it's collect the data, ask a question, get the schema, explore the data, get the schema, explore the data, get the scan and so on. And then answer the question. This is really a collect first ask questions later approach.
And it has some insecure, interesting and serious privacy consequences because when you're arbitrarily capturing everything well, that's great that you're capturing everything, but maybe you've caught things that should be private. And so there are a lot of privacy regulations trying to govern this stuff. Here's the reason why I think this is really important and it has to do with the Medici. The Medici we're an Italian family of merchants. They were one of the first to employ accounting techniques and many of the things that they came up with still with us today, hundreds and hundreds of years later. One of the concepts they had was the idea of a general ledger and accounting, where you would know what to file.
You might take your cup of coffee. This is my cup of coffee. And you would normally know that this gets filed in a certain place. So for example you could file this under people Alistair, or you could file under beverages coffee, or you could fall under their containers. Let's say you put this in the filing system under containers cup, Alistair's bought one coffee and you take a piece of paper and says, Alistair had a cup of coffee and you put it in the containers folder. That's great.
But now if I want to search by person, how do I see the things that Alistair has drunk? If I want to measure how much coffee we're consuming, how do I check that? I can't. If I make a copy of that record and I put one in the coffee folder and one in the Alistair folder when the cup folder, well, now when I change one of those records, the one that's in the cup folder, I don't change the other two. And now my database is all corrupt. That's bad. We have a new thing called hashtags. I could just take this cup and I could tag it with Alistair and coffee and cup.
And then because search is so cheap, I could just say, show me all instances of Alistair or all instances of coffee or all instances of cup and get that data. This is a big change because the accounting industry has trillions of dollars. And the accounting industry has not caught up with this kind of change.
So thinking about data in terms of labels, because hashtags and tags are a way of doing things cheaply and abundantly is massively different. If you're building a government website, for example, why are you still forcing people to navigate down a hierarchy instead of taking that thing that applies to them and letting them find it wherever it's relevant. We have to rethink the way we structure and store information from one of hierarchies to one of searches and hashtags.
And that's massively different. If you were to overhaul the accounting systems of the world, that's trillions of dollars of software that are outdated because they're still using a filing approach that was invented by a bunch of Italians 600 years ago. Maybe it's time we updated that. Another secret of data that you need to think about is that cleaning is most of the work. 80% of the time that people spend working with data is cleaning the data up.
And we often ignore this. We're also terrible at statistics. This is something called Anscombe's Quartet. I'm not going to bore you with all the details, but basically what it says is that these four diagrams that you see all have the same average, have pretty much the same standard deviation. These are all things that statistically, if you analyze them, look the same using math, but then when you plot them on a chart, you see they aren't.
And so it's actually really, really dangerous to use statistics to understand what's going on. One of the things that data really requires us to do now that it's so free and available is to think criticall., let's say we want to reduce traffic accidents. So the first thing you might ask is what causes traffic act. Well, we look up the data and it says, Hey, it turns out more accidents happen to your home.
So maybe we should make people, you know, we should have a campaign to get people to drive safely at home, or we shouldn't have people driving at home. But that's accidents over time. Of course, more accidents happen in your home because more accidents happen when you're driving and most of the time is spent driving. So is it more accidents in urban areas by suburban drivers? Well, maybe, but that's unfamiliarity. Are there more traffic interactions in dense urban areas? Is it that suburban people driving downtown have accidents or is it that in dense urban areas, there's more signs, more passing cars, more turning merging, and then you got to ask yourself, what data would I need to collect that information.
I would need to collect things like accidents per driving event, passing your car, assign and so on, and then say, well, how do I get that data? It's really hard to get a good question. A good answer to what causes traffic accidents, even though it might seem simple. Because each time you get the data, you have to ask yourself, could I be delusional? Could this be a false correlation? Could this be something that I'm not properly understanding? Excuse me. Another thing I really want to bring up is the incredible risk of revisionist history in data. As I mentioned earlier, it's hard to change physical things because they're permanent, but it's easy to change digital things because they're transient. We have concrete proof of this happening.
We've discovered a unmarked graves throughout Canada, who knows how many that were essentially erased from information. And we can only find them because the physical attributes are still there. The fact that things are searchable with precision means that they can also be erasable with precision. Ada Palmer has an amazing post on speaking freely where she asks, how do we protect digital history from those who had hide and its revisions. This is a very real risk to society, to truth, to reconciliation, and to making sure that we don't gloss over or revise the atrocities of human past.
There's also some really scary stuff that can happen. This is a feature on Facebook that they launched for awhile ago and then took down and I had access to it for a while called Facebook graph search. And so these are, this one is married people who like prostitutes and their spouses. This was a search you could actually run on Facebook. Here's Islamic men who live in Tiran and are interested in men and their friends and where they've worked. Clearly data plus algorithms has a tremendous risk to society and we have to figure out how to better regulate this stuff in order to protect the most vulnerable.
Okay. We talked about the internet or the web. We talked about big data. Now I want to talk about cloud computing briefly. I know this is a lot of content and I'm hoping you're all still with me. Let's talk about clouds.
Cloud computing is simply paying for computing work instead of computing hardware. The reality is that you're not using something all the time. In England cars are parked 96% of the time.
We're not using cars efficiently. Well, we're not using computers efficiently either. My computer, even though I'm live streaming, this to people is not fully busy.
Pooling resources gives access to other things as well. So when you rent computing from a service, instead of owning a server, you get access to technical expertise you can afford. There's redundancy because there's lots of spare machines.
You have machines in multiple physical locations, which makes you more resilient. But most importantly, cloud computing reduces the unit cost. If each of these blue squares is a computer and your computer uses the red line, you can see that it's much more efficient to be able to scale up and down across part of a machine like that. Scaling up and down happens a lot because of spiky workloads was one of my favorite charts. This is water consumption in Edmonton during the Olympic gold medal hockey game in 2010.
And you can see the spikes here and you can see that during the four periods, people went to the bathroom. Clearly there are spikes in demand for everything from water use when people use the toilet in between the periods of the hockey game to the use of computing systems. And so when workloads like computing or spiky, you can think of elections. For example, sudden rise in demand or a breaking news story, or a sudden traffic having non-dedicated resources makes a lot more sense. Today, there's other reasons for doing it. If you go look at Microsoft or Amazon or some other cloud computing offering, they have a ton of different services.
They don't just have computers. They also have storage. They also have content delivery networks and authentication and messaging. In fact, Amazon has dozens and dozens of services that you can use on demand. So you don't need