Will Machine Learning Take Your Job?
Hey. Everyone. Thanks. For joining my, name is Stephanie Wong customer, engineer here at Google and we have here of course you know Jonathan, champ and welcome, to GCP online meetup number, 38. Yeah. So. We're super excited today because we, have, Addison. Howard over here from, Kegel and he is a program, manager so he's gonna be talking about everything he knows about Kegel, community, involved, in their own learning. As well so I'm, gonna let him take it away and introduce himself yes, Addison, are you alright, I am, Alive it's, alive and well live. From Denver Colorado. So. Glad be a part of this really excited like. Seven minutes Addison Howard I'm a program manager on, kaggle, I kinda live in a part of Google now for about six months we're the home for data science, and super. Company to be part of this yeah. Thanks, Addison, so, I mean I don't know Stephanie. When. The, topic came up will. They take our jobs I was a little concerned because I don't know what do you think our, jobs will be lost I don't know I feel like I'd be pretty okay with having half my jobs be covered by automated, service I mean I can put my feet up that's true work part time yeah part time sounds pretty good sounds pretty good but I think I mean in all seriousness I, think the topic of you. Know automation, and labor. And the economy, and AI and m/l like how all these things fit together and the impact of it is this serious concern and I mean even politicians are talking about it right now so I thought it was a good topic to bring up and I know Addison, he's been dealing with a lot of you know competitions. A lot of work. With large companies, and small companies around. You know predictions, automations. And I'm, you know I'm guessing if anybody, knows the answer to this Addison, probably has they at least the best idea about what. He sees you know in the future yeah. Well we'll see cuz I would say the, answer the question of will machine learning take your job is kind of a big fat maybe I mean. When it comes to the future work though almost one in every two jobs do, have a high, risk of being automated by machines, now. What would you say that right I always think like terminator or something like that but but, know it's things that can, be automated and repeated, over and over again but. On the, flip side machines, cannot, do things that require a novel, approach or. Are based on something that they haven't seen many, many times before so. Well what do you think about it, I always, kind of find it fascinating that the term data science, itself didn't even exist until like 2008. So, if I mean is gosh, I was a, junior. In college at, the time and data science finally came around as a term and. Its really accelerate, in the past few years but you've already seen it in some common things in the world right so think. About mail. Snail mail. Back. To nature somebody's job was to look at every single letter and say well I think that's a one thing that's a seven and divided up well, now they have out machine learning algorithms, that will scan handwritten, numbers and sort them so that's an example of something. But. Now you think that to a newer, extent, and you're seeing machines. That can read. Retina. Scans of eyes and while one one ophthalmologist. May look at a scan, of an eye maybe. Fifty thousand scans of the course of their career we, are now training machines he'll do the same thing, in. A matter of hours and so, oh the over overall I, will. Say it kind of depends but, ultimately if, you reminded, that machines, cannot, do things that require a novel, approach or, based on something that they haven't really seen you many times before yeah. Maybe the different, way to look at is that machines aren't necessarily replacing. Your jobs I mean they might be replacing portions of your job but it might be actually saving you time so, that you can be doing more interesting work instead of manual, exactly exactly. In, the example of medicine right we're. Now freeing up the time that many ophthalmologists, have from. Looking at scans over and over again to, now being able to do more research and do more, groundbreaking.
Things. In that in, the world yeah, so I see with my customers, that I mean that's exactly right like taking over like. The, entire job or a tire roll is pretty rare it's. More about helping to really. Help helping. Workers. Be more productive so you, know. If. We still need mailmen. But instead of looking at numbers and trying to figure out what zip code it is you know they're you know actually, putting mail inside of your mailbox right right. Right or they're, looking at other ways to be more efficient, with their shipping and routes and things like that yeah. Cool. That's, so, that's not mean I guess that's the net-net of the answer I'd love to hear from the audience you know what your thoughts are on. You. Know you know have, you seen an impact yourself, what, are your thoughts are in terms, of you. Know the impact on the economy, I'd love to hear what you guys think but, I. Mean yeah Addison I know you had a couple other things you wanted to go through there how, what, what does calc will do and how has, it impacted at. The industry, as a whole because I feel like it has it, sure the, capital, K, was known as the home of data science kinda be the fun fact is the term kaggle is literally just a amalgamation. Of freight of sounds. That didn't. Have a domain. Name taken yet and and. That's how Cal came to be there's. Some. Fun internal, team folklore, around what the other alternatives, would've them but. It's. Basically because the home of data science originally. We kind, of came in the market about six or seven years ago running. Machine learning competitions, where. Our, CEO, Anthony, Goldblum, wanted to predict, the results of the Eurovision. There's, no Eurovision, kind of a big American, Idol style competition. Through, everywhere every country, in Europe submits. A or. Sends, a, representative. In the sort of sunlight alliances, and things like that that occur well. They want to host a competition to do that and that quickly expanded. To people, coming to Kaggle to, learn how to do data science again data science hasn't been around for really. Much more than a decade a lot, of people who are currently in the. Usually. The engineering, community may be computer science community may, be mathematical. Community are wanting to get involved in this and there. Is a whole lot of research out there there wasn't at the time and like. Many of us can can likely relate to one the best ways to learn is to go figure it out and try it out and so. Through. Competitions, wind up happening is for people, interested in data. Science on the community, side people. Will come and have a really unique problem. They can then go test their test their knowledge on I'm trying. To predict X I'm trying, to guess. Why we'll go through a couple notes here in a bit on, the business side of the house what we're seeing is a lot of companies, they don't have a big date of science group yet maybe, they have one or two folks can be the do and they don't have some, they do have a couple dozen people her data scientists, and they're, saying we've been trying to solve a couple problems but we need some ditional help we, need to then throw it over to community, and Cal, now has you, think we just surpassed our 1.3. Million people in, our community who have come to Cal go saying we wanted we want to do more learn voice and. Since competitions, we've now started adding in things like data sets and kernels, data.
Sets Being you, know you always hear about 80%, of the work is just cleaning up and munching data together this, now provides a wide, variety of data for people to look at and play around with them analyzed, and then. Kernels which we can get to and later as well as a bit of a code, sharing platform yeah. I think one of the daunting, things about it is trying, to do something by yourself especially when it comes to data science if, you have little to no experience so just having that community. Forum. And people to validate. Your creations with is something that you know we haven't really seen in the, industry past. You know like you said a decade before even just the past few years mm-hmm, so no so. Outside of the obvious reason, why a company, or organization we. Use Kaggle an obvious reason would be like hey, I want to hedge my bet on you, know instead of hiring a bunch of ML data scientists, I want to see if this is even plausible or, feasible. Let, me put, out an award like ten thousand dollars see. What the results look like I mean I feel like that's the main reason why people use cackle are there any other reasons outside of that that a cackle. Would be useful. For for, for an organization, yeah. So I think to. To come to mind one, is I, say fresh perspective, so, for example one competition that we're currently running is yellow it's, a Zillo competition, if you're familiar with Zillow they have the Zestimate, and. And. Some people kind of maybe complain, saying oh well that's, not much my house is worth. Like. Yeah we know we get it right we, understand, it's not always correct but. We want to do better and so they have a team of data scientists I've been working on this algorithm working on it and there's, some things that, are more technical around like overfitting, or, tweaking, the same thing until the point where it doesn't become very useful anymore they. Said we, have a big team less, but let's let's get some fresh eyes on this and see is there more meat left on the bone to get some value, out of what we're creating which I think that's one common area and the second area is novel, approaches, for. Those who are familiar with machine learning they're sort of been different, trends across the industry around what techniques, are useful they're, one you know a big, popular one beginning called random forests but, there's, other ones that come up throughout the throughout, the years that, kind was kind of bendy, the. Place where those have been elevated, and. So like one of the really big common ones now is called X G boost. It. Is but I hear it everywhere exactly. And that's something that is really kind of come to the forefront has been something. That's useful through. Cattle because like you mentioned it's such a big community instead. Of it just being somebody holding on to their their, methodology, they'll. Still win the competition, then they'll say hey guys here's, what I did, data. Scientists here's my methodology, I tried, this new technique why don't you guys play around with it and see what you can do with it as well what. Does the community generally looked at from, what you've seen in terms of how teams are formed and the people that are really interested in getting involved in contests, are, they students, other professionals, yeah, it's kind of mix of both one. Cool thing that we've we've recently released. Is actually is actually a university. Style platform, like university method where a lot of teachers are saying we, can really teach data science well in our classroom, with the competition, type format and so, we have something called in class that, is pretty much just for students to have that type of that. Learning ground well. The same side, it's. Really a wide variety of people from all types of backgrounds become a part Kaggle and so, what we see is as competitions, start people. Start we have discussion, forms are very very prevalent for us and people, start flooding them with hey I found these three articles that might be helpful hey. I have I'm this.
Is My my professional. Educational, background anybody want to be on my team and, will literally see people saying sure message me I'm I'm. In Germany I'm in Bangladesh, I'm in, Hawaii let's, create a team together and it's. Cool to see just people who've never met each other in in, person and sometimes, as they, will win the competition I'll, introduce them to one of these clients they're, meeting for the first time via. Video call because, they're, saying hey I'm here to learn and I want to learn from anybody no matter where they are skill. Wise are, geographically. Yeah. I know people some, companies using it as a recruiting, platform actually. So yeah yeah. Yeah, yeah yeah. Yeah. It's actually pretty neat I mean a lot of big players that you'd have big, players because couple, of them. They. Were pre-acquisition maybe, it might be some brands that, would. Be rivals, to us but no some even look at our website you see Walmart's, and Facebook's in all states have come to us on a rate very regular basis saying this, is a great way for us to have. A. Almost. Like an accelerated, job interview no instead of instead of putting in a room they can do a quick hackathon, let's, give me three months to go figure out what they can go do and give them a problem similar to what they would do in a regular basis, I can attract people who are interested and, and. Give them a day in the life. I. Actually, have a question from online, do. You know how community, engagement, is encouraged, I mean besides just organically, how people you, know comment, and want to be involved in a project but, are there other interesting, ways that you've tried to encourage different. Community, topics to be discussed or, people to start, voting on different solutions that people yeah yeah. So I will, say a majority of it is organic and, there's, actually it's for some people they're incentivized, by we actually have rankings we, have rankings day how good of a competitor, are you how, good of a kernel.
Author Code sharing person are you and, how good of a discussion, person. Are and you can get different rates based off of expert. Master Grandmaster. You, know we have occasional, private competitions, just for our masters, and grandmasters and. I think a lot of is that people people. Just genuinely, come to learn and people generally come to hey, I can share if. I share my knowledge here, then. Somebody else can even clean off that and expand, upon it and. So we've. Occasionally, actually offered some prizes for best kernels, best new datasets to, continue that engagement but. I've been very surprised and very pleased and. Really how organic. That. Has been as it is you, were telling us earlier about the Twitter conversations, that you have right now about machine learning yeah. Talk, about that a little bit yeah. That's. Pretty, neat is and, we do have a full a full team of ours that kaggle is just a community team that's looking for ways to continue to get people involved. Both inside, competitions, which, again what we kind of have. Been known for historically, but now is he's expanded, to become truly a broader home for native science and. For the past three past. Three weeks we've we have one coming up next week much. Not sure how many more we have after that put into the holidays but we've been hosting debates. On Twitter that, have been posing a certain problem or question within, data. Science and saying let's. Talk about this instead the very the very first one we, had was with, actually. A guy that used, to be a cattle, grandmaster. Discussion. Experts, and then he came over we basically poached. Him from her own community and, I'm on board and he was answering the question okay ai what's. The height where's. The over-hyped where's underhyped looking and really accomplished. Last. Week we had one with another one martini members talking about biased, and biased machine learning one. Example I was not familiar with that would be if, I were to create an algorithm that's trying, to take pictures, of stock. Photo pictures of people in professions, and say okay is this a doctor. Or lawyer but. If every picture of a doctor that I bring in there's an old white guy and, all of a sudden algorithms, only think that doctors are all old white guys and, so there's a lot of questions around bias that's that's been recurring. And one, that just happened last night. With one of our community leads was which. Is more important the algorithm of the data you. Know it's how does having a good a good algorithm a bad date or vice-versa gonna. Get you better or worse results I kind of talking through that so we're, gonna hosting those every, Tuesday, you. Can follow the hashtag hashtag, ml debate I believe is what it is I mean check out our Twitter account through that as well cool. Yeah actually it's really I forgot, what podcast, but I was listening, to something about how sometimes, ml, can perpetuate, negative. Stereotypes. Because of right right, and, sometimes. It's very controversial, so. It, right it's, just really interesting how kind. Of way it, kinda, labrum that what's interesting about those problems is sometimes. You just don't have you don't, have the capability, to get, all the others you want especially, starting working with human individuals with humans, it's hard to get a very true representative, sample so we need to often be conscious. That machine, learning, isn't. Always designed to be the perfect solution but maybe creating algorithm is to go from step zero, to, eight and then, maybe. Take the rest they're getting from eight to ten. So. We do have a question from the chat can, you cover how someone with no ml experience, could use Kegel I'll, answer that from my perspective and Addison maybe I'll let you answer, maybe, Stephanie but, like so I actually had no ml, experience, and, when Google acquired pakil so, I went, to account got what first of all what is Kegel and then, I noticed, that okay he's a bunch of competitions, of something and then but.
Then I got really interested, in like basically there's. A problem statement and you. Have to solve it right so, what I did was I went into these competitions I started looking at all these things called kernels, which are kind of like open notebooks that people write in to develop, some, kind of algorithm or software to help, basically. Take the data set that's provided, by the customer and do, some you know manipulation. Modeling, and training on it and, the. Ability to look, through basically, hundreds, of people's notebooks. And look at how they you. Know take a perspective, on the problem that's, how I learned about ml I'm definitely not the experts don't I, mean it was just a wealth of knowledge sitting, in all these you know. Jupiter. Notebooks so maybe. I'll have I don't know Stephanie, and another how does that answer that yeah I know I agree with that I mean I think I'm also getting. Started with learning much, more about machine learning and how to leverage, a lot of the tools that people are using and, being, able to see and learn from example, is key, as you said because it's just difficult to start with nothing and I don't even know where to get started sometimes a lot of it can go over your head so, having those example, datasets and what people have thought of out of the box is just incredibly, helpful and I'm sure at Assen oh yeah just. One more note like the top-rated, kernels, you usually have very, very, very good documentation, meaning, that the person took the time to write out exactly, what, that person did and why and so. I mean just reading all of that is a good learning experiences, when there's a lot of comments a lot of community engagements, which, is why I mean I think it was a great place to learn ml so, if you don't know well actually capital is the place to go yeah and awesome. At ml it's also the place to go yeah. I think, you're right and you said it well is that it I think it provides a lot of opportunities, based on whatever, level you're at I. Will, say guys just to play a little bit devil's advocate against myself is. If, you're coming in so like I don't have a I don't have a computer, developer, background at all I'm, more of a more business guy so I have. Experience in math, econ, that sort of started, the house so I'm starting right now I'm teaching myself Python, I'm, getting out of that level but. A lot of times you may say okay I have list I have this technical, skills I'm know where to apply it or test it we. Have a couple of different introductory, competitions, as one way of doing that for, example one of the most the most popular competition we've ever held is one on the Titanic, or we said basically, is this personally going to survive or not and you give them a whole list and so, you can think just on an, intuitive level or. If you just said men. Don't survive women, and children do you. Can tell me to get 80% right maybe. Start toggling things like okay what if they were wealthy what if they weren't what if they're on the top part of the ship where they weren't and sort of gives you opportunity to play around with that and a very free. Free. Low low, ego low risk environment, right and we started having some more advanced things ok can you predict, housing.
Prices In, in an envelope sort, of way can, you again if you want to get more than image recognition we, have competitions for that little introductory, so that's, what I think the best ways a Jonathan. Mentioned curls, I think is great as well curls, for us it's. Not just code-sharing but what. We would like to do is take out a lot of the dependencies, so if, someone were to share you their code else or, otherwise it maybe it. Go download all 16, of these different libraries, and versions and things like that we, try to remove a lot of that to keep sharing pretty easy and. Then thirdly we're, continue. To develop out more. Tried-and-true, educational. Curriculum do, we have a team that is based on that is entirely rooted. In creating. Education, based documentation. That we hope to be releasing here sin. That's. That's, really cool so I know we have about ten minutes left and we'll leave some room for Q&A do you mind just taking a couple, minutes to walk through the user interface I mean I know I feel like people, who have never looked. Or used kaggle maybe and show them where the kernels are where the competitions, are yeah. Yeah, let me I'm gonna go ahead and share my screen here. Good. Back to all was Cheers. Yeah. So while you do that I mean Stephanie how have you used cable so far so I actually use it to try and find public data. That people have already offered or that we're already natively, on the site just so that I can have some, sort of way, of testing. Out my own operations. And I've actually used it on a workshop as well just see what we have available online, so I've, already found it pretty useful and I get email notifications of, different contests that are being offered by other, commercial, you. Know companies. That are offering contests, and prizes so I do often, see some really interesting use cases for, example the Zillow one there's, one for holiday gift exchanging, and how you can predict how someone, could get the right gift that, they've wanted so. Yeah. It's. Been really great for us. Now. Being a part of being a part of the Google team for, being. Close about nine months now actually we. Have the chance to be a lot to have. A little more fun with competitions. And provide, some more unique Mieke items there as well but um, login, to Cal if you already logged in this page you'll see, actually. Let's see here I think I have the other one pulled, up as. Well so if you're not logged in. Seeing, this new page okay, you, get more of this what is cackled and you get things like a memory, in a job or some. Very high level elements. Okay how can I learn to do data science and talking about, exactly. The same things we were just mentioning surviving, the Titanic, they want to study some benchmark, models that already exist we, have plenty of blogs as well so we interview. Previous. Winner so for example this date of Science Bowl was around. Heart disease and we interviewed learners and said what did you do to win time, we can kind of talk through that and there's a wide variety of ways things here you can see obviously.
I Can mention playing with data they'll, see that before you log in but. Once you log in you. Sort. Of have a dashboard and this is gonna be you. Know just like a standard a newsfeed based off of things you respond to and once you're a part of and. Then up here is where pretty much people spend most of time in these these five primarily, these three, competitions. Like I mentioned before will. Have a list of all the active competitions, that we have by prize. So. We have one for passenger screening algorithm, this is with TSA, and the department homeland security you, remember you talked about going, to the airport and putting, your hands up and spinning around in the machine well, we took 3d, images. Now we get just it took 3d images of users, and basically. Strapped different objects, to their body and said alright is there an item on this person's, lower. Right thigh yes. Or no and what, is the likelihood that there is yes or no and. So there's a wide variety of competitions that can really be, interesting. Entirely. Depending on what your interest. Is both off the skill set are you interested more in image, based data 3d imaging, text audio, tabular. Data industry. So you can see here we have things like security. Real. Estate this is Macari. Is I know expanding a lot that us right now they're kind of like a an. Offer up or a Craigslist, ad style app. We. Have one that's image basis literally uh satellites. To the space saying this is a picture of a ship or an iceberg that, way as ships are planning their navigation, routes they can try to better, plan how to get the strip locations, will. Do a whole whole slew of different things recruit. Is a Japanese. Company that is. Similar. To in the u.s. that we have like a Groupon as well, as an open table kind of both kind of both of those and they're. Saying what's the likelihood that someone's gonna go to restaurant today or not, and. As you mentioned a couple of fun ones so we, have this kooky, author identification. That said based off of people's riding styles can you guess who wrote it. Now. Obviously this is one that's a little bit more for more, for fun yeah, you. Could just go google those things but, what. Do you have or, putting money behind it though I mean it seems like a fine if, someone's funding this fun idea so, this, one if this were actually we funded we created this column so you could plug on competition and, so, we don't have an exact quote winner on this one because you could just search it and put, in all the actual answers right but. We have prizes of, top. Uploaded, awards, for best. Kernels best. Themes on tutorials discussions. So there are other ways like I mentioned earlier trying to get people involved in that and. Like you mentioned we just launched a competition, yesterday, on. Weibo, it called the Santa challenge, we the Santa challenge every year it's, more of a fun one the. One this year is a little bit less machine, learning and it's more of an optimization type, of challenge but, it's always a good time for, us as. You mentioned data sets, it's, a pretty kind of a rush you're seeing a lot more people come to Kegel thing, oh I know, you guys are dataset platform I've never heard your competitions, which, is really cool for us as we grow and it's. Both our team going out and finding as you see here we have over 6,000, datasets that. I shall see what do you have your most votes, different. Ok, critical with everything from credit card fraud detection this. Is a pretty common one that a lot of people like to use, is a wide, variety of clean, data on movies so, the idea is that it not only provides, you with clean data it, provides a community, with the opportunity to kind of see what other people are doing as. Well as which. Gives you an area to play around things you're interested in I think, I saw yeah here's one that's European soccer and, this one that's on Pokemon, there's. One online, ratings, and, there's this Pokemon when we're done there so, it, gives people a chance to it, do some exploratory, type data analysis, on their own without. Without. Any kind of. Just. More of a em of.
A, Introductory. Way to kind of play around with stuff because maybe, you don't have the data to play around with it or you're saying hey, I don't have the the. Bureau of Labor Statistics information. For Bulgaria, for 1938. Easily, organized, fashion Oh a bit Kegel does if, somebody else is just as passionate about as I am and they they uploaded it there's, a lot of cool opportunities there, as well yeah. I think that's powerful, because I you know traditionally, you don't have that amount, of available, data sets for you to actually test your knowledge improve, your skill set, you, know just sharpen your talent, in those areas and also just build off a community, that's able to improve. Maybe what you did and you didn't think about it right, now right we're, trying to see a lot of a lot of companies, do because as you mentioned we don't host competitions, that have public data because then we can just go look it up and win, the competition we. Have a lot of people saying we. Have to get we wanted to make public that people can go do things with it a lot of companies that truly do see the value of publicly. Available data for broader, analysis, and, we're getting a lot of that being brought in. Toward. Community as well and, also very briefly on kernels we think we kind of mention this as well but this is a big, opportunity to share your own work and so, as, I mentioned the biggest value people see is sort of twofold one, sharing. And two, is. Not, having to deal with downloading. A wide variety of libraries, and versions, and languages and things like that what's, very common is what we call forking, of kernels it's, as somebody might have a particular script, or notebook that they want to show that they have but. This must also use it for a different a different use. Case so, an example may be somebody, has a McDonald's. Use case and somebody, else has a and, they did a Twitter sent and an analysis on McDonald's, and familicide. This is a great code that I want to use I want to apply it to unite in their lives and they, then go and then do that with United Airlines they can then copy and paste or rather. We call it forking and then basically just tweak that part of the code that they, need to to shift if that new data and. You can see here I mean some of these have, you know hundreds, of comments that are people talking about talking.
About It this this top one is recently. Exciting. It's, been up in the past two hours so. You can get this loaded. Here with the past for, the last few minutes we got as. As you mentioned Java it's, very very. Well. Documented so you see things like I, don't know people speaking, in more of a common language all. Different types of, visualizations. You. Usually even see some funny animations sometimes people will bring in there as well so it. Gives people a chance to really learn and grow from whatever, stage, of development. That you're currently in the fishing party Oh awesome. Well I think that's almost all the time we have is. There any other questions from online uh no. I mean yeah there's some, okay. I guess I kind of answered it on your, behalf but one, of the questions was, it. Seems like kegels primary purpose is to help businesses crowdsource, solutions, to a data science problem, is that a big problem that businesses, have now and my response was I mean at least from a customer, perspective because, I work with customers every day you. Know hiring ml and data scientist is really hard especially. Specifically. For industry, experts, like it's just not possible, we're. Probably taking. All of them if possible yeah so yeah I don't know yeah what is your response to that I I. Think it said well I think what we're saying is lot is people will usually understand. Ml in one of three ways either I hear, about this I know I should do ad no wait no idea where to get started I. I, hear, about it, and, I know needed or rather like it's the first one is why should I be involved in with ml second I know I need to be involved with ml I don't know how to get started and to is I know, I need to be involved with it I know how to get started now let's go use cattle for it and so, and so I interact with people on all three of those different sides. I think. For, those of us who are maybe in that developer community, it's easy to think what. Do you mean people don't get, this it's like. If you dirty how many times I have to explain to my own parents what kaggle is and they still don't know what it is or, machine learning to be at the least so it is growing so, rapidly that. We don't think we have, no idea where the saturation point releasing I begin. Well. Addison thank you so much for your time I guess. For now we still have our jobs, for. Now. Can't. Go home early today on episode, 39 brought, to you by machine. Learning generated, host. Yeah. Exactly anyway, well thank you so much for what, tuning in and thank you for chatting hit, subscribe if you enjoyed, this show, if you didn't hit subscribe anyway. And. We'll, see you next time thank, you thank you.