The Most Important Data Science Technologies to Learn for 2017
Hello. Hello. What. Is up everyone. Welcome. To help. Me data geek number two the. Second week now I've got a new set up here hopefully, the audio is better the, office is complete, or I, need some acoustic, stuff in here but overall it's it's. Pretty close to being done. What, I want to talk about today, is. Which. Technology. In the, data realm should I learn. So. This. This. Is a fun topic I get, this question all, the time I probably, have a dozen emails from. Aspiring. Data. Analysts, and data scientists. Business. People asking, me what should I do and, have what should I learn so. I figured I would try, to cover this topic today, and we'll. Have some fun exploring, these technologies, and if, you have any questions, during the broadcast here, just post them in the chat on the, right. Whatever, side that is and. I'll, try to get to them at the end of this and. If not whatever we can come back and talk about it later you can email me at help but mental and calm any time and we'll, try to get a, question. Answered soon. From, whenever you send that over so cool, let's, take a look I actually created some slides I'm a geek. About slides, I love slides but. I'd like to make them not not horrible, so. I'll do that now so let me hop over to that. I'm. Going to kick. Up the, slides. Which. Are going and. Or. Not let, me see if I can get that going, okay. I got that now. Let's, go here, and I click that. One, so wheat, okay. Cool so you should be seeing now the which data technology, should I learn so. The first thing to. Talk about here. What. We're going to learn today we're, going to talk about the background I think there's some, in context, here about this topic and, about these technologies so I'll just talk briefly about that kind of thing then, we get into the actual technology themselves so talk a little bit about the. Different technologies, in the data realm and, we'll get into the personas, so the people that actually make up or that use these technologies every, day and. Then the marrying, of those two so who should learn what so hopefully this will help you answer the question like what should you learn depending. On who you are and what your goals are and then. Lastly I point you at where, you can find more info will actually jump over and I'll show you some websites that are great to learn on, okay. Cool so first. Take a look at this so, this, is from indeed.com which, is like a job website, you. Know a lot of a lot, of career, stuff on there and these, are the programming. Languages ranked by number, of programming, jobs, so. I think this is relevant because the, very first one is a, database, language, it, is sequel. So this, is one that's been around forever is, so the point here is that sequel, is probably one of the most universal programming. Languages, so whether, you're a developer, or whether your data geek sequel, is there and and that's pretty awesome also, down on the list a little ways it's Python and Python, is another you, know granted it does a lot of things but it is really, popular in the data science realm as well as the data engineering, realm so two. Out of the top. Nine or ten languages, right here are. Data languages. So that's, incredible. Just to show the popularity, of data, technologies. Today and. This. Other piece I wanted to share was Glassdoor, put out the 25, best, jobs in America, and this, is cool because the number one best job was data scientist, now, I'm gonna I'm gonna you. Know go. Off-script and. Generalize. This for laws, for a second here and say and I'll talk abouts a little bit more but data, scientist to me is akin. To data, analyst, as well I'm not sure when they're talking about it here, if they're talking about more. Of a true data scientist, somebody does formal, statistical modeling. And comes up with machine-learning api's and those kind of things or if, they're just talking about somebody that uses data to solve problems. Which is a very general, way of thinking about the process of data analysis, a data scientist, being probably the most advanced, role.
In The realm of data analysis, so you know just think about that that the number one best job in America and, this includes all kinds of jobs in fact I think number two is like a CPA. Like a tax guy or something like that so that's. Just insane. That I mean data is hot right now and so, I hope you, know what I share with you today is going to be important, for you to understand you know the journey you want to go on so. Let's talk about some technologies, out all. Right and this one I don't know if this would be a surprise to people or not people, that know me it certainly won't be but. Excel. I. Can't. Say enough about excel it is. Probably. The most powerful piece. Of software ever made. It. Helps, us you, know run, the world essentially, I still think that OPEC, the the, cartel, that creates their you know controls oil prices, from the Middle East probably, sits around with the pivot table figuring. Out what the price of oil for the world should be, Excel. Is that kind of a thing it comes up I mean the the famous. Harvard. Economist, study about, GDP, you. Know which turned out to be wrong it was an excel error, I'll blog. About that in the future I'm sure mean. Excel really is is one of the most powerful things so I think it's it's really critical for anyone. In any role in. The. Tech world today, I mean you're really in business or anything and, most people probably know that that's probably not a big surprise, there the thing that I would say is that the people that maybe are skeptics, of this so probably. The biggest skeptics, on I've encountered our database. People so, people that are hardcore database, developers, that know sequel, they think I Excel you know can't handle too many rows I can't write sequel against it you know that's my my, hammer you know that I use for everything III. Would I would encourage, you to take a look you know I use Excel to actually automate the writing of sequel at times and I, also use it to do things like build simple data models and create, database, tables, so, you can use Excel to say. Yourself time in other programming, languages so whether or not you know excels your hammer that you use for everything which, that'd be tough if you're you know doing a lot of data work but. You, know it can it can has many different purposes and that's actually one of the gift, in the curse of Excel is that it's. Such a generic. Product, because they have such a wide audience, that, they try to serve that, it won't take you all the way to you. Know to completion, there it'll, get you about 50. 60 % but a lot of it is going to be on you to finally complete the project, using Excel and so, that's, one of those things that's a that, you know people love and hate I love, it and I recommend you know regardless, of what. Job or what your role. Is is that you you take a look and see how it can benefit you, okay. Now on to the kind of hot stuff the fun tech, that's going on in the data world the first thing need to pause and have some coffee. Brought. To you by Stone, Brewing Company. If. You guys anyone, from Stone is watching you want to send me a beer to drink on the show I will, happily do that so anyways. Back. To regularly. Scheduled programming, the, key technology, is right now so. The first one I want to talk about is sequel this. Is a. Query. Language that is, universal. To all databases. Now, you. Know caveat, or Asterix there that no.
Sequel, Databases, which, by the way since are not only sequel, but not no sequel like the absence of sequel. Those. Things are some like you know MongoDB. Or some of the HDFS. Or some of the other, databases. That, we call databases, that aren't really you, know exclude. Them from that list if you're a database, you support, sequel, if you don't you're not a database, I guess that's my stance on it so sequel, works with with all the databases and each one has its own flavor so, you know MySQL. Has its own flavor of sequel so it supports, the standard, ansi, standard, sequel which aren't many programming, languages that have an ansi standard. There's. You know MySQL. Has its own flavor Oracle, has pl/sql. Sequel. Server. Microsoft thing has a T, sequel, the. Microsoft thing like it's just a thing Mike sequel server is huge you. Know all of them Postgres they all have their own version, of sequel that extends. Beyond the ANSI standard but, at the base level they all support, you know a lot of the common functionality. So select, statements, you know group, bys players and all those kind of things so what that means is that if, you know this one language you can talk to nearly all databases. That exist which. Is great because if you're a data geek like, me or you're in a data role you, kind of don't care you can come into a company okay what kind of database you do you have quality, the right you know tool that I can connect to that database then I can then I can execute my queries because I can write queries because, I know the standards. The. Next one is Python and Python. I absolutely love it, is one of the few technologies, that actually incorporate, Zen into. It they have these these principles, and the, Zen of Python and it's one that is just beautiful and easy to read and, incredibly. Simple to learn and of, course because as a big community, like a lot of these there's. You. Know whatever you're looking to do has been done before so you don't need to reinvent the wheel you can google and you know copy and paste from Stack Overflow or whatever, whatever you want to do so, Python is another one that is huge, in the data world right now Python of course is more general than just data but, in the data realm especially, the data science and data engineering, it's huge. Tableau. Is another one and this may be a bit controversial, or maybe, not I mean you know if you guys follow my stuff you know that I love tableau and I teach and talk about it a lot I'm, hoping to speak at the tableau conference this year all those kind of things so anyways, this, one is huge but all generalizes, a little bit and say that the BI in analytics, tools, and, tableau. Is in my mind the best one out there QlikView. Is another popular one that is also a leader if you saw the recent bi Magic Quadrant there. Was a the. Three leaders left in the, top right quadrant, their tableau, was one click. View was another and Microsoft. Was the other I would say Microsoft is definitely playing catch-up to the to the other two and tableau, I think, is the true leader because they're the ones that really have, revolutionized. The whole bi in analytics, world with hat with their approach to self-service. Analytics and making, it simple and easy for people to visually, explore their data I don't want to get on a sales pitch about tableau you guys have heard me do that enough but, point.
Being, You're. Bi an analytics, tool I recommend, tableau. Is huge. And and right now it's it's super important, for people to learn that, now. The other one is our and I, hate the name just because anytime you search for it you, just get all kinds of crap results, but, R is a open, source, statistical. Modeling. Programming. Language essentially and there's, variants. Of it there's our studio our server, there's, shiny dashboards. There's a whole realm of stuff popping up around our and this, is largely. Used by data scientists. But I would say that it's it's finding other applications, outside of that so people that aren't you. Know classically. Trained in statistics, or some of the other applied. Mathematical. Principles. Are using, R to understand. Data better and to make you know graphics and everything like that so really, powerful, tech, all. Right so those in my world or my peer in my opinion are the top four, technologies. In, data, right, now. Now. We'll switch gears and I want to talk about who you are and. Hopefully. If you're watching this. You're. One of these three roles, and I don't all have a fourth role mentioned, but I don't want to I don't want to highlight it as a data role so the, first one is the knowledge worker so, the knowledge worker is the. Person, that is, a. Business, person that, is using data to make decisions to. Run the business to to, do whatever their their business or I say business but I mean organization, you know I used to work for Mozilla so, we had a foundation and, we didn't refer to ourselves as a business but whatever, your your your organization. Or company or business or whatever is, you. Know there are people that use data to make decisions hopefully. A lot of people hopefully there's a this role this. Persona hopefully, applies to a really, broad range of folks and. So you, know I would contend that even c-level, folks should be knowledge, workers in. That this is a big, big market and it's, it's really the ones that the people that take you, know the insights, from the. That, were developed for you or the dashboards, or whatever and apply, them and actually make the difference so some of them you know the last, mile in the journey if it were from, you know where data starts, to where it actually has an impact. The. Next is the analyst and the scientist, and the data analysts and scientists are the persons that are. People that will take data. Generally. You know either from collecting, it however they can from scraping. It from the web or pulling it from a database or downloads, from CSV files or whatever and. And making sensitive, so this is the real exploratory. Work this is really fun work because you get to learn a lot and this, is constantly, evolving and there's just a huge opportunity to be really creative here, about how you use data. Then. You have the engineer, and the engineer is the one that really kind of makes this whole thing hum I mean, without them the the. Pieces don't fit together the data doesn't flow you. Know someone. Told me recently I was having a chat with a friend and they. Were saying that something like 70 to 80 percent of data scientists, jobs is. Collecting. And organizing data. So, that then they can do. Analysis. On it and. I thought that was ridiculous, you know I think that, that, that's just not how I not. Structured, my teams my organizations, that way so, that's. Insane to me that that. Companies. Would hire somebody or, expect. Somebody who is. Extremely. Hard to find extremely. Valuable, and. Have. Them do kind, of the the heavy lifting of just moving data around I mean a data scientist, should have you, know in theory, that it's it's like a chef coming in you, know to to, the restaurant, where. They, should have all the tools, laid. Out prepped, cleaned exactly, how they like them and. Then they should have all the food. Ready to go and they just make, these beautiful. Dishes. Or you know these beautiful, creations that's, what the analysts and scientists, role should be it, shouldn't be you, know Emeril. Doesn't come into his kitchen and go, chop tomatoes you know to make for the salad somebody's chop the tomatoes for Emeril so that's my point you should have somebody chop the tomatoes for your data analysts, and data scientists first and. That would be the data engineer, or the data engineering team. Okay. So, on to the next one who. Should learn what. Well. On the left here I'm just going to put up our knowledge. Worker or analyst and scientists and our engineer, and then, on top up put our programming, languages so. The first one is Excel, so the, knowledge worker obviously needs to know that in fact they're probably the most familiar with it and they probably try to do everything in Excel, one of them one of the most ironic things you know that you find and I found throughout my career is.
You. Spend all this time building these dashboards and, trying to make it easy for knowledge workers to to. Find answers to their questions and, get the job done and, still. The most common denominators, can i download it to excel and, that's. That's. It's. Unfortunate, because the. Idea, is to, not have to do that because often what people do then is they try to join it up with other data or they try to try to mangle it together or you know fit. It into the model they want and then make their own thing in Excel and it's like we, can do that for you you know or you can, teach you other tools and ways of doing it so it. Anyways. It Excel, is obviously key sequel. Is another one I have, a fun story back in my first, real data, role I was working at a, call center in Phoenix in the late, 90s and my, boss who, was a pure business guy. Who. You. Know his. Role was to help understand, customer service staffing, so, what we did or what I did is we looked at the schedules, for. Our inbound, sales. Actually, so it was customer, service and sales calls coming in and we're trying to trying to balance the staffing levels I come people are on the phones at this time which, means we, have to you know predict, how many calls we're going to get which things, like you know what, marketing campaigns or whatever if there's an outage that kind of stuff and then, you know think about other people's, situations like hey you know so-and-so has vacation, they need to take in all this and we're talking about about two thousand people in, a call center so, you. Know lots of data and it's, really a numbers game trying to fit all these things together his job he actually ran that for a number of call centers my job was to help you know work with the data there and and, the funny part about this story is he's a complete, business person right he's not a tech person he's not he's not a, developer. He. Knew sequel, and it blew me away I thought holy crap so here's a business person and we're talking late 90s, um. Where. He's, writing sequel, code to figure out how to do his job and first, I thought this, guy's freaking awesome and then I also thought man like you, know if he is writing sequel I need he's stepping it up because I thought sequel was kind of the end of, the. Skills I needed at the time, service. Knowledge. Workers yes, - sequel, the. Other one is tableau and. Again if you don't have tableau you should go try it out but, if you you know have a different VI tool whatever it may be that's fine, do that one knowledge, workers need to use this this is the nature of self-service. Analytics right, because Excel, and sequel, well sequel is going to be hard especially for complex, analysis, and. Excel has its limitations for, as great as it is tableau. It goes beyond that you know it kind of is the best of both worlds there it's easy to use like Excel there's, really not coding, or you can code in it but. You know Donna really required you can get a lot done without coding and and. It's it's one that you, know it can handle large sums of data connect, two databases connect, to web data sources and all that kind of thing so, tableau. Is an absolute must. For the knowledge worker. Then. You have on, the analyst and scientist of course Excel, sequel. To you. Know the analysts and sites are going to have to get, down in in query databases I know, a lot of people that are actually tableau, experts, or QlikView experts that don't know sequel, and that blows my mind you, know I think it, depends where you come from some, people come from the knowledge worker side like their business person and they just learn tableau and out their tableau expert. But. They aren't, really tact, like, they're not they don't have a they're not a technologist. Which. Is a term from the 90s that we used to use we used to think of ourselves not as a net. Not, as a you, know a developer, an IT guy or administrator, or whatever we were technologists. Like we just we, were you know jack-of-all-trades. So to speak so if. You're an analyst or scientists equals a must I don't care you know who like, what your background is if you're now are in that role learn, sequel it's super easy it's not crazy hard to learn so don't worry you know don't be intimidated by it our is another one that I would say is is. Required. Or becoming required. If. You're on it if your data scientist absolutely, and, you know I guess there's some some, difference of opinions between R and SPSS or some of the other ones whatever but, a stats package, is the, one there I recommend R.
Then. I'm going to put a dotted one around Python, because, I think this. This is really powerful again. You know part of an analyst and scientist job is to kind of claw and scratch the, data together and, Python. Can allow you to do that unique ways that none, of these other tools can so I recommend. You know learning at least the basics of that and, then. Tableau is a must as well so there's, a lot in the data realm, you know the analysts, and scientists there's a lot on your plate you are really. The workhorse of, this whole process and and so it really revolves around you so you know there really is nothing that that you shouldn't become. Good with or at least proficient, with that to some extent not, to say that you won't have the things that you that. You lean towards you know based, on your your experience or whatever what you like so the, engineer then Excel. Obviously, sequel, yep and this. Is where the difference is the engineer is really heavy in Python so the data engineer, uses Python to move data from place, to place to, manipulate, data a common, framework is we use. Python to take data from wherever, it comes from from an FTP, site, from. An API. From. You know a database whatever from wherever, lives outside. Data warehouse or our analytics, you know warehouse and. We pull, that in using, Python and then we use things like sequel, to actually manipulate it, through the process inside. Of. Our environment, and of course there are lots of other tools there and other ways to do that. I really. Hate, getting stuck in the in the data engineering toolset, because. They all have their limitations and when you get stuck and you can't do something it's just incredibly. Infuriating so, you. Know, it. The, thing is you know in rant. About that for a second is if, you. If you know how to write the code you can get done all the things you could get done with say and informatica, or tool like that in. Probably. About the same amount of time and, and it's really just about as hard to maintain and everything I mean it's it's a watch but you have the ultimate flexibility. You, know some people gravitate, towards tools because they're afraid to write code or they're afraid of command-line, don't, be it's it's not I mean if, you're a technology, especially if you're data engineer then, you, know this the command line is your friend and writing, coach and be your friend too I. Put. Tableau on here as well with a kind. Of a dotted line around it I think. This. One is. This. One is good because engineers, need. To show their work and just, like everyone else now it's not required you, don't have to but, I actually know a lot of data engineers that have benefited, from knowing how to do stuff in tableau or at least the basics you know they.
Can Spit out some results they can test something performance. On the server or whatever and you, and you can you use tableau to visualize that data it's pretty straightforward. Okay. So, hopefully that is a good picture that. Is a good picture for you of depending. On what role you're in or want to be in and what. Types of skills you should, learn I'm. Not going to go into which one you should learn first I would probably just recommend, going after the one I'd try them all out go, after the one that you find the most have. The most interest in and try to you know go, down that rabbit hole and have some success there sort of play to your strengths at first without. Worrying about your weaknesses you know to get going. All. Right where. To find more. So. Here, I'm. Going to talk a couple plural. Courses obviously, I'm an author on Pluralsight, and. I, have a lot of stuff there these. First two are courses, of mine that I recommend. If you're going down this path data analytics, hands-on takes you from soup to nuts it covered all, of these and many other many, other topics in kind of the one inch deep. Level, and then, points you where you can go deeper if, you want to get into that so data modeling, star schemas, ETL. All those kind of things and, tableau fundamentals, of course is just how to get up and running with tableau and by the way we're doing some new tableau stuff there's a new partnership I'll show you some. Cool stuff the. Tableau. Just announced, they have a learning. Partnerships, with us, Pluralsight. Linda. And a couple others so lots. More tableau, courses coming out if you're interested in that on a plural site there's. Also an introduction, to sequel so this is a great way to get going with sequel and there's, a beginning data visualization, with our so you, know we've got all these things covered, on. The Pluralsight, courses oh I forgot to add the Python, one so yeah there's tons of Python courses on Pluralsight as well so. In code school, we've. Got two so Pluralsight. Cost money you can do a free trial for 14, days or email me and I can hook you up with a longer-term, trial and. On. Code school though this is all free they. There, is a paid membership as, well but, these two courses totally, free you probably have to create an account I think but. You know whatever. You have to pay for anything and try, sequel, and try are now the cool thing about code school their difference there is that, these. Are for the absolute beginners so if you're brand new to sequel. Or brand new to our it is, probably. The best way to learn you. Have a person, talking explaining. The concepts, very clearly you, have great graphics, and then you have the. Coding, in the browser so, it's a very interactive way where it's like a person talking diagram. Explaining, something now, your turn it's like a little coding challenge in your browser and these. Guys are just the the production quality of their content, is just. Beyond anyone elses it really is the greatest stuff now, you, know contrasting, the Pluralsight polar site is more I would say advanced so, are more professional, so if you're already. Know how to install, tools and connect to a database and stuff like that you. Know that's, a great place to get going much deeper in kind, of the space code school is much much, higher level and much more beginner content, and, a great way to dive, in to a new technology, they have a ton more stuff but, relevant. To to our talk here there's these couple courses and. Then, I'm going to just mention one other one as well it's not just about promoting, stuff that, that. I get paid for it's about promoting you know sharing my knowledge with you in helping you guys learn data, camp comm, is really cool and has a whole, a true, you know different, tracks and kind of like Python and in our tracks.
For Becoming a data scientists. Okay. So. That's. All for the slides let me jump over now and I will show you I, will, check if there's any questions, and if not we'll, call it a day. All. Right looks, like we don't have any questions, now so if you do have anything and you want to follow, up or questions. About this talk or about this podcast or this podcast this video blog email. Me at help, at Ben songs comm and I'll, see you guys next week ciao.