Hi. Welcome, to this data science dojo beginner tutorial, on getting started with Python and R for data science in this beginner tutorial, will take you through some common Python, in our packages, and libraries used for machine learning and data analysis, as, well as go through a simple linear regression model, will, also help you setup Python in R on your Windows Mac or Linux machine, run. Your code locally and push your code to a github repository so. Let's get started with installing Python and R, to. Install python on a Windows machine we. First need to check if our machine is 64-bit. Or 32-bit as, this, will determine the appropriate, Python program to install. To. Do this search. For about, your PC. And. You'll. See if your machine is 64-bit. Or 32-bit in. My. Case its, 64-bit. Next. In, your web browser type. Python.org. /. Downloads, /, windows and. Scroll. Down to the version of python you wish to download in. My. Case I'll, choose the latest version for 64-bit, executable installer, you. Can, go with the default installation, or you can do a custom installation, to, include optional features such as pip or. You can specify your path directly under C so. It's easier to locate your Python program later on, in. Just. Click install. Once. Python has installed on your computer you'll, need to add python to your path to be able to run Python scripts in a directory or folder. Download. Kit for Windows to. Set your path and run the Python command the. Command using this program are basically the same when using terminal in Mac or Linux. Alternatively. For, Windows you, can use the default command, prompt by searching CMD. You. Can also set your local path by, searching environment. Variables, and. Setting. Your path there. Here's. An example of a Python script, saved, in my documents project. One folder. Using. A text editor of my choice such, as notepad plus plus to write my Python code I saved, my file as a dot PI file. Then. I open. My terminal which. Is in C program. Files. Get. Get. Sam D I. Navigate. To documents. Project one. And. I set my local Python path, so. We'll set this up permanently, using. A bash RC file. With. The path to my Python program directly. Under C. Now. I simply type PI. Followed. By the name of the file and extension. If. Using Python, 2.7. Just. Type Python followed by the name of the file and extension. If. We were to hit enter to run this it, would produce the output of my code which. Has predicted, Heights using a linear regression model. The. Final part of this Python window setup is. Installing, pip to be able to easily install Python, packages and libraries. Pip. Might not have come with your installation if you didn't customize your installation, or it. Might not be installed in an older version of Python. So. To get pip, typing, your web browser, bootstrap. Piper. Don't. I yo /. Gift -, Pepe, and right. Click to save, in your Python program folder and then. Run the command Python, get - picked up pi, so. My Python programs under C. Moving. On to installing our four windows simply. Type in your browser cran. Da, -, project, org, slash. Bin slash windows, slash base and select. The 32 or 64-bit. Once. It is downloaded press, ok and. Click. Next to all. Once, our has installed on your computer you, can simply open the program on your desktop and start typing our commands or code I, recommend. You to download our studio as it just makes the process of editing and debugging your code easier. Otherwise. You're, welcome to use the our command line. To. Save an our file click. On file. File. History and this. Will save your code so you can run it later if you wish to. Set. Your path or working directory, just. Simply type, set. W D, followed. By the path to, where you would like to store your our files locally. You. Might need to use double backslash for Windows as Windows understands, this to means separators, in the path. Now. Let's, install Python, on a Mac go to, Mac terminal in finder. Applications. Utilities. And. Now we're going to store all command line utilities, Xcode as this will help with the installation. So. Type Xcode. -. Select, -. - install. Click. Install and. Agree. Now. We're going to use homebrew, to install Python so. Type slash. USR slash bin, slash, Ruby. And. We're. Going to use curl. And, we're going to type. The URL to homebrew on github. Press. Return. Into. Your password if need be. Next. Add the path so. We will create a bash, RC file to. Permanently add the path. If. You get an error message stating. Cannot write to path try. The pseudo channel command accompanying, this video all. Commands can be copied and pasted as they accompany this video.
Next. We'll install. Python. So just brew install Python. Or. Python 3 if using Python 3. We'll. Also add this to, our path. So, we'll create another -, I see file. Now. To check if pip is as stall as party a Python program simply, type which pip and. I'll show you the location where. Your pip is installed, and. If you want to check out the version just type pip - V and I'll show you which version of people you've installed as, mentioned. Pip is useful for easily installing Python packages and libraries. Moving. On to our to, install this on a Mac after installing, homebrew, simply. Type brew, tap. Homebrew. Slash. Science. And. Then. Type brew, install. Our. To. Open the our command line simply. Type our and enter. Now. Let's install Python, and R on Linux I'm. Using a Bunty later. Versions of Ubuntu might already have Python installed but, I'll take you through the process anyway. So. Open your terminal. Okay. Now. We're going to type, sudo. Apt-get. Install, python. 3.6. Or 2.7. Now. We're going to type. Sudo. Apt. - get. Install. Python. -, set up tools. Lastly. Install. Pip to easily install python libraries in packages by typing, sudo. Ez. Underscore. Install, pip. To. Install our on Linux simply, type, sudo. Apt. - get. -. Why. Install. Our -, base. Now. Type uppercase R and enter to open the our command. Line, now. That we've got the setup and installation, part of this tutorial out of the way we, can now move on to more fun stuff. Let's. Have a quick play with some data to get you familiar with some key data analysis, and linear regression concepts. As, well as basic scripting for this I'm. Going. To go through an example of a simple linear regression in Python. And are using, simulated, data on people's, height in centimeters and, their weight in kilograms. The. Model is based on a formula which can be produced using Python, in our functions, that, gives a predictor comm or estimated, y-value given. A certain x-value, at a certain constant, and slope. Here. Is what's called the regression line I like. To think of it as a line of predicted, values. Along. The x axis for. A given x value. The. Line predicts, the Y value to fall about here in height. The. Actual values, are slightly, above and below the line but. The model is generalized, enough to take into account where most cases would probably fall. The. Formula, gives a constant, value here, which. We add this to a given x. Value multiplied. By a given coefficient. Or slope. The. Constant, means when X is at 0 y, is at this value and. The. Slope means for every one unit increase in X Y. Increases by this number of units. So. We can use this formula to plug in any new x value of a person's weight to. Predict their height or Y value of, course. There, are many other factors not only weight that could influence a person's height, hence. We're just looking at a very simple model to get started with. To. Implement linear regression, in Python we first need to install a few commonly used packages. We'll. Open our terminal and install. SK, learn for modeling. If. Using, Python 2.7, just. Type Python, -. M pip install. Now. We're going to pip install pandas, for data importing. We'll. Also install matplotlib, for, plotting. The, last package do we need to install is just site by. Next. Go, to your text editor and save a new Python file in Documents project 1 or a folder of your choice. So. I'll just call my file LM, model. Save. It as a Python file. Also. Don't forget to CD into this folder in terminal so you can run your script leader. Now. We're going to import these packages, at the beginning of the script where it runs so, at the top of the file will type from. SK learn. Import. Linear. Model. So. Our linear regression tool. We're. Also going to important, data frame from pandas. We. Also want to use pandas as PD. We'll just use it as pendous and. We. Want to import matplotlib, and use it as PLT. Now, we need to read in our data which you can download as part of this tutorial and save in your current folder, will. Use the pandas read table function for this. So. We'll put our data and. Variable. And we'll just call it input, data. And. We'll. Use the read table function. And. We'll give the data file name an, extension, in our folder. Its. Comma separated, as it's a CSV file and. We. Have headers and they start at line 0 and. We'll. Give our X&Y headers, specific, names. This. Automatically, infers. The data types for each column too before.
Applying A linear regression model let's. Plot the data using matplotlib, plot function, to see if the data naturally, follows a linear pattern and the normal distribution, as linear regression, is not appropriate, or useful for, datasets that don't follow this assumption. So. We'll use a scatter, plot. And. We're just putting weight, versus, height. So. Weight is on our x-axis. And. Height. Is on our y-axis. We'll. Need to show this graph so, it can render on our screen. Now. Save and run the script. As. We can see the, data is linear and full is a normal distribution making. Linear, regression appropriate, to use on these data. Now. We'll define our X predictor, variable weight and our Y outcome, variable height. So. We'll use PD as pendous and. We'll. Use the data frame, function. And. We'll use weight. As. Our predictor. And. We'll make height. Our. Outcome variable. Now. Will fit a model to, the data using the fit function and use, this to predict height to given weight. So. We're using a linear regression model. And, we'll fit the model to the data. We. Can now compare the first say six predicted, values using the predict function with the actual height values to see if they're on par. So. First we're going to get all the predicted values. And, we're going to use our predictor. Variable, to, predict the outcome. And. We'll. Just print some, sub heads to differentiate. The list of predicted, values from, the actual. And. We'll have a look at the first zero, to six predictions. And. We'll. Compare, with, the first zero. To six actual, values. You. Oh right. We'll. Save and run the script. The first few predictions with the actual shows the model was not far off the mark which. Is good however to properly assess, a model we. Can use measures such as R squared which is the percentage of explained variants. So. We'll go back to our script and, we're. Going to use the score function, to get the R squared. You. And, we want to print this obviously. Now. We're just going to comment out the above lines as we.
No Longer want to view these. We'll. Save and run our script again. As. We can see a high r-squared shows, the, model explained most or nearly all of the variance which is good however, relying, solely on a squared is probably not good enough when assessing and measuring our models predictions, sometimes. It can be misleading to look at the r-squared, but, the course will go through other measures you can use to. Perform the same analysis, in R well first install commonly, used our package ggplot2, which is used for effectively, visualizing. And analyzing data. I'll. Select a crane mirror that's close to me. We. Need to load ggplot2, whenever, we want to use it. We'll. Read in our data using, the read table function, we'll. Put our data in a variable. We. Use read table. We'll. Give it our file in, our current working directory. Its. Comma separated. And. We do have headers and we'll just use the default header names x and y. This. Automatically, in first out types to, will. Also attach our data frame so we can refer to column headers or variable names without having to refer to the name of our data each time making. This more convenient. Now, we'll plot the data to see its normal distribution, but we can also use ggplot2, to, plot the regression line or the line of best fit. So. We'll plot our, x. And y. Which. Is weight and height. And. In the smooth function, will. Specify a linear model. As. We. Could say before the. Actual heights are close to the predictions, of the line. Implementing. A simple linear regression in R is quite easy using the LM function. Now. To see the first few predictions of height will, use the predict function. We. First need to get all, of the predictions. And. We're. Just going to print, the first few to, have a quick look. So. The first 0 to 6. And. We'll. Compare with our actual values. As. Seen. Before for the first few cases the predictions, are pretty close. To. Print the r-squared, or percentage of explained, variants for assessing the model will. Use summary. As seen. Before it explains, nearly all the variants but. It's a good idea to also look at errors or other measures for this, finally. Now, that we're finished we'll, detach our data. In. The last part of this tutorial we'll push our code to a github repository so, you can share your code publicly, or store it privately if you wish, you. Can create a github account for, free you, can also follow a data science dojo to clone or access a copy of the code provided as part of the course material. Once. You have created an account add, a new repository without, initializing, year via. The github website. The. Instructions, to push a code to get up around the website but I'll take you through the process anyway. First. Open your terminal and, CD, into your current project directory and you'll. Need to configure your user name and, user. Email. Now. Configure, your username. We'll initialize our project, directory, as L get repository. Then. We'll add all files in our project folder we're, not pushing it live yet it's just selecting, the files. Commit. Your files to track the first mission with the message should you wish to publish updates, later on. So. I'm just gonna say first go at. Implementing. Simple. Linear regression. As. You can see all the files in project 1 folder are. There. Now. We're going to give, the URL of our main repository so go to the main page of your github repo. And. Copy. The URL and. We're, going to paste it into the terminal when adding a remote, repo. Finally. We're going to push our code to the repo and github master, branch. Now. If you have a look at your github repo you can see all your files are there all the work we have done in this tutorials here. Alternatively. After, initializing your github repo by the site you. Can simply drag and drop your project folder onto the main page of your repo. Now. That you've gone through the basics, you should feel ready to dive into the course and gain a deeper and wider understanding of data science, you. Know how to set up Python and are in your machine how to do basic scripting, for reading and visualizing, data how. To apply a model and assess it and now, you can share your hacks and projects on github the.
Data Used in this tutorial the. Coded examples, that commands, the URLs to programs, and so on are all accompanying, this video. My. Name is Rebecca Merritt feel free to reach out to me by commenting on this video I'm, more than happy to help you get ready before you start your course thanks, for watching and happy analyzing. You.
2018-03-04