16+ Hours of Video Instruction
R Programming: Fundamentals to Advanced is a tour through the most important parts of R, the statistical programming language, from the very basics to complex modeling. It covers reading data, programming basics, visualization. data munging, regression, classification, clustering, modern machine learning and more.
Data scientist, Columbia University adjunct Professor, author and organizer of the New York Open Statistical Programming meetup Jared P. Lander presents the 20 percent of R functionality to accomplish 80 percent of most statistics needs. This video is based on the material in R for Everyoneand is a condensed version of the course Mr. Lander teaches at Columbia. You start with simply installing R and setting up a productive work environment. You then learn the basics of data and programming using these skills to munge and prepare data for analysis. You then learn visualization, modeling and predicting and close with generating reports and websites and building R packages.
Table of Contents
Lesson 1 Getting Started with R: R can only be used after installation, which fortunately is just as simple as installing any other program. In this lesson you learn about where to download R, how to decide on the best version, how to install it and you get familiar with its environment, using RStudio as a front end. We also take a look at the package system.
Lesson 2 The Basic Building Blocks in R: R is a flexible and robust programming language and using it requires understanding how it handles data. We learn about performing basic math in R, storing various types of data in variables—such as numeric, integer, character and time-based—and calling functions on the data.
Lesson 3 Advanced Data Structures in R: Like many other languages, R offers more complex storage mechanisms such as vectors, arrays, matrices and lists. We take a look at those, and the data.frame, a special storage type that strongly resembles a spreadsheet and is part of what makes working with data in R such a pleasure.
Lesson 4 Reading Data into R: Data is abundant in the world, so analyzing it is just a matter of getting the data into R. There are many ways of doing so, the most common being reading from a CSV or database. We cover these and also importing from other statistical tools, and scraping websites.
Lesson 5 Making Statistical Graphs: Visualizing data is a crucial part of data science both in the discovery phase and when reporting results. R has long been known for its capability to produce compelling plots, and Hadley Wickham’s ggplot2 package makes it even easier to produce better looking graphics. We cover histograms, boxplots, scatterplots, line charts and more.
Lesson 6 Basics of Programming: R has all the standard components of a programming language such as writing functions, if statements and loops, all with their own caveats and quirks. We start with the requisite “Hello, World!’ function and learn about arguments to functions, the regular if statement and the vectorized version, and how to build loops and why they should be avoided.
Lesson 7 Data Munging: Data scientists often bemoan that 80% of their work is manipulating data. As such, R has many tools for this, which are, contrary to what Python users may say, easy to use. We see how R excels at group operations using apply, lapply and the plyr package. We also take a look at its facilities for joining, combing and rearranging data.
Lesson 8 Manipulating Strings: Text data is becoming more pervasive in the world, and fortunately, R provides ways for both combing text and ripping it apart, which we walk through. We also examine R’s extensive regular expression capabilities.
Lesson 9 Basic Statistics: Naturally, R has all the basics when it comes to statistics such as means, variance, correlation, t-tests and anovas. We look at all the different ways those can be computed.
Lesson 10 Linear Models: The workhorse of statistics is regression and its extensions. This consists of linear models, generalized linear models–including logistic and Poisson regression–and survival models. We look at how to fit these models in R and how to evaluate them using measures such as mean squared error, deviance and AIC.
Lesson 11 Other Models: Beyond regression there are many other types of models that can be fit to data. Models covered include regularization with the elastic net, bayesian shrinkage, nonlinear models such as nonlinear least squares, splines and generalized additive models, decision tress and random forests.
Lesson 12 Time Series: Special care must be taken with data where there is time based correlation, otherwise known as autocorrelation. We look at some common methods for dealing with time series such as ARIMA, VAR and GARCH.
Lesson 13 Clustering: A focal point of modern machine learning is clustering, the partitioning of data into groups. We explore three popular methods: K-means, K-medoids and hierarchical clustering.
Lesson 14 Reports and Slideshows with knitr: Successfully delivering the results of an analysis can be just as important as the analysis itself, so it is important to communicate them in an effective way. This communication can take the form of a written report, a Web site of results, a slide show or a dashboard. In this lesson we focus on the first three, which are made remarkably easy using knitr, a package written by Yihui Xie.
Lesson 15 Package Building: Building packages is a great way to contribute back to the R community and doing so has never been easier thanks to Hadley Wickham’s devtools package. This lesson covers all the requirements for a package and how to go about authoring and distributing them.
LiveLessons Video Training series publishes hundreds of hands-on, expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. This professional and personal technology video series features world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, IBM Press, Pearson IT Certification, Prentice Hall, Sams, and Que. Topics include: IT Certification, Programming, Web Development, Mobile Development, Home & Office Technologies, Business & Management, and more. View All LiveLessons http://www.informit.com/imprint/series_detail.aspx?ser=2185116
|Lesson 1: Getting Started with R|
|2. Learning objectives||00:00:00|
|1_1. Download and install R||00:00:00|
|1_2. Work in The R environment||00:00:00|
|1_3. Install and load packages||00:00:00|
|Lesson 2: The basic building block in R|
|2_1. Use R as a calculator||00:00:00|
|2_2. Work with variables||00:00:00|
|2_3. Understand the different data types||00:00:00|
|2_4. Store data in vectors||00:00:00|
|2_5. Call functions||00:00:00|
|Lesson 3: Advanced Data Structure in R|
|3_1. Create and access information in data frames||00:00:00|
|3_2. Create and access information in lists||00:00:00|
|3_3. Create and access information in matrices||00:00:00|
|3_4. Create and access information in arrays||00:00:00|
|Lesson 4: Reading Data into R|
|4_1. Read a CSV into R||00:00:00|
|4_2. Understand that Excel is not easily readable into R||00:00:00|
|4_3. Read from databases||00:00:00|
|4_4. Read data files from other statistical tools||00:00:00|
|4_5. Load binary R files||00:00:00|
|4_6. Load data included with R||00:00:00|
|4_7. Scrape data from the web||00:00:00|
|Lesson 5: Making Statistical Graphs|
|5_1. Find the diamonds data||00:00:00|
|5_2. Make histograms with base graphics||00:00:00|
|5_3. Make scatterplots with base graphics||00:00:00|
|5_4. Make boxplots with base graphics||00:00:00|
|5_5. Get familiar with ggplot2||00:00:00|
|5_6. Plot histograms and densities with ggplot2||00:00:00|
|5_7. Make scatterplots with ggplot2||00:00:00|
|5_8. Make boxplots and violin plots with ggplot2||00:00:00|
|5_9. Make line plots||00:00:00|
|5_10. Create small multiples||00:00:00|
|5_11. Control colors and shapes||00:00:00|
|5_12. Add themes to graphs||00:00:00|
|Section 6: Basics of Programing|
|6_1. Write the classic GÇ£Hello, World!GÇ¥ example||00:00:00|
|6_2. Understand the basics of function arguments||00:00:00|
|6_3. Return a value from a function||00:00:00|
|6_4. Gain flexibility with do||00:00:00|
|6_5. Use if statements to control program flow||00:00:00|
|6_6. Stagger if statements with else||00:00:00|
|6_7. Check multiple statements with switch||00:00:00|
|6_8. Run checks on entire vectors||00:00:00|
|6_9. Check compound statements||00:00:00|
|6_10. Iterate with a for loop||00:00:00|
|6_11. Iterate with a while loop||00:00:00|
|6_12. Control loops with break and next||00:00:00|
|Lesson 7: Data Munging|
|7_1. Repeat an operation on a matrix using apply||00:00:00|
|7_2. Repeat an operation on a list||00:00:00|
|7_3. The mapply||00:00:00|
|7_4. The aggregate function||00:00:00|
|7_5. The plyr package||00:00:00|
|7_6. Combine datasets||00:00:00|
|7_7. Join datasets||00:00:00|
|7_8. Switch storage paradigms||00:00:00|
|Section 8: Manipulating Strings|
|8_1. Combine strings together||00:00:00|
|8_2. Extract text||00:00:00|
|Section 9: Baisic Statistics|
|9_1. Draw numbers from probability distributions||00:00:00|
|9_2. Calculate averages, standard deviations and correlations.||00:00:00|
|9_3. Compare samples with t-tests and analysis of variance||00:00:00|
|Lesson 10: Linear Models|
|10_1. Fit simple linear models||00:00:00|
|10_2. Explore the data||00:00:00|
|10_3. Fit multiple regression models||00:00:00|
|10_4. Fit logistic regression||00:00:00|
|10_5. Fit Poisson regression||00:00:00|
|10_6. Analyze survival data||00:00:00|
|10_7. Assess model quality with residuals||00:00:00|
|10_8. Compare models||00:00:00|
|10_9. Judge accuracy using cross-validation||00:00:00|
|10_10. Estimate uncertainty with the bootstrap||00:00:00|
|10_11. Choose variables using stepwise selection||00:00:00|
|Lesson 11: Other Models|
|11_1. Select variables and improve predictions with the elastic net||00:00:00|
|11_2. Decrease uncertainty with weakly informative priors||00:00:00|
|11_3. Fit nonlinear least squares||00:00:00|
|11_6. Fit decision trees to make a random forest||00:00:00|
|Lesson 12: Time Series|
|12_1. Understand ACF and PACF||00:00:00|
|12_2. Fit and assess ARIMA models||00:00:00|
|12_3. Use VAR for multivariate time series||00:00:00|
|12_4. Use GARCH for better volatility modeling||00:00:00|
|Section 13: Clustering|
|13_1. Partition data with K-means||00:00:00|
|13_2. Robustly cluster, even with categorical data, with PAM||00:00:00|
|13_3. Perform hierarchical clustering||00:00:00|
|Section 14: Report and Slideshows with knitr|
|14_1. Understand the basics of LaTeX||00:00:00|
|14_2. Weave R code into LaTeX using knitr||00:00:00|
|14_3. Understand the basics of Markdown||00:00:00|
|14_4. Weave R code into Markdown using knitr||00:00:00|
|14_5. Use pandoc to convert from Markdown to HTML5 slideshow||00:00:00|
|Lesson 15: Packlage Building|
|15_1. Understand the folder structure and files in a package||00:00:00|
|15_2. Write and document functions||00:00:00|
|15_3. Check and build a package||00:00:00|
|15_4. Submit a package to CRAN||00:00:00|
|Summary of R programing|
|106. Summary of R Programming LiveLesson||00:00:00|