The Data Science programming / analytics languages to know are, R and Python. If you're in Operations Research or another analytics field that somewhat fits under the "Data Science" hat, you: a) already know them really well, b) want to brush up on them, or c) you probably should learn them now. Here I compile my thinking on how to learn R and Python from Beginner to the Intermediate and Advanced levels, based on having tried some of these course materials.
The swirl package within R, by the Biostatistics team at Johns Hopkins University
Forecasting using R (link), by Rob Hyndman from Monash University in Australia and Revolution Analytics (the enterprise R solution)
This is true for programming, analytics, and learning any foreign languages.
"Just do it", is how you get experienced.
There is no course on this stuff (i.e. being advanced), not without a PhD _plus_ years of field work.
My best suggestion is use your curiosity. Find a problem. Dig into it.
Plus, work with other people that are really good.
Beginner (doing basic analysis)
R:
Computing for Data Analysis on Coursera and Youtube (weeks 1, 2, 3, 4), by Roger Peng from Johns Hopkins University- Summary: It covers the basics of conditioning and loop structures, R's syntax, debugging, Object Oriented Programming, performing basic tasks with R, such as importing data, basic statistical analysis, plotting and regular expressions. See syllabus for more.
- Time commitment: 11~36 hours total, including:
- non-programmers: 4 weeks X [3 hours/week on video + 2~6 hours/week on exercises]
- programmers: [3 hours of notes reading + 8~16 hours] on exercises
- Advice for:
- non-programmers: Listen to all lectures (videos), make sure you understand all details, and do all the exercises to hone your skills. Programming is all about practicing. Doing the exercises are important. See below for "Advanced".
- programmers: Don't bother with the videos, go straight to the lecture notes (link). Read the notes - much faster than the videos. if you don't understand anything, look up the video and watch, or google the topic. Then do all the exercises. You don't need me to tell you that practice is king (um, and cash too).
- Summary: It aims to teach R and Statistics within the R environment itself, through a package called swirl. See the announcement here for more detailed info.
- I haven't tried this, so I'm not sure how much time it takes or how good it is. However, I think it sounds pretty good, and deserves a mention. I was never a fan of reading books to learn a programming language. Show me the code, or in this case, let me write the code, and get involved, is much more, well, involving.
Python:
Google's Python course (link)- Summary: It's straight to the meat, no non-sense stuff, and covers all the important things. Suits my style. Enough said, so see the course page on the syllabus.
- Time commitment: 8-10 hours
- including reading notes and doing exercises
- Note, this is for experienced programmers. There are videos too, but don't bother. The notes on the course page are the same, and it always takes less time to read than watch.
Intermediate (building analytical models)
R:
Data Analysis with R on Coursera and Youtube (plus class notes), by Jeff Leek from Johns Hopkins University- Summary: It covers the full modelling cycle, from getting data, to structuring the analysis pipeline, exploring with graphs and statistical analysis, modelling (clustering, regression and trees), and model checking with simulation. It also talks about important statistical watch-outs like p-values, confidence intervals, multiple testing and bootstrapping. More syllabus here.
- Time commitment: 32~56 hours
- including 8 weeks X [2~3 hours/week videos + 2~4 hours/week exercises]
- Summary: topics include "seasonality and trends, exponential smoothing, ARIMA modelling, dynamic regression and state space models, as well as forecast accuracy methods and forecast evaluation techniques such as cross-validation. Some recent developments in each of these areas will be explored" (quoted from course site). Read more there.
- Note: I haven't done this (just started), so I'm not sure about its time requirement or quality. I'm also not sure if they are planning to make available the lectures. Time will tell on these questions.
Python / Octave:
Machine Learning on Coursera, by Andrew Ng from Stanford University --> My Favourite!- Summary: The course actually teaches in the Octave language, but it all can be done in Python. I suppose you can do it twice, first in Octave, and then in Python, if you've got the time. It certainly would solidify your understanding of the material, and Andrew Ng is sure that Octave is rather important in Machine Learning. It assumes some prior knowledge of linear algebra and probability, and refreshes you on some basics. "Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI)." (quoted from the course website)
- Time commitment: 50~90 hours
- including 10 weeks X [2~3 hours/week videos + 3~6 hours/week exercises]
- Note: this course covers a subset of the statistical and modelling principles from the Data Analysis with R course above, but the overall level is more advanced. I enjoyed this course the most.
Advanced (you follow the drift from above)
Advanced = Experienced.This is true for programming, analytics, and learning any foreign languages.
"Just do it", is how you get experienced.
There is no course on this stuff (i.e. being advanced), not without a PhD _plus_ years of field work.
My best suggestion is use your curiosity. Find a problem. Dig into it.
Plus, work with other people that are really good.
Happy learning!