Category: education
-
Help for Academic Programs in Data Science
Brandon Rohrer (along with others) created an excellent resource for academic programs, Industry recommendations for academic data science programs. The resource is authored by a number of industry data scientists and university faculty. It is collection of useful information for college data science programs. Here are some of the topics: What do Industry data scientists do? What…
-
Berkeley Undergrad Data Science Course and Textbook
The University of California at Berkeley has started The Berkeley Data Science Education Program. The goal is to build a data science education program throughout the next several years by engaging faculty and students from across the campus. The introductory data science course is targeting freshman and it is taught from a very applicable and…
-
Learn Apache Spark this Summer with edX
edX has just announced a new series of Big Data courses. The series consists of 2 courses focused around Apache Spark. If you are not familiar with Spark, it is a very fast engine for large-scale data processing. It claims to perform up to 100 times faster than hadoop. Here are the 2 courses: Introduction…
-
Process Mining Course via Coursera
Process mining is a bridge between data mining and business process modeling. Process Mining can be used to study event and log files to extract meaning. The Coursera course, Process Mining: Data science in Action, starts November 12, 2014.
-
Foundations of Data Analysis on EdX
EdX will be offering Foundations of Data Analysis via the University of Texas at Austin. The course starts November 4, 2014. Here is a list of topics: Tutorials on using R Descriptive Statistics Statistical Models (Regression) Inferential Stats
-
What is Maching Learning
Machine Learning is a term that can mean different things to different people. Andrew Ng, cofounder of Coursera and Professor at Stanford, provides two definitions in his popular Machine Learning Course. The first definition comes from Arthur Samuel around 1959. Field of study that gives computers the ability to learn without being explicitly programmed. The…
-
R Commands for Cleaning Data
This post is notes from the Coursera Data Analysis Course. Here are some R commands that might serve helpful for cleaning data. String Replacement sub() replace the first occurrence gsub() replaces all occurrences Quantitative Variables in Ranges cut(data$col, seq(0,100, by=10)) breaks the data up by the range it falls into, in this example: whether the…
-
R Graph Commands for Data Analysis
This post is notes from the Coursera Data Analysis Course. Here are some basic R commands for creating some graphs. Exploratory Graphs boxplot barchart hist plot density Final Graphs for a report Final graphs need to look a little nicer. They must also have informative labels and a title and possibly a legend. plot(data$column1, data$column2,…
-
9 problems with Real World Regression
This list comes from the Coursera Data Analysis Course. Linear and Logistic Regression are some of the most common techniques applied in data analysis. Here is a list of possible problems with regression in the real world. Confounders – variable that is correlated with both the outcome and other variables in the model Complicated Interactions…
-
First Steps to Data Analysis in R
This post is notes from the Coursera Data Analysis Course. Here are some basic R commands that should useful for obtaining data and looking at data in R. Ideally these commands are useful for steps 4, 5, and 6 of the 11 Steps to Data Analysis. Load the data and just look at it download.file(‘http://location.com’,…
-
Levels of Data Analysis
The list is ordered according to the level of difficulty. Descriptive just describe the data, common for census type of data Exploratory find relationships that were not clear beforehand, useful for defining future studies, remember correlation does not imply causation Inferential use a small dataset to say something about a larger population, most common goal…
-
Videos for Learning R
All of the videos from the Computing from Data Analysis Coursera course are available on Youtube. If you are interested in learning R or just need a refresher on some of the topics, these videos could serve as a great resource. Week 1 installing R, data types, reading/writing files Week 2 functions, apply, sapply, other…