-
10 Great R Packages
These slides are targeted at Kaggle competitions, but the R packages can be helpful to anyone using R for data analysis. The slides were created by Xavier Conort, a winner of multiple competitions. 10 R Packages to Win Kaggle Competitions from DataRobot Read more
-
Stanford Releases Large Network Datasets
Stanford University has just released a collection of large datasets of network data. When I say network data, I am referring to the mathematical term of networks (think of a collection of nodes and edges). Here are just a few of the possible categories. Citation Networks Road Networks Web graphs Social Networks such as twitter… Read more
-
An Organization for Opendata and Healthcare
Health Data Consortium is an advocacy group focused on helping the healthcare industry respond to the availability of health data. They are currently focused on innovation and the uses of open health data. Healthcare is currently undergoing some radical changes and data science is going to play a key role in the future of healthcare.… Read more
-
Analytics Handbook: Book 3 is Free
The team that brought you the Analytics Handbook, has freely published the third and final book, titled THE DATA ANALYTICS HANDBOOK RESEARCHERS + ACADEMICS. This book focuses on data science in research and academics communities. Like the previous 2 books in the series, it includes interviews with top experts in the field. Here are just… Read more
-
Huge List of Big Data and Machine Learning Technologies
Onur Akpolat has put together A curated list of awesome big data frameworks and resources. The list is very extensive and includes: NoSQL databases, machine learning libraries, frameworks, filesystems and more. On a similar note, Joseph Misiti has compiled a large list of machine learning specific resources. The list is titled, Awesome Machine Learning, and… Read more
-
Data Science Productivity Platform
Tristan Zajonc, cofounder of Sense Platform, gave a recent thought-provoking talk at Data Driven NYC. He spoke about the future of data science productivity. According to Tristan: In the next 2 or 3 years, everybody doing data science should be using a data science productivity platform…a cloud-based data science platform. In addition to the productivity… Read more
-

Data Scientist vs Data Engineer
As the field of data science continues to grow and mature, it is nice to begin seeing some distinction in the roles of a data scientist. A new job title gaining popularity is the data engineer. In this post, I lay out some of the distinctions between the 2 roles. Data Scientist A data… Read more
-
Statistical Programming Languages Infographic
There are an abundance of statistical programming languages available, and the fine folks at DataCamp started to compile some of the data about the languages. They then produced the infographic at the bottom of the post. To start with, SAS, R, and SPSS are the 3 languages being compared. Here are 3 bits of information… Read more
-
Caltech Machine Learning Course Now on EdX
The widely popular Caltech course, Learning from Data, will be offered on EdX this fall. The course starts September 25, 2014, and it will run for 10 weeks. Here is an abbreviated list of the course topics. Linear Models Bias/Variance Neural Networks Cross Validation and much more EdX offers a number of other Data Science… Read more
-
Wanna Be a Data Engineer? – Insight Data Engineering Can Help
Last week, I got the opportunity to spend some time with the team from Insight Data Engineering. They offer a free program that trains people to be data engineers. Then they help those people connect with a job at an impressive company. The program runs a few times a year and consists of 6 intense… Read more
-
Data Science Startup Ideas
Individualized Online Data Science Training – Create a series of training materials that can walk a student through different data science topics. The student should be able to go as slow or fast as possible, and it would be even better if mentors could be crowdsourced for help with projects. The training should cost a… Read more
-
Data Science Stack Exchange
Stack Exchange, the company that brought Stack Overflow into the world, has recently released Data Science Stack Exchange. This is a site for asking and answering questions related to data science, big data, and machine learning. Oh yes, and it is free. Read more

