Blog

  • NYU Launches New Center For Data Science

    New York University has just launched some Data Science programs via the new Center for Data Science. … to establish the country’s leading data science training and research facilities at NYU. Part of the announcement is an M.S. in Data Science. Applications for the initial class, starting Fall 2013, are now being accepted. The Center… Read more

  • Quandl – A Search Engine for Datasets

    I just found this site a couple days ago. Quandl is a new startup that is a search engine for datasets. The site really has a lot of data (over 2 million datasets). Plus the data can be sorted, filtered, graphed, combined, and finally downloaded in many different formats (Excel, JSON, R, csv, XML). Most… Read more

  • 12 Useful Tips for Machine Learning

    Pedro Domingos of the Department of Computer Science and Engineering at the University of Washington provides a very useful paper with tips for machine learning. The paper is title, A Few Useful Things to Know about Machine Learning [pdf]. Below are the 12 useful tips. LEARNING = REPRESENTATION + EVALUATION + OPTIMIZATION IT’S GENERALIZATION THAT… Read more

  • 10 R packages

    Yhat, a new predictive modeling startup, wrote up a nice blog post about 10 R Packages I wish I knew about earlier. It is worth reading through the list. Special Thanks to Mark Nickel for pointing out this link. Read more

  • Free Natural Language Processing Book

    Natural Language Processing for the working Programmer Beyond the title, no more explanation is needed. Read more

  • 11 Steps to Data Analysis

    Here is a list of Steps to Data Analysis from the Data Analysis Coursera course. Define Question Define Ideal Dataset Define what data you can access Obtain the data Clean the data Exploratory Data analysis Statistical prediction Interpret results Challenge Results Writeup results Create reproducible code for others to recreate Update: A couple of comments… Read more

  • Nice GraphDB and NoSQL Talk

    This is a wonderful talk by Max DeMarzi (he has a very informative blog as well). If you are new to NoSQL or Graph Databases, I highly recommend this video. One comment stuck out for me: You’re never gonna run out of nodes when you get to half a trillion… That is a really big… Read more

  • Buffalo Bills to start advanced analytics department

    Even the NFL is getting into data analysis these days. Buffalo Bills to start advanced analytics department Personal note: Like many American children, I grew up dreaming of playing professional football in the NFL. Also, like many American children, that dream did not come true. Maybe now I could try to make the NFL as… Read more

  • Levels of Data Analysis

    The list is ordered according to the level of difficulty. Descriptive just describe the data, common for census type of data Exploratory find relationships that were not clear beforehand, useful for defining future studies, remember correlation does not imply causation Inferential use a small dataset to say something about a larger population, most common goal… Read more

  • Videos for Learning R

    All of the videos from the Computing from Data Analysis Coursera course are available on Youtube. If you are interested in learning R or just need a refresher on some of the topics, these videos could serve as a great resource. Week 1 installing R, data types, reading/writing files Week 2 functions, apply, sapply, other… Read more

  • Free Data Analysis Textbook

    Cosma Shalizi of the Statistics Department at Carnegie Mellon University is working on an Advanced Data Analysis from an Elementary Point of View textbook. A copy of the textbook will remain freely available on the website. Since the textbook is still being created, comments are welcome. Read more

  • Data Analysis by Data Type

    Data analysis is performed in many different fields and on many different types of data. Most fields call it something different. The following list comes straight from Jeff Leek’s Data Analysis Coursera class. Name of Data Analysis by Data Type Biostatistics for medical data Data Science for data from web analytics Machine learning for data… Read more

data science 101 logo

Data Science 101

One of the oldest blogs on data science, started in 2012.

Threads Dev Interviews

Interviews with Developers on Threads