-
Intoductory Machine Learning Textbook
After numerous semesters of teaching an introductory course on machine learning, Max Welling, Professor at University of California Irvine, decided to compile an introductory textbook titled, A First Encounter with Machine Learning (PDF link). Read more
-
Startups working on Machine Learning as a Service (MLaaS)
BigML – A great interface. Just upload your data and it shows basic information for each column such as a histogram and mean values. See the Gallery for some examples of the final models. Wise.io – Just launched, but it looks to be a serious contender. It was started by a team from UC Berkeley.… Read more
-
A Couple Good Python Resources
In just the past month, a couple of great resources for learning python have been created. Getting started with Python: Tips, Tools and Resources – If you are new to python, this is a great place to start. It contains a brief description and links to books, tutorials, and MOOCs. Getting Started With Python for… Read more
-
Free Textbook and Toolkit: Natural Language Processing with Python
This is an online, HTML version of the book, Natural Language Processing with Python. The book is a companion for NLTK which is a free, open source toolkit, written in python, for Natural Language Processing (NLP). Read more
-
10 Big Data Best Practices
10 Big Data Implementation Best Practices This is a great article and list of topics to remember when working on big data projects. Here is the list. Gather business requirements before gathering data Implementing big data is a business decision not IT Use Agile and Iterative Approach to Implementation Evaluate data requirements Ease skills shortage… Read more
-
R Commands for Cleaning Data
This post is notes from the Coursera Data Analysis Course. Here are some R commands that might serve helpful for cleaning data. String Replacement sub() replace the first occurrence gsub() replaces all occurrences Quantitative Variables in Ranges cut(data$col, seq(0,100, by=10)) breaks the data up by the range it falls into, in this example: whether the… Read more
-
Strata Videos to Watch
Here are a few of the recent Strata videos I would recommend. Video Games: The Biggest Big Data Challenge – Video Games generate Big Data Big Data on Small Devices: Data Science goes Mobile – How data can help build better mobile apps Distributed Environmental Data: On the Ground at the Data Sensing Lab –… Read more
-
Data Science for Social Good Summer Fellowship
The University of Chicago and Argonne National Labs are hosting Data Science for Social Good Summer Fellowship 2013. The Fellowship program is open to students at all levels whom are interested in working on real-world social problems. The program takes place in Chicago and the application deadline is April 1, 2013, so apply soon. Read more
-
Online Textbook Publishing Platform?
About a week ago I posted a link to a free data mining textbook. Hacker News got wind of the book as well, and I am guessing a flood of traffic hit the textbook’s site. The flood happened to take the site completely down for a couple of days. It was a shame because the… Read more
-
Quandl Excel Add In
A few weeks ago, I blogged about Quandl, a search engine for datasets. Well, they have just released an Excel add in that allows a person to pull a dataset from Quandl straight into an Excel spreadsheet. It is very new, so Quandl would appreciate your comments and any bugs you may find. Read more
-
Big Data Journal: 5 articles to highlight
The inaugural issue of Big Data was published a few weeks ago. The journal is excellent. The articles are relevant, readable, and free. In the first issue, most of the articles were not super technical (meaning there was not a lot of equations or algorithms). I would like to highlight just 5 of the articles… Read more
-
Definition of Big Data
Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it. This definition is provided by Edd Dumbill, Editor-in-Chief of Big… Read more

