-
Cloudera Machine Learning Slides
A very nice slidedeck from Jeff Hammerbacher of Cloudera. It goes over k-means clustering and some enhancements. 20130521mlmeetup from Jeff Hammerbacher Read more
-
Deep Learning – A Term To Know
Deep Learning is a new term that is starting to appear in the data science/machine learning news. What is Deep Learning? According to DeepLearning.net, the definition goes like this: Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its… Read more
-
42 Big Data Startups – Vote for the Top 10
Startup50’s list of 42 Big Data Startups. The voting the done, but the list contains plenty of startups working in the data science field. Read more
-
Openstack at NSA
The following video goes well with the previous post about Open Source Alternatives to AWS. It says a lot for the quality of OpenStack, since one the world’s most secretive organizations trusts it. OpenStack might be a good option for data teams needing to quickly build and deploy data products. Note: This post has nothing… Read more
-
Open Source Alternatives to AWS
Working with big data can often mean doing some cloud computing. If a public cloud like Amazon AWS is not an option, there are some open source alternatives. They all offer some level of compatibility with the AWS API for both EC2(compute) and S3(storage). Rackspace OpenStack Apache CloudStack Eucalyptus OpenNubula Read more
-
Making Data tell a Story
I don’t think anybody does it better than Hans Rosling. In the following video he helps to explain population growth, child mortality, and fossil fuel usage based upon wealth. I love how he uses toy blocks and chips to help visualize his point. See the original post from the Guardian, Hans Rosling: the man who’s… Read more
-
A very nice visualization of the Central Limit Theorem
The blog post, Central Limit Theorem Visualized in D3, was posted last week. The post does 2 very nice things. First, it provides a nice visual of what the central limit theorem means. Second, it displays the wonderful power of the javascript library, D3. Read more
-
Is Data Science Your Next Career? via IEEE
IEEE Spectrum’s Techwise Conversations just published an excellent podcast titled Is Data Science Your Next Career?. The author of the podcast interviews Chris Wiggins of Columbia University. Note: If you don’t enjoy podcasts, the link contains the entire text for reading as well. Read more
-
Data Science Short Courses from Zipfian Academy
Zipfian Academy, the same company that is creating the 12 week intensive data science training course, will be offering a series of 6 short courses on data science. The courses will be 1.5 hours each and will be taught live in San Francisco. For those of you that cannot be in San Francisco, the courses… Read more
-
Tornado Path Visualization
Here is a data visualization of the paths of tornadoes in the US over the past 56 years. The brighter the blue, the more intense the tornado. This is also an excellent example of using opendata. The raw data is available at data.gov. Tornado Tracks infographic by johnmnelson. Please continue to pray for the people… Read more
-
Probabilistic Programming and Bayesian Methods for Hackers Online Book
Probabilistic Programming and Bayesian Methods for Hackers is an open source online book. The book is developed with iPython, so it can be read in a variety of formats: web, PDF, or locally with iPython installed. Also, contributions are welcome via the Github repository for the book (or you can email the authors). This is… Read more
-
Probabilistic Programming and Bayesian Methods for Hackers Online Book
Probabilistic Programming and Bayesian Methods for Hackers is an open source online book. The book is developed with iPython, so it can be read in a variety of formats: web, PDF, or locally with iPython installed. Also, contributions are welcome via the Github repository for the book (or you can email the authors). This is… Read more

