-
Github Is Cool: They Like Data
Today, GitHub announced the release of archived public activity data called the GitHub public timeline. The dataset can be queried via the Google BigQuery tool. To make things even more awesome, GitHub is also hosting a Data Challenge. The challenge is to play around with data and create the best visualization possible. You better start Read more
-
Large Scale Text Processing with MapReduce: A Free Textbook
Data-Intensive Text Processing with MapReduce is a Free online (PDF) textbook about text processing on large amounts of data. The 1st edition has been available for a couple of years, and a 2nd edition is in the works. Here is quick overview of some of the topics. Mapreduce Graph Algorithms Text Processing Happy Reading (and Read more
-
Data Science For Trucking? Who Knew?
How To Manage One Million Vehicles With Big Data – Forbes. The above link goes to a great story about using data science. What makes the story great is the company. It is not a science company or a tech startup. It is a truck management company. Data science is truly reaching all industries. Read more
-
Data Without Borders is now DataKind
The Non-Profit Organization Data Without Borders has renamed itself to DataKind. Here is the official announcement. DataKind is an organisation that matches data from non-profit and government organisations with data scientists. DataKind hosts weekend DataDives and they are planning to build a DataCorps. See a previous post, Use Data Science to Help The World, to Read more
-
Big data: The next frontier for innovation, competition, and productivity | McKinsey & Company
Big Data: The next frontier for innovation, competition, and productivity | McKinsey Global Institute | Technology & Innovation | McKinsey & Company. This report by McKinsey & Company is frequently referenced, so I thought I should post a link to it. It includes the following quote about the lack of talent to fill Big Data Read more
-
Twitter, NoSQL and Data Analysis
This is a lengthy but very good slide deck on the what/why of the tools used at Twitter. Note: The slide deck is about 2 years old. NoSQL at Twitter (NoSQL EU 2010) View more presentations from Kevin Weil Read more
-
Hans Rosling and GapMinder
Hans Rosling, co-founder of GapMinder Foundation, provides a good Ted Talk about HIV in the world. He does an excellent job of using data to highlight countries(not continents) that have the most serious problems. He also states some reasons why HIV/AIDS is not dropping off as quickly in some rich countries. Here is a second Read more
-
Coursera is Expanding – New Courses Starting Today
Since recently announcing $16M in funding, Coursera has been making quite a bit of noise. Last fall, Stanford University decided to freely offer a couple computer science classes online. The response was huge, and that led to the creation of Coursera. The courses are no longer limited to computer science, and Stanford is no longer Read more
-
It’s Big Data Week
It’s Big Data Week Read more
-
Machine Learning: Algorithms that Produce Clusters | Architects Zone
Machine Learning: Algorithms that Produce Clusters | Architects Zone. The above article provides a nice brief overview of 5 clustering algorithms. K-Means Hierarchical Clustering Fuzzy C-Means Multi-Gaussian with Expectation-Maximization Density-based Cluster This goes well with a previous post about 6 Machine Learning Algorithms. Read more
-
Data Scientists: The New Rock Stars of the Tech World
Data Scientists: The New Rock Stars of the Tech World. Troy Sadkowsky of DataScientists.net conducted a very nice interview with Jake Porway of the NY Times R&D Lab and Data Without Borders. Here are some of the questions that got answered. Why Data Scientists are Tech’s Rock Stars? What does a Data Scientist do? How Read more
-
Books Can Tell More Than One Story
This is a very entertaining Ted Talk about how what books can tell us over time. Just watch the video. Read more

