A very nice slidedeck from Jeff Hammerbacher of Cloudera. It goes over k-means clustering and some enhancements.