7 Important Data Science Papers

It is back-to-school time, and here are some papers to keep you busy this school year. All the papers are free. This list is far from exhaustive, but these are some important papers in data science and big data.

Google Search

  • PageRank – This is the paper that explains the algorithm behind Google search.

Hadoop

  • MapReduce – This paper explains a programming model for processing large datasets. In particular, it is the programming model used in hadoop.
  • Google File System – Part of hadoop is HDFS. HDFS is an open-source version of the distributed file system explained in this paper.

NoSQL

These are 2 of the papers that drove/started the NoSQL debate. Each paper describes a different type of storage system intended to be massively scabable.

Machine Learning

Bonus Paper

  • Random Forests – One of the most popular machine learning techniques. It is heavily used in Kaggle competitions, even by the winners.

Are there any other papers you feel should be on the list?


Originally Posted

in

,

by

Last Modified:

Comments

25 responses to “7 Important Data Science Papers”

  1. Arni Avatar

    Maybe include a literature survey on neural networks? Not my field of specialty so I won’t recommend one, but I know it’s about to get red hot.

    1. Ryan Swanstrom Avatar

      That is true. Also, a paper on random forests would be nice as well. I may have to look for best papers on those topics.

      Thanks,
      Ryan

    1. Ryan Swanstrom Avatar

      Thank you very much for the link. That would the correct random forest paper to add. I have added it to the list.

      Ryan

  2. richarddunks Avatar

    Reblogged this on Datapolitan and commented:
    Great list of resources, though I’d add E.F. Codd’s seminal “A Relational Model of Data for Large Shared Data Banks”: http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

  3. Subhadeep Paul Avatar
    Subhadeep Paul

    Another paper on MapReduce that helped me a lot in writing algorithms in MapReduce framework is this one http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_725.pdf
    and to handle large data sets, Breiman’s Pasting small votes paper
    http://sci2s.ugr.es/keel/pdf/algorithm/articulo/1999-ML-Breiman-Pasting%20Small%20Votes%20for%20Classification%20in%20Large%20Databases%20and%20On-Line.pdf

  4. […] It is back-to-school time, and here are some papers to keep you busy this school year. All the papers are free. This list is far from exhaustive, but these are some important papers in data science…  […]

  5. […] It is back-to-school time, and here are some papers to keep you busy this school year. All the papers are free. This list is far from exhaustive, but these are some important papers in data science…  […]

  6. garyshort Avatar
    garyshort

    Reblogged this on Gary Short.

    1. Ryan Swanstrom Avatar

      thanks for reblogging

  7. […] It is back-to-school time, and here are some papers to keep you busy this school year. All the papers are free. This list is far from exhaustive, but these are some important papers in data science…  […]

  8. charles Avatar

    great resource list. any papers on statistics as it changes to handle larger datasets?

    1. Ryan Swanstrom Avatar

      That is a great idea. I have not looked for any papers on that topic. If I find any, I will post it to the blog. Thanks for the comment.

      Ryan

  9. […] too long ago, I posted a list of 7 Data Science Papers. Since then, I have found a few more interesting and more recent developments in data […]

  10. Kartik Avatar
    Kartik

    Nice list !! Also, consider Amazon.com recommendations http://www.win.tue.nl/~laroyo/2L340/resources/Amazon-Recommendations.pdf

    1. Ryan Swanstrom Avatar

      Oh, that is a good. I have not seen it before. Thanks for the recommendation on the recommendation paper.

      Ryan

    2. Sharad Avatar

      Thanks for the awesome list, Ryan!

      Thanks for sharing the Amazon recommendations PDF, Kartik! Unfortunately, the link doesn’t seem to work any longer. Can you re-post it please?

    3. alltumdata Avatar

      Thanks for sharing the list or paper, Ryan!

      Thanks for sharing the Amazon recommendations PDF, Kartik!
      Unfortunately, the link doesn’t seem to work anymore. Can you please re-post it?

  11. […] It is back-to-school time, and here are some papers to keep you busy this school year. All the papers are free. This list is far from exhaustive, but these are some important papers in data science…  […]

  12. […] from Data Science 101, several papers that have been important in the evolution of big […]

  13. […] the past, the blog has included 7 Important Data Science Papers and 5 More Data Science Papers. Here is another list if you are looking for something to read over […]

  14. […] 7 Important Data Science Papers […]

Leave a Reply

Discover more from Ryan Swanstrom

Subscribe now to keep reading and get access to the full archive.

Continue reading