Ryan Swanstrom

What is a “Data Lake”?

Mar 12, 2014

—

I have frequently been hearing the term data lake. Being the curious person that I am, I decided to go in search of a definition.

Currently, the company Pivotal is responsible for marketing the term. However, I believe the term was originally coined by Dan Woods of CITO Research back in 2011. Anyhow, here is a basic description of a data lake.

A data lake is an information system consisting of the following 2 characteristics

A parallel system able to store big data
A system able to perform computations on the data without moving the data

Currently, Hadoop is the most common technology to implement a data lake, but it might not be that way forever. Thus it is important to distinguish the difference between Hadoop and a data lake. A data lake is a concept, and Hadoop is a technology to implement the concept.

The following is a recent Strata Talk by Kaushik Das of Pivotal. He discusses how a data lake can be used to create the digital brain.

Comments

2 responses to “What is a “Data Lake”?”

OreillyMedia, Data Science, Data Scientist, Pivotal One, Machine learning, real time data

March 13, 2014

[…] What is a “Data Lake”? […]

Reply
johnkrol2014

July 29, 2014

Reblogged this on johnkrolblog.

Reply

What is a “Data Lake”?

A data lake is an information system consisting of the following 2 characteristics

Comments

2 responses to “What is a “Data Lake”?”

Leave a ReplyCancel reply

Discover more from Ryan Swanstrom