“The Data Lake is an important concept that has many potential uses in high-performance computing.”
With the darling of Big Data, Hadoop, having its 10th birthday on January 28th, 2016, I’d like to talk about how a company of any size can take advantage of Big Data technologies and a “Data Lake” in particular…
Simply put, a Data Lake is a repository that stores your lowest level of data in its raw, unprocessed form. Examples of this kind of data include every click made on your website, every interaction with your mobile app, or every action of your product components.
Fast-forward to today and all of the major software vendors have a Data Lake solution including Microsoft, IBM, Oracle, SAP, EMC and Hitachi. The concept recently grew into the supercomputing realm, with Los Alamos National Laboratory announcing an open-source project called MarsFS which provides a Data Lake capability for high-performance computing.
MarFS supports trillions of files and individual files in the zettabyte (one billion terabyte) range. As Gary Grider, High Performance Computing Division Leader at Los Alamos, puts it, “MarFS provides a much needed Data Lake that can contain years of data for long-running projects such as nuclear stockpile simulations.” He adds: “The Data Lake is an important concept that has many potential uses in high-performance computing.”
Here are my recommendations for achieving this…Read more
Post by – James Dixon