HADOOP & IT’S TECHNOLOGY STACK
HADOOP is an open-source implementation of frameworks for reliable, scalable, distributed computing and data storage. It enables applications to work with thousands of nodes and petabytes of data, and as such is a great tool for research and business operations. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers.
The HADOOP stack includes more than a dozen components, or subprojects, that are complex to deploy and manage. Installation, configuration and production deployment at scale is challenging.
The main components include:
The main components include:
Hadoop -- A Java software framework to support data-intensive distributed applications.
· Zookeeper -- A highly reliable distributed coordination system.
· MapReduce -- A flexible parallel data processing framework for large data sets.
· HDFS -- Hadoop Distributed File System.
· Oozie -- A MapReduce job scheduler.
· Hbase -- Key-value database.
· Hive -- A high-level language built on top of MapReduce for analyzing large data sets.
· Pig -- Enables the analysis of large data sets using Pig Latin.
Pig Latin -- A high-level language compiled into MapReduce for parallel data processing Image Source: http://www.capgemini.com/technology-blog/files/2011/12/hadoop.jpg
No comments:
Post a Comment