Tuesday, January 8, 2013

HADOOP & IT’S TECHNOLOGY STACK

HADOOP & IT’S TECHNOLOGY STACK

HADOOP is an open-source implementation of frameworks for reliable, scalable, distributed computing and data storage. It enables applications to work with thousands of nodes and petabytes of data, and as such is a great tool for research and business operations. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers.

The HADOOP stack includes more than a dozen components, or subprojects, that are complex to deploy and manage. Installation, configuration and production deployment at scale is challenging.

The main components include:

Hadoop  -- A Java software framework to support data-intensive distributed applications. 
·         Zookeeper -- A highly reliable distributed coordination system. 
·         MapReduce -- A flexible parallel data processing framework for large data sets. 
·         HDFS -- Hadoop Distributed File System. 
·         Oozie -- A MapReduce job scheduler. 
·         Hbase -- Key-value database. 
·         Hive -- A high-level language built on top of MapReduce for analyzing large data sets. 
·         Pig -- Enables the analysis of large data sets using Pig Latin.
Pig Latin -- A high-level language compiled into MapReduce for parallel data processing



Image Source: http://www.capgemini.com/technology-blog/files/2011/12/hadoop.jpg

No comments:

Post a Comment