HADOOP & IT’S TECHNOLOGY STACK

Tuesday, January 8, 2013

HADOOP & IT’S TECHNOLOGY STACK

HADOOP & IT’S TECHNOLOGY STACK

HADOOP is an open-source implementation of frameworks for reliable, scalable, distributed computing and data storage. It enables applications to work with thousands of nodes and petabytes of data, and as such is a great tool for research and business operations. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers.

The HADOOP stack includes more than a dozen components, or subprojects, that are complex to deploy and manage. Installation, configuration and production deployment at scale is challenging.

The main components include:

Hadoop -- A Java software framework to support data-intensive distributed applications.

· Zookeeper -- A highly reliable distributed coordination system.

· MapReduce -- A flexible parallel data processing framework for large data sets.

· HDFS -- Hadoop Distributed File System.

· Oozie -- A MapReduce job scheduler.

· Hbase -- Key-value database.

· Hive -- A high-level language built on top of MapReduce for analyzing large data sets.

· Pig -- Enables the analysis of large data sets using Pig Latin.

Pig Latin -- A high-level language compiled into MapReduce for parallel data processing

Image Source: http://www.capgemini.com/technology-blog/files/2011/12/hadoop.jpg

Big Insights with Big Data

Tuesday, January 8, 2013

HADOOP & IT’S TECHNOLOGY STACK

No comments:

Post a Comment