BigData Course Summary

Introduction to Big Data and Hadoop
Hadoop ecosystem - Concepts
Hadoop Map-reduce concepts and features
Developing map-reduce applications
Pig concepts
Hive concepts
Oozie workflow concepts
HBASE Concepts
Real Life Use Cases

Introduction to Big Data and Hadoop

What is Big Data?
 What are the challenges for processing big data?

What technologies support big data?

What is Hadoop?

Why Hadoop?

History of Hadoop

Use Cases of Hadoop

Hadoop eco System

HDFS

Map Reduce
 Statistics

Understanding the Cluster

Typical workflow

Writing files to HDFS

Reading files from HDFS

Rack Awareness

5 daemons

Map Reduce

Before Map reduce

Map Reduce Overview

Word Count Problem

Word Count Flow and Solution

Map Reduce Flow

Algorithms for simple problems
 Algorithms for complex problems

Developing the Map Reduce Application

Data Types

File Formats

Explain the Driver, Mapper and Reducer code

Configuring development environment - Eclipse

Writing Unit Test

Running locally

Running on Cluster

Hands on exercises

How Map-Reduce Works

Anatomy of Map Reduce Job run

Job Submission

Job Initialization

Task Assignment

Job Completion

Job Scheduling

Job Failures

Shuffle and sort

Oozie Workflows

Hands on Exercises

Map Reduce Types and Formats

MapReduce Types

Input Formats - Input splits & records, text input, binary input, multiple inputs & database input

Output Formats - text Output, binary output, muliple outputs, lazy output and database output

 Hands on Exercises

Map Reduce Features

Counters

Sorting

Joins - Map Side and Reduce Side

Side Data Distribution

MapReduce Combiner

MapReduce Partitioner

MapReduce Distributed Cache

Hands Exercises

Hive and PIG

Fundamentals

When to Use PIG and HIVE

Concepts

Hands on Exercises

HBASE

CAP Theorem

Hbase Architecture and concepts
 Programming and Hands on Exercises

Big Insights with Big Data

Tuesday, February 5, 2013

BigData Course Summary

No comments:

Post a Comment