BIG DATA

About Big Data

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.

Big Data Benefits

As the volume of data continues to grow, its potential for business seems to be growing exponentially as Big Data management solutions evolve allowing companies to turn raw data into relevant trends, predictions, and projections with unprecedented accuracy. Companies that use comprehensive Big Data analytics solutions reap the benefits, gaining even more insights that drive intelligent decision-making. Some of the benefits of Big Data analytics include…

  • Identifying the root causes of failures and issues in real time
  • Fully understanding the potential of data-driven marketing
  • Generating customer offers based on their buying habits
  • Improving customer engagement and increasing customer loyalty
  • Reevaluating risk portfolios quickly
  • Personalizing the customer experience
  • Adding value to online and offline customer interactions

BIG DATA & HADOOP

Module 1 - Introduction of Big Data And Hadoop

In this module, will discuss about Big Data. How Big Data impact in our social life & its important role. How Hadoop is helpful to manage & process Big Data. Hadoop Ecosystem & its Architecture. Hadoop components: HDFS & Mapreduce manage to store & process Big Data.

  • • Understand what is Big Data
  • • What is Hadoop
  • • Hadoop Eco-System Components
  • • Introduction to HDFS
  • • Hadoop Processing: MapReduce Framework
  • • Hadoop Server Roles: NameNode, Secondary NameNode, and DataNode
  • • Anatomy of File Write and Read.

Module 2: Playing around with cluster (Hadoop Cluster) :

In this module, we will learn to set up Hadoop Cluster on five different mode. How to configure important files. Data loading & processing.

  • • Hadoop Cluster Architecture
  • • Hadoop Cluster Configuration files
  • • Hadoop Cluster Modes
  • • See the concepts working
  • • Writing into HDFS
  • • Adding a Datanode
  • • Removing a Datanode
  • • Balancing

Module 3- Map-Reduce Basics and implementation :

In this module, will work on Map Reduce Framework.How Map Reduce implement on Data which is stored in HDFS . Know about Input split, input format & output format. Overall Map Reduce Process & different stages to process the data.

  • • Map Reduce Concepts
  • • Mapper
  • • Reducer
  • • Driver

Module 4- Sqoop (Real world datasets and analysis):

  • • What is Sqoop?
  • • Why Sqoop?
  • • Importing and exporting data using Sqoop
  • • Provisioning Hive Metastore
  • • Populating HBase tables
  • • Sqoop Connectors

Module 5- PIG (analytics using Pig) & PIG LATIN:

In this module, will learn about analytics with PIG. About Pig Latin scripting, complex data type, different cases to work with PIG. Execution environment, operation & transformation.

  • • Installing and Running Pig
  • • Grunt
  • • Pig's Data Model
  • • Pig Latin
  • • Developing & Testing Pig Latin Scripts

Module 6- HIVE & HIVEQL:

In this Module we will discuss a data-ware house package which analysis structure data. About Hive installation and loading data. Storing Data in different Table.

  • • Hive Architecture and Installation
  • • Comparison with Traditional Database
  • • HiveQL: Data Types, Operators and Functions
  • • Hive Tables(Managed Tables and External Tables, Partitions and Buckets, Storage)
  • • Formats, (Importing Data, Altering Tables, Dropping Tables)