Big Data Overview
=====
—–
* Course Id : BIGD-OVIW
* Duration : 16 Hours
Overview
—–
* This course if for those new to Big Data and data science
* It helps participants understand why the Big Data Era has come to be
* It is meant for those who want to become familiar with the terminology and the core concepts behind big data problems, applications, and systems.
* This course is useful for those who may need to learn how to process Big Data in their business or career.
* The course introduces one of the most common frameworks, Hadoop, that has made big data analysis easier and more accessible, along with current alternative solution approaches
Pre-Requisites
—–
All attendees should be familiar with :
* Linux environment
* No prior programming experience is needed
* Application installation
* Working with a virtual machine to complete the hands-on assignments
Objectives
—–
All attendees will be able to :
* Describe the Big Data landscape with examples of real world big data problems
* Identify the three key sources of Big Data: people, organizations, and sensors
* Explain the 6 V’s of Big Data (volume, velocity, variety, veracity, valence, and value)
* Understand how each impacts data collection, monitoring, storage, analysis and reporting
* Get value out of Big Data with a well defined process to structure the analysis
* Distinguish big data problems
* Generate data science questions from one or more big data problems
* Be able to explain the architectural components and programming models used for scalable big data analysis
* Describe the core Hadoop stack components including YARN, HDFS and MapReduce
Course Structure
—–
* We provide more focus on hands-on in our technical courses (typically 80% hands-on/20% theory)
* Students get the capability to apply the material they learn to real-world problems
Materials Provided
—–
* PDF of slides and hands-on exercises
* Access to instance with pre-configured lab environment
Software Requirements
—–
Any of the following
* Any current internet browser
* vnc client
* rdp client
Hardware Requirements
—–
* Processor: 1.2 GHz+
* RAM: 512 MB+
* Disk space: 1 GB+
* Network Connection with low latency (<250ms) to Internet
## Daywise Course Outline For Big Data Overview
—–
## Day 1
—–
* Unit 1 : Introduction to Big Data
* Unit 2 : Big Data – Why And Where
* Unit 3 : Big Data Characteristics – the 6 V’s
* Unit 4 : Working with Big Data
## Day 2
—–
* Unit 5 : Current Solution Landscape
* Unit 6 : Real-world case studies in depth
## Detailed Outline For Big Data Overview
—–
Unit 1 : Introduction to Big Data
—–
* What is Big Data?
* Big Data Customer Scenarios
* Limitations of Existing Data Analytics Architecture
* What is Hadoop?
* Key Characteristics of Hadoop
* Hadoop Core Components
Unit 2 : Big Data – Why And Where
——
* Sources of Big Data
* What launched the Big Data era?
* Applications: What makes big data valuable
* Example: Saving lives with Big Data
* Example: Using Big Data to Convert Records To Digital Format
* Where Does Big Data Come From?
* Machine-Generated Data: Sources and Advantages
* People-Generated Big Data: Unstructured
* Organization-Generated Data: Structured in Silos
* Integrating Diverse Data
Unit 3 : Big Data Characteristics – the 6 V’s
——
* Characteristics Of Big Data
* Volume
* Variety
* Velocity
* Veracity
* Valence
* Value
Unit 4 : Working with Big Data
——
Data Science: Getting Value out of Big Data
Building a Big Data Strategy
Components of Data Science
Steps in the Data Science Process
Step 1: Acquiring Data
Step 2: Exploring Data
Step 3: Pre-Processing Data
Step 4: Analyzing Data
Step 5: Communicating Results
Step 6: Turning Insights into Action
Unit 5 : Current Solutions Landscape
—
Hadoop Foundations
Distributed File Systems
Scalable Computing over the Internet
Cloud Computing
Programming Models for Big Data
Unit 6 : Real-world case studies
—
