Big Data Hadoop Spark Developer Training Course

Learn the key concepts to working efficiently with Big Data.

View Schedule Enquire Now
Course Image

20000+

Trained

35+

Business Agility
consultants

4.7

rating

Overview

The Big Data Hadoop Spark Developer Training course provides you with knowledge on multiple aspects of working with Big Data. You will be introduced to the fundamentals of Hadoop Ecosystem and Spark Ecosystem, and familiarized with key technologies involved in Big Data space.

The course covers managing the large amounts of data stored in a distributed file system. It will train you to execute Spark applications on a Hadoop cluster. This course is designed for you to gain hands-on learning experience so that you can execute faster and more efficient analysis of data in the real world.

This course will help you learn to integrate BI tools with the Big Data platform to do data analytics. You will learn how to do batch data processing using Spark SQL with Python.

Objectives of the course:

  • Gain an overall understanding of key technologies involved in Big Data space and Spark Ecosystem
  • Understand fundamentals of Hadoop and how to do data analysis with Hive
  • Learn how to integrate BI tools with Big Data platform to do data analytics
  • Learn how to do batch data processing using Spark SQL with Python
  • Get a feel of some of the technologies in action with hands-on and real time use cases

Key Features of the course

Globally accredited certification

Get Big Data Hadoop Spark Developer PDF certificate & Digital Badge

Interactive Instructor-led Training

Training sessions that meet the exact needs of every individual

Accredited Course Content And Curriculum

Get access to study material, white paper, mock exam and case studies prepared by the Big Data industry experts

Case Studies Which Are Industry Driven

Includes discussions and exercises derived from real-life instances

Best in world Mentorship

Get trained and mentored by industry experts

Extensive Learning

Learn better with case studies, activities and quizzes

Curriculum

Topics Covered

  • Day 1 Module-1: Introduction to BigData and Hadoop
    • Evolution of Distributed Systems
    • Big Data Overview
    • Big Data Use Cases
    • Hadoop Overview
    • HDFS Overview
    • MapReduce Overview
    • YARN Overview
  • Module-2: Introduction to Hive
    • Introduction to Hive
    • Overview of Hive2
    • Hive Architecture
    • Hive Components
    • Hive Metastore
    • Hive Data Types
    • Hive Data Models
  • Module-3: Data Analysis with Hive
    • Hive Managed Tables
    • External Tables
    • Partitioned Tables
    • Clustered Tables
    • SELECT,FILTERING, JOINS, GROUPING, AGGREGATION
    • Data Analysis with Hive
  • Day 2 Module-4: Introduction to Spark
    • Spark Overview
    • MR vs Spark
    • Spark Modes of Operation
    • Spark Fundamentals, Architecture, Components
    • Spark on YARN
    • Spark Context
    • Job server
    • Spark Programming with Python
    • PySpark shell
  • Module-5: Introduction to Spark Core
    • RDD: The foundation of Spark
    • Creating RDDs from different types of files
    • Creating RDDs from another RDDs
    • RDD operations, Actions and Transformations
    • Different Types of RDDs
    • Joins using RDD
    • RDD Persistence and RDD Partitioning
    • RDD Lineage and DAG
    • Broadcast variables and AccumulatorsConnecting to Different Sources with Spark
    • Connecting to Different Sources with Spark
    • Spark programming with PySpark
  • Day 3 Module-6: Introduction to Spark SQL
    • Spark SQL - Structured Data Processing
    • SQL Context
    • Data Frames in Detail
    • Creating Data Frames
    • Transformations and Actions on Data Frames
    • Various Spark SQL Operations
    • Working with different Data Sources
    • Developing Spark SQL (Data Frames) Applications with PySpark
    • Spark SQL integration with Hive
  • Module-7: Introduction to Spark Streaming
    • Spark Streaming - Real time Data Processing
    • Spark vs Storm
    • Dstreams and Micro Batch
    • Windowing Concept
    • Dstreams Actions and Transformations
    • Window Level Actions and Transformations
    • Structured Streaming APIs
    • Dstreams vs Structured Streaming
    • Stream Processing with Structured Streaming using DataFrames
    • Developing Spark Streaming (Dstreams) Applications with PySpark
  • Module-8: Big Data Lake integration with BI Tools
    • BI tools overview
    • Hadoop/Hive integration with BI tools
    • Spark integration with BI tools
    • Perform data analysis and visualize KPIs using BI tools

Prerequisite

  • Familiarity with Big Data concepts and Hadoop Ecosystem.
  • Good Object oriented Programming knowledge preferably in Java.
  • Good knowledge of Unix commands.
  • Good knowledge of SQL.

Study material:

1. Course Materials are important as they are aligned with the course covered in class and can be easily downloaded from the Big Data Community Platform.

2. A Comprehensive that covers all your doubts and includes a detailed reading list, accessible after course completion through the Learning Plan in the Big Data Community Platform.

Benefits Attendees Get:

  • The Big Data Hadoop Spark Developer Training certificate is included in the price of the training
  • This certification will provide you with proof of participation
  • You will receive a digital badge with the certificate
Read more Read less

What does Xebia provide differently?

Step into the realm of learning for an all-inclusive growth. Xebia is a pioneering IT consultancy and service provider that aims at Enterprise Development, Agile Development, DevOps, and Outsourcing Services.

World-class Training

World-class Training

Xebia Academy offers an intensive learning program and industry-specific training courses. It’s a globally acclaimed APMG International Partner for Big Data & Data Science training and certification courses. ReadmoreReadless

Boon To Career

Boon To Career

Xebia offers excellent consultancy, innovative tools, and continuous career growth. We will train you to become a Big Data and Data Science expert. ReadmoreReadless

Expert Advantage

Expert Advantage

Get trained by our In-House Data Science experts with an average 18 years of experience: Data Science and Big Data Experts with extensive knowledge of data and AI. ReadmoreReadless

Flexible Learning

Flexible Learning

Pick the right course: You can choose a public class at our training centre, or learn with your colleagues in a customized, in-company training program, facilitated on-site at your location, anywhere in the world.ReadmoreReadless

Global Experience

Global Experience

18 years of professional training experience and trusted by over 1,00,000 professionals worldwide. Xebia Academy is the largest producer of Big Data and Data Science certifications globally. ReadmoreReadless

Global Experience

Hands-on And Practical Learning Experience

Our trainers are hands-on practitioners and provide interactive training sessions which let students master required skills in real-world scenarios, giving them an edge in the industry. ReadmoreReadless

Certification Process

  • 01

    Enroll for Big Data Hadoop Spark Developer Course

  • 02

    Attend the twenty four hours of training

  • 03

    Get certified by Xebia Academy Global

Industry Connect

Who should attend this course?

  • Big Data Developers

  • Enterprise Data Warehouse Professionals

  • QA Professionals who want to familiarize themselves with Spark and technologies around it

What skills will you learn in the course?

The Fundamentals

You’ll learn about the key technologies involved in Big Data space and Spark Ecosystem.

Basic Concepts in Spark

You’ll learn about Spark fundamentals, Spark vs MR and Spark Core.

Hadoop and Hive

You’ll learn about Hadoop and how to do data analysis with Hive.

Practical Implementation

You’ll learn about some of the technologies in action with hands-on and real time use cases.

Working with BI

You'll learn to integrate BI tools with the Big Data platform to do data analytics.

Why should you attend this course?

By the end of this course, you’ll acquire an understanding of:

  • Key technologies involved in Big Data space and Spark Ecosystem
  • Understand fundamentals of Hadoop and how to do data analysis with Hive
  • Spark fundamentals, Spark vs MR and Spark Core
  • How to integrate BI tools with Big Data platform to do data analytics
  • How to do stream processing using Spark Streaming with Python

Program Visual Library

FAQs

The Hardware & Network Requirements for this course include:

  • Desktop/Laptop with minimum 8GB RAM (Recommended 16 GB)
  • Open Internet connection (minimum 1 mbps per user)

You need to have:

  • Windows / Linux OS
  • Oracle VirtualBox 6.0 and above
  • Pre-configured image with all required softwares to be shared along with setup instructions before the training for labs

This course is meant for Big Data Developers, Enterprise Data Warehouse Professionals and QA Professionals who wanted to get themselves familiarized with Spark and technologies around it.

The Big Data Hadoop Spark Developer Training Course is twenty four hours.

There are no prerequisites required for this course. But a familiarity with Big Data concepts and Hadoop Ecosystem, Object oriented Programming knowledge (preferably in Java), knowledge of Unix commands and SQL is recommended.

To enroll for the course, you have to register at the Xebia Academy Global website. After registering for the Big Data Hadoop Spark Developer training, you will receive a confirmation email with practical information.

The study material provided by Xebia Academy Global is comprehensive, up-to-date, and extremely helpful in your training.

Library Image

Repositories of trending knowledge

Knowledge sources from Xebians to enlighten learners

View More
  • Library Image
  • Library Image
  • Library Image
  • Library Image

Stay updated about the latest courses

Register now to receive notifications of upcoming trainings and latest courses.