Apache Spark and Scala Training Course

From the basics of Big Data to proficiency in Apache Spark using Scala.

View Schedule Enquire Now
Course Image




Business Agility





Apache Spark is a useful project in the Hadoop Ecosystem as it can be run with YARN to work with massive data sets in a distributed manner. It has gained traction in recent years as it is open-source and runs on in-memory, thereby reducing the latency associated with Disk I/O. Its easy integration with Hadoop makes it a popular choice, along with the ease-of-use and flexibility it promises.

With Apache Spark and Scala Training Course, you will gain an access to understanding Big Data, its components and the frameworks. The Spark Online Course also introduces you to Hadoop Cluster architecture and its modes. The course begins with an overview of the fundamentals of Apache Spark and the programming language Scala and trains you to master the concepts and uses of these in less than a week.

The Spark Training course will help you master the concepts of Apache Spark Framework and development, and provide you with comprehensive knowledge on Scala programming language.

Objectives of the course:

  • Get an overall understanding of fundamentals and internals of Spark Ecosystem
  • Get understanding on Scala language fundamentals, constructs, features and libraries
  • Get knowledge on Spark Core, Spark SQL and Spark Streaming integration with Kafka
  • Get a feel of some of the technologies in action with hands-on and real time use cases
  • Troubleshoot and fine tune Spark components and learn usage patterns and best practices

Key Features of the course

Globally accredited certification

Get Apache Spark and Scala PDF certificate & Digital Badge

Interactive Instructor-led Training

Training sessions that meet the exact needs of every individual

Accredited Course Content And Curriculum

Get access to study material, white paper, mock exam and case studies prepared by the Agile industry experts, supporting your end-to-end CSM® journey

Case Studies Which Are Industry Driven

Includes discussions and exercises derived from real-life instances

Best in world Mentorship

Get trained and mentored by industry experts

Extensive Learning

Learn better with case studies, activities and quizzes


Topics Covered

  • Day 1 - Module-1: Introduction to BigData and Spark
    • Evolution of Distributed Systems
    • BigData Overview
    • Hadoop Overview and Recap
    • Spark Overview
    • MR vs Spark
    • Spark Features
    • Benefits and Limitations
  • Module-2: Introduction to Scala
    • Scala Overview
    • Scala Characteristics
    • Environment Setup
    • Scala REPL
    • Basic Syntax
    • Data Types
    • Type Inference
    • Variables and Constants
    • Operators
    • Control Statements, Loops
    • Functions, Closures
    • Strings, Arrays
  • Day 2 - Module-3: Scala Programming
    • Collections API - Sets, Maps, Lists, Tuples
    • Objects with Scala
      • Classes, fields and methods
      • Singleton objects
      • Case classes
      • Companion Objects
      • Inheritance from Classes
      • Abstract Classes
      • Traits
    • Packages and Imports
    • Pattern Matching
    • Extractors
    • Exception Handling
    • Parallel and Concurrent Programming
  • Day 3 - Module-4: Spark Setup and Fundamentals
    • Spark Installation and Modes of Operation
    • Spark Fundamentals, Architecture, Components
    • Spark on YARN
    • Spark on Mesos
    • Spark Context
    • Spark Shell
    • Job Server
  • Module-5: Spark Core
    • RDD: The foundation of Spark
    • Creating RDDs from different types of files
    • Creating RDDs from another RDDs
    • RDD operations, Actions and Transformations
    • Different Types of RDDs
    • Joins using RDD
    • RDD Persistence and RDD Partitioning
    • RDD Lineage and DAG
    • Broadcast variables and Accumulators
    • Optimizations in Spark operations and shuffling
    • Connecting to Different Sources with Spark
  • Day 4 - Module-6: Introduction to Spark SQL
    • Spark SQL and SQL Context
    • Data Frames in Detail
    • Creating Data Frames
    • Transformations and Actions on Data Frames
    • Transformations and Actions on Data Frames
    • Various Spark SQL Operations
    • Data Set API
    • Data Frame vs Data Set
    • Data Sources
    • Spark Schedulers
    • Developing Spark SQL Applications
  • Day 5 - Module-7: Introduction to Spark Streaming
    • Spark Streaming - Real time Data Processing
    • Spark vs Storm
    • Dstreams and Micro Batch
    • Windowing Concept
    • Dstreams Actions and Transformations
    • Window Level Actions and Transformations
    • Structured Streaming Overview
    • Developing Spark Streaming Applications
  • Module-8: Introduction to Kafka
    • Overview of Kafka
    • Kafka Architecture
    • Kafka Setup and Configuration
    • Kafka Components
    • Stream Processing with Kafka and Spark Streaming


  • Familiarity with Big Data concepts and Hadoop Ecosystem
  • Good Object oriented Programming knowledge preferably in Java
  • Good knowledge of Unix commands
  • Good knowledge of SQL

Study material:

1. Course Materials are important as they are aligned with the course covered in class and can be easily downloaded from the Big Data Community Platform.

2. A Comprehensive Guide that covers all your doubts and includes a detailed reading list, accessible after course completion through the Learning Plan in the Big Data Community Platform.

Benefits attendees get:

  • The Apache Spark and Scala Training certificate is included in the price of the training.
  • This certification will provide you with proof of participation.
  • You will receive a digital badge with the certificate.
Read more Read less

What does Xebia provide differently?

Step into the realm of learning for an all-inclusive growth. Xebia is a pioneering IT consultancy and service provider that aims at Enterprise Development, Agile Development, DevOps, and Outsourcing Services.

World-class Training

World-class Training

Xebia Academy offers an intensive learning program and industry-specific training courses. It’s a globally acclaimed APMG International Partner for Big Data & Data Science training and certification courses. ReadmoreReadless

Boon To Career

Boon To Career

Xebia offers excellent consultancy, innovative tools, and continuous career growth. We will train you to become a Big Data and Data Science expert. ReadmoreReadless

Expert Advantage

Expert Advantage

Get trained by our In-House Data Science experts with an average 18 years of experience: Data Science and Big Data Experts with extensive knowledge of data and AI. ReadmoreReadless

Flexible Learning

Flexible Learning

Pick the right course: You can choose a public class at our training centre, or learn with your colleagues in a customized, in-company training program, facilitated on-site at your location, anywhere in the world.ReadmoreReadless

Global Experience

Global Experience

18 years of professional training experience and trusted by over 1,00,000 professionals worldwide. Xebia Academy is the largest producer of Big Data and Data Science certifications globally. ReadmoreReadless

Global Experience

Hands-on And Practical Learning Experience

Our trainers are hands-on practitioners and provide interactive training sessions which let students master required skills in real-world scenarios, giving them an edge in the industry. ReadmoreReadless

Certification Process

  • 01

    Enroll for Data Visualization With Tableau Course

  • 02

    Attend the five days of training

  • 03

    Get certified by Xebia Academy Global

Industry Connect

Who should attend this course?

  • Big Data Developers

  • Enterprise Data Warehouse Professionals

  • QA Professionals who want to familiarize themselves with Spark and technologies around it

What skills will you learn in the course?

The Fundamentals

You’ll learn the fundamentals and internals of Spark Ecosystem.

Basic Concepts in Scala

You’ll learn Scala language fundamentals, constructs, features and libraries.

Integration with Kafka

You’ll learn about Spark Core, Spark SQL and Spark Streaming integration with Kafka.

Practical Implementation

You’ll learn about some of the technologies in action with hands-on and real time use cases.

Usage and Best Practices

You'll learn about Spark components, and understand its usage patterns and best practices.

Why should you attend this course?

By the end of this course, you’ll acquire an understanding of:

  • Fundamentals and internals of Spark Ecosystem
  • Scala language fundamentals, constructs, features and libraries
  • Spark Core, Spark SQL and Spark Streaming integration with Kafka
  • The technologies in action with hands-on and real time use cases
  • Spark components, and usage patterns and best practices

Program Visual Library


The Hardware & Network Requirements for this course include:

  • Desktop/Laptop with minimum 8GB RAM (Recommended 16 GB)
  • Open Internet connection (minimum 1 mbps per user)

You need to have:

  • Windows / Linux OS
  • Oracle VirtualBox 6.0 and above
  • Oracle VirtualBox 6.0 and above
  • Pre-configured image with all required softwares to be shared along with setup instructions before the training for labs.

This course is meant for Big Data Developers, Enterprise Data Warehouse Professionals and QA Professionals who wanted to get themselves familiarized with Spark and technologies around it.

The Apache Spark and Scala Training Course is of five days.

There are no prerequisites required for this course. But a familiarity with Big Data concepts and Hadoop Ecosystem, Object oriented Programming knowledge (preferably in Java), knowledge of Unix commands and SQL is recommended.

To enroll for the course, you have to register at the Xebia Academy Global website. After registering for the Apache Spark and Scala training, you will receive a confirmation email with practical information.

The study material provided by Xebia Academy Global is comprehensive, up-to-date, and extremely helpful in your training.

Stay updated about the latest courses

Register now to receive notifications of upcoming trainings and latest courses.