Announcing Skill Me Up Live! Sign up today and save 60% on your first month using offer code LIVETRAINING at checkout.
IL - Apache Spark Programming with Azure Databricks
Instructor-Led Training
Intermediate
3 Days
Onsite or Virtual
Course Overview
In this course, you will explore the Spark Internals and Architecture of Azure Databricks. The course will start with a brief introduction to Scala. Using the Scala programming language, you will be introduced to the core functionalities and use cases of Azure Databricks including Spark SQL, Spark Streaming, MLlib, and GraphFrames.
Objectives
  • Understand the Azure Databricks architecture
  • Understand the Apache Spark internals
  • Manipulating data using Spark APIs
  • Working with large data sets and querying data with Spark SQL
  • Building structured streaming jobs
  • Implementing machine learning pipelines with the MLlib API
  • Processing data using the GraphFrames API
Pre-Requisites
  • Familiarity with cloud computing concepts
  • Familiarity with Azure
  • Familiarity with SQL
  • Background in programming

In this module, you will learn an overview of Azure Databricks and Spark and where Azure Databricks fits in the big data landscape in Azure. Key features of Azure Databricks such as Workspaces and Notebooks will be covered. You will also learn the basic architecture of Spark and cover basic Spark internals including core APIs, job scheduling and execution. This module will prepare developers and administrators for more advanced work in Azure Databricks such as Python or Scala development.

This session will introduce students to the Scala programming language. We will look at basic Scala syntax including variables, types, control flow, functions, scoping, inference, imports, and object-oriented programming.
In this session, students will learn the basics of Spark and Spark programming. We will cover the DataFrames and Datasets API, processing data with Spark SQL, and working with the Functions API. Students will also look at basic Spark programming concepts and techniques such as aggregation, column operations, joins and broadcasting, user defined functions, caching and performance analysis.
In this session, students will learn about Spark internals. We will look at the Spark cluster architecture covering topics such as job and task executing and scheduling, shuffling and the Catalyst optimizer.
In this session, students will be introduced to Spark Structured Streaming. Students will learn about data sources and data sinks and working with the Structured Streaming APIs. Students will look at stream processing techniques such as windowing and aggregation functions, checkpointing and watermarking and their use in stream processing jobs. Finally, students will investigate fault tolerance in stream processing jobs.
In this session, students will learn how to use the Spark Machine Learning Libraries to build machine learning pipelines. Students will be introduced to machine learning pipeline concepts such as Transformer and Estimator. They will learn to perform feature processing and how to evaluate and apply machine learning models.
In this session, attendees will learn to leverage the GraphFrames API for graph processing. Topics will include transforming data frames into a graph and performing graph analysis including page rank, shortest path, connected components, and label propagation.
Dedicated Training
Contact Us Today

Dedicated instructor-led training is designed for group training and is delivered by the experts at Opsgility. Delivery availability is anywhere in the world at your location or using advanced virtual training software.

Benefits
  • Standard or Customized Curriculum
  • Globally Available for Delivery
  • Holistic Learning Plans are Available
  • Industry Recognized Subject Matter Experts