Implementing Structured Streaming with Azure Databricks
1 h 0 m
Lab Overview
Spark structured streaming enables you to use the dataframe API to read and process an unbounded stream of data. This kind of processing is used in real-time scenarios to aggregate data over temporal intervals or windows. You can use Spark to process streaming data from a wide range of sources, including Azure Event Hubs, Kafka, and others. In this lab, you will run a Spark job to continually process a real-time stream of data.
  • Understand creation of an Azure Databricks Workspace and cluster using the Azure Portal
  • Learn to work with Spark Structured Streaming on Azure Databricks

Accessing and ending the Lab Environment

SkillMeUp Real Time Labs use a virtual machine for all lab exercises. This allows you access to all of the tools and software needed to complete the lab without requiring you to install anything on your local computer.

The virtual machine may take several minutes to fully provision due to software installation and supporting files to copy.

After you have completed all of the lab exercises ensure you click the End Lab button to get access to your certification of completion.

Accessing Microsoft Azure

Launch a browser from the virtual machine and navigate to the URL below. Your Azure Credentials are available by clicking the Cloud Icon at the top of the Lab Player.

In this exercise, you will provision provision a Databricks workspace, an Azure storage account, and a Spark cluster.

Task 1: Provision a Databricks Workspace

In this exercise, you will process a stream of data that simulates status information generated by Internet-of-things (IoT) devices. The data will be written to a blob storage container where it can be accessed by your Spark cluster.
Real-Time Lab
Not Registered?
Create Account
Already Registered?
What are Labs?

Labs are where you can get hands on experience from what you have learned from lectures. You get to work in real time in virtual machines at your pace.