Invalid Login Attempt

Lab: Analyzing Data with Azure Databricks

Overview
In this lab, you will provision how to provision a Databricks workspace, an Azure storage account, and a Spark cluster. You will learn to use the Spark cluster to explore data using Spark Resilient Distributed Datasets (RDDs) and Spark Dataframes.

Details
  • Estimated time required to complete: 1 hours, 15 minutes
  • You will have access to this environment for 7 hours, 0 minutes
  • Learning Credits Required: 10
Who this lab is designed for
  • Data Professionals
  • Data Engineers
  • Data Architects
  • Data Scientists

Learning Objectives

  • Understand creation of an Azure Databricks Workspace and cluster using the Azure Portal
  • Learn to work with Spark SQL on Azure Databricks
  • Learn to work with Spark Resilient Distributed Datasets
  • Learn to work with Spark Dataframes

Exercises

Exercise 1: Environment Setup

Accessing and ending the Lab Environment

SkillMeUp Real Time Labs use a virtual machine for all lab exercises. This allows you access to all of the tools and software needed to complete the lab without requiring you to install anything on your local computer.

The virtual machine may take several minutes to fully provision due to software installation and supporting files to copy.

After you have completed all of the lab exercises ensure you click the End Lab button to get access to your certification of completion.

Accessing Microsoft Azure

Launch a browser from the virtual machine and navigate to the URL below. Your Azure Credentials are available by clicking the Cloud Icon at the top of the Lab Player.

https://portal.azure.com

Exercise 2: Deploying the Databricks Environment
In this exercise, you will provision provision a Databricks workspace, an Azure storage account, and a Spark cluster.
Exercise 3: Exploring Data with Spark Resilient Distributed Datasets (RDDs)
Now that you have provisioned a Spark cluster, you can use it to analyze data. In this exercise, you will use Spark Resilient Distributed Datasets (RDDs) to load and explore data. The RDD-based API is an original component of Spark, and has largely been superseded by a newer Dataframe-based API; however, there are many production systems (and code examples on the Web) that use RDDs, so it’s worth starting your exploration of Spark there.
Exercise 4: Exploring Data Interactively with Dataframes
Spark 2.0 and later provides a schematized object for manipulating and querying data – the DataFrame. This provides a much more intuitive, and better performing, API for working with structured data. In addition to the native Dataframe API, Spark SQL enables you to use SQL semantics to create and query tables based on Dataframes.

Login to Start Lab


Not Registered? Already Registered?
Benefits
Real Time Labs allow you to learn technology in an isolated environment without the hassle or cost of setting up a dedicated learning environment.

How it works