Analyzing Data with Azure Databricks
Lab
Beginner
1 h 15 m
2019-01-04
Lab Overview
In this lab, you will provision how to provision a Databricks workspace, an Azure storage account, and a Spark cluster. You will learn to use the Spark cluster to explore data using Spark Resilient Distributed Datasets (RDDs) and Spark Dataframes.
Objectives
  • Understand creation of an Azure Databricks Workspace and cluster using the Azure Portal
  • Learn to work with Spark SQL on Azure Databricks
  • Learn to work with Spark Resilient Distributed Datasets
  • Learn to work with Spark Dataframes
Exercises

Accessing and ending the Lab Environment

SkillMeUp Real Time Labs use a virtual machine for all lab exercises. This allows you access to all of the tools and software needed to complete the lab without requiring you to install anything on your local computer.

The virtual machine may take several minutes to fully provision due to software installation and supporting files to copy.

After you have completed all of the lab exercises ensure you click the End Lab button to get access to your certification of completion.

Accessing Microsoft Azure

Launch a browser from the virtual machine and navigate to the URL below. Your Azure Credentials are available by clicking the Cloud Icon at the top of the Lab Player.

https://portal.azure.com

In this exercise, you will provision provision a Databricks workspace, an Azure storage account, and a Spark cluster.
Now that you have provisioned a Spark cluster, you can use it to analyze data. In this exercise, you will use Spark Resilient Distributed Datasets (RDDs) to load and explore data. The RDD-based API is an original component of Spark, and has largely been superseded by a newer Dataframe-based API; however, there are many production systems (and code examples on the Web) that use RDDs, so it’s worth starting your exploration of Spark there.
Spark 2.0 and later provides a schematized object for manipulating and querying data – the DataFrame. This provides a much more intuitive, and better performing, API for working with structured data. In addition to the native Dataframe API, Spark SQL enables you to use SQL semantics to create and query tables based on Dataframes.
Real-Time Lab
Not Registered?
Create Account
Already Registered?
Login
What are Labs?

Labs are where you can get hands on experience from what you have learned from lectures. You get to work in real time in virtual machines at your pace.