Leveraging Azure Databricks for Machine Learning
1 h 0 m
Lab Overview
Spark includes an API named Spark MLLib (often referred to as Spark ML), which you can use to create machine learning solutions. Machine learning is a technique in which you train a predictive model using a large volume of data so that when new data is submitted to the model it can predict unknown values. The most common types of machine learning are supervised learning and unsupervised learning. In a supervised learning scenario, you start with a large volume of data that includes both features (categorical and numeric values that describe characteristics of the entity you’re trying to predict something about) and labels (the value your model will predict. Training the model involves applying a statistical algorithm that fits the features to the labels. Because your initial data includes known values for the labels, you can train the model and test its accuracy with these known label values – giving you confidence that the model will work accurately with new data for which the label values aren’t known. Unsupervised learning is a technique in which there are no known label values, and the model is trained to group (or cluster) similar entities together based on their features.In this lab, we’ll focus on supervised learning; and specifically a type of machine learning called classification in which you train a model to identify which category, or class an entity belongs to. You will train a classifier to use features of flights that are enroute to an airport, and predict whether they will be late or on-time.

Related Learning Path(s):
Implementing Azure DataBricks
  • Understand creation of an Azure Databricks Workspace and cluster using the Azure Portal
  • Learn to work with Spark MLLib on Azure Databricks

Accessing and ending the Lab Environment

SkillMeUp Real Time Labs use a virtual machine for all lab exercises. This allows you access to all of the tools and software needed to complete the lab without requiring you to install anything on your local computer.

The virtual machine may take several minutes to fully provision due to software installation and supporting files to copy.

After you have completed all of the lab exercises ensure you click the End Lab button to get access to your certification of completion.

Accessing Microsoft Azure

Launch a browser from the virtual machine and navigate to the URL below. Your Azure Credentials are available by clicking the Cloud Icon at the top of the Lab Player.


In this exercise, you will provision provision a Databricks workspace, an Azure storage account, and a Spark cluster.
In this exercise, you will use your choice of Python or Scala to prepare and explore flight data, before training and testing a classification model. You will train a classifier to use features of flights that are enroute to an airport, and predict whether they will be late or on-time.
Real-Time Lab
Not Registered?
Create Account
Already Registered?
What are Labs?

Labs provide a live environment to get hands-on experience using the same tools and services in the real world.

Learn More