Invalid Login Attempt

Lab: Leveraging Azure Databricks for Machine Learning

Overview

Spark includes an API named Spark MLLib (often referred to as Spark ML), which you can use to create machine learning solutions. Machine learning is a technique in which you train a predictive model using a large volume of data so that when new data is submitted to the model it can predict unknown values. The most common types of machine learning are supervised learning and unsupervised learning. In a supervised learning scenario, you start with a large volume of data that includes both features (categorical and numeric values that describe characteristics of the entity you’re trying to predict something about) and labels (the value your model will predict. Training the model involves applying a statistical algorithm that fits the features to the labels. Because your initial data includes known values for the labels, you can train the model and test its accuracy with these known label values – giving you confidence that the model will work accurately with new data for which the label values aren’t known. Unsupervised learning is a technique in which there are no known label values, and the model is trained to group (or cluster) similar entities together based on their features.

In this lab, we’ll focus on supervised learning; and specifically a type of machine learning called classification in which you train a model to identify which category, or class an entity belongs to. You will train a classifier to use features of flights that are enroute to an airport, and predict whether they will be late or on-time.


Details
  • Estimated time required to complete: 1 hours, 0 minutes
  • You will have access to this environment for 2 hours, 0 minutes
  • Learning Credits Required: 10
Who this lab is designed for
  • Data Professionals
  • Data Engineers
  • Data Scientists

Learning Objectives

  • Understand creation of an Azure Databricks Workspace and cluster using the Azure Portal
  • Learn to work with Spark MLLib on Azure Databricks

Exercises

Exercise 1: Environment Setup

Accessing and ending the Lab Environment

SkillMeUp Real Time Labs use a virtual machine for all lab exercises. This allows you access to all of the tools and software needed to complete the lab without requiring you to install anything on your local computer.

The virtual machine may take several minutes to fully provision due to software installation and supporting files to copy.

After you have completed all of the lab exercises ensure you click the End Lab button to get access to your certification of completion.

Accessing Microsoft Azure

Launch a browser from the virtual machine and navigate to the URL below. Your Azure Credentials are available by clicking the Cloud Icon at the top of the Lab Player.

https://portal.azure.com

Exercise 2: Deploying the Databricks Environment
In this exercise, you will provision provision a Databricks workspace, an Azure storage account, and a Spark cluster.
Exercise 3: Creating and Testing a Machine Learning Model
In this exercise, you will use your choice of Python or Scala to prepare and explore flight data, before training and testing a classification model. You will train a classifier to use features of flights that are enroute to an airport, and predict whether they will be late or on-time.

Login to Start Lab


Not Registered? Already Registered?
Benefits
Real Time Labs allow you to learn technology in an isolated environment without the hassle or cost of setting up a dedicated learning environment.

How it works