Big Data Processing in Azure for Architects
Lecture
Paul Burpo
Intermediate
2 h 2 m
2017-04-17
Lecture Overview
This module will cover all aspects of big data storage and batch processing. We will start by making the case for big data in Azure. Then we will look at Azure service topics to include Blob Storage, Azure Data Lake Store, Azure Data Lake Analytics, and HDInsight clusters running Hadoop, Hive, Interactive Hive (LLAP) and Spark. Storage topics will focus on choosing the right storage, configuring storage and storage optimization. We will also cover Big Data scenarios including batch processing, interactive clusters, multi-cluster deployments and on-demand clusters.
Objectives
  • Understand the advantages of running a big data solution in the cloud
  • Understand the pros and cons of Azure Data Lake Store vs Azure Blob Storage for big data storage
  • Know how to architect a big data storage solution in Azure
  • Know how to choose the right big data processing solution based on workload and the advantages of each
  • Understand the considerations for different big data scenarios in Azure (cluster on-demand vs persistent cluster)
Lecture Modules

This module will cover all aspects of big data storage and batch processing. We will start by making the case for big data in Azure. Then we will look at Azure service topics to include Blob Storage, Azure Data Lake Store, Azure Data Lake Analytics, and HDInsight clusters running Hadoop, Hive, Interactive Hive (LLAP) and Spark. Storage topics will focus on choosing the right storage, configuring storage and storage optimization. We will also cover Big Data scenarios including batch processing, interactive clusters, multi-cluster deployments and on-demand clusters.

In this lab, you will process and query data using Azure Data Lake. You will deploy Azure Data Lake Analytics and Azure Data Lake Store. You will then transfer a data file from your lab machine to the Azure Data Lake Store. You will process the data in the store and save the results back out to a file. You will then run a federated query against the curated data and join it to a table stored in an Azure SQL Database. Finally, we will deploy an HDInsight cluster to query the same data stored in the Data Lake Store using Hive.

In this lab, you will build an HDInisght Spark cluster to support an interactive business intelligence workloads. You will ´╗┐leverage Visual Studio 2017 to customize an ARM template to automate your deployment. You will import data and process it in Spark using Jupyter notebooks and then convert your data into a Hive table. You will then connect to your Spark cluster and query the Hive table to build a report with Power BI.

Try Risk Free
Start a free trial

Skill Me Up subscriptions include unlimited access to on-demand courses with live lab lab environments with our Real Time Labs feature for hands-on lab access.

Subscription Benefits
  • Access to Real Time Lab environments and lab guides
  • Course Completion Certificates when you pass assessments
  • MUCH MORE!