Copy Data to Azure Data Lake Store Gen2 Using DistCp
1 h 45 m
In this lab, you will create an Azure Data Lake Store Gen2 account. You will learn to lock down and manage access of the Data Lake Store, taking advantage of both role-based access control and Data Lake Store Azure AD integration. Finally, you will process a bulk ingest using Hadoop distcp utility.
- Become familiar with the practical usage of Data Lake Store Gen2
- Learn the basics of AzCopy
- Learn the basics of DistCP on HDInsight
- Understand the performance characteristics of DistCp vs AzCopy
In this exercise, you will create an Azure Storage account with hierarchical file system support, also known as Azure Data Lake Gen2. You will then configure a file system for the account.
In this exercise, you will copy a large dataset from a publicly accessible storage account to your private storage account using AzCopy running from the Azure Cloud Shell.
You can use DistCp to copy data between a general purpose V2 storage account and a general purpose V2 storage account with hierarchical namespace (Data Lake Gen2) enabled. In this exercise you will step through the basic use of the DistCp command and learn how to configure it for optimal performance.