Learn your way! Get started

Hadoop, Part 2: ETL and MapReduce

with expert Kevin McCarty


Watch trailer

Hadoop, Part 2: ETL and MapReduce   Trailer

Course at a glance

Included in these subscriptions:

  • Dev & IT Pro Video
  • Dev & IT Pro Power Pack
  • Power Pack Plus

Release date Release date 7/1/2016
Level Level Intermediate
Runtime Runtime 1h 40m
Closed captioning Closed captioning Included
Transcript Transcript Included
eBooks / courseware eBooks / courseware N/A
Hands-on labs Hands-on labs N/A
Sample code Sample code Included
Exams Exams Included


Enterprise Solutions
Enterprise Solutions

Need reporting, custom learning tracks, or SCORM? Learn More



Course description

In this course, Hadoop expert Kevin McCarty takes a closer look at some of the major components underpinning Hadoop – services such as Mahout, Oozie, and ZooKeeper, and languages such as Pig and Hive. He will examine the Hadoop architecture and look at some ETL tools Hadoop provides for moving data between a Hadoop cluster and external servers. Finally, McCarty will demonstrate a simple application in Java and follow that up with a deep dive into MapReduce including a look at automation using the Linux Chron Utility

Prerequisites

This course assumes that students have some programming background and some familiarity with a Unix-based operating system. No specific experience with Java programming language or Hadoop is required. As with any such course, the more experience you bring to the course, the more you’ll get out of it. This course moves quickly through a broad range of topics, but it does not require any prior experience with Hadoop. The course does assume that you are well familiarized with how to use the version of Windows that you are running. For example, the course might say simply “Open PuTTY” without explaining how to do that. You should also be able to navigate the folder hierarchy using Windows Explorer.

Learning Paths

This course is part of the following LearnNowOnline SuccessPaths™:
Hadoop

Meet the expert

Kevin McCarty Kevin McCarty is a computer professional with over 30 years of experience in the industry as a programmer, project manager, database administrator, architect, and data scientist. He is a Microsoft Certified Trainer with over 25 individual certifications in programming and database technologies and serves as the chapter leader of the Boise SQL Server Users Group. A former Army officer and Eagle Scout, he holds a doctorate in Computer Science and a lifelong love of learning.


Course outline



ETL and MapReduce

Big Data Sources And ETL (19:11)
  • Introduction (00:28)
  • Where Do You Find Big Data? (00:46)
  • Big Data Sources - Volume (01:02)
  • Big Data Sources - Variety (03:02)
  • Structured Data (00:43)
  • Semi-Structured (00:26)
  • Unstructured Data (00:24)
  • Problems with Big Data (00:32)
  • Data Integrity (02:21)
  • Data Completeness (00:47)
  • Data Format (01:22)
  • Data Timeliness (00:57)
  • How Do We Process Big Data? (01:08)
  • What Is ETL? - Extraction (00:43)
  • What Is ETL? - Transform (02:48)
  • What Is ETL? - Load (01:08)
  • Summary (00:26)
ETL Demonstration (15:15)
  • Introduction (00:30)
  • In This Exercise... (00:09)
  • Demo: Sqoop (04:43)
  • Demo: Working with Tables (04:56)
  • Demo: ETL (04:32)
  • Summary (00:23)
Understanding MapReduce (16:55)
  • Introduction (00:24)
  • What Is MapReduce? (00:51)
  • History of MapReduce (04:42)
  • MapReduce - Benefits (01:43)
  • MapReduce - Limitations (02:25)
  • Demo: MapReduce (04:33)
  • Demo: Create a Jar File (01:48)
  • Summary (00:25)
MapReduce Demonstration (09:54)
  • Introduction (00:28)
  • Demo: MapReduce Setup (04:08)
  • Demo: Word Count Program (04:47)
  • Summary (00:29)
Developing MapReduce (28:56)
  • Introduction (00:25)
  • Language Support (00:56)
  • How Streaming Works (01:02)
  • Creating a MapReduce Application (00:35)
  • MapReduce - Execution (01:52)
  • MapReduce - Main (01:05)
  • MapReduce - The Mapper (00:42)
  • MapReduce - The Reducer (01:26)
  • Demo: Create Java File (06:02)
  • Demo: MapReduce (05:06)
  • Demo: Map Method (03:11)
  • Demo: Reduce Function (05:57)
  • Summary (00:30)
Schedule MapReduce (10:05)
  • Introduction (00:29)
  • Ad-Hoc vs. Scheduling (01:44)
  • Cron Jobs (01:00)
  • Cron Tables (00:31)
  • Creating a Cron Job (01:08)
  • Example Cron Job Text (00:37)
  • Demo: Cron Scheduling (04:09)
  • Summary (00:23)