skillzpot logo

Learn By Example: Hadoop, MapReduce for Big Data problems

Loony Corn
Filmed by


This course is a zoom-in, zoom-out, hands-on workout involving Hadoop, MapReduce and the art of thinking parallel. Zoom-in, Zoom-Out: This course is both broad and deep. It covers the individual components of Hadoop in great detail, and also gives you a higher level picture of how they interact with each other. Hands-on workout involving Hadoop, MapReduce : This course will get you hands-on with Hadoop very early on. You'll learn how to set up your own cluster using both VMs and the Cloud. All the major features of MapReduce are covered - including advanced topics like Total Sort and Secondary Sort. The art of thinking parallel: MapReduce completely changed the way people thought about processing Big Data. Breaking down any problem into parallelizable units is an art. The examples in this course will train you to think parallel.

Is this course for me?

Yep! Analysts who want to leverage the power of HDFS where traditional databases don't cut it anymore. Yep! Engineers who want to develop complex distributed computing applications to process lot's of data. ep! Data Scientists who want to add MapReduce to their bag of tricks for processing data.

What will I gain from this course?

Develop advanced MapReduce applications to process BigData. Master the art of thinking parallel - how to break up a task into Map/Reduce transformations. Self-sufficiently set up their own mini-Hadoop cluster whether it's a single node, a physical cluster or in the cloud. Use Hadoop and MapReduce to solve a wide variety of problems : from NLP to Inverted Indices to Recommendations. Understand HDFS, MapReduce and YARN and how they interact with each other. Understand the basics of performance tuning and managing your own cluster.

How do I prepare before taking this course? Is there a prerequisite skill set?

You'll need an IDE where you can write Java code or open the source code that's shared. IntelliJ and Eclipse are both great options. You'll need some background in Object-Oriented Programming, preferably in Java. All the source code is in Java and we dive right in without going into Objects, Classes etc. A bit of exposure to Linux/Unix shells would be helpful, but it won't be a blocker.

  • Lesson 1

    Why is Big Data a Big Deal - The Big Data Paradigm

  • Lesson 2

    Serial vs Distributed Computing

  • Lesson 3

    What is Hadoop?

  • Lesson 4

    HDFS or the Hadoop Distributed File System

  • Lesson 5

    MapReduce Introduced

  • Lesson 6

    YARN or Yet Another Resource Negotiator

  • Lesson 7

    Installing Hadoop in Local environment - Hadoop Install Mode

  • Lesson 8

    Hadoop Standalone mode Install

  • Lesson 9

    Hadoop Pseudo-Distributed mode Install

  • Lesson 10

    The MapReduce Hello World - The basic philosophy underlying MapReduce

  • Lesson 11

    MapReduce - Visualized And Explained

  • Lesson 12

    MapReduce - Digging a little deeper at every step

  • Lesson 13

    Hello World in MapReduce

  • Lesson 14

    The Mapper

  • Lesson 15

    The Reducer

  • Lesson 16

    The Job

  • Lesson 17

    Run a MapReduce Job - Get comfortable with HDFS

  • Lesson 18

    Run your first MapReduce Job

  • Lesson 19

    Juicing your MapReduce - Combiners, Shuffle and Sort and The Streaming API - Use your Combiner

  • Lesson 20

    Not all Reducers are Combiners

  • Lesson 21

    How many mappers and reducers does your MapReduce have?

  • Lesson 22

    Parallelizing reduce using Shuffle And Sort

  • Lesson 23

    MapReduce is not limited to the Java language - Introducing the Streaming API

  • Lesson 24

    Python for MapReduce

  • Lesson 25

    HDFS and Yarn - HDFS - Protecting against data loss using replication

  • Lesson 26

    HDFS - Name nodes and why they're critical

  • Lesson 27

    HDFS - Checkpointing to backup name node information

  • Lesson 28

    Yarn - Basic components

  • Lesson 29

    Yarn - Submitting a job to Yarn

  • Lesson 30

    Yarn - Plug in scheduling policies

  • Lesson 31

    Yarn - Configure the scheduler

  • Lesson 32

    Setting up your MapReduce to accept command line arguments

  • Lesson 33

    The Tool, ToolRunner and GenericOptionsParser

  • Lesson 34

    Configuring properties of the Job object

  • Lesson 35

    Customizing the Partitioner, Sort Comparator, and Group Comparator

  • Lesson 36

    The heart of search engines - The Inverted Index

  • Lesson 37

    Generating the inverted index using MapReduce

  • Lesson 38

    Custom data types for keys - The Writable Interface

  • Lesson 39

    Represent a Bigram using a WritableComparable

  • Lesson 40

    MapReduce to count the Bigrams in input text

  • Lesson 41

    Test your MapReduce job using MRUnit

  • Lesson 42

    Introducing the File Input Format

  • Lesson 43

    Text And Sequence File Formats

  • Lesson 44

    Data partitioning using a custom partitioner

  • Lesson 45

    Make the custom partitioner real in code

  • Lesson 46

    Total Order Partitioning

  • Lesson 47

    Input Sampling, Distribution, Partitioning and configuring these

  • Lesson 48

    Secondary Sort

  • Lesson 49

    Introduction to Collaborative Filtering

  • Lesson 50

    Friend recommendations using chained MR jobs

  • Lesson 51

    Get common friends for every pair of users - the first MapReduce

  • Lesson 52

    Top 10 friend recommendation for every user - the second MapReduce

  • Lesson 53

    Hadoop as a Database - Structured data in Hadoop

  • Lesson 54

    Running an SQL Select with MapReduce

  • Lesson 55

    Running an SQL Group By with MapReduce

  • Lesson 56

    A MapReduce Join - The Map Side

  • Lesson 57

    A MapReduce Join - The Reduce Side

  • Lesson 58

    A MapReduce Join - Sorting and Partitioning

  • Lesson 59

    A MapReduce Join - Putting it all together

  • Lesson 60

    What is K-Means Clustering?

  • Lesson 61

    A MapReduce job for K-Means Clustering

  • Lesson 62

    K-Means Clustering - Measuring the distance between points

  • Lesson 63

    K-Means Clustering - Custom Writables for Input/Output

  • Lesson 64

    K-Means Clustering - Configuring the Job

  • Lesson 65

    K-Means Clustering - The Mapper and Reducer

  • Lesson 66

    K-Means Clustering : The Iterative MapReduce Job

  • Lesson 67

    Manually configuring a Hadoop cluster (Linux VMs)

  • Lesson 68

    Getting started with Amazon Web Servicies

  • Lesson 69

    Start a Hadoop Cluster with Cloudera Manager on AWS

  • Lesson 70

    Setup a Virtual Linux Instance (For Windows users)

profile pic

Loony Corn

Loonycorn is us, Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh. Between the four of us, we have studied at Stanford, IIM Ahmedabad, the IITs and have spent years (decades, actually) working in tech, in the Bay Area, New York, Singapore and Bangalore. Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft. Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too. Swetha: Early Flipkart employee, IIM Ahmedabad and IIT Madras alum. Navdeep: longtime Flipkart employee too, and IIT Guwahati alum. We hope you will try our offerings, and think you'll like them.

you might also like