Big Data Programming

To stay competitive a business needs to know as much as it can about people, the environment it's operating in, and who and where the competitors are. The amount of data companies collect keeps growing. There is an urgent need of a strategy to make sense of it all. Star Big Data Programming is a certification course that will help learners master the skills they need to establish a successful career as a data engineer. The program will help the learners master the skills on HDFS, MapReduce, HBase, Hive, Pig, Yarn, Oozie, Flume and Sqoop using real-time use cases from retail, social media, aviation, tourism, and finance industries. It equips the learners with in-depth knowledge of writing code using the MapReduce framework and managing large data sets with HBase.


  • Intermediate

Big Data Programming Course Objectives

In this course, you will learn about:

  • Big data and its business applications
  • Apache Hadoop and its big data eco-system
  • Deploying Hadoop in a clustered environment
  • Interacting with No-SQL databases
  • Managing key Hadoop components (HDFS, YARN and Hive)
  • Spark - the next-generation computational framework
  • Installing and working with Hadoop
  • Hadoop related technologies – Avro, Flume, Sqoop, Pig, Oozie, etc
  • Advanced topics like Hadoop security, Cloudera, IBM InfoSphere and more

Course Outcome

After competing this course, you will be able to:

  • Understand the finer nuances of the Big Data technology
  • Deal with Big Data related tools, platforms, and their architecture to store, program, process, and manage the data
  • Deploy Hadoop and its related technologies
  • Use the Hadoop ecosystem to manage your data
  • Deploy machine learning concepts with Mahout

Table Of Contents Outline

  • Introducing Data and Big Data
  • Identifying the Business Applications of Big Data
  • Big Data and Hadoop
  • HDFS - Storing Data in Hadoop
  • Introduction to MapReduce
  • YARN and MapReduce - Processing Data in Hadoop
  • Developing a First Application for MapReduce
  • Exploring the Working of a MapReduce Process
  • Avro
  • Parquet
  • Flume - Service for Streaming Event Data
  • Sqoop (MySQL to Hadoop)
  • Apache Pig
  • Hive – Data Warehouse
  • Oozie– Workflow Scheduler
  • Exploring Crunch - Joining and Data Integration
  • Exploring Spark and Scala
  • Exploring HBase - Big Data Store
  • Zookeeper - Coordination Service for Distributed Applications
  • Exploring Storm
  • Machine Learning with Mahout
  • Interacting with NoSQL Databases
  • Hadoop and Security
  • Apache Drill and Google BigQuery
  • Exploring Cloudera
  • Exploring Hortonworks
  • HDInsight
  • IBM Infosphere
  • Hadoop and AWS
  • Appendix- Exploring Pivotal HD Case Studies


  • Chapter 1. Setting up the required environment for Apache Hadoop installation
  • Chapter 2. Installing the Single-Node Hadoop configuration on the system
  • Chapter 3. Exploring the Web-Based User Interface of Hadoop Cluster
  • Chapter 4. Implementing Map-Reduce Program for Word Count
  • Chapter 5. Implementing Basic Pig Latin Script
  • Chapter 6. Implementing Basic Hive Query Language Operations
  • Chapter 7. Using Apache Flume to fetch open-source user tweets from Twitter