Accenture Hadoop Admin Interview Questions

Here are Hadoop Admin interview questions and answers for freshers as well as experienced candidates to get their dream job.

3) What are the common Input Formats in Hadoop?

Three widely used input formats are:

  • Text Input: It is default input format in Hadoop.
  • Key Value: It is used for plain text files
  • Sequence: Use for reading files in sequence
  • 4 Suppose there are several small CSV files present in /user/input directory in HDFS and you want to create a single Hive table from these files. The data in these files have the following fields: {registration_no, name, email, address}. What will be your approach to solve this, and where will you create a single Hive table for multiple smaller files without degrading the performance of the system?

    Using SequenceFile format and grouping these small files together to form a single sequence file can solve this problem. Below are the steps:

    1 Explain the Apache Pig architecture.

    Apache Pig architecture includes a Pig Latin interpreter that applies Pig Latin scripts to process and interpret massive datasets. Programmers use Pig Latin language to examine huge datasets in the Hadoop environment. Apache pig has a vibrant set of datasets showing different data operations like join, filter, sort, load, group, etc. Programmers must practice Pig Latin language to address a Pig script to perform a particular task. Pig transforms these Pig scripts into a series of Map-Reduce jobs to reduce programmers’ work. Pig Latin programs are performed via various mechanisms such as UDFs, embedded, and Grunt shells.

    Apache Pig architecture consists of the following major components:

  • Parser: The Parser handles the Pig Scripts and checks the syntax of the script.
  • Optimizer: The optimizer receives the logical plan (DAG). And carries out the logical optimization such as projection and push down.
  • Compiler: The compiler converts the logical plan into a series of MapReduce jobs.
  • Execution Engine: In the end, the MapReduce jobs get submitted to Hadoop in sorted order.
  • Execution Mode: Apache Pig is executed in local and Map Reduce modes. The selection of execution mode depends on where the data is stored and where you want to run the Pig script.
  • Yarn stands for Yet Another Resource Negotiator. It is the resource management layer of Hadoop. The Yarn was launched in Hadoop 2.x. Yarn provides many data processing engines like graph processing, batch processing, interactive processing, and stream processing to execute and process data saved in the Hadoop Distributed File System. Yarn also offers job scheduling. It extends the capability of Hadoop to other evolving technologies so that they can take good advantage of HDFS and economic clusters. Apache Yarn is the data operating method for Hadoop 2.x. It consists of a master daemon known as “Resource Manager,” a slave daemon called node manager, and Application Master.

  • Resource Manager: It runs on a master daemon and controls the resource allocation in the cluster.
  • Node Manager: It runs on the slave daemons and executes a task on each single Data Node.
  • Application Master: It controls the user job lifecycle and resource demands of single applications. It works with the Node Manager and monitors the execution of tasks.
  • Container: It is a combination of resources, including RAM, CPU, Network, HDD, etc., on a single node.
  • Apache Zookeeper is an open-source service that supports controlling a huge set of hosts. Management and coordination in a distributed environment are complex. Zookeeper automates this process and enables developers to concentrate on building software features rather than bother about its distributed nature.

    Zookeeper helps to maintain configuration knowledge, naming, group services for distributed applications. It implements various protocols on the cluster so that the application should not execute them on its own. It provides a single coherent view of many machines.

    Hadoop Admin interview questions and answers

    Related Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *