The eval tool in Sqoop enables users to carry out user-defined queries on the corresponding database servers and check the outcome in the console.
5 Why are the data blocks in HDFS so huge?
The reason behind the large size of the data blocks in HDFS is that the transfer happens at the disk transfer rate in the presence of large-sized blocks. On the other hand, if the size is kept small, there will be a large number of blocks to be transferred, which will force the HDFS to store too much metadata, thus increasing traffic.
What is Apache Oozie?
Apache Oozie is nothing but a scheduler that helps to schedule jobs in Hadoop and bundles them as a single logical work. Oozie jobs can largely be divided into the following two categories:
Oozie Workflow: These jobs are a set of sequential actions that need to be executed.
Oozie Coordinator: These jobs are triggered as and when there is data available for them, until which, it rests.
2 Can you skip the bad records in Hadoop? How?
In Hadoop, there is an option where sets of input records can be skipped while processing map inputs. This feature is managed by the applications through the SkipBadRecords class.
The SkipBadRecords class is commonly used when map tasks fail on input records. Please note that the failure can occur due to faults in the map function. Hence, the bad records can be skipped in Hadoop by using this class.
6 What are the components of the architecture of Hive?
User Interface: It requests the execute interface for the driver and also builds a session for this query. Further, the query is sent to the compiler in order to create an execution plan for the same.
Metastore: It stores the metadata and transfers it to the compiler to execute a query.
Compiler: It creates the execution plan. It consists of a DAG of stages wherein each stage can either be a map, metadata operation, or reduce an operation or job on HDFS.
Execution Engine: It bridges the gap between Hadoop and Hive and helps in processing the query. It communicates with the metastore bidirectionally in order to perform various tasks.
3 What are the various schedulers in YARN?
Mentioned below are the numerous schedulers that are available in YARN:
FIFO Scheduler: The first-in-first-out (FIFO) scheduler places all the applications in a single queue and executes them in the same order as their submission. As the FIFO scheduler can block short applications due to long-running applications, it is less efficient and desirable for professionals.
Capacity Scheduler: A different queue makes it possible to start executing short-term jobs as soon as they are submitted. Unlike in the FIFO scheduler, the long-term tasks are completed later in the capacity scheduler.
Fair Scheduler: The fair scheduler, as the name suggests, works fairly. It balances the resources dynamically between all the running jobs and is not required to reserve a specific capacity for them.
Does Hive support multiline comments?
No. Hive does not support multiline comments. It only supports single-line comments as of now.
4 What are the commands to restart NameNode and all the daemons in Hadoop?
The following commands can be used to restart NameNode and all the daemons:
NameNode can be stopped with the ./sbin /Hadoop-daemon.sh stop NameNode command. The NameNode can be started by using the ./sbin/Hadoop-daemon.sh start NameNode command.
The daemons can be stopped with the ./sbin /stop-all.sh The daemons can be started by using the ./sbin/start-all.sh command.