Hortonworks HDPCD Exam Practice Test Instant Access

Question 1

Which process describes the lifecycle of a Mapper?

AThe JobTracker calls the TaskTracker's configure () method, then its map () method and finally its close () method.

BThe TaskTracker spawns a new Mapper to process all records in a single input split.

CThe TaskTracker spawns a new Mapper to process each key-value pair.

DThe JobTracker spawns a new Mapper to process all records in a single file.

Answer : B

For each map instance that runs, the TaskTracker creates a new instance of your mapper.

Note:

* The Mapper is responsible for processing Key/Value pairs obtained from the InputFormat. The mapper may perform a number of Extraction and Transformation functions on the Key/Value pair before ultimately outputting none, one or many Key/Value pairs of the same, or different Key/Value type.

* With the new Hadoop API, mappers extend the org.apache.hadoop.mapreduce.Mapper class. This class defines an 'Identity' map function by default - every input Key/Value pair obtained from the InputFormat is written out.

Examining the run() method, we can see the lifecycle of the mapper:

/**

* Expert users can override this method for more complete control over the

* execution of the Mapper.

* @param context

* @throws IOException

public void run(Context context) throws IOException, InterruptedException {

setup(context);

while (context.nextKeyValue()) {

map(context.getCurrentKey(), context.getCurrentValue(), context);

}

cleanup(context);

}

setup(Context) - Perform any setup for the mapper. The default implementation is a no-op method.

map(Key, Value, Context) - Perform a map operation in the given Key / Value pair. The default implementation calls Context.write(Key, Value)

cleanup(Context) - Perform any cleanup for the mapper. The default implementation is a no-op method.

Question 2

In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?

AIncrease the parameter that controls minimum split size in the job configuration.

BWrite a custom MapRunner that iterates over all key-value pairs in the entire file.

CSet the number of mappers equal to the number of input files you want to process.

DWrite a custom FileInputFormat and override the method isSplitable to always return false.

Answer : D

FileInputFormat is the base class for all file-based InputFormats. This provides a generic implementation of getSplits(JobContext). Subclasses of FileInputFormat can also override the isSplitable(JobContext, Path) method to ensure input-files are not split-up and are processed as a whole by Mappers.

Question 3

A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?

AThe file will be marked as corrupted if data node B fails during the creation of the file.

BEach data node locks the local file to prohibit concurrent readers and writers of the file.

CEach data node stores a copy of the file in the local file system with the same name as the HDFS file.

DThe file can be accessed if at least one of the data nodes storing the file is available.

Answer : D

HDFS keeps three copies of a block on three different datanodes to protect against truedata corruption. HDFS also tries to distribute these three replicas on more than one rack to protect againstdata availabilityissues. The fact that HDFS actively monitors any failed datanode(s) and upon failure detection immediately schedules re-replication of blocks (if needed) implies that three copies of data on three different nodes is sufficient to avoid corrupted files.

Note:

HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time. The NameNode makes all decisions regarding replication of blocks. HDFS uses rack-aware replica placement policy. In default configuration there are total 3 copies of a datablock on HDFS, 2 copies are stored on datanodes on same rack and 3rd copy on a different rack.

Question 4

Indentify which best defines a SequenceFile?

AA SequenceFile contains a binary encoding of an arbitrary number of homogeneous Writable objects

BA SequenceFile contains a binary encoding of an arbitrary number of heterogeneous Writable objects

CA SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.

DA SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be the same type.

Answer : D

SequenceFile is a flat file consisting of binary key/value pairs.

There are 3 different SequenceFile formats:

Uncompressed key/value records.

Record compressed key/value records - only 'values' are compressed here.

Block compressed key/value records - both keys and values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable.

Question 5

On a cluster running MapReduce v1 (MRv1), a TaskTracker heartbeats into the JobTracker on your cluster, and alerts the JobTracker it has an open map task slot.

What determines how the JobTracker assigns each map task to a TaskTracker?

AThe amount of RAM installed on the TaskTracker node.

BThe amount of free disk space on the TaskTracker node.

CThe number and speed of CPU cores on the TaskTracker node.

DThe average system load on the TaskTracker node over the past fifteen (15) minutes.

EThe location of the InsputSplit to be processed in relation to the location of the node.

Answer : E

The TaskTrackers send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated. When the JobTracker tries to find somewhere to schedule a task within the MapReduce operations, it first looks for an empty slot on the same server that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack.

Question 6

All keys used for intermediate output from mappers must:

AImplement a splittable compression algorithm.

BBe a subclass of FileInputFormat.

CImplement WritableComparable.

DOverride isSplitable.

EImplement a comparator for speedy sorting.

Answer : C

The MapReduce framework operates exclusively on <key, value> pairs, that is, the framework views the input to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of the job, conceivably of different types.

The key and value classes have to be serializable by the framework and hence need to implement the Writable interface. Additionally, the key classes have to implement the WritableComparable interface to facilitate sorting by the framework.

Question 7

What data does a Reducer reduce method process?

AAll the data in a single input file.

BAll data produced by a single mapper.

CAll data for a given key, regardless of which mapper(s) produced it.

DAll data for a given value, regardless of which mapper(s) produced it.

Answer : C

Reducing lets you aggregate values together. A reducer function receives an iterator of input values from an input list. It then combines these values together, returning a single output value.

All values with the same key are presented to a single reduce task.

Hortonworks Data Platform Certified Developer Exam Practice Test