Google Professional Data Engineer Exam Practice Test Instant Access

Question 1

If a dataset contains rows with individual people and columns for year of birth, country, and income, how many of the columns are continuous and how many are categorical?

A1 continuous and 2 categorical

B3 categorical

C3 continuous

D2 continuous and 1 categorical

Answer : D

The columns can be grouped into two types---categorical and continuous columns:

A column is called categorical if its value can only be one of the categories in a finite set. For example, the native country of a person (U.S., India, Japan, etc.) or the education level (high school, college, etc.) are categorical columns.

A column is called continuous if its value can be any numerical value in a continuous range. For example, the capital gain of a person (e.g. $14,084) is a continuous column.

Year of birth and income are continuous columns. Country is a categorical column.

You could use bucketization to turn year of birth and/or income into categorical features, but the raw columns are continuous.

Question 2

You are building a new data pipeline to share data between two different types of applications: jobs generators and job runners. Your solution must scale to accommodate increases in usage and must accommodate the addition of new applications without negatively affecting the performance of existing ones. What should you do?

ACreate an API using App Engine to receive and send messages to the applications

BUse a Cloud Pub/Sub topic to publish jobs, and use subscriptions to execute them

CCreate a table on Cloud SQL, and insert and delete rows with the job information

DCreate a table on Cloud Spanner, and insert and delete rows with the job information

Answer : A

Question 3

You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table?

ASet the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time snapshot to recover the data.

BSet the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.

CSet the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-time snapshot to recover the data.

DSet the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.

Answer : B

Question 4

You recently deployed several data processing jobs into your Cloud Composer 2 environment. You notice that some tasks are failing in Apache Airflow. On the monitoring dashboard, you see an increase in the total workers' memory usage, and there were worker pod evictions. You need to resolve these errors. What should you do?

Choose 2 answers

AIncrease the directed acyclic graph (DAG) file parsing interval.

BIncrease the memory available to the Airflow workers.

CIncrease the maximum number of workers and reduce worker concurrency.

DIncrease the memory available to the Airflow triggerer.

EIncrease the Cloud Composer 2 environment size from medium to large.

Answer : B, C

To resolve issues related to increased memory usage and worker pod evictions in your Cloud Composer 2 environment, the following steps are recommended:

Increase Memory Available to Airflow Workers:

By increasing the memory allocated to Airflow workers, you can handle more memory-intensive tasks, reducing the likelihood of pod evictions due to memory limits.

Increase Maximum Number of Workers and Reduce Worker Concurrency:

Increasing the number of workers allows the workload to be distributed across more pods, preventing any single pod from becoming overwhelmed.

Reducing worker concurrency limits the number of tasks that each worker can handle simultaneously, thereby lowering the memory consumption per worker.

Steps to Implement:

Increase Worker Memory:

Modify the configuration settings in Cloud Composer to allocate more memory to Airflow workers. This can be done through the environment configuration settings.

Adjust Worker and Concurrency Settings:

Increase the maximum number of workers in the Cloud Composer environment settings.

Reduce the concurrency setting for Airflow workers to ensure that each worker handles fewer tasks at a time, thus consuming less memory per worker.

Cloud Composer Worker Configuration

Scaling Airflow Workers

Question 5

You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables. What should you do?

AMake a call to the Stackdriver API to list all logs, and apply an advanced filter.

BIn the Stackdriver logging admin interface, and enable a log sink export to BigQuery.

CIn the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, and subscribe to the topic from your monitoring tool.

DUsing the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, and subscribe to the topic from your monitoring tool.

Answer : B

Question 6

You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possible for long-running batch jobs. You want to use a managed service. What should you do?

ADeploy a Cloud Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://

BDeploy a Cloud Dataproc cluster. Use an SSD persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://

CInstall Hadoop and Spark on a 10-node Compute Engine instance group with standard instances. Install the Cloud Storage connector, and store the data in Cloud Storage. Change references in scripts from hdfs:// to gs://

DInstall Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances. Store data in HDFS. Change references in scripts from hdfs:// to gs://

Answer : A

Question 7

Flowlogistic's management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?

ACloud Pub/Sub, Cloud Dataflow, and Cloud Storage

BCloud Pub/Sub, Cloud Dataflow, and Local SSD

CCloud Pub/Sub, Cloud SQL, and Cloud Storage

DCloud Load Balancing, Cloud Dataflow, and Cloud Storage

Answer : C

Google Professional Data Engineer Google Cloud Certified Professional Data Engineer Exam Practice Test