Google Professional Data Engineer Google Cloud Certified Professional Data Engineer Exam Practice Test

Page: 1 / 14
Total 375 questions
Question 1

Your car factory is pushing machine measurements as messages into a Pub/Sub topic in your Google Cloud project. A Dataflow streaming job. that you wrote with the Apache Beam SDK, reads these messages, sends acknowledgment lo Pub/Sub. applies some custom business logic in a Doffs instance, and writes the result to BigQuery. You want to ensure that if your business logic fails on a message, the message will be sent to a Pub/Sub topic that you want to monitor for alerting purposes. What should you do?



Answer : C

To ensure that messages failing to process in your Dataflow job are sent to a Pub/Sub topic for monitoring and alerting, the best approach is to use Pub/Sub's dead-letter topic feature. Here's why option C is the best choice:

Dead-Letter Topic:

Pub/Sub's dead-letter topic feature allows messages that fail to be processed successfully to be redirected to a specified topic. This ensures that these messages are not lost and can be reviewed for debugging and alerting purposes.

Monitoring and Alerting:

By specifying a new Pub/Sub topic as the dead-letter topic, you can use Cloud Monitoring to track metrics such as subscription/dead_letter_message_count, providing visibility into the number of failed messages.

This allows you to set up alerts based on these metrics to notify the appropriate teams when failures occur.

Steps to Implement:

Enable Dead-Letter Topic:

Configure your Pub/Sub pull subscription to enable dead lettering and specify the new Pub/Sub topic for dead-letter messages.

Set Up Monitoring:

Use Cloud Monitoring to monitor the subscription/dead_letter_message_count metric on your pull subscription.

Configure alerts based on this metric to notify the team of any processing failures.


Pub/Sub Dead Letter Policy

Cloud Monitoring with Pub/Sub

Question 2

You are designing a data mesh on Google Cloud by using Dataplex to manage data in BigQuery and Cloud Storage. You want to simplify data asset permissions. You are creating a customer virtual lake with two user groups:

* Data engineers, which require lull data lake access

* Analytic users, which require access to curated data

You need to assign access rights to these two groups. What should you do?



Answer : A

When designing a data mesh on Google Cloud using Dataplex to manage data in BigQuery and Cloud Storage, it is essential to simplify data asset permissions while ensuring that each user group has the appropriate access levels. Here's why option A is the best choice:

Data Engineer Group:

Data engineers require full access to the data lake to manage and operate data assets comprehensively. Granting the dataplex.dataOwner role to the data engineer group on the customer data lake ensures they have the necessary permissions to create, modify, and delete data assets within the lake.

Analytic User Group:

Analytic users need access to curated data but do not require full control over all data assets. Granting the dataplex.dataReader role to the analytic user group on the customer curated zone provides read-only access to the curated data, enabling them to analyze the data without the ability to modify or delete it.

Steps to Implement:

Grant Data Engineer Permissions:

Assign the dataplex.dataOwner role to the data engineer group on the customer data lake to ensure full access and management capabilities.

Grant Analytic User Permissions:

Assign the dataplex.dataReader role to the analytic user group on the customer curated zone to provide read-only access to curated data.


Dataplex IAM Roles and Permissions

Managing Access in Dataplex

Question 3

You want to encrypt the customer data stored in BigQuery. You need to implement for-user crypto-deletion on data stored in your tables. You want to adopt native features in Google Cloud to avoid custom solutions. What should you do?



Answer : A

To implement for-user crypto-deletion and ensure that customer data stored in BigQuery is encrypted, using native Google Cloud features, the best approach is to use Customer-Managed Encryption Keys (CMEK) with Cloud Key Management Service (KMS). Here's why:

Customer-Managed Encryption Keys (CMEK):

CMEK allows you to manage your own encryption keys using Cloud KMS. These keys provide additional control over data access and encryption management.

Associating a CMEK with a BigQuery table ensures that data is encrypted with a key you manage.

For-User Crypto-Deletion:

For-user crypto-deletion can be achieved by disabling or destroying the CMEK. Once the key is disabled or destroyed, the data encrypted with that key cannot be decrypted, effectively rendering it unreadable.

Native Integration:

Using CMEK with BigQuery is a native feature, avoiding the need for custom encryption solutions. This simplifies the management and implementation of encryption and decryption processes.

Steps to Implement:

Create a CMEK in Cloud KMS:

Set up a new customer-managed encryption key in Cloud KMS.

Associate the CMEK with BigQuery Tables:

When creating a new table in BigQuery, specify the CMEK to be used for encryption.

This can be done through the BigQuery console, CLI, or API.


BigQuery and CMEK

Cloud KMS Documentation

Encrypting Data in BigQuery

Question 4

You recently deployed several data processing jobs into your Cloud Composer 2 environment. You notice that some tasks are failing in Apache Airflow. On the monitoring dashboard, you see an increase in the total workers' memory usage, and there were worker pod evictions. You need to resolve these errors. What should you do?

Choose 2 answers



Answer : B, C

To resolve issues related to increased memory usage and worker pod evictions in your Cloud Composer 2 environment, the following steps are recommended:

Increase Memory Available to Airflow Workers:

By increasing the memory allocated to Airflow workers, you can handle more memory-intensive tasks, reducing the likelihood of pod evictions due to memory limits.

Increase Maximum Number of Workers and Reduce Worker Concurrency:

Increasing the number of workers allows the workload to be distributed across more pods, preventing any single pod from becoming overwhelmed.

Reducing worker concurrency limits the number of tasks that each worker can handle simultaneously, thereby lowering the memory consumption per worker.

Steps to Implement:

Increase Worker Memory:

Modify the configuration settings in Cloud Composer to allocate more memory to Airflow workers. This can be done through the environment configuration settings.

Adjust Worker and Concurrency Settings:

Increase the maximum number of workers in the Cloud Composer environment settings.

Reduce the concurrency setting for Airflow workers to ensure that each worker handles fewer tasks at a time, thus consuming less memory per worker.


Cloud Composer Worker Configuration

Scaling Airflow Workers

Question 5

You are architecting a data transformation solution for BigQuery. Your developers are proficient with SOL and want to use the ELT development technique. In addition, your developers need an intuitive coding environment and the ability to manage SQL as code. You need to identify a solution for your developers to build these pipelines. What should you do?



Answer : C

To architect a data transformation solution for BigQuery that aligns with the ELT development technique and provides an intuitive coding environment for SQL-proficient developers, Dataform is an optimal choice. Here's why:

ELT Development Technique:

ELT (Extract, Load, Transform) is a process where data is first extracted and loaded into a data warehouse, and then transformed using SQL queries. This is different from ETL, where data is transformed before being loaded into the data warehouse.

BigQuery supports ELT, allowing developers to write SQL transformations directly in the data warehouse.

Dataform:

Dataform is a development environment designed specifically for data transformations in BigQuery and other SQL-based warehouses.

It provides tools for managing SQL as code, including version control and collaborative development.

Dataform integrates well with existing development workflows and supports scheduling and managing SQL-based data pipelines.

Intuitive Coding Environment:

Dataform offers an intuitive and user-friendly interface for writing and managing SQL queries.

It includes features like SQLX, a SQL dialect that extends standard SQL with features for modularity and reusability, which simplifies the development of complex transformation logic.

Managing SQL as Code:

Dataform supports version control systems like Git, enabling developers to manage their SQL transformations as code.

This allows for better collaboration, code reviews, and version tracking.


Dataform Documentation

BigQuery Documentation

Managing ELT Pipelines with Dataform

Question 6

You have important legal hold documents in a Cloud Storage bucket. You need to ensure that these documents are not deleted or modified. What should you do?



Answer : A

To ensure that important legal hold documents in a Cloud Storage bucket are not deleted or modified, the most effective method is to set and lock a retention policy. Here's why this is the best choice:

Retention Policy:

A retention policy defines a retention period during which objects in the bucket cannot be deleted or modified. This ensures data immutability.

Once a retention policy is set and locked, it cannot be removed or reduced, providing strong protection against accidental or malicious deletions.

Locking the Retention Policy:

Locking a retention policy ensures that the retention period cannot be changed. This action is permanent and guarantees that the specified retention period will be enforced.

Steps to Implement:

Set the Retention Policy:

Define a retention period for the bucket to ensure that all objects are protected for the required duration.

Lock the Retention Policy:

Lock the retention policy to prevent any modifications, ensuring the immutability of the documents.


Cloud Storage Retention Policy Documentation

How to Set a Retention Policy

Question 7

Your company operates in three domains: airlines, hotels, and ride-hailing services. Each domain has two teams: analytics and data science, which create data assets in BigQuery with the help of a central data platform team. However, as each domain is evolving rapidly, the central data platform team is becoming a bottleneck. This is causing delays in deriving insights from data, and resulting in stale data when pipelines are not kept up to date. You need to design a data mesh architecture by using Dataplex to eliminate the bottleneck. What should you do?



Answer : B

To design a data mesh architecture using Dataplex to eliminate bottlenecks caused by a central data platform team, consider the following:

Data Mesh Architecture:

Data mesh promotes a decentralized approach where domain teams manage their own data pipelines and assets, increasing agility and reducing bottlenecks.

Dataplex Lakes and Zones:

Lakes in Dataplex are logical containers for managing data at scale, and zones are subdivisions within lakes for organizing data based on domains, teams, or other criteria.

Domain and Team Management:

By creating a lake for each team and zones for each domain, each team can independently manage their data assets without relying on the central data platform team.

This setup aligns with the principles of data mesh, promoting ownership and reducing delays in data processing and insights.

Implementation Steps:

Create Lakes and Zones:

Create separate lakes in Dataplex for each team (analytics and data science).

Within each lake, create zones for the different domains (airlines, hotels, ride-hailing).

Attach BigQuery Datasets:

Attach the BigQuery datasets created by the respective teams as assets to their corresponding zones.

Decentralized Management:

Allow each domain to manage their own zone's data assets, providing them with the autonomy to update and maintain their pipelines without depending on the central team.


Dataplex Documentation

BigQuery Documentation

Data Mesh Principles

Page:    1 / 14   
Total 375 questions