A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.
Which command can be used to grant full permissions on the database to the new data engineering team?
Answer : A
To grant full privileges on a table such as 'sales' to a group like 'team', the correct SQL command in Databricks is:
GRANT ALL PRIVILEGES ON TABLE sales TO team;
This command assigns all available privileges, including SELECT, INSERT, UPDATE, DELETE, and any other data manipulation or definition actions, to the specified team. This is typically necessary when a team needs full control over a table to manage and manipulate it as part of a project or ongoing maintenance.
Reference: Databricks documentation on SQL permissions: SQL Permissions in Databricks
Which query is performing a streaming hop from raw data to a Bronze table?
A)
B)
C)
D)
Answer : D
The query performing a streaming hop from raw data to a Bronze table is identified by using the Spark streaming read capability and then writing to a Bronze table. Let's analyze the options:
Option A: Utilizes .writeStream but performs a complete aggregation which is more characteristic of a roll-up into a summarized table rather than a hop into a Bronze table.
Option B: Also uses .writeStream but calculates an average, which again does not typically represent the raw to Bronze transformation, which usually involves minimal transformations.
Option C: This uses a basic .write with .mode('append') which is not a streaming operation, and hence not suitable for real-time streaming data transformation to a Bronze table.
Option D: It employs spark.readStream.load() to ingest raw data as a stream and then writes it out with .writeStream, which is a typical pattern for streaming data into a Bronze table where raw data is captured in real-time and minimal transformation is applied. This approach aligns with the concept of a Bronze table in a modern data architecture, where raw data is ingested continuously and stored in a more accessible format.
Reference: Databricks documentation on Structured Streaming: Structured Streaming in Databricks
Which file format is used for storing Delta Lake Table?
Answer : A
Delta Lake tables use the Parquet format as their underlying storage format. Delta Lake enhances Parquet by adding a transaction log that keeps track of all the operations performed on the table. This allows features like ACID transactions, scalable metadata handling, and schema enforcement, making it an ideal choice for big data processing and management in environments like Databricks.
Reference: Databricks documentation on Delta Lake: Delta Lake Overview
Which of the following describes the type of workloads that are always compatible with Auto Loader?
A data engineer has created a new database using the following command:
CREATE DATABASE IF NOT EXISTS customer360;
In which of the following locations will the customer360 database be located?
Answer : B
dbfs:/user/hive/warehouse Thereby showing 'dbfs:/user/hive/warehouse/customer360.db
The location of the customer360 database depends on the value of thespark.sql.warehouse.dirconfiguration property, which specifies the default location for managed databases and tables. If the property is not set, the default value isdbfs:/user/hive/warehouse. Therefore, the customer360 database will be located indbfs:/user/hive/warehouse/customer360.db. However, if the property is set to a different value, such asdbfs:/user/hive/database, then the customer360 database will be located indbfs:/user/hive/database/customer360.db. Thus, more information is needed to determine the correct response.
Option A is not correct, asdbfs:/user/hive/database/customer360is not the default location for managed databases and tables, unless thespark.sql.warehouse.dirproperty is explicitly set todbfs:/user/hive/database.
Option B is not correct, asdbfs:/user/hive/warehouseis the default location for the root directory of managed databases and tables, not for a specific database. The database name should be appended with.dbto the directory path, such asdbfs:/user/hive/warehouse/customer360.db.
Option C is not correct, asdbfs:/user/hive/customer360is not a valid location for a managed database, as it does not follow the directory structure specified by thespark.sql.warehouse.dirproperty.
[Databricks Data Engineer Professional Exam Guide]
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The code block used by the data engineer is below:
If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?
A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name.
They have the following incomplete code block:
____(f"SELECT customer_id, spend FROM {table_name}")
Which of the following can be used to fill in the blank to successfully complete the task?