Snowflake ARA-C01 Exam Practice Test Instant Access

Question 1

A user has activated primary and secondary roles for a session.

What operation is the user prohibited from using as part of SQL actions in Snowflake using the secondary role?

AInsert

BCreate

CDelete

DTruncate

Answer : B

In Snowflake, when a user activates a secondary role during a session, certain privileges associated with DDL (Data Definition Language) operations are restricted. The CREATE statement, which falls under DDL operations, cannot be executed using a secondary role. This limitation is designed to enforce role-based access control and ensure that schema modifications are managed carefully, typically reserved for primary roles that have explicit permissions to modify database structures. Reference: Snowflake's security and access control documentation specifying the limitations and capabilities of primary versus secondary roles in session management.

Question 2

An Architect is designing a solution that will be used to process changed records in an orders table. Newly-inserted orders must be loaded into the f_orders fact table, which will aggregate all the orders by multiple dimensions (time, region, channel, etc.). Existing orders can be updated by the sales department within 30 days after the order creation. In case of an order update, the solution must perform two actions:

1. Update the order in the f_0RDERS fact table.

2. Load the changed order data into the special table ORDER _REPAIRS.

This table is used by the Accounting department once a month. If the order has been changed, the Accounting team needs to know the latest details and perform the necessary actions based on the data in the order_repairs table.

What data processing logic design will be the MOST performant?

AUse one stream and one task.

BUse one stream and two tasks.

CUse two streams and one task.

DUse two streams and two tasks.

Answer : B

The most performant design for processing changed records, considering the need to both update records in the f_orders fact table and load changes into the order_repairs table, is to use one stream and two tasks. The stream will monitor changes in the orders table, capturing both inserts and updates. The first task would apply these changes to the f_orders fact table, ensuring all dimensions are accurately represented. The second task would use the same stream to insert relevant changes into the order_repairs table, which is critical for the Accounting department's monthly review. This method ensures efficient processing by minimizing the overhead of managing multiple streams and synchronizing between them, while also allowing specific tasks to optimize for their target operations. Reference: Snowflake's documentation on streams and tasks for handling data changes efficiently.

Question 3

An Architect needs to improve the performance of reports that pull data from multiple Snowflake tables, join, and then aggregate the data. Users access the reports using several dashboards. There are performance issues on Monday mornings between 9:00am-11:00am when many users check the sales reports.

The size of the group has increased from 4 to 8 users. Waiting times to refresh the dashboards has increased significantly. Currently this workload is being served by a virtual warehouse with the following parameters:

AUTO-RESUME = TRUE AUTO_SUSPEND = 60 SIZE = Medium

What is the MOST cost-effective way to increase the availability of the reports?

AUse materialized views and pre-calculate the data.

BIncrease the warehouse to size Large and set auto_suspend = 600.

CUse a multi-cluster warehouse in maximized mode with 2 size Medium clusters.

DUse a multi-cluster warehouse in auto-scale mode with 1 size Medium cluster, and set min_cluster_count = 1 and max_cluster_count = 4.

Answer : D

The most cost-effective way to increase the availability and performance of the reports during peak usage times, while keeping costs under control, is to use a multi-cluster warehouse in auto-scale mode. Option D suggests using a multi-cluster warehouse with 1 size Medium cluster and allowing it to auto-scale between 1 and 4 clusters based on demand. This setup ensures that additional computing resources are available when needed (e.g., during Monday morning peaks) and are scaled down to minimize costs when the demand decreases. This approach optimizes resource utilization and cost by adjusting the compute capacity dynamically, rather than maintaining a larger fixed size or multiple clusters continuously. Reference: Snowflake's official documentation on managing warehouses and using auto-scaling features.

Question 4

A company is following the Data Mesh principles, including domain separation, and chose one Snowflake account for its data platform.

An Architect created two data domains to produce two data products. The Architect needs a third data domain that will use both of the data products to create an aggregate data product. The read access to the data products will be granted through a separate role.

Based on the Data Mesh principles, how should the third domain be configured to create the aggregate product if it has been granted the two read roles?

AUse secondary roles for all users.

BCreate a hierarchy between the two read roles.

CRequest a technical ETL user with the sysadmin role.

DRequest that the two data domains share data using the Data Exchange.

Answer : D

In the scenario described, where a third data domain needs access to two existing data products in a Snowflake account structured according to Data Mesh principles, the best approach is to utilize Snowflake's Data Exchange functionality. Option D is correct as it facilitates the sharing and governance of data across different domains efficiently and securely. Data Exchange allows domains to publish and subscribe to live data products, enabling real-time data collaboration and access management in a governed manner. This approach is in line with Data Mesh principles, which advocate for decentralized data ownership and architecture, enhancing agility and scalability across the organization. Reference:

Snowflake Documentation on Data Exchange

Articles on Data Mesh Principles in Data Management

Question 5

A company has built a data pipeline using Snowpipe to ingest files from an Amazon S3 bucket. Snowpipe is configured to load data into staging database tables. Then a task runs to load the data from the staging database tables into the reporting database tables.

The company is satisfied with the availability of the data in the reporting database tables, but the reporting tables are not pruning effectively. Currently, a size 4X-Large virtual warehouse is being used to query all of the tables in the reporting database.

What step can be taken to improve the pruning of the reporting tables?

AEliminate the use of Snowpipe and load the files into internal stages using PUT commands.

BIncrease the size of the virtual warehouse to a size 5X-Large.

CUse an ORDER BY <cluster_key (s) > command to load the reporting tables.

DCreate larger files for Snowpipe to ingest and ensure the staging frequency does not exceed 1 minute.

Answer : C

Effective pruning in Snowflake relies on the organization of data within micro-partitions. By using an ORDER BY clause with clustering keys when loading data into the reporting tables, Snowflake can better organize the data within micro-partitions. This organization allows Snowflake to skip over irrelevant micro-partitions during a query, thus improving query performance and reducing the amount of data scanned12.

Reference =

* Snowflake Documentation on micro-partitions and data clustering2

* Community article on recognizing unsatisfactory pruning and improving it1

Question 6

An Architect has designed a data pipeline that Is receiving small CSV files from multiple sources. All of the files are landing in one location. Specific files are filtered for loading into Snowflake tables using the copy command. The loading performance is poor.

What changes can be made to Improve the data loading performance?

AIncrease the size of the virtual warehouse.

BCreate a multi-cluster warehouse and merge smaller files to create bigger files.

CCreate a specific storage landing bucket to avoid file scanning.

DChange the file format from CSV to JSON.

Answer : B

According to the Snowflake documentation, the data loading performance can be improved by following some best practices and guidelines for preparing and staging the data files. One of the recommendations is to aim for data files that are roughly 100-250 MB (or larger) in size compressed, as this will optimize the number of parallel operations for a load. Smaller files should be aggregated and larger files should be split to achieve this size range. Another recommendation is to use a multi-cluster warehouse for loading, as this will allow for scaling up or out the compute resources depending on the load demand. A single-cluster warehouse may not be able to handle the load concurrency and throughput efficiently. Therefore, by creating a multi-cluster warehouse and merging smaller files to create bigger files, the data loading performance can be improved.Reference:

Data Loading Considerations

Preparing Your Data Files

Planning a Data Load

Question 7

A company wants to Integrate its main enterprise identity provider with federated authentication with Snowflake.

The authentication integration has been configured and roles have been created in Snowflake. However, the users are not automatically appearing in Snowflake when created and their group membership is not reflected in their assigned rotes.

How can the missing functionality be enabled with the LEAST amount of operational overhead?

AOAuth must be configured between the identity provider and Snowflake. Then the authorization server must be configured with the right mapping of users and roles.

BOAuth must be configured between the identity provider and Snowflake. Then the authorization server must be configured with the right mapping of users, and the resource server must be configured with the right mapping of role assignment.

CSCIM must be enabled between the identity provider and Snowflake. Once both are synchronized through SCIM, their groups will get created as group accounts in Snowflake and the proper roles can be granted.

DSCIM must be enabled between the identity provider and Snowflake. Once both are synchronized through SCIM. users will automatically get created and their group membership will be reflected as roles In Snowflake.

Answer : D

The best way to integrate an enterprise identity provider with federated authentication and enable automatic user creation and role assignment in Snowflake is to use SCIM (System for Cross-domain Identity Management). SCIM allows Snowflake to synchronize with the identity provider and create users and groups based on the information provided by the identity provider. The groups are mapped to roles in Snowflake, and the users are assigned the roles based on their group membership. This way, the identity provider remains the source of truth for user and group management, and Snowflake automatically reflects the changes without manual intervention. The other options are either incorrect or incomplete, as they involve using OAuth, which is a protocol for authorization, not authentication or user provisioning, and require additional configuration of authorization and resource servers.

Snowflake ARA-C01 SnowPro Advanced: Architect Certification Exam Practice Test