Snowflake DSA-C02 Exam Practice Test Instant Access

Question 1

Which of the following metrics are used to evaluate classification models?

AArea under the ROC curve

BF1 score

CConfusion matrix

DAll of the above

Answer : D

Evaluation metrics are tied to machine learning tasks. There are different metrics for the tasks of classification and regression. Some metrics, like precision-recall, are useful for multiple tasks. Classification and regression are examples of supervised learning, which constitutes a majority of machine learning applications. Using different metrics for performance evaluation, we should be able to im-prove our model's overall predictive power before we roll it out for production on unseen data. Without doing a proper evaluation of the Machine Learning model by using different evaluation metrics, and only depending on accuracy, can lead to a problem when the respective model is deployed on unseen data and may end in poor predictions.

Classification metrics are evaluation measures used to assess the performance of a classification model. Common metrics include accuracy (proportion of correct predictions), precision (true positives over total predicted positives), recall (true positives over total actual positives), F1 score (har-monic mean of precision and recall), and area under the receiver operating characteristic curve (AUC-ROC).

Confusion Matrix

Confusion Matrix is a performance measurement for the machine learning classification problems where the output can be two or more classes. It is a table with combinations of predicted and actual values.

It is extremely useful for measuring the Recall, Precision, Accuracy, and AUC-ROC curves.

The four commonly used metrics for evaluating classifier performance are:

1. Accuracy: The proportion of correct predictions out of the total predictions.

2. Precision: The proportion of true positive predictions out of the total positive predictions (precision = true positives / (true positives + false positives)).

3. Recall (Sensitivity or True Positive Rate): The proportion of true positive predictions out of the total actual positive instances (recall = true positives / (true positives + false negatives)).

4. F1 Score: The harmonic mean of precision and recall, providing a balance between the two metrics (F1 score = 2 * ((precision * recall) / (precision + recall))).

These metrics help assess the classifier's effectiveness in correctly classifying instances of different classes.

Understanding how well a machine learning model will perform on unseen data is the main purpose behind working with these evaluation metrics. Metrics like accuracy, precision, recall are good ways to evaluate classification models for balanced datasets, but if the data is imbalanced then other methods like ROC/AUC perform better in evaluating the model performance.

ROC curve isn't just a single number but it's a whole curve that provides nuanced details about the behavior of the classifier. It is also hard to quickly compare many ROC curves to each other.

Question 2

Which of the following process best covers all of the following characteristics?

* Collecting descriptive statistics like min, max, count and sum.

* Collecting data types, length and recurring patterns.

* Tagging data with keywords, descriptions or categories.

* Performing data quality assessment, risk of performing joins on the data.

* Discovering metadata and assessing its accuracy.

Identifying distributions, key candidates, foreign-key candidates, functional dependencies, embedded value dependencies, and performing inter-table analysis.

AData Visualization

BData Virtualization

CData Profiling

DData Collection

Answer : C

Data processing and analysis cannot happen without data profiling---reviewing source data for con-tent and quality. As data gets bigger and infrastructure moves to the cloud, data profiling is increasingly important.

What is data profiling?

Data profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects.

Data profiling is a crucial part of:

* Data warehouse and business intelligence (DW/BI) projects---data profiling can uncover data quality issues in data sources, and what needs to be corrected in ETL.

* Data conversion and migration projects---data profiling can identify data quality issues, which you can handle in scripts and data integration tools copying data from source to target. It can also un-cover new requirements for the target system.

* Source system data quality projects---data profiling can highlight data which suffers from serious or numerous quality issues, and the source of the issues (e.g. user inputs, errors in interfaces, data corruption).

Data profiling involves:

* Collecting descriptive statistics like min, max, count and sum.

* Collecting data types, length and recurring patterns.

* Tagging data with keywords, descriptions or categories.

* Performing data quality assessment, risk of performing joins on the data.

* Discovering metadata and assessing its accuracy.

* Identifying distributions, key candidates, foreign-key candidates, functional dependencies, embedded value dependencies, and performing inter-table analysis.

Question 3

Which command is used to install Jupyter Notebook?

Apip install jupyter

Bpip install notebook

Cpip install jupyter-notebook

Dpip install nbconvert

Answer : A

Jupyter Notebook is a web-based interactive computational environment.

The command used to install Jupyter Notebook is pip install jupyter.

The command used to start Jupyter Notebook is jupyter notebook.

Question 4

Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the expression g = df.groupby(df.index.str.len()) do?

AGroups df based on index values

BGroups df based on length of each index value

CGroups df based on index strings

DData frames cannot be grouped by index values. Hence it results in Error.

Answer : D

Data frames cannot be grouped by index values. Hence it results in Error.

Question 5

Consider a data frame df with columns ['A', 'B', 'C', 'D'] and rows ['r1', 'r2', 'r3']. What does the ex-pression df[lambda x : x.index.str.endswith('3')] do?

AReturns the row name r3

BResults in Error

CReturns the third column

DFilters the row labelled r3

Answer : D

It will Filters the row labelled r3.

Question 6

Which object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change, so that actions can be taken using the changed data of Data Science Pipelines?

ATask

BDynamic tables

CStream

DTags

EDelta

FOFFSET

Answer : C

A stream object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change, so that actions can be taken using the changed data. This process is referred to as change data capture (CDC). An individual table stream tracks the changes made to rows in a source table. A table stream (also referred to as simply a ''stream'') makes a ''change table'' available of what changed, at the row level, between two transactional points of time in a table. This allows querying and consuming a sequence of change records in a transactional fashion.

Streams can be created to query change data on the following objects:

* Standard tables, including shared tables.

* Views, including secure views

* Directory tables

* Event tables

Question 7

A Data Scientist as data providers require to allow consumers to access all databases and database objects in a share by granting a single privilege on shared databases. Which one is incorrect SnowSQL command used by her while doing this task?

Assuming:

A database named product_db exists with a schema named product_agg and a table named Item_agg.

The database, schema, and table will be shared with two accounts named xy12345 and yz23456.

1. USE ROLE accountadmin;

2. CREATE DIRECT SHARE product_s;

3. GRANT USAGE ON DATABASE product_db TO SHARE product_s;

4. GRANT USAGE ON SCHEMA product_db. product_agg TO SHARE product_s;

5. GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s;

6. SHOW GRANTS TO SHARE product_s;

7. ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;

8. SHOW GRANTS OF SHARE product_s;

AGRANT USAGE ON DATABASE product_db TO SHARE product_s;

BCREATE DIRECT SHARE product_s;

CGRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s;

DALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;

Answer : C

CREATE SHARE product_s is the correct Snowsql command to create Share object.

Rest are correct ones.

https://docs.snowflake.com/en/user-guide/data-sharing-provider#creating-a-share-using-sql

Snowflake DSA-C02 SnowPro Advanced: Data Scientist Certification Exam Practice Test