Amazon DAS-C01 Exam Practice Test Instant Access

Question 1

A company's data science team is designing a shared dataset repository on a Windows server. The data repository will store a large amount of training data that the data

science team commonly uses in its machine learning models. The data scientists create a random number of new datasets each day.

The company needs a solution that provides persistent, scalable file storage and high levels of throughput and IOPS. The solution also must be highly available and must

integrate with Active Directory for access control.

Which solution will meet these requirements with the LEAST development effort?

AStore datasets as files in an Amazon EMR cluster. Set the Active Directory domain for authentication.

BStore datasets as files in Amazon FSx for Windows File Server. Set the Active Directory domain for authentication.

CStore datasets as tables in a multi-node Amazon Redshift cluster. Set the Active Directory domain for authentication.

DStore datasets as global tables in Amazon DynamoDB. Build an application to integrate authentication with the Active Directory domain.

Answer : B

Question 2

A company is using an AWS Lambda function to run Amazon Athena queries against a cross-account AWS Glue Data Catalog. A query returns the following error:

HIVE METASTORE ERROR

The error message states that the response payload size exceeds the maximum allowed payload size. The queried table is already partitioned, and the data is stored in an

Amazon S3 bucket in the Apache Hive partition format.

Which solution will resolve this error?

AModify the Lambda function to upload the query response payload as an object into the S3 bucket. Include an S3 object presigned URL as the payload in the Lambda function response.

BRun the MSCK REPAIR TABLE command on the queried table.

CCreate a separate folder in the S3 bucket. Move the data files that need to be queried into that folder. Create an AWS Glue crawler that points to the folder instead of the S3 bucket.

DCheck the schema of the queried table for any characters that Athena does not support. Replace any unsupported characters with characters that Athena supports.

Answer : A

Question 3

A social media company is using business intelligence tools to analyze data for forecasting. The company is using Apache Kafka to ingest dat

a. The company wants to build dynamic dashboards that include machine learning (ML) insights to forecast key business trends.

The dashboards must show recent batched data that is not more than 75 minutes old. Various teams at the company want to view the dashboards by using Amazon QuickSight with ML insights.

Which solution will meet these requirements?

AReplace Kafka with Amazon Managed Streaming for Apache Kafka (Amazon MSK). Use AWS Data Exchange to store the data in Amazon S3. Use SPICE in QuickSight Enterprise edition to refresh the data from Amazon S3 each hour. Use QuickSight to create a dynamic dashboard that includes forecasting and ML insights.

BReplace Kafka with an Amazon Kinesis data stream. Use AWS Data Exchange to store the data in Amazon S3. Use SPICE in QuickSight Standard edition to refresh the data from Amazon S3 each hour. Use QuickSight to create a dynamic dashboard that includes forecasting and ML insights.

CConfigure the Kafka-Kinesis-Connector to publish the data to an Amazon Kinesis Data Firehose delivery stream. Configure the delivery stream to store the data in Amazon S3 with a max buffer size of 60 seconds. Use SPICE in QuickSight Enterprise edition to refresh the data from Amazon S3 each hour. Use QuickSight to create a dynamic dashboard that includes forecasting and ML insights.

DConfigure the Kafka-Kinesis-Connector to publish the data to an Amazon Kinesis Data Firehose delivery stream. Configure the delivery stream to store the data in Amazon S3 with a max buffer size of 60 seconds. Refresh the data in QuickSight Standard edition SPICE from Amazon S3 by using a scheduled AWS Lambda function. Configure the Lambda function to run every 75 minutes and to invoke the QuickSight API to create a dynamic dashboard that includes forecasting and ML insights.

Answer : C

Question 4

A company has a process that writes two datasets in CSV format to an Amazon S3 bucket every 6 hours. The company needs to join the datasets, convert the data to Apache Parquet, and store the data within another bucket for users to query using Amazon Athen

a. The data also needs to be loaded to Amazon Redshift for advanced analytics. The company needs a solution that is resilient to the failure of any individual job component and can be restarted in case of an error.

Which solution meets these requirements with the LEAST amount of operational overhead?

AUse AWS Step Functions to orchestrate an Amazon EMR cluster running Apache Spark. Use PySpark to generate data frames of the datasets in Amazon S3, transform the data, join the data, write the data back to Amazon S3, and load the data to Amazon Redshift.

BCreate an AWS Glue job using Python Shell that generates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift. Use an AWS Glue workflow to orchestrate the AWS Glue job at the desired frequency.

CUse AWS Step Functions to orchestrate the AWS Glue job. Create an AWS Glue job using Python Shell that creates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift.

DCreate an AWS Glue job using PySpark that creates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift. Use an AWS Glue workflow to orchestrate the AWS Glue job.

Answer : D

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics1.It can process datasets from various sources and formats, such as CSV and Parquet, and write them to different destinations, such as Amazon S3 and Amazon Redshift2.

AWS Glue provides two types of jobs: Spark and Python Shell.Spark jobs run on Apache Spark, a distributed processing framework that supports a wide range of data processing tasks3.Python Shell jobs run Python scripts on a managed serverless infrastructure4. Spark jobs are more suitable for complex data transformations and joins than Python Shell jobs.

AWS Glue provides dynamic frames, which are an extension of Apache Spark data frames. Dynamic frames handle schema variations and errors in the data more easily than data frames. They also provide a set of transformations that can be applied to the data, such as join, filter, map, etc.

AWS Glue provides workflows, which are directed acyclic graphs (DAGs) that orchestrate multiple ETL jobs and crawlers. Workflows can handle dependencies, retries, error handling, and concurrency for ETL jobs and crawlers. They can also be triggered by schedules or events.

By creating an AWS Glue job using PySpark that creates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift, the company can perform the required ETL tasks with a single job. By using an AWS Glue workflow to orchestrate the AWS Glue job, the company can schedule and monitor the job execution with minimal operational overhead.

Question 5

A manufacturing company is storing data from its operational systems in Amazon S3. The company's business analysts need to perform one-time queries of the data in Amazon S3 with Amazon Athen

a. The company needs to access the Athena service from the on-premises network by using a JDBC connection. The company has created a VPC. Security policies mandate that requests to AWS services cannot traverse the internet.

Which combination of steps should a data analytics specialist take to meet these requirements? (Select TWO.)

AEstablish an AWS Direct Connect connection between the on-premises network and the VPC.

BConfigure the JDBC connection to connect to Athena through Amazon API Gateway.

CConfigure the JDBC connection to use a gateway VPC endpoint for Amazon S3.

DConfigure the JDBC connection to use an interface VPC endpoint for Athena.

EDeploy Athena within a private subnet.

Answer : A, D

AWS Direct Connect is a service that establishes a dedicated network connection between your on-premises network and AWS1.It can help you reduce network costs, increase bandwidth throughput, and provide a more consistent network experience than internet-based connections1. It can also help you meet the security policy that requires requests to AWS services not to traverse the internet.

An interface VPC endpoint is a type of VPC endpoint that enables you to privately connect your VPC to supported AWS services and VPC endpoint services powered by AWS PrivateLink2.It is represented by one or more Elastic Network Interfaces (ENIs) with private IP addresses in your VPC subnets2. It can also help you meet the security policy that requires requests to AWS services not to traverse the internet.

Amazon Athena now provides an interface VPC endpoint that allows you to connect directly to Athena through an interface VPC endpoint in your VPC3.You can create an interface VPC endpoint to connect to Athena using the AWS console or AWS CLI commands4.You can also configure the JDBC connection to use the interface VPC endpoint for Athena by specifying the endpoint URL as the JDBC URL5.

Question 6

A healthcare company ingests patient data from multiple data sources and stores it in an Amazon S3 staging bucket. An AWS Glue ETL job transforms the data, which is written to an S3-based data lake to be queried using Amazon Athen

a. The company wants to match patient records even when the records do not have a common unique identifier.

Which solution meets this requirement?

AUse Amazon Macie pattern matching as part of the ETLjob

BTrain and use the AWS Glue PySpark filter class in the ETLjob

CPartition tables and use the ETL job to partition the data on patient name

DTrain and use the AWS Glue FindMatches ML transform in the ETLjob

Answer : D

Question 7

A data analytics specialist has a 50 GB data file in .csv format and wants to perform a data transformation task. The data analytics specialist is using the Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to perform the transformation. The resulting output will be used to query the data from Amazon Redshift Spectrum.

Which CTAS statement should the data analytics specialist use to provide the MOST efficient performance?

AOption A

BOption B

COption C

DOption D

Answer : B

Amazon DAS-C01 AWS Certified Data Analytics - Specialty Exam Practice Test