A music streaming company is building a pipeline to extract features. The company wants to store the features for offline model training and online inference. The company wants to track feature history and to give the company's data science teams access to the features.
Which solution will meet these requirements with the MOST operational efficiency?
Answer : A
Amazon SageMaker Feature Store is a fully managed, purpose-built repository for storing, updating, and sharing machine learning features. It supports both online and offline stores for features, allowing real-time access for online inference and batch access for offline model training. It also tracks feature history, making it easier for data scientists to work with and access relevant feature sets.
This solution provides the necessary storage and access capabilities with high operational efficiency by managing feature history and enabling controlled access through IAM roles, making it a comprehensive choice for the company's requirements.
A manufacturing company stores production volume data in a PostgreSQL database.
The company needs an end-to-end solution that will give business analysts the ability to prepare data for processing and to predict future production volume based the previous year's production volume. The solution must not require the company to have coding knowledge.
Which solution will meet these requirements with the LEAST effort?
Answer : B
AWS Glue DataBrew provides a no-code data preparation interface that enables business analysts to clean and transform data from various sources, including PostgreSQL databases, without needing programming skills. Amazon SageMaker Canvas offers a no-code interface for machine learning model training and predictions, allowing users to predict future production volume without coding expertise.
This solution meets the requirements efficiently by providing end-to-end data preparation and prediction modeling without requiring coding.
A data scientist wants to improve the fit of a machine learning (ML) model that predicts house prices. The data scientist makes a first attempt to fit the model, but the fitted model has poor accuracy on both the training dataset and the test dataset.
Which steps must the data scientist take to improve model accuracy? (Select THREE.)
Answer : B, C, E
When a model shows poor accuracy on both the training and test datasets, it often indicates underfitting. To improve the model's accuracy, the data scientist can:
Decrease regularization: Excessive regularization can lead to underfitting by constraining the model too much. Reducing it allows the model to capture more complexity.
Increase the number of training examples: Adding more data can help the model learn better and generalize well, especially if the dataset was previously insufficient.
Increase the number of model features: Adding relevant features can help the model capture more predictive information, thus potentially improving accuracy.
Options A, D, and F would either reduce the complexity or impact the generalization capability, which is not desirable in the case of underfitting.
A company distributes an online multiple-choice survey to several thousand people. Respondents to the survey can select multiple options for each question.
A machine learning (ML) engineer needs to comprehensively represent every response from all respondents in a dataset. The ML engineer will use the dataset to train a logistic regression model.
Which solution will meet these requirements?
Answer : A
In cases where survey questions allow multiple choices per question, one-hot encoding is an effective way to represent responses as binary features. Each possible option for each question is transformed into a separate binary column (1 if selected, 0 if not), providing a comprehensive and machine-readable format that logistic regression models can interpret effectively.
This approach ensures that each respondent's selections are accurately captured in a format suitable for training, offering a straightforward representation for multi-choice responses.
A machine learning (ML) specialist is building a credit score model for a financial institution. The ML specialist has collected data for the previous 3 years of transactions and third-party metadata that is related to the transactions.
After the ML specialist builds the initial model, the ML specialist discovers that the model has low accuracy for both the training data and the test dat
a. The ML specialist needs to improve the accuracy of the model.
Which solutions will meet this requirement? (Select TWO.)
Answer : A, C
For a model with low accuracy on both training and testing datasets, the following two strategies are effective:
Increase the number of passes and perform hyperparameter tuning: This approach allows the model to better learn from the existing data and improve performance through optimized hyperparameters.
Add domain-specific features and use more complex models: Adding relevant features that capture additional information from domain knowledge and using more complex model architectures can help the model capture patterns better, potentially improving accuracy.
Options B, D, and E would either reduce feature complexity or training data volume, which is less likely to improve performance when accuracy is low on both training and testing sets.
An online delivery company wants to choose the fastest courier for each delivery at the moment an order is placed. The company wants to implement this feature for existing users and new users of its application. Data scientists have trained separate models with XGBoost for this purpose, and the models are stored in Amazon S3. There is one model fof each city where the company operates.
The engineers are hosting these models in Amazon EC2 for responding to the web client requests, with one instance for each model, but the instances have only a 5% utilization in CPU and memory, ....operation engineers want to avoid managing unnecessary resources.
Which solution will enable the company to achieve its goal with the LEAST operational overhead?
Answer : B
The best solution for this scenario is to use a multi-model endpoint in Amazon SageMaker, which allows hosting multiple models on the same endpoint and invoking them dynamically at runtime. This way, the company can reduce the operational overhead of managing multiple EC2 instances and model servers, and leverage the scalability, security, and performance of SageMaker hosting services. By using a multi-model endpoint, the company can also save on hosting costs by improving endpoint utilization and paying only for the models that are loaded in memory and the API calls that are made. To use a multi-model endpoint, the company needs to prepare a Docker container based on the open-source multi-model server, which is a framework-agnostic library that supports loading and serving multiple models from Amazon S3. The company can then create a multi-model endpoint in SageMaker, pointing to the S3 bucket containing all the models, and invoke the endpoint from the web client at runtime, specifying the TargetModel parameter according to the city of each request. This solution also enables the company to add or remove models from the S3 bucket without redeploying the endpoint, and to use different versions of the same model for different cities if needed.References:
Use Docker containers to build models
Host multiple models in one container behind one endpoint
Multi-model endpoints using Scikit Learn
Multi-model endpoints using XGBoost
An engraving company wants to automate its quality control process for plaques. The company performs the process before mailing each customized plaque to a customer. The company has created an Amazon S3 bucket that contains images of defects that should cause a plaque to be rejected. Low-confidence predictions must be sent to an internal team of reviewers who are using Amazon Augmented Al (Amazon A2I).
Which solution will meet these requirements?
Answer : B
Amazon Rekognition is a service that provides computer vision capabilities for image and video analysis, such as object, scene, and activity detection, face and text recognition, and custom label detection. Amazon Rekognition can be used to automate the quality control process for plaques by comparing the images of the plaques with the images of defects in the Amazon S3 bucket and returning a confidence score for each defect. Amazon A2I is a service that enables human review of machine learning predictions, such as low-confidence predictions from Amazon Rekognition. Amazon A2I can be integrated with a private workforce option, which allows the engraving company to use its own internal team of reviewers to manually inspect the plaques that are flagged by Amazon Rekognition. This solution meets the requirements of automating the quality control process, sending low-confidence predictions to an internal team of reviewers, and using Amazon A2I for manual review.References:
1: Amazon Rekognition documentation
3: Amazon Rekognition Custom Labels documentation
4: Amazon A2I Private Workforce documentation