Pre-Summer Sale Discount Flat 70% Offer - Ends in 0d 00h 00m 00s - Coupon code: 70diswrap

Amazon Web Services MLA-C01 Dumps

Page: 1 / 24
Total 241 questions

AWS Certified Machine Learning Engineer - Associate Questions and Answers

Question 1

A company has significantly increased the amount of data that is stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than they used to take.

An ML engineer must implement a solution to optimize the data for query performance.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

A.

Configure an AWS Lambda function to split the .csv files into smaller objects in the S3 bucket.

B.

Configure an AWS Glue job to drop columns that have string type values and to save the results to the S3 bucket.

C.

Configure an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Apache Parquet format.

D.

Configure an Amazon EMR cluster to process the data that is in the S3 bucket.

Question 2

An ML engineer must choose the appropriate Amazon SageMaker algorithm to solve specific AI problems.

Select the correct SageMaker built-in algorithm from the following list for each use case. Each algorithm should be selected one time.

• Random Cut Forest (RCF) algorithm

• Semantic segmentation algorithm

• Sequence-to-Sequence (seq2seq) algorithm

as

Options:

Question 3

A company ' s ML engineer has deployed an ML model for sentiment analysis to an Amazon SageMaker AI endpoint. The ML engineer needs to explain to company stakeholders how the model makes predictions.

Which solution will provide an explanation for the model ' s predictions?

Options:

A.

Use SageMaker Model Monitor on the deployed model.

B.

Use SageMaker Clarify on the deployed model.

C.

Show the distribution of inferences from A/B testing in Amazon CloudWatch.

D.

Add a shadow endpoint. Analyze prediction differences on samples.

Question 4

A bank needs to use Amazon SageMaker AI to create an ML model to determine which customers qualify for a new product. The bank must use algorithms that SageMaker AI directly supports. The model must be explainable to the bank ' s regulators.

Which modeling approach will meet these requirements?

Options:

A.

Train the model by using the Object2Vec algorithm.

B.

Train the model by using the linear learner algorithm.

C.

Train a neural network.

D.

Train the model by using the k-means algorithm.

Question 5

An ML engineer wants to deploy a workflow that processes streaming IoT sensor data and periodically retrains ML models. The most recent model versions must be deployed to production.

Which service will meet these requirements?

Options:

A.

Amazon SageMaker Pipelines

B.

Amazon Managed Workflows for Apache Airflow (MWAA)

C.

AWS Lambda

D.

Apache Spark

Question 6

An ML engineer needs to create data ingestion pipelines and ML model deployment pipelines on AWS. All the raw data is stored in Amazon S3 buckets.

Which solution will meet these requirements?

Options:

A.

Use Amazon Data Firehose to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

B.

Use AWS Glue to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

C.

Use Amazon Redshift ML to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

D.

Use Amazon Athena to create the data ingestion pipelines. Use an Amazon SageMaker notebook to create the model deployment pipelines.

Question 7

A company uses a hybrid cloud environment. A model that is deployed on premises uses data in Amazon 53 to provide customers with a live conversational engine.

The model is using sensitive data. An ML engineer needs to implement a solution to identify and remove the sensitive data.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Deploy the model on Amazon SageMaker. Create a set of AWS Lambda functions to identify and remove the sensitive data.

B.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.

C.

Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.

D.

Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.

Question 8

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to run an on-demand workflow to monitor bias drift for models that are deployed to real-time endpoints from the application.

Which action will meet this requirement?

Options:

A.

Configure the application to invoke an AWS Lambda function that runs a SageMaker Clarify job.

B.

Invoke an AWS Lambda function to pull the sagemaker-model-monitor-analyzer built-in SageMaker image.

C.

Use AWS Glue Data Quality to monitor bias.

D.

Use SageMaker notebooks to compare the bias.

Question 9

A company wants to deploy an Amazon SageMaker AI model that can queue requests. The model needs to handle payloads of up to 1 GB that take up to 1 hour to process. The model must return an inference for each request. The model also must scale down when no requests are available to process.

Which inference option will meet these requirements?

Options:

A.

Asynchronous inference

B.

Batch transform

C.

Serverless inference

D.

Real-time inference

Question 10

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to use the central model registry to manage different versions of models in the application.

Which action will meet this requirement with the LEAST operational overhead?

Options:

A.

Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model.

B.

Use Amazon Elastic Container Registry (Amazon ECR) and unique tags for each model version.

C.

Use the SageMaker Model Registry and model groups to catalog the models.

D.

Use the SageMaker Model Registry and unique tags for each model version.

Question 11

An ML engineer is deploying a generative AI model-based customer support agent that uses Amazon SageMaker AI for inference. The customer support agent must respond to customer questions about topics such as shipping policies, refund processes, and account management. The generative AI model generates one token at a time.

Customers report dissatisfaction with how long the customer support agent takes to generate lengthy responses to questions. The ML engineer must apply an inference optimization technique to improve the performance of the customer support agent.

Which solution will meet this requirement?

Options:

A.

Compilation

B.

Speculative decoding

C.

Quantization

D.

Fast model loading

Question 12

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of data quality for the models and must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

Options:

A.

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and send alerts.

B.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and send alerts.

C.

Deploy the models by using Amazon ECS on AWS Fargate. Use Amazon EventBridge to monitor the data quality and send alerts.

D.

Deploy the models by using Amazon SageMaker AI batch transform. Use SageMaker Model Monitor to monitor the data quality and send alerts.

Question 13

A company wants to use large language models (LLMs) supported by Amazon Bedrock to develop a chat interface for internal technical documentation.

The documentation consists of dozens of text files totaling several megabytes and is updated frequently.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.

Train a new LLM in Amazon Bedrock using the documentation.

B.

Use Amazon Bedrock guardrails to integrate documentation.

C.

Fine-tune an LLM in Amazon Bedrock with the documentation.

D.

Upload the documentation to an Amazon Bedrock knowledge base and use it as context during inference.

Question 14

A company collects customer data every day. The company stores the data as compressed files in an Amazon S3 bucket that is partitioned by date. Every month, analysts download the data, process the data to check the data quality, and then upload the data to Amazon QuickSight dashboards.

An ML engineer needs to implement a solution to automatically check the data quality before the data is sent to QuickSight.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Run an AWS Glue crawler every month to update the AWS Glue Data Catalog. Use AWS Glue Data Quality rules to check the data quality.

B.

Use an AWS Glue trigger to run an AWS Glue crawler every month to update the AWS Glue Data Catalog. Create an AWS Glue job that loads the data into a PySpark DataFrame. Configure the job to apply custom functions and to evaluate the data quality.

C.

Run Python scripts on an AWS Lambda function every month to evaluate data quality. Configure the S3 bucket to invoke the Lambda function when objects are added to the S3 bucket.

D.

Configure the S3 bucket to send event notifications to an Amazon Simple Queue Service (Amazon SQS) queue when objects are uploaded. Use Amazon CloudWatch insights every month for the SQS queue to evaluate the data quality.

Question 15

A company is planning to use Amazon SageMaker to make classification ratings that are based on images. The company has 6 ТВ of training data that is stored on an Amazon FSx for NetApp ONTAP system virtual machine (SVM). The SVM is in the same VPC as SageMaker.

An ML engineer must make the training data accessible for ML models that are in the SageMaker environment.

Which solution will meet these requirements?

Options:

A.

Mount the FSx for ONTAP file system as a volume to the SageMaker Instance.

B.

Create an Amazon S3 bucket. Use Mountpoint for Amazon S3 to link the S3 bucket to the FSx for ONTAP file system.

C.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Question 16

A streaming media company uses a churn risk model to assess the churn risk of its premium tier customers. Each month, the company runs an aggregation job on individual customers’ streaming data and uploads the user engagement features to an Amazon S3 bucket. The company manually re-trains the churn risk model with the user engagement data.

The current process requires manual intervention and is time-consuming. The company needs a solution that automatically re-trains the churn prediction model with the most recent data.

Which solution will meet these requirements with the SHORTEST delay?

Options:

A.

Set up an Amazon EventBridge rule to run an Amazon Elastic Container Service (Amazon ECS) task hourly for model re-training. Configure the ECS task to use the most recent data from the S3 bucket.

B.

Configure the S3 bucket to invoke an AWS Lambda function that re-trains the model.

C.

Create a pipeline in Amazon SageMaker Pipelines for re-training. Configure an Amazon EventBridge rule to monitor S3 PutObject creation events and invoke the pipeline.

D.

Create a pipeline in Amazon SageMaker Pipelines for re-training. Configure a pipeline schedule to re-train the model.

Question 17

A company uses an ML model to recommend videos to users. The model is deployed on Amazon SageMaker AI. The model performed well initially after deployment, but the model ' s performance has degraded over time.

Which solution can the company use to identify model drift in the future?

Options:

A.

Create a monitoring job in SageMaker Model Monitor. Then create a baseline from the training dataset.

B.

Create a baseline from the training dataset. Then create a monitoring job in SageMaker Model Monitor.

C.

Create a baseline by using a built-in rule in SageMaker Clarify. Monitor the drift in Amazon CloudWatch.

D.

Retrain the model on new data. Compare the retrained model ' s performance to the original model ' s performance.

Question 18

A hospital is using an ML model to validate x-ray results. The hospital runs a nightly batch inference job. The hospital needs to produce a daily report about model data quality and model performance.

Which solution will meet these requirements?

Options:

A.

Schedule a monitoring job in Amazon SageMaker Model Monitor. Generate the monitoring results for the model and data.

B.

Create an Amazon CloudWatch dashboard that includes the metrics for processing steps in the nightly batch inference job. Compare the baseline resource metrics. Share the dashboard link.

C.

Use AWS Glue DataBrew to create a custom recipe job that uses the Numerical Statistics data quality check for the model file. Generate the results.

D.

Create a SageMaker AI pipeline that includes a QualityCheck step to run monitoring jobs. Generate the monitoring results for the model and the data.

Question 19

A company wants to use large language models (LLMs) that are supported by Amazon Bedrock to develop a chat interface for the company ' s internal technical documentation. The company stores the documentation as dozens of text files that are several megabytes in total size. The company updates the text files often.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.

Create a new LLM on Amazon Bedrock. Train the new LLM on the original dataset and the company documentation. Make the new model available in Bedrock for calls from the chat interface.

B.

Integrate the company documentation with Amazon Bedrock guardrails. Invoke the guardrails for all Amazon Bedrock calls from the chat interface.

C.

Use all the text files to fine tune a model in Amazon Bedrock. Use the fine-tuned model to process user prompts.

D.

Upload all the text files to an Amazon Bedrock knowledge base. Use the knowledge base to provide context when the chat interface makes calls to Amazon Bedrock.

Question 20

A company has deployed a model to predict the churn rate for its games by using Amazon SageMaker Studio. After the model is deployed, the company must monitor the model performance for data drift and inspect the report. Select and order the correct steps from the following list to model monitor actions. Select each step one time. (Select and order THREE.) .

Check the analysis results on the SageMaker Studio console. .

Create a Shapley Additive Explanations (SHAP) baseline for the model by using Amazon SageMaker Clarify.

Schedule an hourly model explainability monitor.

as

Options:

Question 21

A company is developing an ML model to forecast future values based on time series data. The dataset includes historical measurements collected at regular intervals and categorical features. The model needs to predict future values based on past patterns and trends.

Which algorithm and hyperparameters should the company use to develop the model?

Options:

A.

Use the Amazon SageMaker AI XGBoost algorithm. Set the scale_pos_weight hyperparameter to adjust for class imbalance.

B.

Use k-means clustering with k to specify the number of clusters.

C.

Use the Amazon SageMaker AI DeepAR algorithm with matching context length and prediction length hyperparameters.

D.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm with contamination to set the expected proportion of anomalies.

Question 22

An ML engineer needs to deploy a trained model based on a genetic algorithm. Predictions can take several minutes, and requests can include up to 100 MB of data.

Which deployment solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Deploy on EC2 Auto Scaling behind an ALB.

B.

Deploy to a SageMaker AI real-time endpoint.

C.

Deploy to a SageMaker AI Asynchronous Inference endpoint.

D.

Deploy to Amazon ECS on EC2.

Question 23

A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models.

The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data.

Which solution will provide the HIGHEST performance for data retrieval?

Options:

A.

Keep all the time-series data without partitioning in the S3 bucket. Manually move data that is older than 30 days to separate S3 buckets.

B.

Create AWS Lambda functions to copy the time-series data into separate S3 buckets. Apply S3 Lifecycle policies to archive data that is older than 30 days to S3 Glacier Flexible Retrieval.

C.

Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval.

D.

Put each day ' s time-series data into its own S3 bucket. Use S3 Lifecycle policies to archive S3 buckets that hold data that is older than 30 days to S3 Glacier Flexible Retrieval.

Question 24

An ML engineer is preparing a dataset that contains medical records to train an ML model to predict the likelihood of patients developing diseases.

The dataset contains columns for patient ID, age, medical conditions, test results, and a " Disease " target column.

How should the ML engineer configure the data to train the model?

Options:

A.

Remove the patient ID column.

B.

Remove the age column.

C.

Remove the medical conditions and test results columns.

D.

Remove the " Disease " target column.

Question 25

A company uses Amazon SageMaker for its ML workloads. The company ' s ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required.

What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?

Options:

A.

Download the file to a local workstation. Perform one-hot encoding by using a custom Python script.

B.

Create an Apache Spark job that uses a custom processing script on Amazon EMR.

C.

Create a SageMaker processing job by calling the SageMaker Python SDK.

D.

Create a data flow in SageMaker Data Wrangler. Configure a transform step.

Question 26

A company is gathering audio, video, and text data in various languages. The company needs to use a large language model (LLM) to summarize the gathered data that is in Spanish.

Which solution will meet these requirements in the LEAST amount of time?

Options:

A.

Train and deploy a model in Amazon SageMaker to convert the data into English text. Train and deploy an LLM in SageMaker to summarize the text.

B.

Use Amazon Transcribe and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Jurassic model to summarize the text.

C.

Use Amazon Rekognition and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Anthropic Claude model to summarize the text.

D.

Use Amazon Comprehend and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Stable Diffusion model to summarize the text.

Question 27

A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker AI compute costs reach a specific threshold.

Which solution will meet these requirements?

Options:

A.

Add resource tagging by editing the SageMaker AI user profile in the SageMaker AI domain. Configure AWS Cost Explorer to send an alert when the threshold is reached.

B.

Add resource tagging by editing the SageMaker AI user profile in the SageMaker AI domain. Configure AWS Budgets to send an alert when the threshold is reached.

C.

Add resource tagging by editing each user ' s IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached.

D.

Add resource tagging by editing each user ' s IAM profile. Configure AWS Budgets to send an alert when the threshold is reached.

Question 28

A company uses an Amazon SageMaker AI model for real-time inference with auto scaling enabled. During peak usage, new instances launch before existing instances are fully ready, causing inefficiencies and delays.

Which solution will optimize the scaling process without affecting response times?

Options:

A.

Change to a multi-model endpoint configuration.

B.

Integrate Amazon API Gateway and AWS Lambda to manage invocations.

C.

Decrease the scale-in cooldown period and increase the maximum instance count.

D.

Increase the cooldown period after scale-out activities.

Question 29

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

Options:

A.

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts.

B.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts.

C.

Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts.

D.

Deploy the models by using Amazon SageMaker batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts.

Question 30

A company has an application that uses different APIs to generate embeddings for input text. The company needs to implement a solution to automatically rotate the API tokens every 3 months.

Which solution will meet this requirement?

Options:

A.

Store the tokens in AWS Secrets Manager. Create an AWS Lambda function to perform the rotation.

B.

Store the tokens in AWS Systems Manager Parameter Store. Create an AWS Lambda function to perform the rotation.

C.

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS managed key to perform the rotation.

D.

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS owned key to perform the rotation.

Question 31

An ML engineer trained an ML model on Amazon SageMaker to detect automobile accidents from dosed-circuit TV footage. The ML engineer used SageMaker Data Wrangler to create a training dataset of images of accidents and non-accidents.

The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras.

Which solution will improve the model ' s accuracy in the LEAST amount of time?

Options:

A.

Collect more images from all the cameras. Use Data Wrangler to prepare a new training dataset.

B.

Recreate the training dataset by using the Data Wrangler corrupt image transform. Specify the impulse noise option.

C.

Recreate the training dataset by using the Data Wrangler enhance image contrast transform. Specify the Gamma contrast option.

D.

Recreate the training dataset by using the Data Wrangler resize image transform. Crop all images to the same size.

Question 32

An ML engineer decides to use Amazon SageMaker AI automated model tuning (AMT) for hyperparameter optimization (HPO). The ML engineer requires a tuning strategy that uses regression to slowly and sequentially select the next set of hyperparameters based on previous runs. The strategy must work across small hyperparameter ranges.

Which solution will meet these requirements?

Options:

A.

Grid search

B.

Random search

C.

Bayesian optimization

D.

Hyperband

Question 33

An ML engineer needs to use AWS services to identify and extract meaningful unique keywords from documents.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use the Natural Language Toolkit (NLTK) library on Amazon EC2 instances for text pre-processing. Use the Latent Dirichlet Allocation (LDA) algorithm to identify and extract relevant keywords.

B.

Use Amazon SageMaker and the BlazingText algorithm. Apply custom pre-processing steps for stemming and removal of stop words. Calculate term frequency-inverse document frequency (TF-IDF) scores to identify and extract relevant keywords.

C.

Store the documents in an Amazon S3 bucket. Create AWS Lambda functions to process the documents and to run Python scripts for stemming and removal of stop words. Use bigram and trigram techniques to identify and extract relevant keywords.

D.

Use Amazon Comprehend custom entity recognition and key phrase extraction to identify and extract relevant keywords.

Question 34

A company is running ML models on premises by using custom Python scripts and proprietary datasets. The company is using PyTorch. The model building requires unique domain knowledge. The company needs to move the models to AWS.

Which solution will meet these requirements with the LEAST effort?

Options:

A.

Use SageMaker built-in algorithms to train the proprietary datasets.

B.

Use SageMaker script mode and premade images for ML frameworks.

C.

Build a container on AWS that includes custom packages and a choice of ML frameworks.

D.

Purchase similar production models through AWS Marketplace.

Question 35

An ML engineer is setting up a continuous integration and continuous delivery (CI/CD) pipeline for an ML workflow in Amazon SageMaker AI. The pipeline needs to automate model re-training, testing, and deployment whenever new data is uploaded to an Amazon S3 bucket. New data files are approximately 10 GB in size. The ML engineer wants to track model versions for auditing.

Which solution will meet these requirements?

Options:

A.

Use AWS CodePipeline, Amazon S3, and AWS CodeBuild to retrain and deploy the model automatically and to track model versions.

B.

Use SageMaker Pipelines with the SageMaker Model Registry to orchestrate model training and version tracking.

C.

Create an AWS Lambda function to re-train and deploy the model. Use Amazon EventBridge to invoke the Lambda function. Reference the Lambda logs to track model versions.

D.

Use SageMaker AI notebook instances to manually re-train and deploy the model when needed. Reference AWS CloudTrail logs to track model versions.

Question 36

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records every second.

The company needs to implement a scalable solution on AWS to identify anomalous data points.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Ingest real-time data into Amazon Kinesis data streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.

B.

Ingest real-time data into Amazon Kinesis data streams. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

C.

Ingest real-time data into Apache Kafka on Amazon EC2 instances. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

D.

Send real-time data to an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Create an AWS Lambda function to consume the queue messages. Program the Lambda function to start an AWS Glue extract, transform, and load (ETL) job for batch processing and anomaly detection.

Question 37

A company has an ML model that generates text descriptions based on images that customers upload to the company ' s website. The images can be up to 50 MB in total size.

An ML engineer decides to store the images in an Amazon S3 bucket. The ML engineer must implement a processing solution that can scale to accommodate changes in demand.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Create an Amazon SageMaker batch transform job to process all the images in the S3 bucket.

B.

Create an Amazon SageMaker Asynchronous Inference endpoint and a scaling policy. Run a script to make an inference request for each image.

C.

Create an Amazon Elastic Kubernetes Service (Amazon EKS) cluster that uses Karpenter for auto scaling. Host the model on the EKS cluster. Run a script to make an inference request for each image.

D.

Create an AWS Batch job that uses an Amazon Elastic Container Service (Amazon ECS) cluster. Specify a list of images to process for each AWS Batch job.

Question 38

A company has a binary classification model in production. An ML engineer needs to develop a new version of the model.

The new model version must maximize correct predictions of positive labels and negative labels. The ML engineer must use a metric to recalibrate the model to meet these requirements.

Which metric should the ML engineer use for the model recalibration?

Options:

A.

Accuracy

B.

Precision

C.

Recall

D.

Specificity

Question 39

A company is planning to create several ML prediction models. The training data is stored in Amazon S3. The entire dataset is more than 5 ТВ in size and consists of CSV, JSON, Apache Parquet, and simple text files.

The data must be processed in several consecutive steps. The steps include complex manipulations that can take hours to finish running. Some of the processing involves natural language processing (NLP) transformations. The entire process must be automated.

Which solution will meet these requirements?

Options:

A.

Process data at each step by using Amazon SageMaker Data Wrangler. Automate the process by using Data Wrangler jobs.

B.

Use Amazon SageMaker notebooks for each data processing step. Automate the process by using Amazon EventBridge.

C.

Process data at each step by using AWS Lambda functions. Automate the process by using AWS Step Functions and Amazon EventBridge.

D.

Use Amazon SageMaker Pipelines to create a pipeline of data processing steps. Automate the pipeline by using Amazon EventBridge.

Question 40

A company plans to use Amazon SageMaker AI to build image classification models. The company has 6 TB of training data stored on Amazon FSx for NetApp ONTAP. The file system is in the same VPC as SageMaker AI.

An ML engineer must make the training data accessible to SageMaker AI training jobs.

Which solution will meet these requirements?

Options:

A.

Mount the FSx for ONTAP file system as a volume to the SageMaker AI instance.

B.

Create an Amazon S3 bucket and use Mountpoint for Amazon S3 to link the bucket to FSx for ONTAP.

C.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Question 41

A company launches a feature that predicts home prices. An ML engineer trained a regression model using the SageMaker AI XGBoost algorithm. The model performs well on training data but underperforms on real-world validation data.

Which solution will improve the validation score with the LEAST implementation effort?

Options:

A.

Create a larger training dataset with more real-world data and retrain.

B.

Increase the num_round hyperparameter.

C.

Change the eval_metric from RMSE to Error.

D.

Increase the lambda hyperparameter.

Question 42

A company runs an Amazon SageMaker AI domain in a public subnet of a newly created VPC. The network is configured properly, and ML engineers can access the SageMaker AI domain.

Recently, the company discovered suspicious traffic to the domain from a specific IP address. The company needs to block traffic from the specific IP address.

Which update to the network configuration will meet this requirement?

Options:

A.

Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

B.

Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network ACL for the subnet where the domain is located.

C.

Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

D.

Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

Question 43

A company needs to combine data from multiple sources. The company must use Amazon Redshift Serverless to query an AWS Glue Data Catalog database and underlying data that is stored in an Amazon S3 bucket.

Select and order the correct steps from the following list to meet these requirements. Select each step one time or not at all. (Select and order three.)

• Attach the IAM role to the Redshift cluster.

• Attach the IAM role to the Redshift namespace.

• Create an external database in Amazon Redshift to point to the Data Catalog schema.

• Create an external schema in Amazon Redshift to point to the Data Catalog database.

• Create an IAM role for Amazon Redshift to use to access only the S3 bucket that contains underlying data.

• Create an IAM role for Amazon Redshift to use to access the Data Catalog and the S3 bucket that contains underlying data.

as

Options:

Question 44

An ML engineer needs to use an Amazon EMR cluster to process large volumes of data in batches. Any data loss is unacceptable.

Which instance purchasing option will meet these requirements MOST cost-effectively?

Options:

A.

Run the primary node, core nodes, and task nodes on On-Demand Instances.

B.

Run the primary node, core nodes, and task nodes on Spot Instances.

C.

Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.

D.

Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.

Question 45

A company is developing an application that reads animal descriptions from user prompts and generates images based on the information in the prompts. The application reads a message from an Amazon Simple Queue Service (Amazon SQS) queue. Then the application uses Amazon Titan Image Generator on Amazon Bedrock to generate an image based on the information in the message. Finally, the application removes the message from SQS queue.

Which IAM permissions should the company assign to the application ' s IAM role? (Select TWO.)

Options:

A.

Allow the bedrock:InvokeModel action for the Amazon Titan Image Generator resource.

B.

Allow the bedrock:Get* action for the Amazon Titan Image Generator resource.

C.

Allow the sqs:ReceiveMessage action and the sqs:DeleteMessage action for the SQS queue resource.

D.

Allow the sqs:GetQueueAttributes action and the sqs:DeleteMessage action for the SQS queue resource.

E.

Allow the sagemaker:PutRecord* action for the Amazon Titan Image Generator resource.

Question 46

An ML engineer is configuring auto scaling for an inference component of a model that runs behind an Amazon SageMaker AI endpoint. The ML engineer configures SageMaker AI auto scaling with a target tracking scaling policy set to 100 invocations per model per minute. The SageMaker AI endpoint scales appropriately during normal business hours. However, the ML engineer notices that at the start of each business day, there are zero instances available to handle requests, which causes delays in processing.

The ML engineer must ensure that the SageMaker AI endpoint can handle incoming requests at the start of each business day.

Which solution will meet this requirement?

Options:

A.

Reduce the SageMaker AI auto scaling cooldown period to the minimum supported value. Add an auto scaling lifecycle hook to scale the SageMaker AI instances.

B.

Change the target metric to CPU utilization.

C.

Modify the scaling policy target value to one.

D.

Apply a step scaling policy that scales based on an Amazon CloudWatch alarm. Apply a second CloudWatch alarm and scaling policy to scale the minimum number of instances from zero to one at the start of each business day.

Question 47

An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains structured data and unstructured data. The company ' s ML engineers are assigned to specific advertisement campaigns.

The ML engineers must interact with the data through Amazon Athena and by browsing the data directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are specific to their assigned advertisement campaigns.

Which solution will meet these requirements in the MOST operationally efficient way?

Options:

A.

Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers ' campaigns.

B.

Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies.

C.

Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns.

D.

Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers ' campaigns.

Question 48

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model’s performance?

Options:

A.

Accuracy

B.

Area Under the ROC Curve (AUC)

C.

F1 score

D.

Mean absolute error (MAE)

Question 49

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a re-training job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

Options:

A.

Use an AWS Glue crawler and an AWS Glue extract, transform, and load (ETL) job to detect data drift. Use AWS Glue triggers to automate the retraining job.

B.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the re-training job.

C.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the re-training job.

D.

Use Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the re-training job.

Question 50

A construction company is using Amazon SageMaker AI to train specialized custom object detection models to identify road damage. The company uses images from multiple cameras. The images are stored as JPEG objects in an Amazon S3 bucket.

The images need to be pre-processed by using computationally intensive computer vision techniques before the images can be used in the training job. The company needs to optimize data loading and pre-processing in the training job. The solution cannot affect model performance or increase compute or storage resources.

Which solution will meet these requirements?

Options:

A.

Use SageMaker AI file mode to load and process the images in batches.

B.

Reduce the batch size of the model and increase the number of pre-processing threads.

C.

Reduce the quality of the training images in the S3 bucket.

D.

Convert the images into RecordIO format and use the lazy loading pattern.

Question 51

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

Options:

A.

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

B.

Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

C.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

D.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Question 52

A company needs to create a central catalog for all the company ' s ML models. The models are in AWS accounts where the company developed the models initially. The models are hosted in Amazon Elastic Container Registry (Amazon ECR) repositories.

Which solution will meet these requirements?

Options:

A.

Configure ECR cross-account replication for each existing ECR repository. Ensure that each model is visible in each AWS account.

B.

Create a new AWS account with a new ECR repository as the central catalog. Configure ECR cross-account replication between the initial ECR repositories and the central catalog.

C.

Use the Amazon SageMaker Model Registry to create a model group for models hosted in Amazon ECR. Create a new AWS account. In the new account, use the SageMaker Model Registry as the central catalog. Attach a cross-account resource policy to each model group in the initial AWS accounts.

D.

Use an AWS Glue Data Catalog to store the models. Run an AWS Glue crawler to migrate the models from the ECR repositories to the Data Catalog. Configure cross-account access to the Data Catalog.

Question 53

An ML engineer has developed a binary classification model outside of Amazon SageMaker. The ML engineer needs to make the model accessible to a SageMaker Canvas user for additional tuning.

The model artifacts are stored in an Amazon S3 bucket. The ML engineer and the Canvas user are part of the same SageMaker domain.

Which combination of requirements must be met so that the ML engineer can share the model with the Canvas user? (Choose two.)

Options:

A.

The ML engineer and the Canvas user must be in separate SageMaker domains.

B.

The Canvas user must have permissions to access the S3 bucket where the model artifacts are stored.

C.

The model must be registered in the SageMaker Model Registry.

D.

The ML engineer must host the model on AWS Marketplace.

E.

The ML engineer must deploy the model to a SageMaker endpoint.

Question 54

A company uses Amazon SageMakerAI to support ML workflows such as model training and deployment.

Select the correct registry from the following list to meet the requirements for each use case with the LEAST operational overhead. Each registry should be selected one or more times. (Select FOUR.)

• Amazon Elastic Container Registry (Amazon ECR)

• SageMaker Model Registry

as

Options:

Question 55

An ML engineer is tuning an image classification model that performs poorly on one of two classes. The poorly performing class represents an extremely small fraction of the training dataset.

Which solution will improve the model’s performance?

Options:

A.

Optimize for accuracy. Use image augmentation on the less common images.

B.

Optimize for F1 score. Use image augmentation on the less common images.

C.

Optimize for accuracy. Use SMOTE to generate synthetic images.

D.

Optimize for F1 score. Use SMOTE to generate synthetic images.

Question 56

A company wants to use Amazon SageMaker AI to host an ML model that runs on CPU for real-time predictions. The model has intermittent traffic during business hours and periods of no traffic after business hours.

Which hosting option will serve inference requests in the MOST cost-effective manner?

Options:

A.

Deploy the model to a real-time endpoint with scheduled auto scaling.

B.

Deploy the model to a SageMaker AI Serverless Inference endpoint with provisioned concurrency during business hours.

C.

Deploy the model to an asynchronous inference endpoint with auto scaling to zero.

D.

Deploy the model to a real-time endpoint and activate it only during business hours using AWS Lambda.

Question 57

A company wants to predict the success of advertising campaigns by considering the color scheme of each advertisement. An ML engineer is preparing data for a neural network model. The dataset includes color information as categorical data.

Which technique for feature engineering should the ML engineer use for the model?

Options:

A.

Apply label encoding to the color categories. Automatically assign each color a unique integer.

B.

Implement padding to ensure that all color feature vectors have the same length.

C.

Perform dimensionality reduction on the color categories.

D.

One-hot encode the color categories to transform the color scheme feature into a binary matrix.

Question 58

An ML engineer needs to choose the most appropriate data format for various data uses. Different teams will access the data for analytics, ML, and reporting purposes.

Select the correct data format from the following list to meet the requirements for each use case. Select each data format one time. (Select FOUR.)

as

Options:

Question 59

A company has trained an ML model that is packaged in a container. The company will integrate the model with an existing Python web application. The company needs to host the model on AWS by using Kubernetes.

The company does not want to manage the control plane and must provision the resources in a repeatable manner. The infrastructure must be provisioned by using Python.

Which solution will meet these requirements?

Options:

A.

Use AWS CloudFormation to provision Amazon EC2 instances in multiple Availability Zones. Set up a Kubernetes cluster. Host the model container on the Kubernetes cluster.

B.

Use the AWS CLI to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

C.

Use the AWS Cloud Development Kit (AWS CDK) to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

D.

Use AWS CloudFormation to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

Question 60

A company is using an AWS Lambda function to monitor the metrics from an ML model. An ML engineer needs to implement a solution to send an email message when the metrics breach a threshold.

Which solution will meet this requirement?

Options:

A.

Log the metrics from the Lambda function to AWS CloudTrail. Configure a CloudTrail trail to send the email message.

B.

Log the metrics from the Lambda function to Amazon CloudFront. Configure an Amazon CloudWatch alarm to send the email message.

C.

Log the metrics from the Lambda function to Amazon CloudWatch. Configure a CloudWatch alarm to send the email message.

D.

Log the metrics from the Lambda function to Amazon CloudWatch. Configure an Amazon CloudFront rule to send the email message.

Question 61

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company ' s main competitor.

Which solution will meet this requirement?

Options:

A.

Configure the competitor ' s name as a blocked phrase in Amazon Q Business.

B.

Configure an Amazon Q Business retriever to exclude the competitor ' s name.

C.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor ' s name.

D.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor ' s name.

Question 62

A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive.

A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database.

Which solution will meet these requirements with the LEAST implementation effort?

Options:

A.

Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

B.

Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

C.

Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

D.

Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Question 63

A company is building an Amazon SageMaker AI pipeline for an ML model. The pipeline uses distributed processing and training.

An ML engineer needs to encrypt network communication between instances that run distributed jobs. The ML engineer configures the distributed jobs to run in a private VPC.

What should the ML engineer do to meet the encryption requirement?

Options:

A.

Enable network isolation.

B.

Configure traffic encryption by using security groups.

C.

Enable inter-container traffic encryption.

D.

Enable VPC flow logs.

Question 64

An ML engineer is using Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect very high or very low machine operating temperatures compared to normal. The ML engineer sets the Severity parameter to Low and above. The ML engineer sets the Direction parameter to All.

What effect will the ML engineer observe in the anomaly detection results if the ML engineer changes the Direction parameter to Lower than expected?

Options:

A.

Increased anomaly identification frequency and increased recall

B.

Decreased anomaly identification frequency and decreased recall

C.

Increased anomaly identification frequency and decreased recall

D.

Decreased anomaly identification frequency and increased recall

Question 65

A company is building an Amazon SageMaker AI pipeline for an ML model. The pipeline uses distributed processing and distributed training.

An ML engineer needs to encrypt network communication between instances that run distributed jobs. The ML engineer configures the distributed jobs to run in a private VPC.

What should the ML engineer do to meet the encryption requirement?

Options:

A.

Enable network isolation.

B.

Configure traffic encryption by using security groups.

C.

Enable inter-container traffic encryption.

D.

Enable VPC flow logs.

Question 66

A company wants to migrate ML models from an on-premises environment to Amazon SageMaker AI. The models are based on the PyTorch algorithm. The company needs to reuse its existing custom scripts as much as possible.

Which SageMaker AI feature should the company use?

Options:

A.

SageMaker AI built-in algorithms

B.

SageMaker Canvas

C.

SageMaker JumpStart

D.

SageMaker AI script mode

Question 67

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a re-training job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

Options:

A.

Use an AWS Glue crawler and an AWS Glue extract, transform, and load (ETL) job to detect data drift. Use AWS Glue triggers to automate the re-training job.

B.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the re-training job.

C.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the re-training job.

D.

Use Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the re-training job.

Question 68

A company uses the Amazon SageMaker AI Object2Vec algorithm to train an ML model. The model performs well on training data but underperforms after deployment. The company wants to avoid overfitting the model and maintain the model ' s ability to generalize.

Which solution will meet these requirements?

Options:

A.

Decrease the early_stopping_patience hyperparameter.

B.

Increase the mini_batch_size hyperparameter.

C.

Decrease the dropout rate.

D.

Increase the number of epochs.

Question 69

A company wants to improve its customer retention ML model. The current model has 85% accuracy and a new model shows 87% accuracy in testing. The company wants to validate the new model’s performance in production.

Which solution will meet these requirements?

Options:

A.

Deploy the new model for 4 weeks across all production traffic. Monitor performance metrics and validate improvements.

B.

Run A/B testing on both models for 4 weeks. Route 20% of traffic to the new model. Monitor customer retention rates across both variants.

C.

Run both models in parallel for 4 weeks. Analyze offline predictions weekly by using historical customer data analysis.

D.

Implement alternating deployments for 4 weeks between the current model and the new model. Track performance metrics for comparison.

Question 70

An ML engineer is evaluating several ML models and must choose one model to use in production. The cost of false negative predictions by the models is much higher than the cost of false positive predictions.

Which metric finding should the ML engineer prioritize the MOST when choosing the model?

Options:

A.

Low precision

B.

High precision

C.

Low recall

D.

High recall

Question 71

A company regularly receives new training data from the vendor of an ML model. The vendor delivers cleaned and prepared data to the company ' s Amazon S3 bucket every 3-4 days.

The company has an Amazon SageMaker pipeline to retrain the model. An ML engineer needs to implement a solution to run the pipeline when new data is uploaded to the S3 bucket.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.

Create an S3 Lifecycle rule to transfer the data to the SageMaker training instance and to initiate training.

B.

Create an AWS Lambda function that scans the S3 bucket. Program the Lambda function to initiate the pipeline when new data is uploaded.

C.

Create an Amazon EventBridge rule that has an event pattern that matches the S3 upload. Configure the pipeline as the target of the rule.

D.

Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the pipeline when new data is uploaded.

Question 72

A company is creating an ML model to identify defects in a product. The company has gathered a dataset and has stored the dataset in TIFF format in Amazon S3. The dataset contains 200 images in which the most common defects are visible. The dataset also contains 1,800 images in which there is no defect visible.

An ML engineer trains the model and notices poor performance in some classes. The ML engineer identifies a class imbalance problem in the dataset.

What should the ML engineer do to solve this problem?

Options:

A.

Use a few hundred images and Amazon Rekognition Custom Labels to train a new model.

B.

Undersample the 200 images in which the most common defects are visible.

C.

Oversample the 200 images in which the most common defects are visible.

D.

Use all 2,000 images and Amazon Rekognition Custom Labels to train a new model.

Page: 1 / 24
Total 241 questions