Question 1

[Modeling]
A data scientist is trying to improve the accuracy of a neural network classification model. The data
scientist wants to run a large hyperparameter tuning job in Amazon SageMaker.
However, previous smaller tuning jobs on the same model often ran for several weeks. The ML
specialist wants to reduce the computation time required to run the tuning job.
Which actions will MOST reduce the computation time for the hyperparameter tuning job? (Select
TWO.)

Accepted Answer

A, C

Question 2

[Modeling]
A financial services company wants to automate its loan approval process by building a machine
learning (ML) model. Each loan data point contains credit history from a third-party data source and
demographic information about the customer. Each loan approval prediction must come with a
report that contains an explanation for why the customer was approved for a loan or was denied for
a loan. The company will use Amazon SageMaker to build the model.
Which solution will meet these requirements with the LEAST development effort?

Accepted Answer

C

Question 3

[Modeling]
An insurance company is creating an application to automate car insurance claims. A machine
learning (ML) specialist used an Amazon SageMaker Object Detection - TensorFlow built-in algorithm
to train a model to detect scratches and dents in images of cars. After the model was trained, the ML
specialist noticed that the model performed better on the training dataset than on the testing
dataset.
Which approach should the ML specialist use to improve the performance of the model on the
testing data?

Accepted Answer

D

Question 4

[Modeling]
An ecommerce company has used Amazon SageMaker to deploy a factorization machines (FM)
model to suggest products for customers. The company's data science team has developed two new
models by using the TensorFlow and PyTorch deep learning frameworks. The company needs to use
A/B testing to evaluate the new models against the deployed model.
...required A/B testing setup is as follows:
• Send 70% of traffic to the FM model, 15% of traffic to the TensorFlow model, and 15% of traffic to
the Py Torch model.
• For customers who are from Europe, send all traffic to the TensorFlow model
..sh architecture can the company use to implement the required A/B testing setup?

Accepted Answer

D

Question 5

[Machine Learning Implementation and Operations]
A bank's Machine Learning team is developing an approach for credit card fraud detection The
company has a large dataset of historical data labeled as fraudulent The goal is to build a model to
take the information from new transactions and predict whether each transaction is fraudulent or
not
Which built-in Amazon SageMaker machine learning algorithm should be used for modeling this
problem?

Accepted Answer

B

Question 6

[Data Engineering]
A technology startup is using complex deep neural networks and GPU compute to recommend the
company’s products to its existing customers based upon each customer’s habits and interactions.
The solution currently pulls each dataset from an Amazon S3 bucket before loading the data into a
TensorFlow model pulled from the company’s Git repository that runs locally. This job then runs for
several hours while continually outputting its progress to the same S3 bucket. The job can be paused,
restarted, and continued at any time in the event of a failure, and is run from a central queue.
Senior managers are concerned about the complexity of the solution’s resource management and
the costs involved in repeating the process regularly. They ask for the workload to be automated so it
runs once a week, starting Monday and completing by the close of business Friday.
Which architecture should be used to scale the solution at the lowest cost?

Accepted Answer

A

Question 7

[Data Engineering]
A data science team is working with a tabular dataset that the team stores in Amazon S3. The team
wants to experiment with different feature transformations such as categorical feature encoding.
Then the team wants to visualize the resulting distribution of the dataset. After the team finds an
appropriate set of feature transformations, the team wants to automate the workflow for feature
transformations.
Which solution will meet these requirements with the MOST operational efficiency?

Accepted Answer

A

Question 8

[Modeling]
A machine learning (ML) engineer is integrating a production model with a customer metadata
repository for real-time inference. The repository is hosted in Amazon SageMaker Feature Store. The
engineer wants to retrieve only the latest version of the customer metadata record for a single
customer at a time.
Which solution will meet these requirements?

Accepted Answer

D

Question 9

[Data Engineering]
A company is launching a new product and needs to build a mechanism to monitor comments about
the company and its new product on social medi
a. The company needs to be able to evaluate the sentiment expressed in social media posts, and
visualize trends and configure alarms based on various thresholds.
The company needs to implement this solution quickly, and wants to minimize the infrastructure and
data science resources needed to evaluate the messages. The company already has a solution in
place to collect posts and store them within an Amazon S3 bucket.
What services should the data science team use to deliver this solution?

Accepted Answer

D

Question 10

[Data Engineering] During mini-batch training of a neural network for a classification problem, a Data Scientist notices that training accuracy oscillates What is the MOST likely cause of this issue?

Accepted Answer

D

Question 11

[Data Engineering]
A manufacturing company uses machine learning (ML) models to detect quality issues. The models
use images that are taken of the company's product at the end of each production step. The company
has thousands of machines at the production site that generate one image per second on average.
The company ran a successful pilot with a single manufacturing machine. For the pilot, ML specialists
used an industrial PC that ran AWS IoT Greengrass with a long-running AWS Lambda function that
uploaded the images to Amazon S3. The uploaded images invoked a Lambda function that was
written in Python to perform inference by using an Amazon SageMaker endpoint that ran a custom
model. The inference results were forwarded back to a web service that was hosted at the
production site to prevent faulty products from being shipped.
The company scaled the solution out to all manufacturing machines by installing similarly configured
industrial PCs on each production machine. However, latency for predictions increased beyond
acceptable limits. Analysis shows that the internet connection is at its capacity limit.
How can the company resolve this issue MOST cost-effectively?

Accepted Answer

D

Question 12

[Machine Learning Implementation and Operations]
A bank's Machine Learning team is developing an approach for credit card fraud detection The
company has a large dataset of historical data labeled as fraudulent The goal is to build a model to
take the information from new transactions and predict whether each transaction is fraudulent or
not
Which built-in Amazon SageMaker machine learning algorithm should be used for modeling this
problem?

Accepted Answer

B

Question 13

[Data Engineering]
A Data Scientist is building a linear regression model and will use resulting p-values to evaluate the
statistical significance of each coefficient. Upon inspection of the dataset, the Data Scientist discovers
that most of the features are normally distributed. The plot of one feature in the dataset is shown in
the graphic.
What transformation should the Data Scientist apply to satisfy the statistical assumptions of the
linear
regression model?

Accepted Answer

B

Question 14

[Data Engineering]
A machine learning (ML) specialist at a retail company must build a system to forecast the daily sales
for one of the company's stores. The company provided the ML specialist with sales data for this
store from the past 10 years. The historical dataset includes the total amount of sales on each day for
the store. Approximately 10% of the days in the historical dataset are missing sales data.
The ML specialist builds a forecasting model based on the historical dataset. The specialist discovers
that the model does not meet the performance standards that the company requires.
Which action will MOST likely improve the performance for the forecasting model?

Accepted Answer

D

Question 15

[Modeling]
A Machine Learning Specialist is designing a system for improving sales for a company. The objective
is to use the large amount of information the company has on users' behavior and product
preferences to predict which products users would like based on the users' similarity to other users.
What should the Specialist do to meet this objective?

Accepted Answer

B

Free Amazon MLS-C01 Actual Exam Questions