Home/amazon aws/Free Amazon MLS-C01 Actual Exam Questions

Free Amazon MLS-C01 Actual Exam Questions

The questions for this exam were last updated on January 9, 2026

Dumps Box (DumpsBox) offers up-to-date practice exam questions for MLS-C01 certification exam which are developed and validated by Amazon – AWS subject domain experts certified in Amazon MLS-C01 . These practice questions are update regularly as we keep an eye on any recent changes in MLS-C01 syllabus, and when there is update our team quickly adjusts the questions. This commitment to providing the best quality exam prep material to certification aspirants is what makes DumpsBox.com the best certification exam prep website. On top of that, our strong, yet strictly moderated, community based feedback keeps the content clean and current. Each question has helpful community discussion that provides it extra perspective and introduces helpful resources for better exam preparation. This also saves students from other outdated practice questions or illicit exam dumps that can have adverse affects on career. Browse through our Amazon MLS-C01 exam questions and pass your exam on first try.

Question No. 1
[Modeling]
A data scientist is trying to improve the accuracy of a neural network classification model. The data
scientist wants to run a large hyperparameter tuning job in Amazon SageMaker.
However, previous smaller tuning jobs on the same model often ran for several weeks. The ML
specialist wants to reduce the computation time required to run the tuning job.
Which actions will MOST reduce the computation time for the hyperparameter tuning job? (Select
TWO.)
Select all that apply, then reveal solution.
Top comments
AU
Adeel U.
2026-02-09

Maybe A and E, since Hyperband prunes early and more parallel jobs speed up runtime.

0
AU
Adeel U.
2026-02-02

Maybe A and C, since Hyperband cuts early and fewer total jobs means less time.

0
Question No. 2
[Modeling]
A financial services company wants to automate its loan approval process by building a machine
learning (ML) model. Each loan data point contains credit history from a third-party data source and
demographic information about the customer. Each loan approval prediction must come with a
report that contains an explanation for why the customer was approved for a loan or was denied for
a loan. The company will use Amazon SageMaker to build the model.
Which solution will meet these requirements with the LEAST development effort?
Select one option, then reveal solution.
Top comments
SB
Sam B.
2026-02-14

C. SageMaker Clarify is designed exactly for explanation reports with minimal setup, unlike Lambda or CloudWatch which need more manual work. Even if attaching isn’t automatic, it’s still the simplest overall.

0
SB
Sam B.
2026-02-02

I’m thinking B could be tricky since Lambda doesn’t automatically create the reports for each prediction without extra work. So probably not the least effort. C sounds simpler overall. C

0
Question No. 3
[Modeling]
An insurance company is creating an application to automate car insurance claims. A machine
learning (ML) specialist used an Amazon SageMaker Object Detection - TensorFlow built-in algorithm
to train a model to detect scratches and dents in images of cars. After the model was trained, the ML
specialist noticed that the model performed better on the training dataset than on the testing
dataset.
Which approach should the ML specialist use to improve the performance of the model on the
testing data?
Select one option, then reveal solution.
Top comments
MI
Mason I.
2026-02-21

A imo, increasing momentum can help the model escape local minima and improve generalization a bit, though it’s less direct than regularization tweaks. Since the issue is better training performance but worse testing, overfitting is likely, so regularization methods are usually best. But if you don’t want to mess with regularization right away, tweaking momentum might smooth out training and improve test results slightly without risking underfitting. It’s not as strong a fix as D, but still worth considering alongside other hyperparameters.

0
CN
Carlos N.
2026-02-14

Good point on overfitting—D makes sense since L2 adds regularization.

0
Question No. 4
[Modeling]
An ecommerce company has used Amazon SageMaker to deploy a factorization machines (FM)
model to suggest products for customers. The company's data science team has developed two new
models by using the TensorFlow and PyTorch deep learning frameworks. The company needs to use
A/B testing to evaluate the new models against the deployed model.
...required A/B testing setup is as follows:
• Send 70% of traffic to the FM model, 15% of traffic to the TensorFlow model, and 15% of traffic to
the Py Torch model.
• For customers who are from Europe, send all traffic to the TensorFlow model
..sh architecture can the company use to implement the required A/B testing setup?
Select one option, then reveal solution.
Top comments
JU
James U.
2026-02-09

The TargetVariant header doesn’t handle geo-based routing by itself, so D wouldn’t fully cover the Europe-only TensorFlow traffic. ALB with listener rules seems better to manage that condition, making A more fitting.

0
OP
Osama P.
2026-01-27

It’s A, ALB supports path and host-based routing, so geo-based rules fit better there.

0
Question No. 5
[Machine Learning Implementation and Operations]
A company operates an amusement park. The company wants to collect, monitor, and store real-
time traffic data at several park entrances by using strategically placed cameras. The company's
security team must be able to immediately access the data for viewing. Stored data must be indexed
and must be accessible to the company's data science team.
Which solution will meet these requirements MOST cost-effectively?
Select one option, then reveal solution.
Top comments
AV
Andrew V.
2026-01-20

A imo. Kinesis Video Streams is designed for real-time video ingestion and storage with indexing, which fits the use case well. B’s HLS streaming is good for viewing, but it might not cover the indexing or easy storage access needed for data science teams. Rekognition integration isn’t mandatory for just viewing, so A’s solution to combine ingestion, indexing, and storage in one service seems cleaner and likely cheaper than adding extra components. D is out since Firehose can’t do HLS streaming, and C splits ingestion and storage awkwardly. So A looks like the most cost-effective all-in-one se

0
AV
Andrew V.
2026-01-16

Option B, since Data Firehose doesn’t support HLS streaming, so D is out.

0
Question No. 6
[Data Engineering]
A technology startup is using complex deep neural networks and GPU compute to recommend the
company’s products to its existing customers based upon each customer’s habits and interactions.
The solution currently pulls each dataset from an Amazon S3 bucket before loading the data into a
TensorFlow model pulled from the company’s Git repository that runs locally. This job then runs for
several hours while continually outputting its progress to the same S3 bucket. The job can be paused,
restarted, and continued at any time in the event of a failure, and is run from a central queue.
Senior managers are concerned about the complexity of the solution’s resource management and
the costs involved in repeating the process regularly. They ask for the workload to be automated so it
runs once a week, starting Monday and completing by the close of business Friday.
Which architecture should be used to scale the solution at the lowest cost?
Select one option, then reveal solution.
Top comments
SC
Shah C.
2026-02-22

A/D? AWS Batch (A) is designed specifically for managing batch jobs with automatic retries and spot handling, which suits the weekly long job better than ECS (D). ECS is more for ongoing services than batch processing.

0
SC
Shah C.
2026-02-01

It’s A because AWS Batch is built for batch processing and handles spot interruptions smoothly, which fits the long-running, checkpointed job better than ECS or Fargate options.

0
Question No. 7
[Data Engineering]
A data science team is working with a tabular dataset that the team stores in Amazon S3. The team
wants to experiment with different feature transformations such as categorical feature encoding.
Then the team wants to visualize the resulting distribution of the dataset. After the team finds an
appropriate set of feature transformations, the team wants to automate the workflow for feature
transformations.
Which solution will meet these requirements with the MOST operational efficiency?
Select one option, then reveal solution.
Top comments
OK
Omar K.
2026-02-22

A – Data Wrangler covers all steps smoothly without extra moving parts.

0
CN
Carlos N.
2026-02-11

A – integrated workflow in SageMaker is simpler and more efficient overall.

0
Question No. 8
[Modeling]
A machine learning (ML) engineer is integrating a production model with a customer metadata
repository for real-time inference. The repository is hosted in Amazon SageMaker Feature Store. The
engineer wants to retrieve only the latest version of the customer metadata record for a single
customer at a time.
Which solution will meet these requirements?
Select one option, then reveal solution.
Top comments
KN
Karan N.
2026-02-09

D This API is made to pull the latest record by ID, so it’s more efficient and direct than batch or Athena queries for a single customer’s latest data.

0
OJ
Omar J.
2026-01-29

D imo since GetRecord returns the latest record directly without needing extra queries.

0
Question No. 9
[Data Engineering]
A company is launching a new product and needs to build a mechanism to monitor comments about
the company and its new product on social medi
a. The company needs to be able to evaluate the sentiment expressed in social media posts, and
visualize trends and configure alarms based on various thresholds.
The company needs to implement this solution quickly, and wants to minimize the infrastructure and
data science resources needed to evaluate the messages. The company already has a solution in
place to collect posts and store them within an Amazon S3 bucket.
What services should the data science team use to deliver this solution?
Select one option, then reveal solution.
Top comments
LH
Luke H.
2026-02-15

Makes sense to use the built-in sentiment analysis API without building a model, so C.

0
MR
Marco R.
2026-02-01

Good point on C for minimal setup, but I think D might edge out because it uses CloudWatch metrics directly for alarms instead of a separate SNS step, which simplifies monitoring and alerting. D

0
Question No. 10
[Data Engineering]
During mini-batch training of a neural network for a classification problem, a Data Scientist notices
that training accuracy oscillates What is the MOST likely cause of this issue?
Select one option, then reveal solution.
Top comments
AF
Ahmed F.
2026-02-14

D The oscillation in training accuracy is classic when the learning rate is set too high, causing the model to overshoot the minima during optimization. Even without exact numbers, this is the most common cause compared to batch size or shuffling. Big batches usually stabilize training rather than cause oscillations, and an imbalanced dataset would affect accuracy differently, not typically causing back-and-forth swings in training metrics.

0
AF
Ahmed F.
2026-02-02

B Disabling shuffling means batches stay similar every epoch, which can cause accuracy to jump instead of smooth out. It’s a common but subtle issue that fits the oscillation clue well.

0
Question No. 11
[Data Engineering]
A manufacturing company uses machine learning (ML) models to detect quality issues. The models
use images that are taken of the company's product at the end of each production step. The company
has thousands of machines at the production site that generate one image per second on average.
The company ran a successful pilot with a single manufacturing machine. For the pilot, ML specialists
used an industrial PC that ran AWS IoT Greengrass with a long-running AWS Lambda function that
uploaded the images to Amazon S3. The uploaded images invoked a Lambda function that was
written in Python to perform inference by using an Amazon SageMaker endpoint that ran a custom
model. The inference results were forwarded back to a web service that was hosted at the
production site to prevent faulty products from being shipped.
The company scaled the solution out to all manufacturing machines by installing similarly configured
industrial PCs on each production machine. However, latency for predictions increased beyond
acceptable limits. Analysis shows that the internet connection is at its capacity limit.
How can the company resolve this issue MOST cost-effectively?
Select one option, then reveal solution.
Top comments
SK
Shoaib K.
2026-02-10

D/B? D reduces internet traffic, but if local PCs struggle with the model, compressing images (B) might ease bandwidth without heavy hardware upgrade. B feels like a simpler fix before full edge deployment.

0
JS
James S.
2026-01-23

Probably D, moving inference to edge cuts internet load and latency issues completely.

0
Question No. 12
[Machine Learning Implementation and Operations]
A bank's Machine Learning team is developing an approach for credit card fraud detection The
company has a large dataset of historical data labeled as fraudulent The goal is to build a model to
take the information from new transactions and predict whether each transaction is fraudulent or
not
Which built-in Amazon SageMaker machine learning algorithm should be used for modeling this
problem?
Select one option, then reveal solution.
Top comments
MR
Marco R.
2026-02-09

B/D? I get why D’s tempting since RCF is good for anomaly detection, but here they specifically mention a large labeled dataset, which suggests supervised learning. That usually points to XGBoost (B) since it’s great for classification with labeled data. If it was purely unsupervised or no labels, RCF would make more sense. Also, K-means (C) is for clustering without labels, so that’s out. Seq2seq (A) is more for sequence prediction, not really fraud detection. So I’d go with B based on the supervised nature of the task.

0
AA
Ash A.
2026-01-31

D imo, Random Cut Forest is actually designed for anomaly detection, which is pretty relevant for spotting unusual transactions like fraud. Unlike XGBoost (B), which is a supervised classifier, RCF works unsupervised and can detect outliers without needing balanced labels. So if the fraud cases are rare or the data is skewed, RCF might catch the subtle weird patterns better than a standard classifier. That said, B is still good if you trust the labels and want a solid predictive model, but RCF fits the anomaly angle more naturally here.

0
Question No. 13
[Data Engineering]
A Data Scientist is building a linear regression model and will use resulting p-values to evaluate the
statistical significance of each coefficient. Upon inspection of the dataset, the Data Scientist discovers
that most of the features are normally distributed. The plot of one feature in the dataset is shown in
the graphic.
What transformation should the Data Scientist apply to satisfy the statistical assumptions of the
linear
regression model?
Select one option, then reveal solution.
Top comments
RY
Rayan Y.
2026-02-20

B imo, logs tame right skew better than polynomial or sinusoidal here.

0
RY
Rayan Y.
2026-01-30

B. Log transformation tends to reduce right skew and helps meet linear regression assumptions better than polynomial or sinusoidal here. Exponential would just stretch the skew more.

0
Question No. 14
[Data Engineering]
A machine learning (ML) specialist at a retail company must build a system to forecast the daily sales
for one of the company's stores. The company provided the ML specialist with sales data for this
store from the past 10 years. The historical dataset includes the total amount of sales on each day for
the store. Approximately 10% of the days in the historical dataset are missing sales data.
The ML specialist builds a forecasting model based on the historical dataset. The specialist discovers
that the model does not meet the performance standards that the company requires.
Which action will MOST likely improve the performance for the forecasting model?
Select one option, then reveal solution.
Top comments
PH
Peter H.
2026-02-16

D vs C? Filling missing data with linear interpolation (D) ensures the model sees consistent input, which is usually better than just switching to weekly frequency (C) that might hide important daily patterns.

0
PH
Peter H.
2026-02-15

C changing to weekly might reduce noise and improve model stability.

0
Question No. 15
[Modeling]
A Machine Learning Specialist is designing a system for improving sales for a company. The objective
is to use the large amount of information the company has on users' behavior and product
preferences to predict which products users would like based on the users' similarity to other users.
What should the Specialist do to meet this objective?
Select one option, then reveal solution.
Top comments
SC
Shah C.
2026-02-13

B, since the question highlights using user similarity to predict preferences, collaborative filtering fits best. Content-based (A) focuses more on item features, so it’s less relevant here.

0
VT
Vikas T.
2026-02-01

It’s B for me because the focus is on leveraging similarities between users to make recommendations. Content-based filtering (A) mainly uses product features instead of user-user relationships, so it doesn’t fit as well here. Model-based filtering (C) isn’t a widely recognized term in this context, and combinative (D) seems like a mix without clear definition. Collaborative filtering directly captures user similarity to suggest products others with similar tastes liked, which matches the question’s goal perfectly.

0