Free AWS AIP-C01 Actual Exam Questions - Question 9 Discussion

Question No. 9
Scenario: An ML engineer uses Amazon SageMaker Data Wrangler to explore a numerical feature
(image brightness) before applying normalization, as it affects model convergence.
Question- Which action should the engineer take to best understand the range and distribution of the
brightness feature values before transformation?.
Options:
Select one option, then reveal solution.
US
MQ
Mason Q.
2026-02-06

Maybe B could work too since AWS Glue DataBrew can create box plots, which also show distribution and outliers. Exporting to S3 adds a step though, so it’s less direct than D.

0
MQ
Mason Q.
2026-02-06

Option D, histograms are perfect for continuous data distribution and spotting outliers fast.

0
RS
Ravi S.
2026-02-02

It’s D because using a histogram in Data Wrangler directly shows the distribution and any outliers without needing to move data or use unrelated services like Comprehend or Clarify.

0
FJ
Farhan J.
2026-01-27

Makes sense to go with D here. Histograms in Data Wrangler are super straightforward for checking range and spotting outliers quickly. No need to switch tools or export data just for a simple visualization. Options A and C don’t really fit since sentiment analysis and bias detection aren’t relevant for understanding numeric distribution. B is just too complicated for what the question asks. So yeah, sticking with D feels right given the context.

0
ZK
Zain K.
2026-01-23

B tbh seems like overkill since exporting to S3 and using Glue DataBrew adds unnecessary steps. D lets you check brightness distribution right inside SageMaker without extra tools.

0
IR
Irfan R.
2026-01-23

D. Just to add, exporting to S3 and using Glue DataBrew (B) seems like extra hassle when Data Wrangler already provides a quick way to visualize distributions with histograms. Since the question is about understanding range and distribution before normalization, the built-in histogram in Data Wrangler is the most straightforward and efficient choice. Plus, A and C are clearly off-topic here—they don’t focus on numeric distribution, so it’s not worth considering them.

0
CL
Chris L.
2026-01-21

D makes the most sense since histograms directly reveal distribution and outliers.

0
CL
Chris L.
2026-01-20

Makes sense to skip A and C since sentiment analysis and bias detection don’t directly show value distribution. D fits better than B because Data Wrangler is built for quick visual checks inside SageMaker. D

0
AB
Arjun B.
2026-01-18

D imo, async endpoints are built exactly for long-running inferences like this. The payload size fits within their limits, and running inside a private VPC is supported without issues. It’s cost-effective too compared to always-on real-time endpoints.

0
SA
Sohail A.
2026-01-14

Anyone else think option D fits best since the inference time is so long? But how does the async endpoint handle such large payloads in a private VPC without internet? Is there a size limit or any additional config needed?

0