Free AWS AIP-C01 Actual Exam Questions - Question 9 Discussion
(image brightness) before applying normalization, as it affects model convergence.
Question- Which action should the engineer take to best understand the range and distribution of the
brightness feature values before transformation?.
Options:
Maybe B could work too since AWS Glue DataBrew can create box plots, which also show distribution and outliers. Exporting to S3 adds a step though, so it’s less direct than D.
Option D, histograms are perfect for continuous data distribution and spotting outliers fast.
It’s D because using a histogram in Data Wrangler directly shows the distribution and any outliers without needing to move data or use unrelated services like Comprehend or Clarify.
Makes sense to go with D here. Histograms in Data Wrangler are super straightforward for checking range and spotting outliers quickly. No need to switch tools or export data just for a simple visualization. Options A and C don’t really fit since sentiment analysis and bias detection aren’t relevant for understanding numeric distribution. B is just too complicated for what the question asks. So yeah, sticking with D feels right given the context.
B tbh seems like overkill since exporting to S3 and using Glue DataBrew adds unnecessary steps. D lets you check brightness distribution right inside SageMaker without extra tools.
D. Just to add, exporting to S3 and using Glue DataBrew (B) seems like extra hassle when Data Wrangler already provides a quick way to visualize distributions with histograms. Since the question is about understanding range and distribution before normalization, the built-in histogram in Data Wrangler is the most straightforward and efficient choice. Plus, A and C are clearly off-topic here—they don’t focus on numeric distribution, so it’s not worth considering them.
D makes the most sense since histograms directly reveal distribution and outliers.
Makes sense to skip A and C since sentiment analysis and bias detection don’t directly show value distribution. D fits better than B because Data Wrangler is built for quick visual checks inside SageMaker. D
D imo, async endpoints are built exactly for long-running inferences like this. The payload size fits within their limits, and running inside a private VPC is supported without issues. It’s cost-effective too compared to always-on real-time endpoints.
Anyone else think option D fits best since the inference time is so long? But how does the async endpoint handle such large payloads in a private VPC without internet? Is there a size limit or any additional config needed?