Free AWS MLA-C01 Actual Exam Questions - Question 12 Discussion

Question No. 12

HOTSPOT A company stores historical data in .csv files in Amazon S3. Only some of the rows and columns in the .csv files are populated. The columns are not labeled. An ML engineer needs to prepare and store the data so that the company can use the data to train ML models. Select and order the correct steps from the following list to perform this task. Each step should be selected one time or not at all. (Select and order three.) • Create an Amazon SageMaker batch transform job for data cleaning and feature engineering. • Store the resulting data back in Amazon S3. • Use Amazon Athena to infer the schemas and available columns. • Use AWS Glue crawlers to infer the schemas and available columns. • Use AWS Glue DataBrew for data cleaning and feature engineering.

US
SC
Shoaib C.
2026-02-17

I’m thinking AWS Glue crawlers (D) should come first to detect any schema info despite no headers, since Athena usually relies on some metadata or headers to infer columns. After that, DataBrew (E) seems like the best move to clean and engineer features visually without coding. Finally, storing the cleaned data back in S3 (B) wraps it up. The batch transform job feels more like model inference than prepping raw unlabeled data. This order makes sense for a smooth pipeline from raw to ready-for-ML data.

0
SC
Shoaib C.
2026-02-13

I’d go with using AWS Glue crawlers (D) first, even without headers, since they can still detect schema patterns. Then use DataBrew (E) to clean and engineer features because it’s designed for that kind of task on raw data. Finally, store the cleaned dataset back in S3 (B) so it’s ready for model training. Batch transform feels more like a later step after you have clean, labeled data, not for initial prep. Athena is more for querying directly, less about preparing or cleaning the data itself.

0
SC
Shoaib C.
2026-02-12

I think starting with D makes sense to get a schema overview since Glue crawlers can scan the data structure even without headers, then E for actual cleaning and prepping features because DataBrew is user-friendly for this, and finally B to save the cleaned dataset back to S3. Batch transform seems more for applying models than prepping raw CSVs. Athena might be less ideal here since it’s more about querying than cleaning or transforming data.

0
SC
Shoaib C.
2026-02-11

I’d pick E first since DataBrew is made for cleaning unlabeled data, then D to infer schema after cleaning, and finally B to store cleaned data. Batch transform feels less suited here since cleaning is the focus.

0
JS
James S.
2026-01-20

D, E, B seems right—use crawler for schema, DataBrew cleans, then save.

0
JS
James S.
2026-01-17

D, E, B makes sense since Glue crawlers do schema, DataBrew cleans, then save to S3.

0
JS
James S.
2026-01-16

I’d pick using Glue crawlers first since they handle schema inference better for unlabeled data (D). Then Glue DataBrew for cleaning and feature engineering (E), and finally store it back in S3 (B).

0
SM
Sohail M.
2026-01-15

Do we know if the columns are consistently in the same order across the CSV files? That might affect how well Athena or Glue can infer the schema without labels.

0