Free AWS MLA-C01 Actual Exam Questions - Question 4 Discussion
MB Apache Parquet data file to build a fraud detection model. The file includes several correlated
columns that are not required.
What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?
Option C seems simpler here since it avoids extra setup like Spark or local downloads, and you can quickly write a small script to drop columns using SageMaker processing.
C, since it avoids setting up Spark or local downloads and is simpler than EMR.
C/D? Data Wrangler is easiest if already set up; otherwise, SageMaker processing job is simpler than Spark or local.
D – Data Wrangler’s UI makes column dropping way faster than scripting stuff.
Downloading locally (A) is definitely more work for just dropping columns, so can rule that out. Between C and D, Data Wrangler (D) should be easiest if available, but if not set up yet, the SDK processing job (C) is still straightforward.
D. Using SageMaker Data Wrangler is pretty straightforward for dropping columns without needing to write or run any code, which is a big time saver. Setting up a processing job with the SDK (C) might require some coding and environment setup, so it’s not necessarily the least effort if you just want to drop columns quickly. Since the file is only 50 MB, a Data Wrangler data flow can handle it easily and visually, making it less error-prone than custom scripts or Spark jobs.
D sounds best here. Data Wrangler is designed for quick data prep and dropping columns is super easy with it—less setup than EMR or custom scripts.