Free AWS MLA-C01 Actual Exam Questions - Question 6 Discussion
stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that
minimizes processing time for the data.
Which file format will meet these requirements?
Parquet files are definitely optimized for complex data and fast processing, so D still makes sense here. CSV and JSON just don’t handle complexity or speed as well. D
A vs D? Parquet usually beats CSV for complex data, but codec support might mess things up.
D. Parquet is built for complex data and columnar storage, which usually means less I/O and better performance. Even if compression support varies, Parquet’s structure generally speeds up processing compared to CSV or JSON formats. The other options like CSV or JSON are either row-based or not optimized for complex structures, so they’d usually be slower to handle in SageMaker Canvas. From a purely performance and complexity standpoint, Parquet makes the most sense here.
D, Parquet is designed for efficient columnar storage and faster reads, which helps with complex data and minimizes processing time in SageMaker Canvas compared to JSON or CSV formats.
D/B? Parquet (D) is built for complex data and speed, but JSONL (B) can handle nested objects better if Parquet support is spotty with Canvas. Still, Parquet’s compression and columnar format usually win here.
D/C? Parquet (D) is great for speed and complex data, but if compression support in Canvas is limited, gzipped JSON (C) might still work better. Depends on how Canvas handles compression and nested structures.
D imo, Parquet’s columnar format really speeds up processing compared to JSON or CSV.
Option D still seems best because Parquet’s columnar layout really cuts down on how much data gets read and processed, which speeds things up. CSV and JSON just can’t match that efficiency, especially with complex data structures. Plus, Parquet handles nested data better compared to plain JSON files. Even if there are some codec or version specifics, Parquet is widely supported and designed for exactly this kind of use case where performance matters. The other formats just don’t minimize processing time as effectively for complex S3 data in SageMaker Canvas.
CSV and JSON formats aren’t great for complex structures or speed since they’re row-based and can get bulky. Parquet’s columnar format definitely helps with faster processing. But does Canvas handle any Parquet compression or just Snappy?
D is the best choice here. Apache Parquet is columnar and optimized for complex data, so it speeds up processing compared to CSV or JSON formats.