Free Amazon MLS-C01 Actual Exam Questions - Question 6 Discussion
A technology startup is using complex deep neural networks and GPU compute to recommend the
company’s products to its existing customers based upon each customer’s habits and interactions.
The solution currently pulls each dataset from an Amazon S3 bucket before loading the data into a
TensorFlow model pulled from the company’s Git repository that runs locally. This job then runs for
several hours while continually outputting its progress to the same S3 bucket. The job can be paused,
restarted, and continued at any time in the event of a failure, and is run from a central queue.
Senior managers are concerned about the complexity of the solution’s resource management and
the costs involved in repeating the process regularly. They ask for the workload to be automated so it
runs once a week, starting Monday and completing by the close of business Friday.
Which architecture should be used to scale the solution at the lowest cost?
A/D? AWS Batch (A) is designed specifically for managing batch jobs with automatic retries and spot handling, which suits the weekly long job better than ECS (D). ECS is more for ongoing services than batch processing.
It’s A because AWS Batch is built for batch processing and handles spot interruptions smoothly, which fits the long-running, checkpointed job better than ECS or Fargate options.
Option A fits best since AWS Batch handles retries and spot interruptions for long jobs.
A/D? Both use Spot and can handle GPU workloads, but Batch (A) is designed for batch jobs with retry logic, while ECS (D) focuses more on container orchestration. For a long, interruptible job, Batch might still edge out.
A imo. Since the job can be paused and restarted, using AWS Batch with GPU Spot Instances fits perfectly for cost saving and scaling. Batch handles spot interruptions gracefully, so you don’t waste money on idle time. Fargate (C) doesn’t support GPUs yet, so that’s out. EC2 with Instance Scheduler (B) might keep instances running longer than needed, raising costs. ECS on spot (D) is good but less tailored for batch jobs and resuming after interruptions compared to Batch. Overall, Batch feels like the cleanest, most cost-effective approach for this weekly, GPU-heavy workload.
This question doesn’t specify whether the workload requires persistent storage or how the job handles interruptions on spot instances. Since the job can be paused and resumed, does that mean it’s designed to handle spot instance termination gracefully? Also, is the TensorFlow model containerized already, or would it need to be adapted for containers like in options A and C? Understanding if the startup prefers managed services or more control over the environment would help too, since EC2 (B) offers more control but might cost more than batch jobs on spot instances.