Free NVIDIA NCA-AIIO Actual Exam Questions - Question 7 Discussion
After deployment, the company notices a significant delay in processing transactions, which impacts
their operations. Upon investigation, it’s discovered that the AI model is being heavily used during
peak business hours, leading to resource contention on the GPUs. What is the best approach to
address this issue?
D. If they have multiple GPUs or cloud instances, spreading the workload is the cleanest way to reduce delays caused by resource contention. Other options either slow processing or don’t really solve the core issue.
Makes sense to avoid A since CPUs are slower; that won’t solve the bottleneck. I’d say C isn’t ideal either because increasing batch size can add latency per transaction, which they want to avoid. So it’s really about handling the GPU resource problem. D is the only option that directly deals with spreading the load, assuming multiple GPUs or instances are available. So I agree with picking D here.
Switching to CPUs (A) would likely slow things down even more, not fix delays. D is better since distributing GPU load tackles the resource bottleneck directly. Paul L. D
D imo, because spreading the load is the only way to fix contention directly.
It’s D again, because spreading the load across GPUs actually tackles the resource contention head-on instead of just tweaking processing details. C might improve throughput but could cause more delay per transaction.
Makes sense that D is a solid option, but I’m also thinking about C. Increasing batch size can improve GPU efficiency by processing more data at once, which might reduce overall processing time. The risk is higher latency per transaction, but if they tune batch size carefully, it could help smooth out peaks without needing extra hardware. Definitely doesn’t seem like switching to CPUs (A) or disabling monitoring (B) would solve the root problem. So, besides spreading workload with D, tweaking batch size with C might be worth testing too.
A/C? CPUs would likely be slower, so A seems off. Increasing batch size (C) might help throughput but could also increase latency, making delays worse during peak hours.
D sounds like the best fix here—spreading the load across multiple GPUs should reduce the delay without cutting performance.