Free NVIDIA NCP-AIO Actual Exam Questions - Question 10 Discussion

Question No. 10
Your Kubernetes cluster is running a mixture of AI training and inference workloads. You want to
ensure that inference services have higher priority over training jobs during peak resource usage
times.
How would you configure Kubernetes to prioritize inference workloads?
Select one option, then reveal solution.
US
RU
Ryan U.
2026-02-19

D/C? While D covers priority and quotas well, C could help inference scale automatically during peaks, which might be useful alongside priority settings. Just relying on replicas or namespaces doesn’t guarantee prioritization.

0
RU
Ryan U.
2026-02-18

D, since PriorityClasses help ensure inference pods get scheduled before training ones.

0
RU
Ryan U.
2026-02-17

Option D, since PriorityClasses explicitly prioritize pods during scheduling conflicts.

0
RU
Ryan U.
2026-02-16

Probably D makes the most sense here. Using PriorityClasses lets you explicitly control which pods get scheduled first when resources are tight, and ResourceQuotas can help prevent training jobs from hogging everything. The other options don’t actually enforce priority during contention. Increasing replicas (A) won’t guarantee resource priority under load, and HPA (C) helps scale but doesn’t prevent resource starvation. Separating namespaces (B) alone won’t ensure inference workloads get priority either. So D is the only option that’s built for controlling workload priority at the Kubernetes s

0
RU
Ryan U.
2026-02-13

It’s D because PriorityClasses directly control pod scheduling priority, not just scaling or namespaces. The other options don’t actually enforce priority during resource contention.

0
RU
Ryan U.
2026-01-30

D imo, it’s the only one that explicitly controls priority and resource limits together.

0
RU
Ryan U.
2026-01-30

It’s D for sure. PriorityClasses let you define exactly which workloads get preference when resources run low, and ResourceQuotas keep things from getting out of control. The other options don’t truly guarantee inference jobs get the resources they need during contention. Scaling or namespaces can help organize or boost capacity but won’t enforce priority like D does. So if you want inference to always come first under pressure, D is the only way to go.

0
IR
Irfan R.
2026-01-28

B/C? Increasing replicas (A) doesn't guarantee priority under resource crunch, just spreads resources thinner. HPA (C) helps inference scale based on load, so it adapts better at peak times. While PriorityClasses and ResourceQuotas (D) manage priority, they can cause delays with pod eviction, impacting latency-sensitive inference. Separating namespaces (B) might help isolate resources and apply limits more cleanly, but alone it’s not enough. Combining C and B could be a good practical approach for smoother scaling and resource isolation without sudden preemption.

0
PE
Peter E.
2026-01-23

What if inference replicas can't scale quickly enough? Isn't D still better since it ensures inference gets resources first, not just relies on scaling or namespaces?

0
PE
Peter E.
2026-01-22

D, since you want clear priority and preemption, not just scaling or separation.

0
AR
Arjun R.
2026-01-19

D. PriorityClasses allow actual preemption, which is better than just scaling or isolating namespaces. ResourceQuotas also help prevent training jobs from hogging resources when inference needs them more.

0
OG
Omar G.
2026-01-18

Guessing D, since PriorityClasses let you preempt lower-priority jobs effectively.

0
OG
Omar G.
2026-01-18

D, because PriorityClasses allow real preemption rather than just resource limits.

0
OG
Omar G.
2026-01-18

Option D makes the most sense because it directly handles workload priority instead of just scaling or separating namespaces. You can create custom PriorityClasses for inference so they preempt training jobs when resources get tight. ResourceQuotas help enforce limits so training jobs don’t hog everything. Options A and C only focus on scaling, which won’t guarantee priority. B might isolate resources but won’t ensure inference workloads get priority when contention happens. So, using both PriorityClasses and ResourceQuotas gives you a clear and enforceable way to prioritize inference jobs dur

0
NA
Noah A.
2026-01-16

D imo, it’s the only option that directly controls workload priority, not just scaling or namespaces.

0
NA
Noah A.
2026-01-15

D imo, setting PriorityClasses and ResourceQuotas is the straightforward way to prioritize inference workloads properly.

0