Free Google Professional-Machine-Learning-Engineer Actual Exam Questions - Question 14 Discussion

Question No. 14
You recently developed a deep learning model using Keras, and now you are experimenting with
different training strategies. First, you trained the model using a single GPU, but the training process
was too slow. Next, you distributed the training across 4 GPUs using tf.distribute.MirroredStrategy
(with no other changes), but you did not observe a decrease in training time. What should you do?
Select one option, then reveal solution.
US
VJ
Vikas J.
2026-02-21

D imo, increasing batch size usually helps utilize multiple GPUs better.

0
VJ
Vikas J.
2026-02-20

I’m thinking option A might be worth reconsidering. If the dataset isn’t properly sharded across the GPUs, each GPU could be doing the same work, so no speedup happens. Distributing the dataset explicitly makes sure each GPU processes a different slice. Just switching to MirroredStrategy alone doesn’t automatically handle how data is fed. Could that explain why training time stayed the same? What if they did increase batch size but didn’t shard the data? Would that still cause no improvement?

0
ZN
Zain N.
2026-02-18

A The question says they used MirroredStrategy but doesn't mention distributing the dataset explicitly. Without using experimental_distribute_dataset to shard the data across GPUs, each GPU might be redundantly processing the full dataset, causing no speedup. So, distributing the dataset properly is key before worrying about batch size or TPUs.

0
FU
Farhan U.
2026-02-17

Option D, increasing the batch size, makes a lot of sense here. Smaller batches per GPU can cause overhead to dominate, so bigger batches help GPUs stay busy and speed up training.

0
SY
Shah Y.
2026-02-12

A Using tf.distribute.Strategy.experimental_distribute_dataset ensures the data is properly split across GPUs; without this, the GPUs might be waiting on the same data, killing speed gains.

0
SY
Shah Y.
2026-02-12

A/D? Proper dataset distribution matters, but bigger batch sizes usually help more with multi-GPU speedup.

0
AN
Andre N.
2026-02-10

D imo, increasing batch size is key to better GPU utilization here.

0
AN
Andre N.
2026-02-10

A/B? If the dataset isn't properly distributed, the GPUs might just be waiting around. But also, a custom training loop (B) gives more control over syncing and might help optimize performance beyond defaults.

0
AN
Andre N.
2026-02-09

It’s D for me. Even if you’re using MirroredStrategy, if the batch size stays the same as single-GPU training, each GPU ends up doing less work and the overhead of syncing can kill any speedup. Increasing batch size lets you feed all GPUs properly and really speed things up. A is important too, but assuming you’ve already got the dataset distribution right, the batch size is the main bottleneck here.

0
RD
Rayan D.
2026-01-28

A/D? Distributing the dataset properly (A) is crucial so each GPU gets its own data chunk. Once that’s set, bumping batch size (D) can improve GPU utilization and speed up training further.

0
MR
Marco R.
2026-01-28

A, without distributing the dataset, each GPU does the same work, so no speedup.

0
KZ
Kevin Z.
2026-01-25

A, the dataset might not be split correctly, so GPUs aren't utilized fully.

0
AT
Ahmed T.
2026-01-22

Not distributing the dataset properly could mean all GPUs are doing the same work, so no speedup. But increasing batch size (D) might also be necessary to keep GPUs busy enough. Could it be that both A and D matter here?

0
AE
Adeel E.
2026-01-20

Maybe D here. Increasing batch size often helps better utilize multiple GPUs since small batches can bottleneck parallelism, so training won’t speed up without a bigger batch.

0
AE
Adeel E.
2026-01-19

Option A makes sense since distributing the dataset ensures each GPU gets unique data batches.

0
DV
David V.
2026-01-19

B imo, creating a custom training loop lets you control exactly how data and gradients flow across GPUs. Just switching strategy without adjusting training code might not speed things up.

0
PE
Peter E.
2026-01-17

It’s A for me. Without explicitly distributing the dataset, the MirroredStrategy won't automatically split the data across GPUs, so each GPU could still be processing the full dataset redundantly. That would explain why training time didn't improve. Increasing batch size (D) helps but won’t fix the root problem if the data isn’t properly distributed. Setting up experimental_distribute_dataset ensures each GPU gets its own slice, which should speed things up without needing to change your batch size or switch hardware.

0
IC
Irfan C.
2026-01-15

A/D? If the dataset isn’t properly distributed, the GPUs won’t be utilized fully. Also, increasing batch size (D) often helps make multi-GPU training more efficient. Either could fix the issue.

0
IC
Irfan C.
2026-01-15

Sounds like the batch size might be too small, so D.

0