Free Google Professional-Machine-Learning-Engineer Actual Exam Questions - Question 5 Discussion
should you adjust your model to ensure that it converges?
A/D? Increasing batch size (A) can reduce gradient noise, which smooths out loss oscillations, while decreasing learning rate (D) helps prevent overshooting. Both can help, but larger batches give steadier gradients.
If the loss is bouncing around, it usually points to the learning rate being too high rather than batch size issues. Lowering the learning rate (D) lets the model take smaller, more controlled steps, which helps it settle down. Increasing batch size (A) can reduce noise but might not fix oscillations caused by a too-large step. So maybe focusing on tuning the learning rate makes more sense here? Could the oscillations also be from extremely noisy gradients or just a poorly scaled learning rate?
Adeel T.: A imo, increasing batch size smooths gradients, helping reduce noise and oscillations.
D, oscillation usually means the step size is too big, so lower the learning rate.
A/D? Bigger batch sizes tend to give more stable gradient estimates, which can reduce loss oscillations. But if the learning rate is already high, no matter the batch size, the model can overshoot and oscillate. So either increasing batch size (A) or decreasing learning rate (D) could help depending on what's causing the oscillations. Since the question doesn’t specify current values, I’d say both are valid adjustments to consider to improve convergence.
A/D? Bigger batches usually give smoother gradient estimates, which can reduce oscillations. But if the learning rate is too high, lowering it (D) is still a good move to help convergence.
It’s D. Oscillations usually mean the learning rate’s too high, causing the model to overshoot minima. Lowering it helps the loss settle down more smoothly.
A/D? Increasing batch size can reduce noise causing oscillations, but lowering learning rate directly smooths updates. Both help, but lowering learning rate feels more straightforward to stop the loss bouncing around.
D imo, lowering the learning rate makes the weight updates less drastic, which usually calms down those oscillations better than just changing batch size.
B imo, smaller batches introduce more noise but can sometimes help escape sharp oscillation patterns by adding randomness. Plus, decreasing batch size is often easier than tuning the learning rate.
A imo, bigger batches give more stable gradients, which can reduce those annoying swings.
A vs D? Increasing batch size (A) can reduce variance in gradient estimates, which might help stabilize loss oscillations. D also sounds right, but bigger batch sizes usually mean smoother updates.
It’s D. A high learning rate often causes the loss to oscillate, so lowering it helps the model settle into minima more smoothly without jumping around too much.
This happens when the learning rate is too high, causing the loss to jump around rather than settle down. So, D makes sense - decreasing the learning rate should help convergence. Increasing batch size (A) might smooth gradients but won't fix oscillation caused by a large step size.