Free Google Professional Data Engineer Actual Exam Questions - Question 8 Discussion

Question No. 8
Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use
Hadoop jobs they have already created and minimize the management of the cluster as much as
possible. They also want to be able to persist data beyond the life of the cluster. What should you do?
Select one option, then reveal solution.
US
RI
Rayan I.
2026-02-17

What about C? Setting up Hadoop on Compute Engine with persistent disks keeps the environment exactly the same, so no job changes needed. But it might mean more cluster management compared to Dataproc options.

0
OP
Osama P.
2026-02-16

It’s D. Using Cloud Storage with Dataproc means data sticks around even if the cluster’s deleted, plus it cuts down on management since you’re not handling disks directly like in B or C.

0
OP
Osama P.
2026-02-12

B imo. Using Dataproc with persistent disks basically keeps the familiar HDFS environment, so your existing jobs won’t break due to storage changes. It also reduces cluster management since Dataproc handles most of it. D could work if your jobs don’t depend on HDFS specifics, but persistent disks give you a safer bet for compatibility and data persistence beyond cluster life. Running Hadoop on Compute Engine (C or E) means more overhead managing the cluster yourself, which goes against the goal of minimizing management.

0
OP
Osama P.
2026-01-21

Maybe D, since Cloud Dataproc with Cloud Storage lets you keep your data separate from the cluster, so you don’t lose anything when the cluster shuts down. Plus, it supports reusing existing Hadoop jobs easily.

0
OP
Osama P.
2026-01-19

B imo, since using persistent disks for HDFS on a Dataproc cluster means you can keep the data stored even when the cluster is gone. It’s still managed by Dataproc, so less overhead than running your own Hadoop cluster on Compute Engine. Plus, you can reuse existing Hadoop jobs pretty easily without rewriting them for Dataflow or anything else. D is good, but relying solely on Cloud Storage might cause some minor compatibility issues with certain Hadoop jobs expecting HDFS semantics. Persistent disks feel like a safer middle ground here.

0
MF
Mason F.
2026-01-16

It’s D because Cloud Dataproc handles most cluster management automatically, and using the Cloud Storage connector means data stays safe beyond cluster shutdowns. Other options either require more manual management or don’t persist data well.

0
BA
Bilal A.
2026-01-15

It’s D since Cloud Storage persistence is key, and A misses Hadoop reuse.

0