Free Google Professional Data Engineer Actual Exam Questions
Dumps Box (DumpsBox) offers up-to-date practice exam questions for Professional-Data-Engineer certification exam which are developed and validated by Google subject domain experts certified in Google Professional Data Engineer . These practice questions are update regularly as we keep an eye on any recent changes in Professional-Data-Engineer syllabus, and when there is update our team quickly adjusts the questions. This commitment to providing the best quality exam prep material to certification aspirants is what makes DumpsBox.com the best certification exam prep website. On top of that, our strong, yet strictly moderated, community based feedback keeps the content clean and current. Each question has helpful community discussion that provides it extra perspective and introduces helpful resources for better exam preparation. This also saves students from other outdated practice questions or illicit exam dumps that can have adverse affects on career. Browse through our Google Professional Data Engineer exam questions and pass your exam on first try.
directed acyclic graph (DAG) relies on a third-party service. You want to be notified when the task
does not succeed. What should you do?
D vs A? Submitting a job feels like more than just viewing since it triggers an action. Viewers usually don’t have that kind of permission. Listing jobs is more like reading info already there, which fits a viewer’s role better. So I’d say D makes more sense from a permissions standpoint.
D seems right because listing jobs doesn’t modify anything, fitting a viewer’s read-only role. The others require more control, so they’re likely off-limits for viewers.
conduct data transformations at scale, but your pipelines are taking over twelve hours to run. To
expedite development and pipeline run time, you want to use a serverless tool and SQL syntax You
have already moved your raw data into Cloud Storage How should you build the pipeline on Google
Cloud while meeting speed and processing requirements?
D, new subscription avoids compatibility issues and keeps data safe.
A/C? Draining the old pipeline (A) ensures no data loss by finishing all work, but creating a new pipeline on the same subscription (C) also avoids losing messages since Pub/Sub keeps them until acknowledged.
data from BigQuery. The reference data is small enough to fit in memory on a single worker. The
pipeline should write enriched results to BigQuery for analysis. Which job type and transforms
should this pipeline use?
Not B, autoscaling usually manages pods, not the underlying specialized nodes with GPUs and local SSDs, so it might not guarantee the exact hardware specs needed for each node.
It’s C. Using Cloud Build with Terraform lets you manage both infrastructure and deployment as code, which fits the need to have GPUs, local SSDs, and specific bandwidth guaranteed on the nodes. This approach also ensures you always launch containers with the latest configurations since Cloud Build can trigger builds and deployments automatically. Options A and B don’t clearly handle the infrastructure setup for GPUs and local SSDs as well, while D is off because Dataflow isn’t designed for managing GKE clusters or container deployment. So C covers everything more cleanly.
existing initialization action. Company security policies require that Cloud Dataproc nodes do not
have access to the Internet so public initialization actions cannot fetch resources. What should you
do?
A/B? I’m thinking B adds unnecessary storage and update costs for something that can be done on the fly with a view, so A feels more cost-effective unless the app just can’t use views at all.
Adding a FullName column like in B feels unnecessary since it duplicates data and increases storage costs. A view in A keeps data normalized and avoids extra storage charges.
time. This is then loaded into BigQuery. Analysts in your company want to query the tracking data in
BigQuery to analyze geospatial trends in the lifecycle of a package. The table was originally created
with ingest-date partitioning. Over time, the query processing time has increased. You need to
implement a change that would improve query performance in BigQuery. What should you do?
C. This makes sense because giving viewer access on the shared dataset plus editor rights only on each analyst’s own dataset keeps their workspaces private and the shared data read-only.
Option C sounds right; individual datasets keep their tables private and shared one is read-only.
the same dataset. You need to keep the costs of data sharing low and ensure that the data is current.
Which solution should you choose?
Option B seems off because Bigtable doesn’t support native JDBC drivers, so that contradicts the question requirements. Option C is similar but also includes Bigtable, which complicates things with JDBC. Between A and D, A’s Cloud Spanner offers global scaling natively, which fits the long-term goal better. But for cost optimization early on, D’s zonal Cloud SQL is cheaper and easier to manage. Since the question prioritizes cost first and global presence after funding, D matches the phased approach without adding complexity that JDBC might not handle well.
It’s A because Cloud Spanner natively supports JDBC and scales globally after funding, unlike Cloud SQL which may require more complex migration or doesn't scale as smoothly across regions.
A. Seek with a timestamp is the only straightforward way to rewind and reprocess messages from exactly two days ago. E also works since Snapshots let you restore a consistent state from that time.
Option A makes sense since Seek rewinds messages by timestamp; E captures state to replay.
What about C? Setting up Hadoop on Compute Engine with persistent disks keeps the environment exactly the same, so no job changes needed. But it might mean more cluster management compared to Dataproc options.
It’s D. Using Cloud Storage with Dataproc means data sticks around even if the cluster’s deleted, plus it cuts down on management since you’re not handling disks directly like in B or C.
An organization maintains a Google BigQuery dataset that contains tables with user-level data. They want to expose aggregates of this data to other Google Cloud projects, while still controlling access to the user-level data. Additionally, they need to minimize their overall storage cost and ensure the analysis cost for other projects is assigned to those projects. What should they do?
It’s A, Pig simplifies coding and optimizes MapReduce without extra cluster costs.
Maybe A. Pig scripts are generally simpler and can optimize MapReduce jobs without extra hardware or cluster changes, so it might improve speed without cost hikes.
serve the trading application, you need to access only the most recent stock prices that are streaming
in How should you design your row key and tables to ensure that you can access the data with the
most simple query?
This one feels like B for me too. Creating a small set of charts with filters makes sense since you can avoid the explosion of visuals in A or C and skip the heavy lifting of building a custom app in D. Plus, filters let users drill down on geography or date range without needing new visuals each month. As long as the data source supports quick queries, this should keep load times manageable and reports fresh without extra maintenance.
Not C, spreadsheets won’t scale well with 50,000 installations and frequent updates. B sounds better since filters keep it dynamic without overwhelming the system.
need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should
you configure the BigQuery table?
Maybe A, since linear regression is simple and uses minimal resources, making it ideal for a single limited VM. Neural nets would likely overkill for just predicting prices here.
It’s A. Since it’s a regression problem and we want something lightweight, linear regression fits best. Neural networks (C and D) are too heavy, and logistic (B) is for classification, not prices.
need to be identified You want to cleanse the data n near-reel time before running it through Al
models What should you do?
C. Keeping data in Avro format on Cloud Storage works well since it’s accessible by both Spark and BigQuery without needing extra infrastructure like HDFS. It’s simpler than managing Dataproc storage.
Option C makes sense because storing data as Avro in Cloud Storage lets both BigQuery and Spark access it without depending on a specific cluster. It’s more flexible than tying data to HDFS on Dataproc.
properties option. The format for the option is: file_prefix:property=_____.
B. Format-preserving encryption keeps the email structure so joining still works, unlike masking options that break the join key. This fits the need to hide PII but preserve joinability.
B Using format-preserving encryption keeps the email format intact so joining works, unlike masking which breaks join keys. Just need to handle key security carefully.
names and addresses. You need to share the customer data with your data analytics and consumer
support teams securely. The data analytics team needs to access the data of all the customers, but
must not be able to access the sensitive dat
a. The consumer support team needs access to all data columns, but must not be able to access
customers that no longer have active contracts. You enforced these requirements by using an
authorized dataset and policy tags After implementing these steps, the data analytics team reports
that they still have access to the sensitive columns. You need to ensure that the data analytics team
does not have access to restricted data What should you do?
Choose 2 answers
D Exporting logs with an aggregated sink to one project makes it simpler to control access strictly for audit personnel and ensures compliance uniformly across all projects.
Makes sense to go with D since aggregated export sinks collect logs from all projects in one place, making it easier to control access and meet retention rules. D.
D sounds simplest and least time-consuming compared to scripting or Dataflow.
D imo, using the RDB backup and gsutil is straightforward and aligns with common Redis migration practices. It avoids the complexity of writing custom scripts or jobs like in B or C.