Free Google Professional Data Engineer Actual Exam Questions - Question 9 Discussion
was previously. You manage the daily batch MapReduce analytics jobs in Apache Hadoop. However,
the recent increase in data has meant the batch jobs are falling behind. You were asked to
recommend ways the development team could increase the responsiveness of the analytics without
increasing costs. What should you recommend they do?
It’s A, Pig simplifies coding and optimizes MapReduce without extra cluster costs.
Maybe A. Pig scripts are generally simpler and can optimize MapReduce jobs without extra hardware or cluster changes, so it might improve speed without cost hikes.
It’s B, since Spark speeds up processing a lot without needing more hardware.
It’s B because Spark handles large-scale data faster by keeping data in memory, which cuts down processing time without needing more servers or cluster expansion.
B, since Spark can run on the same cluster and usually outperforms MapReduce.
It’s B; Spark reduces I/O overhead without adding hardware costs.
It’s B because Spark’s in-memory processing is way faster for iterative tasks compared to classic MapReduce, which is disk-heavy and slower. Also, Spark can run on the existing cluster, so no extra hardware costs. Option A with Pig is just a scripting language on top of MapReduce, so it won’t speed things up much. C means more hardware, which breaks the no-cost increase rule. D makes no sense since shrinking the cluster usually slows things down, regardless of Hive rewriting. So B hits that sweet spot between speed and cost.
B Spark handles big data faster than MapReduce without costing more.