Free Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Actual Exam Questions - Question 3 Discussion
44 of 55. A data engineer is working on a real-time analytics pipeline using Spark Structured Streaming. They want the system to process incoming data in micro-batches at a fixed interval of 5 seconds. Which code snippet fulfills this requirement?
A Spark’s processingTime trigger is designed exactly for fixed-interval micro-batches, so A makes the most sense. B’s continuous trigger is for a different streaming mode, not fixed micro-batch intervals.
A/D? D lacks a trigger, so it won’t guarantee 5-second intervals. A explicitly sets micro-batch timing with processingTime, which fits the 5-second fixed interval best.
Continuous processing with option B isn’t what the question asks for since that’s more about low-latency continuous mode, not fixed micro-batches. Option C runs the query just once, not repeatedly every 5 seconds, so it doesn’t fit either. D just starts streaming immediately without a fixed trigger interval, so it won’t process data in consistent 5-second chunks. That really leaves A as the only one that schedules micro-batches every 5 seconds using processingTime.
A/D? D just starts streaming without any trigger interval, so it won't enforce the 5-second micro-batch timing. A explicitly uses processingTime which is designed for micro-batch intervals, making it more suitable. B is continuous mode, which is event-driven and not fixed-interval micro-batches, so it’s out. C runs only once and stops, so it doesn’t fit the real-time continuous processing requirement. So between A and D, A is definitely the better fit for fixed 5-second micro-batches.
Makes sense to rule out B and C since continuous is not micro-batch and once=True obviously only runs once. D just starts without setting the trigger interval, so it won’t guarantee the 5-second batch. A fits because trigger(processingTime="5 seconds") explicitly sets that fixed micro-batch interval. So I’d go with A for this one.
It’s A because B uses continuous mode, which isn't micro-batch style. C only runs once, and D has no fixed interval, so A is the clear choice here.
A/D? D runs micro-batches but with default timing, not fixed 5 seconds. A explicitly sets the 5-second interval, so it fits better for fixed micro-batch processing here.
A/D? D doesn’t specify any trigger, so it defaults to micro-batches but with the default interval, which isn’t 5 seconds. So that leaves A, which explicitly sets processingTime to 5 seconds, fitting the fixed interval micro-batch requirement. B is out because continuous mode is streaming without batches, and C is a one-time batch, not continuous streaming. So A seems to be the only one that matches the fixed micro-batch timing exactly.
Makes sense, continuous trigger (B) isn't for micro-batches. So definitely A.
I think the answer is A. Using trigger(processingTime="5 seconds") sets the micro-batch interval to 5 seconds, which fits the requirement for fixed interval processing. The others are either continuous or once triggers or no trigger at all.