Home/databricks/Free Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Actual Exam Questions/Question 6

Free Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Actual Exam Questions - Question 6 Discussion

Question No. 6

A data engineer writes the following code to join two DataFrames df1 and df2:
df1 = spark.read.csv("sales_data.csv") # ~10 GB
df2 = spark.read.csv("product_data.csv") # ~8 MB
result = df1.join(df2, df1.product_id == df2.product_id)
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 practice exam questions

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 practice exam questions

Which join strategy will Spark use?

Select one option, then reveal solution.

Bilal A.

2026-02-16

A imo, no AQE means static plan so no broadcast despite small df2 size.

Bilal A.

2026-02-15

B. Since df2 is just 8 MB, it’s well under the default broadcast threshold (usually 10 MB), so Spark should automatically pick a broadcast join here without needing hints or AQE. The fact that df1 is huge doesn’t matter as much as this small side table being eligible for broadcast.

Bilal A.

2026-02-15

D, because without explicit hints, Spark won’t necessarily broadcast even if the table is small.

Bilal A.

2026-02-05

Probably A. Since the question doesn’t confirm AQE is enabled or that any broadcast hints were used, Spark might default to a shuffle join because it sticks to a static plan without adaptive optimization. Even though df2 is small, without explicit hints or AQE, Spark won’t necessarily broadcast it. The default behavior can vary depending on configs, but with missing info, safest bet is shuffle join here.

Rayan G.

2026-02-02

It’s B because df2 is only about 8 MB, which is well below the default 10 MB broadcast threshold, so Spark should pick a broadcast join automatically here.

Ash K.

2026-01-17

It’s A because the question states AQE isn’t enabled, so Spark sticks to a static plan and won’t automatically broadcast even if df2 is small.

Ash K.

2026-01-15

B makes sense since df2 is tiny and likely fits broadcast criteria without AQE.

Ash K.

2026-01-15

B tbh, the key here is that df2 is only about 8 MB, which is pretty small and under Spark’s default broadcast threshold (usually 10 MB). So even if AQE isn’t explicitly enabled, Spark should pick broadcast join automatically by default to avoid the shuffle on the large df1. That makes option B the most plausible. Options A, C, and D all lean on shuffle joins but don’t consider that small df2 size.

Omar V.

2026-01-15

Is AQE (Adaptive Query Execution) enabled or disabled in this scenario? The question mentions AQE but doesn’t clarify its status, which affects the join strategy choice.