Free Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Actual Exam Questions - Question 12 Discussion

Question No. 12
What is the difference between df.cache() and df.persist() in Spark DataFrame?
Select one option, then reveal solution.
US
TA
Tom A.
2026-02-20

A vs D? A is off since cache() can’t set storage level; it just uses a default. D fits better because cache() defaults to MEMORY_AND_DISK and persist() lets you specify other levels. That flexibility is really the key difference here. B and C mix up which method lets you choose storage levels, so not those either.

0
OA
Omar A.
2026-02-05

I’m thinking A can be ruled out since cache() doesn’t let you change the storage level, it just uses MEMORY_AND_DISK by default. Also, B is off because persist() definitely lets you pick levels, not just DISK_ONLY. Does D cover all the bases?

0
OA
Omar A.
2026-02-02

D, because cache() always uses MEMORY_AND_DISK and persist() lets you pick storage levels.

0
OA
Omar A.
2026-01-31

Maybe D, since cache() sticks to MEMORY_AND_DISK by default and persist() is for custom storage levels.

0
JT
John T.
2026-01-30

D imo, because cache() always uses MEMORY_AND_DISK by default, while persist() is flexible with storage levels. C is off since it swaps their behaviors, which doesn't match my experience.

0
JT
John T.
2026-01-30

D/C? Cache() defaults to MEMORY_AND_DISK, while persist() lets you pick storage levels, so D fits better. Option C mixes that up by swapping what each method does with storage levels.

0
IC
Irfan C.
2026-01-22

A/D? Cache does default to MEMORY_AND_DISK but persist() lets you choose other levels. Option A is off since cache() can’t set different levels, but D nails the customization part correctly.

0
SZ
Shah Z.
2026-01-21

This one’s definitely D. cache() is just persist() with MEMORY_AND_DISK by default, while persist() lets you pick storage levels like MEMORY_ONLY or DISK_ONLY.

0
SZ
Shah Z.
2026-01-17

It’s D here as well. Cache() is basically a quick way to persist with MEMORY_AND_DISK by default, which is a balanced choice for performance and fault tolerance. Persist(), on the other hand, lets you pick exactly how and where to store the data, like MEMORY_ONLY or DISK_ONLY, which is useful for tweaking based on your workload. So, cache() is like a preset option while persist() gives you customization flexibility.

0
SZ
Shah Z.
2026-01-16

D, because cache() is basically persist() with MEMORY_AND_DISK as default.

0
SZ
Shah Z.
2026-01-16

It’s D for sure. cache() is just a shorthand for persist() with MEMORY_AND_DISK, while persist() gives you full control over storage options beyond that. Makes sense to keep them separate like this.

0
SZ
Shah Z.
2026-01-15

It’s D because cache() uses MEMORY_AND_DISK by default, while persist() lets you choose different storage levels as needed. Seen similar questions before and this matches what I remember.

0