Free Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Actual Exam Questions - Question 9 Discussion
upstream team on a nightly basis. The data is stored in a directory structure with a base path of
"/path/events/data". The upstream team drops daily data into the underlying subdirectories
following the convention year/month/day.
A few examples of the directory structure are:

Which of the following code snippets will read all the data within the directory structure?
C imo, wildcard should catch all year folders without depending on Spark version.
Option C makes sense too since using a wildcard like /path/events/data/* should grab all first-level directories (the year folders) and Spark will then read all files inside those folders. It doesn’t rely on recursiveFileLookup, so it works regardless of Spark version. A and D just read the base path without recursion, so they probably miss nested files. B is good if the Spark version supports recursiveFileLookup, but since that info’s missing, C feels like a safer bet.
Maybe C works because the wildcard should match all first-level directories, grabbing all year folders, then Spark can read inside those. B depends on Spark version, which isn’t given.
Actually, I’d rule out A and D because they don’t specify recursive reading, so they’d likely only load files directly under /path/events/data, missing nested folders. C uses a wildcard, but it only expands one level, so it won’t cover the full year/month/day structure. B is the only one explicitly telling Spark to look recursively through all subdirectories, which fits the question’s scenario perfectly—assuming the Spark version supports it.
I think option B is the best here since it explicitly tells Spark to look into all nested folders recursively, which fits the year/month/day structure. Option C with the wildcard might only cover one level of subfolders and miss deeper ones, so it’s less reliable. A and D just read the top-level directory and won’t pick up files inside those nested folders. So B makes the most sense to cover everything without missing data.
B This option explicitly enables recursive lookup, which is necessary to read deeply nested folders like year/month/day. The others either don't recurse or just target the first level.
C imo, using the wildcard * should read all immediate subfolders under the base path without needing recursiveFileLookup. It’s simpler if the nesting isn’t too deep or complicated.
B This option explicitly sets recursiveFileLookup, so it goes through all nested folders, unlike C which stops at one level deep, and A/D which only read the base directory.
B/C? B is the only one that explicitly enables recursive reading, so it makes sense for nested year/month/day folders. C uses a wildcard but only one level, so it might miss deeper subfolders. A and D are basically the same and won’t go into subdirectories without recursiveFileLookup true. So B seems like the cleanest fit here to catch all nested Parquet files.
B looks right since recursiveFileLookup lets Spark read all nested folders. The others won’t go deeper than the base folder.