Free Databricks Machine Learning Associate Actual Exam Questions - Question 9 Discussion

Question No. 9
A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a
single-node model:
They have written the following incomplete code block to use predict to score each record of Spark
DataFrame spark_df:
Machine Learning Associate practice exam questions
Which of the following lines of code can be used to complete the code block to successfully complete
the task?
Select one option, then reveal solution.
US
FK
Fahad K.
2026-02-21

Guessing B makes the most sense since mapInPandas applies a function that processes batches as Pandas DataFrames, which fits the idea of parallelizing inference on a Spark DataFrame.

0
OC
Osama C.
2026-02-13

A makes sense since unpacking columns fits Pandas UDF input style better than passing the whole DataFrame.

0
OC
Osama C.
2026-02-13

A Using *spark_df.columns to unpack columns seems right since the UDF likely expects separate column inputs, not an iterator or the whole DataFrame at once.

0
DD
David D.
2026-02-12

A imo, unpacking columns matches how Pandas UDFs usually get input data in Spark.

0
SN
Sami N.
2026-02-09

D imo, mapInPandas expects a function, not the result of calling one. So passing predict(spark_df.columns) (D) is wrong. B is cleaner since it directly uses the function as needed.

0
SN
Sami N.
2026-02-09

It’s B. mapInPandas is made for applying functions like predict that handle batches as Pandas DataFrames, so it fits the expected input/output pattern better than passing columns directly.

0
SN
Sami N.
2026-01-30

It’s A for me. Since predict is a Pandas UDF designed to work on columns, calling it with *spark_df.columns unpacks all the columns as inputs, which fits how vectorized UDFs usually get applied. The other options either try using mapInPandas incorrectly or pass the whole DataFrame in a way that won’t match the expected function signature. So calling predict directly with the columns makes sense here.

0
SN
Sami N.
2026-01-28

Option B looks best because mapInPandas expects a function that takes an iterator of Pandas DataFrames, which matches the usual signature for this kind of Pandas UDF. The other choices either call predict directly or pass columns instead of an iterator, which won’t work for this parallelized inference setup. Plus, mapInPandas handles the distribution over the Spark DataFrame partitions automatically, so it fits perfectly here.

0
AV
Amit V.
2026-01-16

This question feels kinda confusing, especially since the function is a Pandas UDF. I think it’s about using mapInPandas correctly. I’m guessing B fits best here, anyone else see it that way?

0