Free Databricks Machine Learning Associate Actual Exam Questions - Question 13 Discussion
Pandas Function API. They have developed the train_model function, and they want to apply it to
each group of DataFrame df.
They have written the following incomplete code block:

Which of the following pieces of code can be used to fill in the above blank to complete the task?
Option B works too since mapInPandas applies a function to each partition or group and also returns a DataFrame, so it can handle the train_model function on grouped data.
It’s A, applyInPandas is built for grouped DataFrames and fits perfectly here.
It’s definitely not C or D since those are functions, not methods to apply a function. E sounds made-up or outdated. Between A and B, I’d pick A because applyInPandas specifically works on grouped DataFrames and lets you apply the function per group while returning a DataFrame. mapInPandas is more for mapping over partitions, not necessarily groups. So A fits the task better for group-specific model training with Pandas API.
Maybe A here. applyInPandas is made for grouped DataFrames and lets you run a function like train_model on each group, returning a DataFrame, which fits the question’s need better than mapInPandas.
A imo, because applyInPandas is designed specifically for applying a function to each group in a grouped DataFrame and expects the function to return a DataFrame, which fits the use case here. mapInPandas is more for row-wise map operations rather than group-level training. Also, train_model sounds like the function being applied, not the method to call on the grouped object, so D doesn’t make sense in the blank. The other options don’t quite fit Pandas API naming conventions either.
Could someone clarify if train_model returns a DataFrame though? I remember applyInPandas and mapInPandas expect functions to output a DataFrame, but they differ slightly on how they handle input/output. Just wanna be sure which one fits best here before deciding.