Free Databricks-Generative-AI-Engineer-Associate Actual Exam Questions - Question 6 Discussion
following business objective: answer employee HR questions using HR PDF documentation.
Which set of high level tasks should the Generative AI Engineer's system perform?
I’m ruling out B because summarizing all HR docs first seems risky—important details might get lost or generalized. Between A and D, D’s approach of chunking docs makes more sense for handling large PDFs, since it avoids overloading the LLM’s context window. A’s method of averaging embeddings per doc could miss out on specific details buried in different sections, which are often needed in HR questions. Does anyone think A’s simpler embedding comparison could actually perform better in scenarios with simpler, more straightforward queries?
C imo, option D is solid but C adds a historical layer by factoring in previous questions, which could improve relevance and personalization for employee queries. This might be especially useful if the HR questions tend to repeat or have common patterns. Also, leveraging ALS for embeddings could capture subtle relationships that chunk-based methods might miss. It feels like a more data-driven approach overall, though it might be more complex to set up compared to D’s straightforward chunking and retrieval. Still, I wouldn’t rule out C if the goal includes learning from past interaction trends
Option D, chunking docs avoids info loss unlike A’s averaging method.
B seems less practical since just summarizing docs might lose important details for specific queries.
Maybe D here. A feels off since averaging embeddings can lose detail. D’s chunking and vector store matches typical doc retrieval setups for specific answers. C sounds unnecessarily complex for this task.