Question 1

A Generative Al Engineer is developing a RAG system for their company to perform internal
document Q&A for structured HR policies, but the answers returned are frequently incomplete and
unstructured It seems that the retriever is not returning all relevant context The Generative Al
Engineer has experimented with different embedding and response generating LLMs but that did not
improve results.
Which TWO options could be used to improve the response quality?
Choose 2 answers

Accepted Answer

A, B

Explanation: The core problem is that the retriever is not supplying sufficient context to the Large Language Model (LLM), resulting in incomplete answers. The documents are structured (HR policies), which is a key detail. (B) Increase the document chunk size: A larger chunk size ensures that more related information and context are contained within a single retrieved block. This directly addresses the issue of "incomplete context" by providing the generation model with a more comprehensive passage to formulate its answer. (A) Add the section header as a prefix to chunks: For structured documents like HR policies, section headers provide vital context. Prefixing chunks with their corresponding headers (e.g., "Leave Policies: Annual Leave Accrual") creates more semantically rich embeddings. This helps the retriever more accurately identify and fetch the most relevant sections of the document, even if the user's query doesn't use the exact phrasing from the policy text.

Question 2

A Generative AI Engineer I using the code below to test setting up a vector store: https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/Databricks-Generative-AI-Engineer-Associate/page_24_img_1.jpg Assuming they intend to use Databricks managed embeddings with the default embedding model, what should be the next logical function call?

Accepted Answer

B

Explanation: After initializing the VectorSearchClient, the next logical step to set up a new vector store is to create an index. The createdeltasyncindex() function is used to create a Vector Search index that automatically syncs with a source Delta Table. This is a common and robust pattern for production use cases where the source data may change over time. The function call would include parameters specifying the endpoint name, the source Delta table, the primary key, and the columns to be indexed, including the one containing the embeddings.

Question 3

A Generative AI Engineer is testing a simple prompt template in LangChain using the code below, but is getting an error. https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/Databricks-Generative-AI-Engineer-Associate/page_13_img_1.jpg Assuming the API key was properly defined, what change does the Generative AI Engineer need to make to fix their chain? A) https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/Databricks-Generative-AI-Engineer-Associate/page_13_img_2.jpg B) https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/Databricks-Generative-AI-Engineer-Associate/page_13_img_3.jpg C) https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/Databricks-Generative-AI-Engineer-Associate/page_14_img_1.jpg D) https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/Databricks-Generative-AI-Engineer-Associate/page_14_img_2.jpg

Accepted Answer

C

Explanation: The chain.run() method is a legacy convenience function in LangChain designed for chains with a single input variable. The modern, standard interface for executing any "Runnable" object in LangChain, including LLMChain, is the invoke() method. This method expects a dictionary where the keys match the inputvariables specified in the PromptTemplate (in this case, "product"). Using chain.invoke({"product": "colorful socks"}) aligns with the current LangChain Expression Language (LCEL) standard, ensuring compatibility and predictable behavior. The original code fails because run() is either deprecated or being used in a context where the more explicit invoke() is required.

Question 4

A Generative Al Engineer is tasked with developing an application that is based on an open source
large language model (LLM). They need a foundation LLM with a large context window.
Which model fits this need?

Accepted Answer

D

Explanation: The primary requirement is a foundation LLM with a large context window. Among the options provided, DBRX has the largest context window by a significant margin. DBRX is a state-of-the-art, open-source, mixture-of-experts (MoE) model from Databricks that was trained with a context length of 32,768 (32k) tokens. This extensive context window makes it highly suitable for applications that need to process and generate text based on long documents, detailed histories, or complex instructions, directly addressing the engineer's need.

Question 5

A Generative Al Engineer is tasked with improving the RAG quality by addressing its inflammatory outputs. Which action would be most effective in mitigating the problem of offensive text outputs?

Accepted Answer

D

Explanation: The most effective method to mitigate inflammatory outputs in a Retrieval-Augmented Generation (RAG) system is to address the root cause: the content within the knowledge base. The RAG process retrieves information from this upstream data to provide context for the language model's response. If the source documents contain offensive or inflammatory material, the system is likely to retrieve and incorporate it into its output. Proactively curating the data, including manual review and cleaning, ensures that the foundational information used by the RAG system is free from undesirable content. This "Garbage In, Garbage Out" principle is fundamental to building safe and reliable AI systems.

Question 6

A Generative AI Engineer has been asked to design an LLM-based application that accomplishes the
following business objective: answer employee HR questions using HR PDF documentation.
Which set of high level tasks should the Generative AI Engineer's system perform?

Accepted Answer

D

Explanation: This option describes the standard and most effective architecture for this use case, known as Retrieval-Augmented Generation (RAG). Splitting large documents into smaller, semantically meaningful chunks preserves specific details that would be lost by summarizing or averaging. These chunks are then embedded and stored in a vector database for efficient similarity search. When a user asks a question, the system retrieves only the most relevant chunks, providing focused and accurate context to the LLM. This allows the LLM to generate a precise, grounded answer based on the source material.

Question 7

A Generative AI Engineer is creating an LLM-powered application that will need access to up-to-date
news articles and stock prices.
The design requires the use of stock prices which are stored in Delta tables and finding the latest
relevant news articles by searching the internet.
How should the Generative AI Engineer architect their LLM system?

Accepted Answer

B

Explanation: The most effective and standard design for a system with multiple, distinct capabilities (knowledge retrieval, API calls, database queries) is an agent-based architecture. This approach involves defining each capability as a "tool" and providing the LLM with a system prompt that describes these tools. The agent then uses its reasoning capabilities to understand the user's query, select the appropriate tool or sequence of tools, and execute them to generate a comprehensive answer. This design is flexible, scalable, and leverages the core strengths of modern LLMs in planning and tool use, which is superior to rigid, hard-coded logic.

Question 8

A Generative Al Engineer would like an LLM to generate formatted JSON from emails. This will
require parsing and extracting the following information: order ID, date, and sender email. Here’s a
sample email:
Generative AI Engineer Associate practice exam questions

Generative AI Engineer Associate practice exam questions

They will need to write a prompt that will extract the relevant information in JSON format with the
highest level of output accuracy.
Which prompt will do that?

Accepted Answer

B

Explanation: This prompt utilizes a technique known as "few-shot prompting." By providing a concrete example of the desired JSON output, the prompt gives the LLM a clear template to follow. This significantly improves the model's ability to understand the exact structure, key names (e.g., senderemail vs. sender), and value formats required. This method is superior to simply describing the task (zero-shot prompting) as it reduces ambiguity and leads to more consistent and accurate structured data extraction, which is the primary goal of the question.

Question 9

A Generative Al Engineer is tasked with developing a RAG application that will help a small internal group of experts at their company answer specific questions, augmented by an internal knowledge base. They want the best possible quality in the answers, and neither latency nor throughput is a

huge concern given that the user group is small and they’re willing to wait for the best answer. The topics are sensitive in nature and the data is highly confidential and so, due to regulatory requirements, none of the information is allowed to be transmitted to third parties. Which model meets all the Generative Al Engineer’s needs in this situation?

Accepted Answer

DBRX Instruct

Explanation: DBRX Instruct is an open-weights, state-of-the-art large language model released under Apache-2.0. Because the weights can be downloaded and run entirely on the customer’s own Databricks cluster, no prompts or retrieved documents ever leave the organization, satisfying the strict regulatory requirement against third-party data transfer. Compared with smaller open models (e.g., Mistral 7B, Dolly 2.0) it delivers markedly higher accuracy—crucial when latency and cost are secondary. GPT-4 or other hosted APIs would yield top quality but violate the no-external-transmission rule. Therefore, DBRX Instruct uniquely meets all constraints: maximum answer quality, on-prem/private deployment, and acceptable performance for a small user base.

Question 10

A Generative AI Engineer is developing an LLM application that users can use to generate
personalized birthday poems based on their names.
Which technique would be most effective in safeguarding the application, given the potential for
malicious user inputs?

Accepted Answer

A

Explanation: The most effective and standard technique for safeguarding an LLM application is to implement input guardrails. A safety filter acts as such a guardrail by analyzing user input for malicious or harmful content before it reaches the LLM. If such content is detected, the system should refuse to process the request and provide a safe, generic response, such as stating it is unable to assist. This approach directly mitigates risks like prompt injection and the generation of inappropriate content, forming a critical component of a Responsible AI framework.

Question 11

A Generative Al Engineer has successfully ingested unstructured documents and chunked them by
document sections. They would like to store the chunks in a Vector Search index. The current format
of the dataframe has two columns: (i) original document file name (ii) an array of text chunks for
each document.
What is the most performant way to store this dataframe?

Accepted Answer

B

Explanation: Databricks Vector Search indexes data from a source Delta table. For the indexing process to be effective and performant, each row in this table must represent a single, discrete unit of text (a chunk) that will be embedded. The original dataframe structure, with an array of chunks per document, is unsuitable. The most performant and correct approach is to "flatten" or "explode" this array, creating a new row for each individual chunk. Furthermore, a Vector Search source table has a strict requirement for a primary key column to uniquely identify each row (i.e., each chunk).

Question 12

A Generative Al Engineer is helping a cinema extend its website's chat bot to be able to respond to
questions about specific showtimes for movies currently playing at their local theater. They already
have the location of the user provided by location services to their agent, and a Delta table which is
continually updated with the latest showtime information by location. They want to implement this
new capability In their RAG application.
Which option will do this with the least effort and in the most performant way?

Accepted Answer

A

Explanation: Databricks Feature Serving is the optimal solution for this use case. It is specifically designed to provide low-latency, real-time access to features (like movie showtimes) stored in Delta tables for applications like chatbots. By defining a FeatureSpec and creating a serving endpoint, the system automatically syncs the data to a high-performance online store. This endpoint can then be easily integrated as a tool within the RAG agent's logic. This approach is the most performant due to the optimized online store lookup and requires the least effort as it leverages a managed, purpose-built service within the Databricks platform, avoiding complex custom implementations or external infrastructure.

Question 13

A Generative AI Engineer is creating an LLM-powered application that will need access to up-to-date
news articles and stock prices.
The design requires the use of stock prices which are stored in Delta tables and finding the latest
relevant news articles by searching the internet.
How should the Generative AI Engineer architect their LLM system?

Accepted Answer

D

Explanation: The scenario requires the LLM system to perform actions on external, real-time data sources: querying a structured database (Delta tables) and searching the unstructured web. An agent-based architecture is the most suitable design pattern for this. An LLM agent uses a reasoning engine to determine which "tools" to use to fulfill a request. In this case, the engineer would create a tool for executing SQL queries against the Delta tables and another tool for performing web searches. The agent would then orchestrate these tools, retrieve the necessary real-time data, and provide it as context to the LLM to generate a final, informed response.

Question 14

A Generative AI Engineer is developing a chatbot designed to assist users with insurance-related
queries. The chatbot is built on a large language model (LLM) and is conversational. However, to
maintain the chatbot’s focus and to comply with company policy, it must not provide responses to
questions about politics. Instead, when presented with political inquiries, the chatbot should
respond with a standard message:
“Sorry, I cannot answer that. I am a chatbot that can only answer questions around insurance.”
Which framework type should be implemented to solve this?

Accepted Answer

A

Explanation: A safety guardrail is a mechanism designed to control the behavior and output of a large language model (LLM) to ensure it is appropriate, on-topic, and not harmful. In this scenario, the goal is to prevent the chatbot from discussing a specific undesirable topic (politics). Implementing a system to detect questions about politics and provide a canned response is a form of content moderation and topic control, which is a primary function of a safety guardrail. This ensures the model operates within its intended, safe conversational boundaries.

Question 15

A Generative Al Engineer is setting up a Databricks Vector Search that will lookup news articles by
topic within 10 days of the date specified An example query might be "Tell me about monster truck
news around January 5th 1992". They want to do this with the least amount of effort.
How can they set up their Vector Search index to support this use case?

Accepted Answer

B

Explanation: Databricks Vector Search is designed to perform hybrid searches that combine semantic similarity search on vector embeddings with precise filtering on metadata. By including the article date and topic as metadata columns in the source Delta Table for the index, the engineer can use the filters parameter in the query. This allows the system to first narrow down the search space to articles within the specified date range and/or topic, and then perform the vector search on that pre-filtered subset. This is the most efficient and direct method, requiring the least effort as it leverages a built-in feature.

Free Databricks-Generative-AI-Engineer-Associate Actual Exam Questions