Databricks Real Dumps Practice Exam Questions by Dumpswarp

Databricks Certified Generative AI Engineer Associate Questions and Answers

Question 1

A team uses Mosaic AI Vector Search to retrieve documents for their Retrieval-Augmented Generation (RAG) pipeline. The search query returns five relevant documents, and the first three are added to the prompt as context. Performance evaluation with Agent Evaluation shows that some lower-ranked retrieved documents have higher context relevancy scores than higher-ranked documents. Which option should the team consider to optimize this workflow?

Options:

Use a reranker to order the documents based on the relevance scores.

Modify the prompt to instruct the LLM to order the documents based on the relevance scores.

Use a different embedding model for computing document embeddings.

Increase the number of documents added to the prompt to improve context relevance.

Question 2

A Generative Al Engineer has built an LLM-based system that will automatically translate user text between two languages. They now want to benchmark multiple LLM's on this task and pick the best one. They have an evaluation set with known high quality translation examples. They want to evaluate each LLM using the evaluation set with a performant metric.

Which metric should they choose for this evaluation?

Options:

ROUGE metric

BLEU metric

NDCG metric

RECALL metric

Question 3

A Generative AI Engineer is using LangGraph to define multiple tools in a single agentic application. They want to enable the main orchestrator LLM to decide on its own which tools are most appropriate to call for a given prompt. To do this, they must determine the general flow of the code. Which sequence will do this?

Options:

1. Define or import the tools 2. Add tools and LLM to the agent 3. Create the ReAct agent

1. Define or import the tools 2. Define the agent 3. Initialize the agent with ReAct, the LLM, and the tools

1. Define the tools 2. Load each tool into a separate agent 3. Instruct the LLM to use ReAct to call the appropriate agent

1. Define the tools inside the agents 2. Load the agents into the LLM 3. Instruct the LLM to use COT reasoning to determine the appropriate agent

Question 4

A Generative AI Engineer is developing a patient-facing healthcare-focused chatbot. If the patient’s question is not a medical emergency, the chatbot should solicit more information from the patient to pass to the doctor’s office and suggest a few relevant pre-approved medical articles for reading. If the patient’s question is urgent, direct the patient to calling their local emergency services.

Given the following user input:

“I have been experiencing severe headaches and dizziness for the past two days.”

Which response is most appropriate for the chatbot to generate?

Options:

Here are a few relevant articles for your browsing. Let me know if you have questions after reading them.

Please call your local emergency services.

Headaches can be tough. Hope you feel better soon!

Please provide your age, recent activities, and any other symptoms you have noticed along with your headaches and dizziness.

Question 5

A Generative Al Engineer is building a production-ready LLM system which replies directly to customers. The solution makes use of the Foundation Model API via provisioned throughput. They are concerned that the LLM could potentially respond in a toxic or otherwise unsafe way. They also wish to perform this with the least amount of effort.

Which approach will do this?

Options:

Host Llama Guard on Foundation Model API and use it to detect unsafe responses

Add some LLM calls to their chain to detect unsafe content before returning text

Add a regex expression on inputs and outputs to detect unsafe responses.

Ask users to report unsafe responses

Answer:

Explanation:

The task is to prevent toxic or unsafe responses in an LLM system using the Foundation Model API with minimal effort. Let’s assess the options.

Option A: Host Llama Guard on Foundation Model API and use it to detect unsafe responses

Llama Guard is a safety-focused model designed to detect toxic or unsafe content. Hosting it via the Foundation Model API (a Databricks service) integrates seamlessly with the existing system, requiring minimal setup (just deployment and a check step), and leverages provisioned throughput for performance.

Databricks Reference:"Foundation Model API supports hosting safety models like Llama Guard to filter outputs efficiently"("Foundation Model API Documentation," 2023).

Option B: Add some LLM calls to their chain to detect unsafe content before returning text

Using additional LLM calls (e.g., prompting an LLM to classify toxicity) increases latency, complexity, and effort (crafting prompts, chaining logic), and lacks the specificity of a dedicated safety model.

Databricks Reference:"Ad-hoc LLM checks are less efficient than purpose-built safety solutions"("Building LLM Applications with Databricks").

Option C: Add a regex expression on inputs and outputs to detect unsafe responses

Regex can catch simple patterns (e.g., profanity) but fails for nuanced toxicity (e.g., sarcasm, context-dependent harm), requiring significant manual effort to maintain and update rules.

Databricks Reference:"Regex-based filtering is limited for complex safety needs"("Generative AI Cookbook").

Option D: Ask users to report unsafe responses

User reporting is reactive, not preventive, and places burden on users rather than the system. It doesn’t limit unsafe outputs proactively and requires additional effort for feedback handling.

Databricks Reference:"Proactive guardrails are preferred over user-driven monitoring"("Databricks Generative AI Engineer Guide").

Conclusion: Option A (Llama Guard on Foundation Model API) is the least-effort, most effective approach, leveraging Databricks’ infrastructure for seamless safety integration.

Question 6

All of the following are Python APIs used to query Databricks foundation models. When running in an interactive notebook, which of the following libraries does not automatically use the current session credentials?

Options:

OpenAI client

REST API via requests library

MLflow Deployments SDK

Databricks Python SDK

Question 7

An AI developer team wants to fine-tune an open-weight model to have exceptional performance on a code generation use case. They are trying to choose the best model to start with. They want to minimize model hosting costs and are using Hugging Face model cards and spaces to explore models. Which TWO model attributes and metrics should the team focus on to make their selection?

Options:

Big Code Models Leaderboard

Number of model parameters

MTEB Leaderboard

Chatbot Arena Leaderboard

Number of model downloads last month

Question 8

Which indicator should be considered to evaluate the safety of the LLM outputs when qualitatively assessing LLM responses for a translation use case?

Options:

The ability to generate responses in code

The similarity to the previous language

The latency of the response and the length of text generated

The accuracy and relevance of the responses

Question 9

After changing the response generating LLM in a RAG pipeline from GPT-4 to a model with a shorter context length that the company self-hosts, the Generative AI Engineer is getting the following error:

What TWO solutions should the Generative AI Engineer implement without changing the response generating model? (Choose two.)

Options:

Use a smaller embedding model to generate

Reduce the maximum output tokens of the new model

Decrease the chunk size of embedded documents

Reduce the number of records retrieved from the vector database

Retrain the response generating model using ALiBi

Question 10

A Generative Al Engineer is building an LLM-based application that has an

important transcription (speech-to-text) task. Speed is essential for the success of the application

Which open Generative Al models should be used?

Options:

L!ama-2-70b-chat-hf

MPT-30B-lnstruct

DBRX

whisper-large-v3 (1.6B)

Answer:

Explanation:

The task requires an open generative AI model for a transcription (speech-to-text) task where speed is essential. Let’s assess the options based on their suitability for transcription and performance characteristics, referencing Databricks’ approach to model selection.

Option A: Llama-2-70b-chat-hf

Llama-2 is a text-based LLM optimized for chat and text generation, not speech-to-text. It lacks transcription capabilities.

Databricks Reference:"Llama models are designed for natural language generation, not audio processing"("Databricks Model Catalog").

Option B: MPT-30B-Instruct

MPT-30B is another text-based LLM focused on instruction-following and text generation, not transcription. It’s irrelevant for speech-to-text tasks.

Databricks Reference: No specific mention, but MPT is categorized under text LLMs in Databricks’ ecosystem, not audio models.

Option C: DBRX

DBRX, developed by Databricks, is a powerful text-based LLM for general-purpose generation. It doesn’t natively support speech-to-text and isn’t optimized for transcription.

Databricks Reference:"DBRX excels at text generation and reasoning tasks"("Introducing DBRX," 2023)—no mention of audio capabilities.

Option D: whisper-large-v3 (1.6B)

Whisper, developed by OpenAI, is an open-source model specifically designed for speech-to-text transcription. The “large-v3” variant (1.6 billion parameters) balances accuracy and efficiency, with optimizations for speed via quantization or deployment on GPUs—key for the application’s requirements.

Databricks Reference:"For audio transcription, models like Whisper are recommended for their speed and accuracy"("Generative AI Cookbook," 2023). Databricks supports Whisper integration in its MLflow or Lakehouse workflows.

Conclusion: OnlyD. whisper-large-v3is a speech-to-text model, making it the sole suitable choice. Its design prioritizes transcription, and its efficiency (e.g., via optimized inference) meets the speed requirement, aligning with Databricks’ model deployment best practices.

Question 11

A Generative Al Engineer is creating an LLM system that will retrieve news articles from the year 1918 and related to a user's query and summarize them. The engineer has noticed that the summaries are generated well but often also include an explanation of how the summary was generated, which is undesirable.

Which change could the Generative Al Engineer perform to mitigate this issue?

Options:

Split the LLM output by newline characters to truncate away the summarization explanation.

Tune the chunk size of news articles or experiment with different embedding models.

Revisit their document ingestion logic, ensuring that the news articles are being ingested properly.

Provide few shot examples of desired output format to the system and/or user prompt.

Question 12

A Generative AI Engineer is deploying a customer-facing, fine-tuned LLM on their public website. Given the large investment the company put into fine-tuning this model, and the proprietary nature of the tuning data, they are concerned about model inversion attacks. Which of the following Databricks AI Security Framework (DASF) risk mitigation strategies are most relevant to this use case?

Options:

Implement AI guardrails to allow users to configure and enforce compliance

Leverage Databricks access control lists (ACLs) to configure permissions for accessing models

Use secure model features with Databricks Feature Store

Apply attribute-based access controls (ABAC) to limit unauthorized access

Question 13

A Generative AI Engineer is building an LLM to generate article summaries in the form of a type of poem, such as a haiku, given the article content. However, the initial output from the LLM does not match the desired tone or style.

Which approach will NOT improve the LLM’s response to achieve the desired response?

Options:

Provide the LLM with a prompt that explicitly instructs it to generate text in the desired tone and style

Use a neutralizer to normalize the tone and style of the underlying documents

Include few-shot examples in the prompt to the LLM

Fine-tune the LLM on a dataset of desired tone and style

Question 14

A Generative Al Engineer is using an LLM to classify species of edible mushrooms based on text descriptions of certain features. The model is returning accurate responses in testing and the Generative Al Engineer is confident they have the correct list of possible labels, but the output frequently contains additional reasoning in the answer when the Generative Al Engineer only wants to return the label with no additional text.

Which action should they take to elicit the desired behavior from this LLM?

Options:

Use few snot prompting to instruct the model on expected output format

Use zero shot prompting to instruct the model on expected output format

Use zero shot chain-of-thought prompting to prevent a verbose output format

Use a system prompt to instruct the model to be succinct in its answer

Answer:

Explanation:

The LLM classifies mushroom species accurately but includes unwanted reasoning text, and the engineer wants only the label. Let’s assess how to control output format effectively.

Option A: Use few shot prompting to instruct the model on expected output format

Few-shot prompting provides examples (e.g., input: description, output: label). It can work but requires crafting multiple examples, which is effort-intensive and less direct than a clear instruction.

Databricks Reference:"Few-shot prompting guides LLMs via examples, effective for format control but requires careful design"("Generative AI Cookbook").

Option B: Use zero shot prompting to instruct the model on expected output format

Zero-shot prompting relies on a single instruction (e.g., “Return only the label”) without examples. It’s simpler than few-shot but may not consistently enforce succinctness if the LLM’s default behavior is verbose.

Databricks Reference:"Zero-shot prompting can specify output but may lack precision without examples"("Building LLM Applications with Databricks").

Option C: Use zero shot chain-of-thought prompting to prevent a verbose output format

Chain-of-Thought (CoT) encourages step-by-step reasoning, which increases verbosity—opposite to the desired outcome. This contradicts the goal of label-only output.

Databricks Reference:"CoT prompting enhances reasoning but often results in detailed responses"("Databricks Generative AI Engineer Guide").

Option D: Use a system prompt to instruct the model to be succinct in its answer

A system prompt (e.g., “Respond with only the species label, no additional text”) sets a global instruction for the LLM’s behavior. It’s direct, reusable, and effective for controlling output style across queries.

Databricks Reference:"System prompts define LLM behavior consistently, ideal for enforcing concise outputs"("Generative AI Cookbook," 2023).

Conclusion: Option D is the most effective and straightforward action, using a system prompt to enforce succinct, label-only responses, aligning with Databricks’ best practices for output control.

Question 15

A Generative AI Engineer received the following business requirements for an external chatbot.

The chatbot needs to know what types of questions the user asks and routes to appropriate models to answer the questions. For example, the user might ask about upcoming event details. Another user might ask about purchasing tickets for a particular event.

What is an ideal workflow for such a chatbot?

Options:

The chatbot should only look at previous event information

There should be two different chatbots handling different types of user queries.

The chatbot should be implemented as a multi-step LLM workflow. First, identify the type of question asked, then route the question to the appropriate model. If it’s an upcoming event question, send the query to a text-to-SQL model. If it’s about ticket purchasing, the customer should be redirected to a payment platform.

The chatbot should only process payments

Question 16

A Generative Al Engineer interfaces with an LLM with prompt/response behavior that has been trained on customer calls inquiring about product availability. The LLM is designed to output “In Stock” if the product is available or only the term “Out of Stock” if not.

Which prompt will work to allow the engineer to respond to call classification labels correctly?

Options:

Respond with “In Stock” if the customer asks for a product.

You will be given a customer call transcript where the customer asks about product availability. The outputs are either “In Stock” or “Out of Stock”. Format the output in JSON, for example: {“call_id”: “123”, “label”: “In Stock”}.

Respond with “Out of Stock” if the customer asks for a product.

You will be given a customer call transcript where the customer inquires about product availability. Respond with “In Stock” if the product is available or “Out of Stock” if not.

Question 17

Which TWO chain components are required for building a basic LLM-enabled chat application that includes conversational capabilities, knowledge retrieval, and contextual memory?

Options:

(Q)

Vector Stores

Conversation Buffer Memory

External tools

Chat loaders

React Components

Answer:

B, C

Explanation:

Building a basic LLM-enabled chat application with conversational capabilities, knowledge retrieval, and contextual memory requires specific components that work together to process queries, maintain context, and retrieve relevant information. Databricks’ Generative AI Engineer documentation outlines key components for such systems, particularly in the context of frameworks like LangChain or Databricks’ MosaicML integrations. Let’s evaluate the required components:

Understanding the Requirements:

Conversational capabilities: The app must generate natural, coherent responses.

Knowledge retrieval: It must access external or domain-specific knowledge.

Contextual memory: It must remember prior interactions in the conversation.

Databricks Reference:"A typical LLM chat application includes a memory component to track conversation history and a retrieval mechanism to incorporate external knowledge"("Databricks Generative AI Cookbook," 2023).

Evaluating the Options:

A. (Q): This appears incomplete or unclear (possibly a typo). Without further context, it’s not a valid component.

B. Vector Stores: These store embeddings of documents or knowledge bases, enabling semantic search and retrieval of relevant information for the LLM. This is critical for knowledge retrieval in a chat application.

Databricks Reference:"Vector stores, such as those integrated with Databricks’ Lakehouse, enable efficient retrieval of contextual data for LLMs"("Building LLM Applications with Databricks").

C. Conversation Buffer Memory: This component stores the conversation history, allowing the LLM to maintain context across multiple turns. It’s essential for contextual memory.

Databricks Reference:"Conversation Buffer Memory tracks prior user inputs and LLM outputs, ensuring context-aware responses"("Generative AI Engineer Guide").

D. External tools: These (e.g., APIs or calculators) enhance functionality but aren’t required for abasicchat app with the specified capabilities.

E. Chat loaders: These might refer to data loaders for chat logs, but they’re not a core chain component for conversational functionality or memory.

F. React Components: These relate to front-end UI development, not the LLM chain’s backend functionality.

Selecting the Two Required Components:

Forknowledge retrieval, Vector Stores (B) are necessary to fetch relevant external data, a cornerstone of Databricks’ RAG-based chat systems.

Forcontextual memory, Conversation Buffer Memory (C) is required to maintain conversation history, ensuring coherent and context-aware responses.

While an LLM itself is implied as the core generator, the question asks for chain components beyond the model, making B and C the minimal yet sufficient pair for a basic application.

Conclusion: The two required chain components areB. Vector StoresandC. Conversation Buffer Memory, as they directly address knowledge retrieval and contextual memory, respectively, aligning with Databricks’ documented best practices for LLM-enabled chat applications.

Question 18

A Generative Al Engineer is developing a RAG application and would like to experiment with different embedding models to improve the application performance.

Which strategy for picking an embedding model should they choose?

Options:

Pick an embedding model trained on related domain knowledge

Pick the most recent and most performant open LLM released at the time

pick the embedding model ranked highest on the Massive Text Embedding Benchmark (MTEB) leaderboard hosted by HuggingFace

Pick an embedding model with multilingual support to support potential multilingual user questions

Answer:

Explanation:

The task involves improving a Retrieval-Augmented Generation (RAG) application’s performance by experimenting with embedding models. The choice of embedding model impacts retrieval accuracy, which is critical for RAG systems. Let’s evaluate the options based on Databricks Generative AI Engineer best practices.

Option A: Pick an embedding model trained on related domain knowledge

Embedding models trained on domain-specific data (e.g., industry-specific corpora) produce vectors that better capture the semantics of the application’s context, improving retrieval relevance. For RAG, this is a key strategy to enhance performance.

Databricks Reference:"For optimal retrieval in RAG systems, select embedding models aligned with the domain of your data"("Building LLM Applications with Databricks," 2023).

Option B: Pick the most recent and most performant open LLM released at the time

LLMs are not embedding models; they generate text, not embeddings for retrieval. While recent LLMs may be performant for generation, this doesn’t address the embedding step in RAG. This option misunderstands the component being selected.

Databricks Reference: Embedding models and LLMs are distinct in RAG workflows:"Embedding models convert text to vectors, while LLMs generate responses"("Generative AI Cookbook").

Option C: Pick the embedding model ranked highest on the Massive Text Embedding Benchmark (MTEB) leaderboard hosted by HuggingFace

The MTEB leaderboard ranks models across general tasks, but high overall performance doesn’t guarantee suitability for a specific domain. A top-ranked model might excel in generic contexts but underperform on the engineer’s unique data.

Databricks Reference: General performance is less critical than domain fit:"Benchmark rankings provide a starting point, but domain-specific evaluation is recommended"("Databricks Generative AI Engineer Guide").

Option D: Pick an embedding model with multilingual support to support potential multilingual user questions

Multilingual support is useful only if the application explicitly requires it. Without evidence of multilingual needs, this adds complexity without guaranteed performance gains for the current use case.

Databricks Reference:"Choose features like multilingual support based on application requirements"("Building LLM-Powered Applications").

Conclusion: Option A is the best strategy because it prioritizes domain relevance, directly improving retrieval accuracy in a RAG system—aligning with Databricks’ emphasis on tailoring models to specific use cases.

Question 19

A Generative AI Engineer has been reviewing issues with their company's LLM-based question-answering assistant and has determined that a technique called prompt chaining could help alleviate some performance concerns. However, to suggest this to their team, they have to clearly explain how it works and how it can benefit their question-answering assistant. Which explanation do they communicate to the team?

Options:

It allows you to break down complex tasks into multiple independent subtasks. This enables the assistant to generate more comprehensive and accurate responses.

It allows you to reduce the latency of your applications. By having multiple chains participating in the response as a chain, you increase the rate at which the response is generated.

It allows you to decrease the effort involved in crafting a prompt. Chains make it possible to reuse prompt text across multiple different use cases.

It reduces the average cost of a typical request. Chains make more efficient use of the tokens produced to generate higher quality responses with fewer tokens.

Question 20

A Generative Al Engineer is ready to deploy an LLM application written using Foundation Model APIs. They want to follow security best practices for production scenarios

Which authentication method should they choose?

Options:

Use an access token belonging to service principals

Use a frequently rotated access token belonging to either a workspace user or a service principal

Use OAuth machine-to-machine authentication

Use an access token belonging to any workspace user

Answer:

Explanation:

The task is to deploy an LLM application using Foundation Model APIs in a production environment while adhering to security best practices. Authentication is critical for securing access to Databricks resources, such as the Foundation Model API. Let’s evaluate the options based on Databricks’ security guidelines for production scenarios.

Option A: Use an access token belonging to service principals

Service principals are non-human identities designed for automated workflows and applications in Databricks. Using an access token tied to a service principal ensures that the authentication is scoped to the application, follows least-privilege principles (via role-based access control), and avoids reliance on individual user credentials. This is a security best practice for production deployments.

Databricks Reference:"For production applications, use service principals with access tokens to authenticate securely, avoiding user-specific credentials"("Databricks Security Best Practices," 2023). Additionally, the "Foundation Model API Documentation" states:"Service principal tokens are recommended for programmatic access to Foundation Model APIs."

Option B: Use a frequently rotated access token belonging to either a workspace user or a service principal

Frequent rotation enhances security by limiting token exposure, but tying the token to a workspace user introduces risks (e.g., user account changes, broader permissions). Including both user and service principal options dilutes the focus on application-specific security, making this less ideal than a service-principal-only approach. It also adds operational overhead without clear benefits over Option A.

Databricks Reference:"While token rotation is a good practice, service principals are preferred over user accounts for application authentication"("Managing Tokens in Databricks," 2023).

Option C: Use OAuth machine-to-machine authentication

OAuth M2M (e.g., client credentials flow) is a secure method for application-to-service communication, often using service principals under the hood. However, Databricks’ Foundation Model API primarily supports personal access tokens (PATs) or service principal tokens over full OAuth flows for simplicity in production setups. OAuth M2M adds complexity (e.g., managing refresh tokens) without a clear advantage in this context.

Databricks Reference:"OAuth is supported in Databricks, but service principal tokens are simpler and sufficient for most API-based workloads"("Databricks Authentication Guide," 2023).

Option D: Use an access token belonging to any workspace user

Using a user’s access token ties the application to an individual’s identity, violating security best practices. It risks exposure if the user leaves, changes roles, or has overly broad permissions, and it’s not scalable or auditable for production.

Databricks Reference:"Avoid using personal user tokens for production applications due to security and governance concerns"("Databricks Security Best Practices," 2023).

Conclusion: Option A is the best choice, as it uses a service principal’s access token, aligning with Databricks’ security best practices for production LLM applications. It ensures secure, application-specific authentication with minimal complexity, as explicitly recommended for Foundation Model API deployments.

Question 21

A Generative AI Engineer is testing a simple prompt template in LangChain using the code below, but is getting an error:

Python

from langchain.chains import LLMChain

from langchain_community.llms import OpenAI

from langchain_core.prompts import PromptTemplate

prompt_template = "Tell me a {adjective} joke"

prompt = PromptTemplate(input_variables=["adjective"], template=prompt_template)

# ... (Error-prone section)

Assuming the API key was properly defined, what change does the Generative AI Engineer need to make to fix their chain?

Options:

(Incorrect structure)

prompt_template = "Tell me a {adjective} joke"

prompt = PromptTemplate(input_variables=["adjective"], template=prompt_template)

llm = OpenAI()

llm_chain = LLMChain(prompt=prompt, llm=llm)

llm_chain.generate([{"adjective": "funny"}])

(Incorrect structure)

Load More Databricks-Generative-AI-Engineer-Associate Questions

Pre-Summer Sale Discount Flat 70% Offer - Ends in 0d 00h 00m 00s - Coupon code: 70diswrap

Dumpswrap Top Menu

breadcrumb

Databricks Databricks-Generative-AI-Engineer-Associate Dumps

Databricks-Generative-AI-Engineer-Associate Free PDF Questions

Databricks Certified Generative AI Engineer Associate Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Databricks-Generative-AI-Engineer-Associate Free PDF Answers

Dumpswrap Footer Menu

DumpsWrap All Rights Reserved