Q: HOTSPOT You plan to process the following three datasets by using Fabric: • Dataset1: This dataset will be added to Fabric and will have a unique primary key between the source and the destination. The unique primary key will be an integer and will start from 1 and have an increment of 1. • Dataset2: This dataset contains semi-structured data that uses bulk data transfer. The dataset must be handled in one process between the source and the destination. The data transformation process will include the use of custom visuals to understand and work with the dataset in development mode. • Dataset3. This dataset is in a takehouse. The data will be bulk loaded. The data transformation process will include row-based windowing functions during the loading process. You need to identify which type of item to use for the datasets. The solution must minimize development effort and use built-in functionality, when possible. What should you identify for each dataset? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/DP-700/page_113_img_1.jpg

DATASET1: A DATAFLOW GEN2 DATAFLOW DATASET2: A NOTEBOOK DATASET3: A T-SQL STATEMENT Explanation: Dataset1: A Dataflow Gen2 dataflow Dataflow Gen2 (which uses the Power Query engine) provides a built-in, low-code transformation called "Add Index Column" . This feature directly satisfies the requirement to create a unique integer primary key starting from 1 with an increment of 1, minimizing development effort. Dataset2: A notebook Notebooks are the primary Fabric item for handling semi-structured data at scale using Spark. Crucially, the notebook environment is designed for iterative development and allows developers to use code-based custom visualizations (using libraries like Plotly, Seaborn, or Matplotlib) to explore and understand the data during the transformation process. Dataset3: A T-SQL statement The data resides in a Lakehouse, which exposes a SQL-T-SQL endpoint. T-SQL is the native language designed to perform complex analytical operations, such as row-based windowing functions (e.g., ROW_NUMBER() , RANK() , LAG() , LEAD() ). This transformation can be applied efficiently during the bulk loading process using T-SQL commands like CREATE TABLE AS SELECT (CTAS) or INSERT...SELECT .

Question 1

You have a Fabric workspace named Workspace1 that contains a notebook named Notebook1.
In Workspace1, you create a new notebook named Notebook2.
You need to ensure that you can attach Notebook2 to the same Apache Spark session as Notebook1.
What should you do?

Accepted Answer

A

Explanation: The high concurrency feature in Microsoft Fabric is specifically designed to allow multiple notebooks to share a single Apache Spark session. When a session is started with high concurrency enabled, subsequent notebooks from the same user can attach to this existing session instead of creating a new one. This significantly reduces the startup time for the second notebook (Notebook2) and optimizes the use of cluster resources, directly fulfilling the requirement to attach Notebook2 to the same session as Notebook1.

Question 2

You need to recommend a solution for handling old files. The solution must meet the technical requirements. What should you include in the recommendation?

Accepted Answer

B

Explanation: The VACUUM command is specifically designed to clean up and remove old, unreferenced data files from the underlying storage of a Delta Lake table. When data in a Delta Lake table is updated or deleted, the old data files are not immediately removed to support features like time travel. The VACUUM command purges these stale files that are older than the specified retention threshold, which directly addresses the requirement of handling old files in a safe, transactionally-aware manner. This is a crucial table maintenance task for managing storage costs and cleaning up data.

Question 3

You have a Fabric F32 capacity that contains a workspace. The workspace contains a warehouse
named DW1 that is modelled by using MD5 hash surrogate keys.
DW1 contains a single fact table that has grown from 200 million rows to 500 million rows during the
past year.
You have Microsoft Power BI reports that are based on Direct Lake. The reports show year-over-year
values.
Users report that the performance of some of the reports has degraded over time and some visuals
show errors.
You need to resolve the performance issues. The solution must meet the following requirements:
Provide the best query performance.
Minimize operational costs.
Which should you do?

Accepted Answer

C

Explanation: The performance degradation is directly linked to the significant growth of the fact table to 500 million rows. V-Order is a write-time optimization specific to the Microsoft Fabric storage engine that reorders data within Parquet files. This optimization improves data compression and creates a more efficient data layout, which significantly accelerates read operations by compute engines like Power BI (using Direct Lake) and the SQL endpoint. Applying V-Order to the large fact table will directly address the performance degradation caused by increased data volume, especially for scans and filters common in year-over-year analysis. This is a built-in feature that improves resource utilization, thus meeting the requirement to minimize operational costs.

Question 4

You need to implement the solution for the book reviews. Which should you do?

Accepted Answer

B

Explanation: To implement a solution for book reviews, assuming the data resides in an external storage location like Azure Data Lake Storage (ADLS) Gen2, a shortcut is the most efficient method. In Microsoft Fabric, a OneLake shortcut acts as a symbolic link to the external data source. This approach allows you to access and analyze the book review data directly within your lakehouse without ingesting or duplicating it. This avoids data movement, reduces storage costs, and ensures that analyses are always performed on the most current data from the source system.

Question 5

You have a Fabric warehouse named DW1. DW1 contains a table that stores sales data and is used by
multiple sales representatives.
You plan to implement row-level security (RLS).
You need to ensure that the sales representatives can see only their respective data.
Which warehouse object do you require to implement RLS?

Accepted Answer

D

Explanation: Row-Level Security (RLS) in a Microsoft Fabric warehouse is implemented by creating a security policy that applies a filter predicate to a table. This predicate logic, which determines whether a user can access a specific row, must be encapsulated within an inline table-valued function (TVF). The function evaluates each row against the current user's context (e.g., USERNAME()) and returns a result that the security policy uses to either show or hide the row. Therefore, a function is the essential T-SQL object required to define the core filtering logic for RLS.

Question 6

Your company has a sales department that uses two Fabric workspaces named Workspace1 and
Workspace2.
The company decides to implement a domain strategy to organize the workspaces.
You need to ensure that a user can perform the following tasks:
Create a new domain for the sales department.
Create two subdomains: one for the east region and one for the west region.
Assign Workspace1 to the east region subdomain.
Assign Workspace2 to the west region subdomain.
The solution must follow the principle of least privilege.
Which role should you assign to the user?

Accepted Answer

D

Explanation: The user is required to perform several tasks, the first of which is to "Create a new domain for the sales department." According to Microsoft Fabric documentation, the creation of a new, top-level domain is a tenant-level operation. This action can only be performed by users assigned the Fabric admin, Power BI admin, or Global administrator role. While a domain admin can create subdomains and a domain contributor can assign workspaces, neither role has the necessary permissions to create the initial domain. Therefore, to accomplish all the specified tasks, the user must be assigned the Fabric admin role, which is the minimum role that satisfies all requirements.

Question 7

You have two Fabric workspaces named Workspace1 and Workspace2.
You have a Fabric deployment pipeline named deployPipeline1 that deploys items from Workspace1
to Workspace2. DeployPipeline1 contains all the items in Workspace1.
You recently modified the items in Workspaces1.
The workspaces currently contain the items shown in the following table.
DP-700 practice exam questions

Items in Workspace1 that have the same name as items in Workspace2 are currently paired.
You need to ensure that the items in Workspace1 overwrite the corresponding items in Workspace2.
The solution must minimize effort.
What should you do?

Accepted Answer

D

Explanation: The fundamental purpose of a Microsoft Fabric deployment pipeline is to manage the lifecycle of content by promoting it between different stages (workspaces). When you deploy content from a source workspace to a target workspace that already contains items, the pipeline's default behavior is to overwrite the existing items in the target with the versions from the source. Since the items are already paired by name, simply running the deployment pipeline will automatically update the items in Workspace2 with the modified versions from Workspace1. This is the most direct method and requires the minimum effort.

Question 8

You have a Fabric workspace that contains a lakehouse and a notebook named Notebook1.
Notebook1 reads data into a DataFrame from a table named Table1 and applies transformation logic.
The data from the DataFrame is then written to a new Delta table named Table2 by using a merge
operation.
You need to consolidate the underlying Parquet files in Table1.
Which command should you run?

Accepted Answer

C

Explanation: The OPTIMIZE command is specifically designed to improve the performance of queries on a Delta table by compacting small files into fewer, larger ones. This process, known as file compaction, directly addresses the requirement to "consolidate the underlying Parquet files." Consolidating files reduces the metadata overhead and improves I/O efficiency, leading to faster read operations. The command can be run on a table to rewrite the data files for better layout and size.

Question 9

You have a Fabric capacity that contains a workspace named Workspace1. Workspace1 contains a
lakehouse named Lakehouse1, a data pipeline, a notebook, and several Microsoft Power BI reports.
A user named User1 wants to use SQL to analyze the data in Lakehouse1.
You need to configure access for User1. The solution must meet the following requirements:
Provide User1 with read access to the table data in Lakehouse1.
Prevent User1 from using Apache Spark to query the underlying files in Lakehouse1.
Prevent User1 from accessing other items in Workspace1.
What should you do?

Accepted Answer

A

Explanation: The most effective and precise way to meet all requirements is to use item-level sharing for the Lakehouse. Sharing Lakehouse1 directly with User1 and granting the Read all SQL endpoint data permission achieves the specific goals. This permission allows User1 to connect to the SQL analytics endpoint and run SELECT queries on all tables, fulfilling the primary requirement. Crucially, this method does not grant permissions to the Lakehouse explorer or the underlying files for Apache Spark access. Furthermore, because this is an item-specific permission, it does not grant any access to other items within Workspace1, such as notebooks or pipelines, thus adhering to the principle of least privilege.

Question 10

HOTSPOT You plan to process the following three datasets by using Fabric: • Dataset1: This dataset will be added to Fabric and will have a unique primary key between the source and the destination. The unique primary key will be an integer and will start from 1 and have an increment of 1. • Dataset2: This dataset contains semi-structured data that uses bulk data transfer. The dataset must be handled in one process between the source and the destination. The data transformation process will include the use of custom visuals to understand and work with the dataset in development mode. • Dataset3. This dataset is in a takehouse. The data will be bulk loaded. The data transformation process will include row-based windowing functions during the loading process. You need to identify which type of item to use for the datasets. The solution must minimize development effort and use built-in functionality, when possible. What should you identify for each dataset? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/DP-700/page_113_img_1.jpg

Accepted Answer

DATASET1: A DATAFLOW GEN2 DATAFLOW

DATASET2: A NOTEBOOK

DATASET3: A T-SQL STATEMENT

Explanation: Dataset1: A Dataflow Gen2 dataflow Dataflow Gen2 (which uses the Power Query engine) provides a built-in, low-code transformation called "Add Index Column" . This feature directly satisfies the requirement to create a unique integer primary key starting from 1 with an increment of 1, minimizing development effort. Dataset2: A notebook Notebooks are the primary Fabric item for handling semi-structured data at scale using Spark. Crucially, the notebook environment is designed for iterative development and allows developers to use code-based custom visualizations (using libraries like Plotly, Seaborn, or Matplotlib) to explore and understand the data during the transformation process. Dataset3: A T-SQL statement The data resides in a Lakehouse, which exposes a SQL-T-SQL endpoint. T-SQL is the native language designed to perform complex analytical operations, such as row-based windowing functions (e.g., ROW_NUMBER() , RANK() , LAG() , LEAD() ). This transformation can be applied efficiently during the bulk loading process using T-SQL commands like CREATE TABLE AS SELECT (CTAS) or INSERT...SELECT .

Question 11

HOTSPOT You have a Fabric warehouse named DW1 that contains four staging tables named ProductCategory, ProductSubcategory, Product, and SalesOrder. ProductCategory, ProductSubcategory, and Product are used often in analytical queries. You need to implement a star schema for DW1. The solution must minimize development effort. Which design approach should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/DP-700/page_104_img_2.jpg

Accepted Answer

PRODUCTCATEGORY, PRODUCTSUBCATEGORY AND PRODUCT MUST BE: DENORMALIZED INTO A SINGLE PRODUCT DIMENSION TABLE

THE JOINING KEY MUST BE: THE UNIQUE SYSTEM GENERATED IDENTIFIER

Explanation: In a star schema, the central fact table (derived from SalesOrder ) is surrounded by dimension tables. The tables ProductCategory , ProductSubcategory , and Product represent a hierarchy describing a single business entity: the product. To create a star schema, these normalized tables are denormalized into a single Product dimension table. This simplifies queries and improves performance, which aligns with the goal of supporting analytical queries. This new Product dimension table should have a surrogate key —a unique, system-generated identifier (e.g., ProductKey )—as its primary key. This key is meaningless to the business but ensures stability and efficient joins. This surrogate key is then added to the fact table as a foreign key to link sales orders to the product details.

Question 12

HOTSPOT You have a Fabric workspace named Workspace1 that contains the items shown in the following table. https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/DP-700/page_42_img_1.jpg For Model1, the Keep your Direct Lake data up to date option is disabled. You need to configure the execution of the items to meet the following requirements: Notebook1 must execute every weekday at 8:00 AM. Notebook2 must execute when a file is saved to an Azure Blob Storage container. Model1 must refresh when Notebook1 has executed successfully. How should you orchestrate each item? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/DP-700/page_44_img_1.jpg

Accepted Answer

NOTEBOOK1: ADD NOTEBOOK1 TO PIPELINE1.

NOTEBOOK2: FROM REAL-TIME HUB, CONFIGURE THE EXECUTION OF NOTEBOOK2.

PIPELINE1: CONFIGURE THE EXECUTION OF PIPELINE1 BY USING A SCHEDULE.

MODEL1: ADD MODEL1 TO PIPELINE1.

Explanation: Notebook1, Model1, and Pipeline1: The requirements for Notebook1 (run on a schedule) and Model1 (refresh after Notebook1 succeeds) create a dependency. The best tool in Fabric to orchestrate a sequence of activities like this is a Data pipeline . You add Notebook1 as a "Notebook" activity within Pipeline1 . You add Model1 as a "Dataset Refresh" activity within Pipeline1 , setting it to run only after the Notebook1 activity succeeds. You then configure Pipeline1 with a schedule (weekdays at 8:00 AM) to run this entire workflow. Notebook2: The requirement for Notebook2 is event-driven (run when a file is saved). This is handled by the streaming and event-processing capabilities of Fabric, which are centralized in the Real-Time hub . From the hub, you can use an Eventstream to capture Azure Blob Storage events and configure a trigger (e.g., a Lakehouse trigger or Data Activator reflex) to execute Notebook2.

Question 13

HOTSPOT You are building a data loading pattern for Fabric notebook workloads. You have the following code segment: https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/DP-700/page_75_img_1.jpg For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/DP-700/page_76_img_1.jpg

Accepted Answer

THE TARGET TABLE WILL ALWAYS BE OVERWRITTEN: NO

THE MERGE OPERATION WILL ALWAYS RUN: NO

THE LOADING PATTERN SUPPORTS BOTH FULL AND INCREMENTAL LOADING REQUIREMENTS: YES

Explanation: The code's logic is branched based on whether the target_table already exists. The code first attempts DeltaTable.FromName(spark, target_table) . If the table does not exist, this try block fails and raises an exception. The except block then runs, creating the table using mode("overwrite") (a full load ). The function then return s, and the merge operation is never reached. If the table does exist, the first try block succeeds. The code skips the except block (and the overwrite logic) and proceeds to the second try block, where it executes a merge operation (an incremental load ) to upsert data. Therefore, neither the overwrite nor the merge operation "always" runs; they are mutually exclusive. The pattern correctly handles the initial full load (creation) and subsequent incremental loads (merge).

Question 14

HOTSPOT You are building a data orchestration pattern by using a Fabric data pipeline named Dynamic Data Copy as shown in the exhibit. (Click the Exhibit tab.) https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/DP-700/page_109_img_1.jpg Dynamic Data Copy does NOT use parametrization. You need to configure the ForEach activity to receive the list of tables to be copied. How should you complete the pipeline expression? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/DP-700/page_109_img_2.jpg

Accepted Answer

BOX 1: LOOKUP SCHEMA AND TABLE

BOX 2: OUTPUT.VALUE

Explanation: To configure the ForEach activity (named "Extraction Loop"), its Items property must be set to an array from the output of the preceding activity. The preceding activity is named Lookup Schema and Table . The dynamic content expression to access another activity's output is @activity('ActivityName') . The Lookup activity returns its results (the list of tables) as an array within the value property of its output object. Therefore, the correct expression to pass the array of tables to the ForEach loop is @activity('Lookup Schema and Table').output.value .

Question 15

HOTSPOT You have a Fabric workspace that contains a warehouse named Warehouse!. Warehousel contains a table named DimCustomers. DimCustomers contains the following columns: • CustomerName • CustomerlD • BirthDate • Email You need to configure security to meet the following requirements: • BirthDate in DimCustomer must be masked and display 1900-01-01. • Email in DimCustomer must be masked and display only the first leading character and the last five characters. How should you complete the statement? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/DP-700/page_111_img_1.jpg

Accepted Answer

BOX 1: 'DEFAULT()'

BOX 2: 'PARTIAL(1, "@", 5)'

Explanation: BirthDate: The default() masking function is used to apply a predefined mask based on the column's data type. For date or datetime data types, the default() function masks the value to 1900-01-01 , which precisely matches the requirement. EmailAddress: The partial(prefix, padding, suffix) function is used to mask string data, exposing a specified number of characters at the beginning ( prefix ) and end ( suffix ), while replacing the middle section with a padding string. The requirement is to show the "first leading character," so prefix = 1 . The requirement is to show the "last five characters," so suffix = 5 . The option 'partial(1, "@", 5)' correctly implements this logic, using "@" as the padding string. The email() function is incorrect as it has a fixed format (first letter and .com suffix) that does not match the "last five characters" requirement. The random() function is incorrect as it is only used for numeric data types.

Free Microsoft Data Engineering DP-700 Actual Exam Questions