Question 1

A company hosts its applications on Amazon EC2 instances. The company must use SSL/TLS
connections that encrypt data in transit to communicate securely with AWS infrastructure that is
managed by a customer.
A data engineer needs to implement a solution to simplify the generation, distribution, and rotation
of digital certificates. The solution must automatically renew and deploy SSL/TLS certificates.
Which solution will meet these requirements with the LEAST operational overhead?

Accepted Answer

B

Explanation: AWS Certificate Manager (ACM) is the designated AWS service for provisioning, managing, and deploying public and private SSL/TLS certificates. It is specifically designed to automate and simplify the entire certificate lifecycle, including generation, distribution, and, most importantly, automatic renewal. By integrating ACM with services like Elastic Load Balancing (which fronts EC2 instances), the company can achieve end-to-end encryption with minimal operational overhead, as ACM handles the complex and error-prone tasks of certificate management automatically.

Question 2

A data engineer needs to build an enterprise data catalog based on the company's Amazon S3
buckets and Amazon RDS databases. The data catalog must include storage format metadata for the
data in the catalog.
Which solution will meet these requirements with the LEAST effort?

Accepted Answer

B

Explanation: An AWS Glue crawler is the purpose-built service for automatically discovering and cataloging metadata from various data sources, including Amazon S3 and Amazon RDS. Crawlers use a prioritized list of classifiers (both built-in and custom) to automatically determine the data's schema, including its storage format (e.g., Parquet, JSON, CSV). This process populates the AWS Glue Data Catalog with the required metadata tables. This automated approach is the most direct and efficient method, fulfilling the "LEAST effort" requirement by eliminating the need for manual scripting or metadata entry.

Question 3

A company has an application that uses a microservice architecture. The company hosts the
application on an Amazon Elastic Kubernetes Services (Amazon EKS) cluster.
The company wants to set up a robust monitoring system for the application. The company needs to
analyze the logs from the EKS cluster and the application. The company needs to correlate the
cluster's logs with the application's traces to identify points of failure in the whole application
request flow.
Which combination of steps will meet these requirements with the LEAST development effort?
(Select TWO.)

Accepted Answer

A, D

Explanation: This scenario requires a two-part solution: a mechanism for data collection (logs and traces) and a backend for data correlation and analysis. Option A provides the most efficient collection strategy. FluentBit is a lightweight, high-performance log processor and forwarder, making it ideal for collecting logs from containerized environments like Amazon EKS with minimal resource overhead. OpenTelemetry is the open-source industry standard for instrumenting applications to generate and collect telemetry data, including traces, with minimal code changes, thus satisfying the "least development effort" constraint. Option D provides the optimal backend for analysis. Amazon OpenSearch Service is a managed service specifically designed for log analytics, monitoring, and full-text search. It has built-in capabilities, known as Trace Analytics, to ingest, visualize, and correlate trace data from OpenTelemetry with corresponding log data, directly addressing the core requirement.

Question 4

A company stores customer records in Amazon S3. The company must not delete or modify the
customer record data for 7 years after each record is created. The root user also must not have the
ability to delete or modify the data.
A data engineer wants to use S3 Object Lock to secure the data.
Which solution will meet these requirements?

Accepted Answer

B

Explanation: The core requirement is to make data immutable for 7 years, even for the root user. S3 Object Lock in Compliance mode is specifically designed for this purpose. In Compliance mode, a protected object version cannot be overwritten or deleted by any user, including the root user in the AWS account, until the retention period expires. Setting a default retention period of 7 years on the bucket ensures that all new customer records automatically inherit this strict, unchangeable WORM (Write-Once, Read-Many) protection, fulfilling all stated requirements.

Question 5

A company uses AWS Glue Apache Spark jobs to handle extract, transform, and load (ETL) workloads.
The company has enabled logging and monitoring for all AWS Glue jobs. One of the AWS Glue jobs
begins to fail. A data engineer investigates the error and wants to examine metrics for all individual
stages within the job. How can the data engineer access the stage metrics?

Accepted Answer

A

Explanation: The Apache Spark UI is the primary interface for monitoring and debugging the execution of Spark applications. When an AWS Glue job runs on the Spark engine, AWS Glue provides access to the Spark UI. This UI offers a granular, real-time, and post-run view of the job's execution, including a detailed breakdown of jobs, stages, and tasks. A data engineer can use the Spark UI to examine metrics for each individual stage, such as duration, data shuffled, input/output records, and execution details, which is essential for diagnosing failures or performance bottlenecks at a specific point in the ETL process.

Question 6

A data engineer needs to create an Amazon Athena table based on a subset of data from an existing
Athena table named cities_world. The cities_world table contains cities that are located around the
world. The data engineer must create a new table named cities_us to contain only the cities from
cities_world that are located in the US.
Which SQL statement should the data engineer use to meet this requirement?
DEA-C01 practice exam questions

Accepted Answer

A

Explanation: The most precise and correct method to create a new Amazon Athena table from a query result is by using a CREATE TABLE AS SELECT (CTAS) statement. The statement in Option A correctly uses the CREATE TABLE tablename AS syntax, followed by a SELECT query that filters the citiesworld table for records where the country column is 'US'. This single operation creates the new table citiesus and populates it with the specified subset of data, storing the new data files in Amazon S3.

Question 7

A company has a data processing pipeline that includes several dozen steps. The data processing
pipeline needs to send alerts in real time when a step fails or succeeds. The data processing pipeline
uses a combination of Amazon S3 buckets, AWS Lambda functions, and AWS Step Functions state
machines.
A data engineer needs to create a solution to monitor the entire pipeline.
Which solution will meet these requirements?

Accepted Answer

D

Explanation: The most direct and efficient solution is to use Amazon EventBridge. AWS Step Functions natively emits events to EventBridge whenever an execution's status changes (e.g., SUCCEEDED, FAILED, RUNNING). An EventBridge rule can be configured to specifically filter for these status change events from the target state machine. This rule can then invoke a target, such as an Amazon SNS topic, to send notifications in real time. This approach provides a decoupled, event-driven mechanism for monitoring the entire pipeline's orchestration layer without modifying the pipeline's core logic.

Question 8

A marketing company uses Amazon S3 to store marketing dat
a. The company uses versioning in some buckets. The company runs several jobs to read and load
data into the buckets.
To help cost-optimize its storage, the company wants to gather information about incomplete
multipart uploads and outdated versions that are present in the S3 buckets.
Which solution will meet these requirements with the LEAST operational effort?

Accepted Answer

C

Explanation: Amazon S3 Storage Lens automatically collects bucket-level metrics, including “Previous version bytes/objects” and “Incomplete multipart upload bytes/objects.” The managed dashboard surfaces these data without scripting or report generation, satisfying both requirements with minimal operational effort. (AWS, “S3 Storage Lens – Metrics and dimensions,” Table: Incomplete MPU bytes; Previous version bytes).

Question 9

A company maintains a data warehouse in an on-premises Oracle database. The company wants to
build a data lake on AWS. The company wants to load data warehouse tables into Amazon S3 and
synchronize the tables with incremental data that arrives from the data warehouse every day.
Each table has a column that contains monotonically increasing values. The size of each table is less
than 50 GB. The data warehouse tables are refreshed every night between 1 AM and 2 AM. A
business intelligence team queries the tables between 10 AM and 8 PM every day.
Which solution will meet these requirements in the MOST operationally efficient way?

Accepted Answer

B

Explanation: AWS Glue can connect to the on-premises Oracle database through a JDBC connection. When job bookmarks are enabled and a monotonically increasing column (e.g., ID or timestamp) is specified, Glue automatically filters out rows it has already processed and copies only the new rows each run. A single nightly Glue job scheduled after the 1–2 AM warehouse refresh appends the daily increments to the existing S3 objects, giving one managed service, no replication instances, and minimal operational overhead while meeting the BI team’s 10 AM–8 PM SLA.

Question 10

A financial company recently added more features to its mobile app. The new features required the
company to create a new topic in an existing Amazon Managed Streaming for Apache Kafka (Amazon
MSK) cluster.
A few days after the company added the new topic, Amazon CloudWatch raised an alarm on the
RootDiskUsed metric for the MSK cluster.
How should the company address the CloudWatch alarm?

Accepted Answer

A

Explanation: The scenario describes a disk space issue (RootDiskUsed alarm) that occurred after adding a new topic, which implies an increase in data volume. The most direct and standard solution for insufficient storage capacity in Amazon MSK is to increase the size of the EBS storage volumes attached to the brokers. Option A proposes exactly this, along with the best practice of enabling automatic storage expansion. This feature uses CloudWatch alarms to monitor disk usage and proactively increases storage, preventing future alarms and potential service disruptions caused by full disks. This directly addresses the root cause—increased data storage requirements from the new topic.

Question 11

A telecommunications company collects network usage data throughout each day at a rate of several
thousand data points each second. The company runs an application to process the usage data in real
time. The company aggregates and stores the data in an Amazon Aurora DB instance.
Sudden drops in network usage usually indicate a network outage. The company must be able to
identify sudden drops in network usage so the company can take immediate remedial actions.
Which solution will meet this requirement with the LEAST latency?

Accepted Answer

B

Explanation: The core requirement is to analyze a high-velocity data stream (thousands of points per second) in real time to detect anomalies with the least latency. The combination of Amazon Kinesis Data Streams and Amazon Managed Service for Apache Flink is purpose-built for this use case. This architecture enables continuous, stateful processing of streaming data. An Apache Flink application can use time-based windows (e.g., sliding windows) to compute aggregates and compare them in real time, allowing for the immediate detection of sudden drops with sub-second latency, which is significantly faster than any polling-based approach.

Question 12

A company wants to combine data from multiple software as a service (SaaS) applications for
analysis.
A data engineering team needs to use Amazon QuickSight to perform the analysis and build
dashboards. A data engineer needs to extract the data from the SaaS applications and make the data
available for QuickSight queries.
Which solution will meet these requirements in the MOST operationally efficient way?

Accepted Answer

C

Explanation: The most operationally efficient solution is to use Amazon AppFlow, a fully managed integration service designed specifically for transferring data between SaaS applications and AWS services. AppFlow provides pre-built connectors that handle authentication, API interactions, and data transfer, eliminating the need for custom code development and maintenance required by Lambda-based solutions. By scheduling flows in AppFlow to populate an Amazon S3 bucket, the data engineering team can automate the extraction process with minimal operational overhead. AWS Glue can then catalog this data for easy querying by Amazon QuickSight.

Question 13

A data engineer must orchestrate a data pipeline that consists of one AWS Lambda function and one
AWS Glue job. The solution must integrate with AWS services.
Which solution will meet these requirements with the LEAST management overhead?

Accepted Answer

A

Explanation: AWS Step Functions is a serverless orchestration service that allows developers to coordinate multiple AWS services into flexible workflows. It provides direct, state-based integration for both AWS Lambda (Lambda:Invoke) and AWS Glue (Glue:StartJobRun). This serverless nature means there is no underlying infrastructure like servers or clusters to provision, patch, or manage, directly fulfilling the requirement for the "LEAST management overhead." The visual workflow also simplifies development, monitoring, and debugging of the multi-step pipeline.

Question 14

A company uses a variety of AWS and third-party data stores. The company wants to consolidate all
the data into a central data warehouse to perform analytics. Users need fast response times for
analytics queries.
The company uses Amazon QuickSight in direct query mode to visualize the data. Users normally run
queries during a few hours each day with unpredictable spikes.
Which solution will meet these requirements with the LEAST operational overhead?

Accepted Answer

A

Explanation: The solution requires a central data warehouse with fast query performance for analytics, low operational overhead, and the ability to handle unpredictable, spiky workloads. Amazon Redshift Serverless is the ideal choice as it is a purpose-built data warehouse designed for high-performance analytics. Its serverless architecture automatically provisions and scales compute resources to match workload demands, including unpredictable spikes. This auto-scaling capability, combined with not having to manage clusters, directly addresses the requirement for the least operational overhead while ensuring fast query responses for Amazon QuickSight in direct query mode.

Question 15

A retail company is using an Amazon Redshift cluster to support real-time inventory management.
The company has deployed an ML model on a real-time endpoint in Amazon SageMaker.
The company wants to make real-time inventory recommendations. The company also wants to
make predictions about future inventory needs.
Which solutions will meet these requirements? (Select TWO.)

Accepted Answer

A, B

Explanation: Amazon Redshift ML enables users to make predictions using SQL commands directly within their data warehouse. This addresses both of the company's requirements. For real-time recommendations, users can write a SQL query that invokes the existing remote SageMaker endpoint. Redshift ML handles the communication, allowing real-time predictions on data as it resides in Redshift (Option B). For predicting future inventory needs, Redshift ML can be used to create, train, and deploy a new forecasting model using data in Redshift. This model can then be used to generate batch predictions or recommendations on a schedule, fulfilling the second requirement (Option A).

Free Top Amazon/AWS DEA-C01 Actual Exam Questions