Question 1

A company is using ML to predict the presence of a specific weed in a farmer's field. The company is
using the Amazon SageMaker linear learner built-in algorithm with a value of multiclass_dassifier for
the predictorjype hyperparameter.
What should the company do to MINIMIZE false positives?

Accepted Answer

C

Explanation: The primary goal is to minimize false positives, which are instances incorrectly predicted as positive (e.g., predicting a weed is present when it is not). This is equivalent to maximizing the precision metric, defined as True Positives / (True Positives + False Positives). To increase the precision score, the model must reduce the number of false positives. The Amazon SageMaker Linear Learner algorithm provides hyperparameters to optimize the model's classification threshold for specific metrics. By increasing the value of a precision-oriented hyperparameter, the training process is explicitly guided to favor a model that makes fewer false positive errors.

Question 2

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3. The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data. The ML engineer needs to use an Amazon SageMaker built-in algorithm to train the model. Which algorithm should the ML engineer use to meet this requirement?

Accepted Answer

A

Explanation: The problem describes a supervised classification task (fraud detection) using tabular data with significant feature interdependencies and class imbalance. The LightGBM algorithm, a gradient boosting framework using tree-based models, is the most appropriate choice. Tree-based ensembles are highly effective at capturing complex, non-linear interactions between features, which is a key requirement ("many of the features have interdependencies"). The Amazon SageMaker implementation of LightGBM also provides specific hyperparameters, such as isunbalance or scaleposweight, to directly address the class imbalance problem, making it superior to other options for this specific scenario.

Question 3

An ML engineer needs to use an ML model to predict the price of apartments in a specific location. Which metric should the ML engineer use to evaluate the model's performance?

Accepted Answer

D

Explanation: The task of predicting apartment prices is a regression problem, as the target variable (price) is a continuous numerical value. Mean Absolute Error (MAE) is a standard metric used to evaluate the performance of regression models. MAE calculates the average of the absolute differences between the predicted values and the actual values. This provides a straightforward, interpretable measure of the model's average prediction error in the same units as the target variable (e.g., dollars), making it highly suitable for this scenario.

Question 4

A company uses Amazon SageMaker for its ML workloads. The company's ML engineer receives a 50
MB Apache Parquet data file to build a fraud detection model. The file includes several correlated
columns that are not required.
What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?

Accepted Answer

D

Explanation: Amazon SageMaker Data Wrangler is specifically designed to simplify and accelerate data preparation for machine learning using a visual, low-code interface. For the task of dropping unnecessary columns from a 50 MB file, Data Wrangler offers the most direct and lowest-effort solution. An ML engineer can import the dataset into a Data Wrangler flow and apply the built-in "Drop column" transform through a point-and-click user interface without writing any code. This visual approach is significantly faster and requires less technical overhead than scripting a solution or configuring a separate processing cluster.

Question 5

A company has an application that uses different APIs to generate embeddings for input text. The
company needs to implement a solution to automatically rotate the API tokens every 3 months.
Which solution will meet this requirement?

Accepted Answer

A

Explanation: AWS Secrets Manager is the purpose-built AWS service for managing the lifecycle of secrets, including API tokens. It provides a native framework for automatic rotation. For secrets that AWS cannot rotate natively, such as third-party API tokens, Secrets Manager integrates directly with AWS Lambda. You can configure Secrets Manager to invoke a specific Lambda function on a schedule (e.g., every 90 days) to handle the custom logic required to connect to the third-party API, generate a new token, and update the secret value in Secrets Manager.

Question 6

An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is
stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that
minimizes processing time for the data.
Which file format will meet these requirements?

Accepted Answer

D

Explanation: Apache Parquet is a columnar storage file format optimized for performance in large-scale data processing. For complex data structures, Parquet's columnar layout significantly minimizes processing time by allowing Amazon SageMaker to read only the specific columns needed for analysis, rather than scanning entire rows. This drastically reduces I/O operations and accelerates data retrieval. Its support for efficient, splittable compression further enhances performance in distributed environments like those used by AWS services.

Question 7

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a central model registry, model deployment, and model monitoring. The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3. The company needs to run an on-demand workflow to monitor bias drift for models that are deployed to real-time endpoints from the application. Which action will meet this requirement?

Accepted Answer

A

Explanation: The requirement is to create an on-demand workflow to monitor bias drift for a deployed model. Amazon SageMaker Clarify is the specific service designed to detect bias in models and data, both before training and after deployment (bias drift). A SageMaker Clarify job can be run as a processing job. To create an "on-demand workflow" that can be triggered by a web application, an AWS Lambda function is the ideal serverless compute service. The Lambda function can be invoked via an API call from the application, and it will, in turn, initiate the SageMaker Clarify processing job to perform the bias drift analysis. This architecture directly and precisely meets all stated requirements.

Question 8

An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains
structured data and unstructured dat
a. The company's ML engineers are assigned to specific advertisement campaigns.
The ML engineers must interact with the data through Amazon Athena and by browsing the data
directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are
specific to their assigned advertisement campaigns.
Which solution will meet these requirements in the MOST operationally efficient way?

Accepted Answer

C

Explanation: The most operationally efficient solution is to use AWS Lake Formation's native capabilities for fine-grained access control. Lake Formation Tag-Based Access Control (LF-TBAC) is designed for this exact scenario. By assigning LF-Tags to Data Catalog resources (which represent the data in Amazon S3) based on campaign, and then granting permissions on those tags to the corresponding ML engineers' roles, the company can manage access centrally. This single policy framework is enforced for users accessing data through both Amazon Athena and directly in S3 (via Lake Formation's credential vending), meeting all requirements in a unified and scalable manner.

Question 9

HOTSPOT An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features The ML engineer will use the following feature engineering techniques to estimate the prices of the homes: • Feature splitting • Logarithmic transformation • One-hot encoding • Standardized distribution Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.)

Accepted Answer

CITY (NAME): ONE-HOT ENCODING  TYPE_YEAR (TYPE OF HOME AND YEAR THE HOME WAS BUILT): FEATURE SPLITTING  SIZE OF THE BUILDING (SQUARE FEET OR SQUARE METERS): LOGARITHMIC TRANSFORMATION

Explanation: The selection of each technique aligns with standard machine learning data preprocessing practices. City (name) is a nominal categorical feature. One-hot encoding is the appropriate method to convert these non-ordinal categories into a numerical format that an ML model can process, creating a separate binary feature for each city. Type_year is a composite feature containing two distinct pieces of information: the home type and its construction year. Feature splitting is necessary to separate these into two independent, more useful features ( type_of_home and year_built ). Size of the building is a continuous numerical feature. In real estate, features like size and price are often right-skewed. A Logarithmic transformation is used to handle this skewness, making the distribution more symmetric and helping to meet the assumptions of certain models like linear regression.

Question 10

HOTSPOT
An ML engineer is building a generative AI application on Amazon Bedrock by using large language
models (LLMs).
Select the correct generative AI term from the following list for each description. Each term should
be selected one time or not at all. (Select three.)
• Embedding
• Retrieval Augmented Generation (RAG)
• Temperature
• Token
MLA-C01 practice exam questions

Accepted Answer

1. EMBEDDING
2. RETRIEVAL AUGMENTED GENERATION (RAG)
3. TEMPERATURE

Explanation: • Embedding – converts words or passages into high-dimensional vectors that capture semantic meaning, letting the model compare related concepts numerically. • Retrieval Augmented Generation (RAG) – a pattern that first retrieves relevant external documents and then supplies them as additional context to the LLM, enriching its generated answer with up-to-date or domain-specific knowledge. • Temperature – a decoding parameter that scales the soft-max distribution during generation; higher values increase randomness/creativity while lower values make output more deterministic.

Question 11

HOTSPOT An ML engineer needs to use Amazon SageMaker Feature Store to create and manage features to train a model. Select and order the steps from the following list to create and use the features in Feature Store. Each step should be selected one time. (Select and order three.) • Access the store to build datasets for training. • Create a feature group. • Ingest the records.

Accepted Answer

1.  CREATE A FEATURE GROUP.
2.  INGEST THE RECORDS.
3.  ACCESS THE STORE TO BUILD DATASETS FOR TRAINING.

Explanation: The workflow for using Amazon SageMaker Feature Store follows a logical sequence. First, a feature group must be created; this acts as a container and defines the schema for the features. Second, data records are populated into this feature group through an ingestion process, typically using the PutRecord API. Finally, once the feature group contains data, it can be accessed to retrieve features and assemble a dataset for model training or to serve features for real-time inference. This sequence ensures the data structure exists before data is added, and data is present before it is used.

Question 12

HOTSPOT A company stores historical data in .csv files in Amazon S3. Only some of the rows and columns in the .csv files are populated. The columns are not labeled. An ML engineer needs to prepare and store the data so that the company can use the data to train ML models. Select and order the correct steps from the following list to perform this task. Each step should be selected one time or not at all. (Select and order three.) • Create an Amazon SageMaker batch transform job for data cleaning and feature engineering. • Store the resulting data back in Amazon S3. • Use Amazon Athena to infer the schemas and available columns. • Use AWS Glue crawlers to infer the schemas and available columns. • Use AWS Glue DataBrew for data cleaning and feature engineering.

Accepted Answer

1.  USE AWS GLUE CRAWLERS TO INFER THE SCHEMAS AND AVAILABLE COLUMNS.
2.  USE AWS GLUE DATABREW FOR DATA CLEANING AND FEATURE ENGINEERING.
3.  STORE THE RESULTING DATA BACK IN AMAZON S3.

Explanation: The most logical and efficient workflow begins with understanding the data's structure. AWS Glue crawlers are designed specifically to scan data stores like Amazon S3, automatically infer schemas from unstructured or semi-structured data (like the described .csv files), and populate the AWS Glue Data Catalog. Once the schema is defined, AWS Glue DataBrew provides a visual interface to clean, normalize, and perform feature engineering on the dataset. It is the ideal tool for handling sparse, unlabeled data without writing extensive code. Finally, the prepared, clean dataset must be stored. Amazon S3 is the standard and most integrated storage service for machine learning workflows on AWS, making the processed data readily available for model training with services like Amazon SageMaker.

Question 13

HOTSPOT A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (Cl/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket. Select and order the pipeline's correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.) • An S3 event notification invokes the pipeline when new data is uploaded. • S3 Lifecycle rule invokes the pipeline when new data is uploaded. • SageMaker retrains the model by using the data in the S3 bucket. • The pipeline deploys the model to a SageMaker endpoint. • The pipeline deploys the model to SageMaker Model Registry.

Accepted Answer

1.  AN S3 EVENT NOTIFICATION INVOKES THE PIPELINE WHEN NEW DATA IS UPLOADED.
2.  SAGEMAKER RETRAINS THE MODEL BY USING THE DATA IN THE S3 BUCKET.
3.  THE PIPELINE DEPLOYS THE MODEL TO A SAGEMAKER ENDPOINT.

Explanation: This sequence represents a standard MLOps continuous training and deployment (CI/CD) workflow. The process is initiated by a trigger when new data arrives. An Amazon S3 event notification is the correct mechanism to detect the new data upload and invoke a downstream process like an AWS CodePipeline, typically via Amazon EventBridge. Once triggered, the pipeline's first logical action is to use the new data to retrain the model with a SageMaker training job. After successful retraining and evaluation (an implicit step), the final action is to deploy the updated model artifact to a SageMaker endpoint to make it available for real-time inference, thus completing the automated deployment cycle.

Question 14

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a central model registry, model deployment, and model monitoring. The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3. The company needs to use the central model registry to manage different versions of models in the application. Which action will meet this requirement with the LEAST operational overhead?

Accepted Answer

C

Explanation: The Amazon SageMaker Model Registry is the purpose-built service for centrally cataloging, versioning, and managing machine learning models for deployment. The core organizational construct within the Model Registry is the "model package group" (or model group). A model group is used to collect and manage different versions of the same model. By registering each new model iteration as a new version within a specific model group, the company can track lineage, compare performance metrics, and manage the approval status for each version, fulfilling the requirement with the least operational overhead as it is a managed, integrated feature.

Question 15

A company regularly receives new training data from the vendor of an ML model. The vendor
delivers cleaned and prepared data to the company's Amazon S3 bucket every 3-4 days.
The company has an Amazon SageMaker pipeline to retrain the model. An ML engineer needs to
implement a solution to run the pipeline when new data is uploaded to the S3 bucket.
Which solution will meet these requirements with the LEAST operational effort?

Accepted Answer

C

Explanation: The most efficient solution with the least operational effort is to use Amazon EventBridge. Amazon S3 natively emits events for actions like object creation. An EventBridge rule can be configured to listen for the specific s3:ObjectCreated: event in the designated S3 bucket. Amazon SageMaker Pipelines can be set as a direct target for an EventBridge rule. This creates a direct, serverless, event-driven integration that requires no custom code or intermediate services, minimizing both setup and ongoing operational management.

Free AWS MLA-C01 Actual Exam Questions