Free Amazon MLS-C01 Actual Exam Questions - Question 12 Discussion
A bank's Machine Learning team is developing an approach for credit card fraud detection The
company has a large dataset of historical data labeled as fraudulent The goal is to build a model to
take the information from new transactions and predict whether each transaction is fraudulent or
not
Which built-in Amazon SageMaker machine learning algorithm should be used for modeling this
problem?
B/D? I get why D’s tempting since RCF is good for anomaly detection, but here they specifically mention a large labeled dataset, which suggests supervised learning. That usually points to XGBoost (B) since it’s great for classification with labeled data. If it was purely unsupervised or no labels, RCF would make more sense. Also, K-means (C) is for clustering without labels, so that’s out. Seq2seq (A) is more for sequence prediction, not really fraud detection. So I’d go with B based on the supervised nature of the task.
D imo, Random Cut Forest is actually designed for anomaly detection, which is pretty relevant for spotting unusual transactions like fraud. Unlike XGBoost (B), which is a supervised classifier, RCF works unsupervised and can detect outliers without needing balanced labels. So if the fraud cases are rare or the data is skewed, RCF might catch the subtle weird patterns better than a standard classifier. That said, B is still good if you trust the labels and want a solid predictive model, but RCF fits the anomaly angle more naturally here.
Option B also stands out because it handles tabular data well and is widely used for fraud detection.
This isn’t about clustering or anomaly detection, so C and D are out since K-means is unsupervised and RCF is mainly for anomaly detection without labels. Seq2seq (A) is more for sequence prediction like translation, not classification. With labeled data, B (XGBoost) makes the most sense as it’s a strong supervised classifier and widely used for fraud detection.
B, because labeled data calls for a supervised method, and XGBoost is solid for classification.
B, since the data is labeled and XGBoost is great for classification tasks.
B/C? Just wondering if they mentioned whether the data is labeled or not, since K-means is unsupervised and XGBoost needs labels. The question says labeled but not sure if that affects the best choice.