Free Amazon MLS-C01 Actual Exam Questions - Question 13 Discussion
A Data Scientist is building a linear regression model and will use resulting p-values to evaluate the
statistical significance of each coefficient. Upon inspection of the dataset, the Data Scientist discovers
that most of the features are normally distributed. The plot of one feature in the dataset is shown in
the graphic.
What transformation should the Data Scientist apply to satisfy the statistical assumptions of the
linear
regression model?
B imo, logs tame right skew better than polynomial or sinusoidal here.
B. Log transformation tends to reduce right skew and helps meet linear regression assumptions better than polynomial or sinusoidal here. Exponential would just stretch the skew more.
It’s B; log transformation usually fixes right skew better than others here.
A vs B? The exponential transformation would actually increase skewness if the data is already right-skewed, so that seems counterproductive. Since most features are normally distributed except this one, applying a log transform (B) is a common way to fix right-skewness and get closer to normality. Polynomial transformations usually help with non-linear relationships but don’t necessarily fix distribution shape. Sinusoidal doesn’t really fit here either. So B makes more sense to meet the normality assumption for linear regression residuals.
A vs B? The plot looks heavily right-skewed, so exponential (A) would make things worse by stretching that tail more. Log transformation (B) usually helps pull in the long right tail and make the distribution more symmetric, which fits with normality assumptions for linear regression. Polynomial (C) and sinusoidal (D) don’t really address skewness directly, so they seem less relevant here. I’d go with B for making the feature distribution more normal-like before modeling.
B, because log helps reduce skewness better than polynomial here.
Option B makes the most sense here. If the feature is skewed (which is what it sounds like from the question), a log transformation often helps to normalize it and stabilize variance, which better meets linear regression assumptions. Polynomial transformations mainly add complexity and capture non-linearity but don’t fix skewness or normality of residuals. The exponential and sinusoidal options don’t really fit typical approaches for this problem.
B/C? I’m wondering if just applying a polynomial transformation is enough or if the log transformation is better to handle skewness in the feature distribution. Also, does the question specify if the feature is skewed left or right? That might affect which transformation fits better. The plot detail would help clarify this.