Free Databricks Certified Data Analyst Associate Actual Exam Questions - Question 5 Discussion
meaningfully different?
Option E definitely makes the most sense here since extreme outliers skew the mean way more than the median, creating a noticeable difference. Also, A, B, C, and D don't really fit because missing values or variable types like boolean or categorical don’t impact the difference between mean and median in a meaningful way—mean and median aren't really applicable for non-numeric data. So it’s safe to say outliers are the key factor here.
Makes sense that outliers push the mean away from the median, so E stands out. Without extreme values, mean and median stay close, so A can be dismissed. E
No B, missing values don’t affect the relationship between mean and median since they’re usually just excluded or imputed. Extreme outliers (E) are the main reason these two stats diverge.
Good point on E; outliers almost always mess with the mean more than median. E
Yeah, E definitely feels right since extreme outliers can really drag the mean away from the median. I’d also rule out A because if there are no outliers, the mean and median usually align closely. For C and D, mean or median aren’t really proper or meaningful measures because those data types aren’t numeric in the usual sense. So E is the only case where you’d get a meaningful difference due to how outliers skew the average but not the median.
E/D? The extreme outliers in E would definitely pull the mean away from the median, making them differ a lot. D is interesting because with categorical data, the median isn’t really defined the same way as for numeric data, so conceptually they wouldn’t be comparable or meaningful. So while E shows a real numeric difference, D highlights a case where median might not even be applicable, which could count as a meaningful difference in interpretation. The other options are less likely to cause a meaningful difference between mean and median.
A/E? A has no outliers so mean and median should be close, but E with extreme outliers definitely makes mean and median differ more. So E still seems the best for meaningful difference.
I’d go with E. Extreme outliers can skew the mean a lot, but the median stays more stable, so they’d differ meaningfully there. The others don’t really affect mean vs median like that.