Analytics

Simpson, Eh?

Continuing my series on errors in analysis, I’m going do dig into a tricky issue of correlation analysis. To picture the problem, let’s say I work at a university. It’s called Mock U. Like a lot of universities, Mock U is trying to foster a diverse workforce. In particular, we want to ensure that our faculty positions are filled by more women than men to close an existing gender gap. Let’s look at the jobs offered to applicants by gender in two colleges: Men Women Adv Hired Applied % Hired Applied % Engineering 2 25 8% 3 30 10% Business 5 6 83% 4 4 100% Are we meeting our objectives? In Engineering, 25 men applied and 2 were hired (8%). Compare that to the 3 women hired from a pool of 30 applicants (10%). The engineering college favors women in hiring. In business, 5 men were hired from among…

Analytics

It’s Accurate, But Is It Useful?

In my last post, I dug a little bit into the difference between accuracy and predictive value, and how those get confused when using “accuracy” in conversation. When people ask “how accurate is it?” they aren’t usually asking in the data scientist sense. They aren’t even asking the question I answered last time, “When the light turns red, what are the odds it’s a real problem?” They’re asking: “How useful is this thing in directing my attention where it needs to be?” Today I’ll give a tour of a a good tool for answering that and explain how we use it to tune results to clients’ specific needs. In short: we’ll bridge the gap between technical accuracy and usefulness. The ROC Curve Let’s revisit that student risk model. The usefulness of the model depends on how you can respond to it. We need a way to discuss model performance that…