Cloud Computing

Developing a Bot Army

This may be a little different territory for this blog since it’s not directly related to data analytics. But, it still inhabits the same universe and I thought it was interesting enough to share. Over the last few months I had an opportunity to work on a project that took a unique approach to data acquisition. The data that we were interested in came from various sources. However, it was not always available from feeds or dumps or even from APIs. A good portion of the data was only accessible by visiting a website, submitting a search form and then drilling down through the results. Since we were going after a lot of data, this would have to be automated in some fashion. Basically what we needed to set up was a scalable screen scraping operation. Luckily, there is a fair amount of technology available to accomplish just this. The…

Analytics

It’s Accurate, But Is It Useful?

In my last post, I dug a little bit into the difference between accuracy and predictive value, and how those get confused when using “accuracy” in conversation. When people ask “how accurate is it?” they aren’t usually asking in the data scientist sense. They aren’t even asking the question I answered last time, “When the light turns red, what are the odds it’s a real problem?” They’re asking: “How useful is this thing in directing my attention where it needs to be?” Today I’ll give a tour of a a good tool for answering that and explain how we use it to tune results to clients’ specific needs. In short: we’ll bridge the gap between technical accuracy and usefulness. The ROC Curve Let’s revisit that student risk model. The usefulness of the model depends on how you can respond to it. We need a way to discuss model performance that…