Databinding with d3

Visualizing data is a big part of BlueCanary and d3.js plays a big role in building those visualization. D3 provides such a powerful toolset with tons of features and built in capabilities. Chief among those are:

  • DOM Manipulation
  • Data Fetching, e.g. Ajax
  • Data parsing, e.g. CSV, JSON, XML
  • Data Manipulation (Grouping, transforming, filtering, etc.)
  • And of course, SVG

There is definitely some overlap with jQuery‘s features but d3 takes a slightly different approach.

For example, jQuery more or less assumes you are starting with a DOM and wish to do something interesting with it. D3 on the other hands assumes that you have data and wish to iterate over it and display it in some interesting manner.

To illustrate what I mean, first start with an array of values, e.g.:
var data = [1, 2, 3, 4];

To display these values with jQuery you could write something like this:

data.forEach(function(d) {
$('body').append('<div>' + d + '</div>');
});

Nothing fancy. The resulting HTML looks like this:

<div>1</div>
<div>2</div>
<div>3</div>
<div>4</div>

In contrast, to display these values with d3 you could write something like this:

d3.select('body')
.selectAll('div')
.data(data)
.enter()
.append('div')
.text(function(d,i) { return d; });

The output is exactly the same as the jQuery snippet. But the first thing to notice is that there is no obvious or familiar looping construct. Since d3 assumes you are starting with an array of data and obviously need to loop over it, it does away with some of the ceremony of setting up a loop. It introduces its own syntax. So, in d3 the sequence

.selectAll()
.data()
.enter();

is its primary looping mechanism. There is quite a bit going on in those three lines but they form the basis for all of d3’s data binding. Another key element is the callback you see in this line:

.text(function(d,i) { return d; });

The callback – function(d,i) – mirrors the callback in the forEach method where you get access to each data element as the dataset is being iterated over. The callback is passed to methods that you use to set properties and attributes of container elements and shapes. This means that whatever property you are setting you have access to the current data element itself. That’s essentially how d3 achieves data binding.

To look at a more typical d3 example:

var data = [1, 2, 3, 4];
d3.select('body')
.append('svg')
.selectAll('rect')
.data(data)
.enter()
.append('rect')
.attr('height', function (d) { return d * 10; })
.attr('width', function (d) { return d * 20; })
.attr('x', function (d, i) { return d * 100; });

The SVG output is this:

<svg>
<rect height=”10″ width=”20″ x=”100″></rect>
<rect height=”20″ width=”40″ x=”200″></rect>
<rect height=”30″ width=”60″ x=”300″></rect>
<rect height=”40″ width=”80″ x=”400″></rect>
</svg>

And looks like this

boxes

Again, nothing fancy. The rectangles though are easily configured according to the data elements in the array. The above pattern is pervasive throughout d3 visualizations and illustrates the rudiments of d3’s approach to data binding. In a nutshell:

1) Looping over data with minimal ceremony
2) Providing access to data elements within the loop via callback

The example above also shows another prevalent pattern in d3 visualizations – method chaining. It may take a little getting used to since it behaves slightly differently than method chaining in jQuery. But all together you can create elegant code that can do a lot of interesting things with data.

Is Religion Negatively Correlated With Intelligence?

A Pitfall of Measuring Correlation

I saw this article on The Australian titled: “So, Who Are the Smartest Scientists?” reporting on a paper from Interdisciplinary Journal of Research on Religion.

Naturally, my first thought was “Data scientists are the smartest!” But then my second thought was “wait, how would you even measure that?” The article just says “IQ,” but then it goes on to say that scientists in physical sciences are less religious than ones in social sciences. One of the paper’s authors is quoted as explaining “This is predicted by their high IQ.”

This raised more questions:

Does the paper claim that smarter people are less religious?

Well, here’s the first sentence of the conclusion:

“There is sound evidence of a negative correlation between intelligence and religiosity and between intelligence and political extremism.”

So, yeah. It pretty clearly makes that claim.

What about method? Does the paper have reason to claim that?

I’m not going to analyze that here. The researchers were most likely working in good faith and their data probably support their conclusion for their sample. I recognize that this is the most important question and I’m “yada yada”ing it, but that’s not what I want to talk about.

Are there other explanations?

This is the interesting bit. I’m going to depart from the paper at this point. The data below are made up to illustrate a pitfall of data analysisI’m going to ask you to make some assumptions that may not be true. Just bear with me.

Let’s assume that religiosity and intelligence are independent and that they are in no way correlated. Let’s further assume that both religiosity and intelligence are useful or are correlated with useful traits in studying sciences. Finally, let’s assume that the “elite institutions” in the study have done an effective job of selecting people with an abundance of useful traits.

To illustrate the point, I’ve invented some data using a Box-Muller transform. This is not the authors’ data; I made it up. There is no correlation. Here’s the whole population of 100 pseudo-humans:

Random Scatter Plot

But remember, our assumption is that intelligence and religiosity are useful (or tend to occur alongside useful traits), and these are elite schools. They only hire the best of the best, so here are the 11 pseudo-humans who end up working at elite schools. They’re in red.

Elite Selections from the random scatter

What happens when we look at them? Here’s a plot of just the red dots with a linear fit. We find a negative correlation between intelligence and religiosity!

Just the Elites... What the what?

These data were randomly-generated normal distributions on both axes. The data were totally uncorrelated by design. The apparent negative correlation was created by the sampling plan. What you’re seeing is the relative rarity of people who are outliers on both distributions compared to the outliers on just one. If you measure any two traits that are correlated with participation in your sample group, you’ll show a negative correlation between them.

So… are religion and intelligence negatively correlated?

I wouldn’t say that based on this paper. I’m willing to go along with the conclusion that their data really do show a negative correlation for this small sample, but even there I would first look to alternative explanations of which I’ve given only one.

What’s the Lesson Here?

I’ll give you a couple:

1) Be very suspicious of simple relationships between data, especially when reported in non-science press.

2) Consult with experts before you decide to up-end your business because you see an unexpected (but crystal clear!) correlation between two factors. It could be nothing more than sampling bias.