Lies, damned lies, and huge Excel reports.

I was in a meeting the other day trying to figure out a particularly thorny issue. We had a group of smart, experienced, opinionated people in the room -- my kind of meeting.

About halfway through the meeting, one of the participants produced a huge, multi-page Excel print-out. Many of the pages had colored graphs, three or four to a page. A second set of pages contained an assortment of tables correlating variables against one another. An accompanying narrative outlined that there was a "significant" relationship between some of the variables because the correlation coefficient was over .5.

"As you can clearly see from this graph..."

We all kind of sifted through the print-outs and gave them the old college try while the presenter tried to narrate. After a few minutes, it was clear that none of us, including the presenter, understood what the graphs meant. There was no description of the units or explanation of how they were derived. And so what happened was everyone started to use the graphs to explain their own point of view: "What I think they mean is..."

It was humorous, really, and luckily we all noticed it and started laughing and threw the spreadsheets aside. At the same time, the experience was a good reminder of how easy it is to manipulate -- and be manipulated by -- numbers. 

A few tactical takeaways:

  • Label graphs clearly for your audience, not for yourself. Provide notes if the graphs aren't clear -- but if the graphs aren't clear, rethink whether to use them at all.
  • Be careful of comparing correlations of a huge number of interconnected variables. In many systems (datasets), the variables are correlated with each other -- that might be why you are studying them in the first place. So comparing a correlation of .5 to a correlation of .6 and calling the latter "better" is more than a tad sloppy and isn't the whole story by a long shot. Variables interact. To study the interaction of a number of variables, look to regression analysis instead.
  • "Significance" means something very specific in statistics. It is not the same as "strongly correlated." When two things are correlated, it means they vary in relation to one another. That's it. A correlation does not answer any questions about the causes of the relationship. When a relationship is "significant," it means that there is a very low probability that the relationship has occurred by chance. Two variables could have a low correlation with high significance, or a high correlation with low significance. Think of it this way: You might have a great night on the town with someone you hardly know; in the same way you might have a really lousy day with your closest friend. How well the date went is not the same as how close your relationship is. (IMPORTANT NOTE: Significance will increase with the population size, so with large datasets you can find that all the relationships are significant mathematically even if they have no practical significance at all!)
  • And most of all, this hopefully (?) goes without saying, but don't base decisions on a report that no one can understand!

Two reviews of The Signal and the Noise.

Over the last month I've been asked repeatedly about Nate Silver's The Signal and the Noise – have I read it? What did I think? What did I think while I was reading it? I'm guessing that anyone who spends even a small amount of time working with and talking about data has been similarly swamped with questions.

The coolest new thing. 

The coolest new thing. 

I've not read the book yet – it's on the holiday list – but I did find two reviews of the book on Andrew Gelman's excellent Statistical Modeling, Causal Inference, and Social Science blog. The reviews are by statisticians, and offer differing opinions besides, and so the entire post makes for good reading for anyone interested in what seems to be one of the trendier cultural phenomena of 2012.


Cool - and sobering - visual on homicide data.

The always fantastic, highly recommended Data Blog at the Guardian just published this engrossing Tableau-based interactive visualization on worldwide homicide rates. Not the cheeriest of weekend topics, but worth a quick look. (I tried to embed the code here but Squarespace wasn't having it, and trouble-shooting javascript doesn't seem like a good Saturday morning activity!)

For example, there has been a lot of talk in Chicago this year about the alarming increase in homicides -- but Chicago doesn't even make this list! New York is the only U.S. city in the top 60 or so in the visualization. Makes one count one's blessings...

In any case, I thought it was worth sharing. One of the many things I like about Tableau is that shared visualizations like this come with the data embedded -- in other words, you can download the data yourself to inspect it. 

Click to interact.

Click to interact.


May 9th Webinar: Deeper Segmentation Techniques for Fundraising!

If you looked at the exclamation point in the subject heading and said, “Huh? That doesn’t look exciting at all,” you can just stop reading now.

But for those of you who get excited by the idea of fundraising segmentation (I know you’re out there!), I wanted to let you know I’m hosting a free webinar next week to explore practical fundraising segmentation techniques. I’m going to try some new visualization techniques that may or may not work, so that in and of itself will provide some excitement above and beyond the subject matter!

This webinar is a follow-up to my presentation at the Nonprofit Technology Conference last month, but will be a complement to it — attendance at that presentation is not a prerequisite. So for those of you who did not attend, I promise you’ll still get something out of the presentation.

It takes on May 9 at 1:00 Central, and you can register here. Hope to see you there!

Getting Started with Analytics: Some Reading

Since returning home from last week’s 2011 NTEN Nonprofit Technology Conference, I’ve been asked about a half-dozen times for reading suggestions for fundraisers looking to learn more about statistics, and in particular, segmentation. 

I have a couple of suggestions to get you started, but I want to say that the best way to start to learn segmentation is to export some data from your database — say, the results of your most recent initiative — open it in Excel, and just look at what you see. Sort the list by donation size. How many large gifts are there? How many small gifts? Do you notice clumping around certain numbers? Look at the addresses of the donors — are more from certain places than others? These are basic questions, but they are the first step towards viewing your donors as individuals rather than as one anonymous whole. I’ll write more on that in coming weeks, but the message is: Don’t be afraid to play with your data! You won’t break anything, I promise.

Now, as for the recommendations, I always start with two books. The first is the reassuringly titled Statistics Without Tears by Derek Rowntree. You’ll like this book immediately just by its size — unlike most statistics texts, you can carry it with one hand. It looks at you non-threateningly, as a small puppy might. It is a classic book, first published years ago, and there’s something comforting about the type and the graphs. It reminds me of cookies and tea at Grandma’s. More than the appearance, though, is the content. You may sweat a bit in places, but there will be no crying, and you’ll come out the other side knowing a bit more about the things you know you should know (what is a median, and why does it matter; what does the standard deviation measure, and why shouldn’t you be afraid of the word “deviation”) but don’t. 

The second is the much more recent but excellent Fundraising Analytics: Using Data to Guide Strategy by Joshua Birkholz. Unlike Rowntree’s book, this book was written after the secret consortium of business publishers decreed that all business books much contain a colon in their title. (Have you noticed this? The same rule applies to movie sequels.) But more importantly, this is a very recent and much-needed addition to the vast number of fundraising books on the market, most of which lack any real specificity when it comes to collecting data and understanding it, and a few of which are patently banal. Birkholz walks through a number of basic and more advanced analytics issues, including a treatment of RFM analysis and an introduction to regression. It won’t make you a statistics hero, but it will go a long ways towards improving your knowledge, particularly if you read it with an eye not only towards specific techniques, but towards how he approaches data and analysis more generally. High recommended.

Neither book is a cheap ticket, but both are worth it, and should get you started. Happy reading!