jeffshuck.com

View Original

Back-of-the-envelope math, COVID-19, and you.

I’ve posted this simple analysis elsewhere but may as well post this here too. The intent of this post is to try to concisely parse out the actual data about COVID-19 and help us understand what is concerning and what isn't. I'm dramatically over-simplifying a number of factors to convey the basic concepts.

Because I'm getting asked and because I'm a data geek and because I'm a parent and because I care about people I've been looking at the raw data. The Washington Post and the New York times, among other places, have updated datasets at the links below. In a quick summary, you can look at it and with a bit of orientation, you can see what is concerning people.

Right now (evening of March 12), there are roughly 128K confirmed cases worldwide with 4,722 reported deaths. (Note that these numbers are likely already out of date because the data is updated regularly.) Wow, a 3.7% death rate? That sounds bad.

"But," you hear, "that is a misrepresentation, because the 'experts' think the death rate is really only 1% or less. Most people will get better." Yes, good. Okay, well -- for that to be true, it must mean there are many more actual cases than are reported (which makes sense, because many people will simply think they have a cold or the flu or can't be bothered to get treated or are stubborn like me). So to simplify things greatly, if the death rate is really 1%, then that means the actual cases are more like 472,200 -- 4,722 x 100.

Comparing that to the ratio of confirmed cases, 472K/128K means that for every 1 reported case, there are another 2.7 or so that go unreported. Okay then. Well that sounds reasonable.

So, now let's have a look at Washington state in the good ol' USA. 31 deaths. Using our 1%, that means we must have 3100 actual cases. So we'd expect 1/4 of those to be reported -- say 775 reported cases or thereabouts. But Washington has only 442 reported cases. What gives?

Well, this is the problem with back-of-the-envelope math. It doesn't work because you also have to factor in incubation time, growth rate, average community size, and so forth. Many of those factors are currently unknown.

And that's what is creating the alarm -- for Washington to have 31 deaths from 442 cases, it either means the death rate is way higher than what anyone wants to believe, or more likely, it means that way, way more people have the virus than we currently know about. And many of those people are meeting other people and waiting in line to buy toilet paper and writing long posts like this on Facebook. 

That brings us to one more aspect of this that is pretty easy to explain. Even a group of several thousand cases in a state of millions of people dramatically increases the probability that many people will get the virus. That’s not alarmism — it’s statistics. Here’s a simple non-virus example. 

How many people would you say need to be in a room to make it 99% likely that two people share the same birthday? You’d probably answer 365 people, right? Or maybe 366 people? 

Actually, the answer is 75 people — it is fairly easy to prove statistically that if 75 people are in the same room, it is 99.9% likely that two of them will have the same birthday. What’s more, if only 23 people are in the same room, there’s a 50% chance that two of them will share the same birthday! 

How can this be? It’s true because while the number of people in the room is small, the number of possible COMBINATIONS of birthdays is quite large. (If you’re still doubting me, search “birthday paradox” — you can quickly find the proof online.)

So, applying that back to COVID-19, it may not seem like 400 or even 4,000 cases is a lot. But the number of possible combinations of other people those cases may have infected is much larger. 

That's why there's a concern that the numbers could grow dramatically, as they did in Italy.

My point is to say, there is absolutely a risk of being alarmist. But also, there's a risk of doing nothing. From a purely data perspective, the current evidence suggest that the number of cases in the US is going to increase considerably. Let’s stay safe, hopeful, practical, and peaceful. Let's all wash hands. And maybe hunker down for a bit.

Most of all, I hope you look at the actual data yourself and make your own conclusions. Find it at: https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6 and at: https://www.washingtonpost.com/world/2020/01/22/mapping-spread-new-coronavirus/?arc404=true and at: https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html.

——

ADDENDUM.

For those interested, I highly recommend a quick look at Figure 1 in the JAMA analysis. The nutshell is, it shows the relationship between confirmed cases and actual cases. It is another call for taking action, now. Consider this: China locked down an entire province of 60 million people after 600 confirmed cases. They were lucky to have done so, because as we discuss above, there were far more actual cases by that point. As of this morning, the US has approximately three times that many confirmed cases... we are not moving fast enough. Data and graphic here: https://jamanetwork.com/journals/jama/fullarticle/2762130