Friday, April 17, 2015

HAI and aliens: The Drake equation in epidemiology

File:NASA-Apollo8-Dec24-Earthrise.jpgIn a 1961, Frank Drake introduced the following equation for the number N of civilizations in our galaxy with which radio-communication might be possible,
N = R x Fp x Ne x Fl x Fi x Fc x L
As described by the SETI institute,
  • R is the average rate of star formation in the galaxy, 
  • Fp is the fraction of those stars that have planets, 
  • Ne is the average number of planets that can potentially support life per star that has planets, 
  • Fl is the fraction of planets that could support life that actually develop life at some point, 
  • Fi is the fraction of planets with life that actually go on to develop intelligent life (civilizations), 
  • Fc is the fraction of civilizations that develop a technology that releases detectable signs of their existence into space, and 
  • L is the length of time for which such civilizations release detectable signals into space. 
Drake's purpose in writing this equation was to facilitate discussion at a meeting. Its importance is not the numerical prediction of communicative civilizations in the galaxy (note there are 7 factors in the equation and errors in each term will combine to make any calculation wildly uncertain) but rather in the framing of issues related to the search for alien life. That said, the equation tells a story. Assuming that these are the relevant factors, then if any the terms are zero, N is zero and we are likely to be alone. If none of them are zero, then even if they are exceedingly small, there is a chance that there is life somewhere in the galaxy. Moreover, it's unlikely that any of these terms are zero, given the huge size of the galaxy. In epidemiological terms, then, the equation helps to frame our thinking about the potential prevalence of life in the Milky Way galaxy.

Given that NASA opined recently that we're likely to have strong indications of life beyond Earth within a decade, it made me wonder about Drake-like equations in medicine and epidemiology. As a toy example, suppose that we write the number of patients contracting hospital-acquired infections (HAIs) yearly in the US as the product of several factors, say
N = Nhospital visits x Pcontact x Pdevelop disease x Pdisease reported
  • Nhospital visits is the number of patients visiting hospitals annually,
  • Pcontact is the probability that a patient comes into contact with infectious material (e.g., via environmental contamination or an infectious patient or HCW)
  • Pdevelop disease is the probability of developing disease if infected, and 
  • Pdisease reported is the probability that an infection is recognized and reported.
According to the CDC, there are 35.1M hospital discharges annually in the US, so Nhosp visits~35M. Now suppose that Pcontact and Pdevelop disease are both low, say 1% , and that we have excellent surveillance so that Pdisease reported ~1. If that could be true, then we would expect to see 3,500 HAI per year. We should be so lucky. Being more realistic, however, might yield a 10% change of coming into contact to infection, Pcontact~0.1, and a higher probability of contracting disease if infected, say 50%, so that Pdevelop disease ~ 0.5. In that case we get N=1.75M HAI annually, which is close to the CDC estimate of 1.7M.

How could this be decreased? The number of hospital visits, N, is unlikely to decrease drastically, so that's not really a control variable. Perhaps we could develop interventions to decrease Pcontact and Pdevelop disease. Obviously there is tremendous focus on reducing Pcontact through handwashing, alcohol based had rubs, contact precautions, better environmental cleaning, etc already. If Pcontact could be reduced by a factor of 10, from 0.1 to 0.01 -- seemingly a tall order -- N could be dropped to 175K. That may not be possible, but suppose we could achieve a factor of 2 improvement so that Pcontact ~ 0.05. If we could combine that with a similar decrease in Pdevelop disease by, say, better use of antimicrobials, then N could in turn fall from 1.7M to 438K. Thus, combination strategies could have great impact.

This is simply a back of the envelop calculation: the equation above is but an approximation and the estimates are completely arbitrary. Moreover, parameters will vary from facility to facility and even between patient populations (imagine how Pdevelop disease is likely to vary between transplant versus general surgery patients). That said, this toy model illustrates a simple point: Breaking a problem down into smaller pieces can be helpful in thinking about it.

While this way of thinking is not alien (pun intended) to biostatistics and epidemiology, and clearly has limitations, I think it's helpful for framing issues in one's mind. In addition to clearly laying out assumptions in whatever is being contemplated (in this case, HAI), toy model approaches can suggest what may be needed in order to get a better answer.

(image source: Wikipedia)

Saturday, April 4, 2015

The information tsunami: Riding versus drowning

File:Great Wave off Kanagawa2.jpgA few things have come across my Twitter feed in recent weeks that relate to cognition and the Internet. The first is an article by Thomas Erren et al on 10 elements of lifelong learning according to Richard Hamming. I'm a big fan of Hamming's ideas and research philosophy, and the authors do a nice job of updating some important points from his book. I recommend reading the paper; to wet your appetite, the first two rules they describe include
Rule 1. Cultivate Lifelong Learning as a “Style of Thinking” That Concentrates on Fundamental Principles Rather Than on Facts
Rule 2. Structure Your Learning to Ride the Information Tsunami Rather Than Drown in It
These strike a chord when thought about in the context of the recent study by Barr et al, which suggested that smartphones, as entrée to vast stores of information, can supplant critical thinking by making it easy for people to offload thinking to technology. Hamming, Erren et al, and Barr et al collectively remind us of the dangers of merely looking for facts on the Internet as opposed to concentrating on forming a coherent body of knowledge out of those facts.

Eric Topol captures this perfectly in a recent tweet:
The future of medicine is not about looking things up on the Internet; it's being able to generate one's own real world data+super-analytics
I couldn't agree more. Data and "super-analytics" need to be aimed at generating and conveying a coherent picture of health so that consumers of those analytics -- whether researchers, physicians, or non-specialists -- are informed and educated.

As the sea of facts gets larger and more accessible, becoming a tsunami, we face the risk of drowning, as Erren et al suggest, or becoming cognitively lazy, as Barr et al suggest. How do humans synthesize knowledge from data, especially when the data may be messy, variable, or uncertain? In the case of public health, the issue is hugely important, because people will synthesize their own knowledge based on facts they find compelling (e.g., the University of Google, "My science is named Evan"). How do we leverage the information superhighway to produce insight and decisions grounded in the relevant facts?

(image source: Wikipedia)