Testing Without Theory

From time to time I run across a finding in the medical literature along the lines of “Coffee Causes Bladder Cancer.” Or was it “Coffee Prevents Bladder Cancer”? Oops, maybe it wasn’t coffee at all. Maybe it was broccoli. Or cashew nuts.

I rarely ever report these results at my blog, with the possible exception of vitamin studies and I even regret reporting on those.

Many of these studies have the same basic problem: They involve testing without theory. Give me one group of people who drink coffee, another group that abstains and, say, several hundred health and demographic variables and I can almost guarantee you that coffee drinking (or not drinking) will correlate with something. It will probably correlate with 4 or 5 things.

The literature on spurious correlation has a number of entertaining examples of this. In one study (described here), the prices of a select list of NYSE stocks rose 87 percent of the time when the temperature reading fell at a weather station on Adak Island, Alaska. The authors note that with 3,315 stocks, chance alone insured that some were sure to be correlated with temperature measurements.

As for statistical significance, remember what a 95% confidence interval means. It means that 5% of the time, the relationship you have discovered could have been produced by random chance. If you have thousands of researchers mining thousands of data sets, they are almost guaranteed to find many spurious relationships and, unfortunately, they will get them published in peer reviewed journals as scholarly papers. The results will then appear in daily newspapers (what editor can resist a finding that coffee causes or prevents any malady?), and the public will be sorely misled.

The New Math 

What brings all this to mind is a Wall Street Journal article about two studies published in 2010 that examined whether the oral bisphosphonates commonly prescribed for osteoporosis increase esophageal and gastric cancer. They came to opposite conclusions despite using the same database:

Which conclusion was correct? Who can say? As a researcher from the National Institute of Statistical Sciences put it “There is enough wrong with both papers that we can’t be sure.”

The Journal article is focused on the difference between “randomly controlled clinical trials” and “observational studies,” which analyze previously gathered data. The author, Gautam Naik, regards the former as the “gold standard” for testing and apparently considers the latter technique suspect. Since observational studies are easier to do and less expensive, there are more of them. In fact, over the past decade there were 263,557 such studies reported in 11,600 peer reviewed journals, worldwide. Naik explains the problem as follows:

[O]bservational studies in general can be replicated only 20% of the time, versus 80% for large, well-designed randomly controlled trials, says Dr. Ioannidis. Dr. Young, meanwhile, pegs the replication rate for observational data at an even lower 5% to 10%.

Whatever the figure, it suggests that a lot more of these studies are getting published. Those papers can often trigger pointless follow-on research and affect real-world practices.

But hold on. Randomly controlled trials can be replicated only 80% of the time? So doctors who rely on the “gold standard” in treating their patients will be wrong one out of every five times?

In reality, they might be wrong more often than that. Groups from pharmaceutical companies and biotech venture capital firms have reported difficulty reproducing “foundational” academic research from academic labs. All of these groups have an interest in assessing the quality of academic reports before they invest millions of dollars in trying to translate seemingly promising research into something physicians can use to benefit patients. According to Bruce Booth of Atlas Venture, “the unspoken rule is that at least 50% of the studies published even in top tier academic journals…can’t be repeated with the same conclusions by an industrial lab. In particular, key animal models often don’t reproduce.”

Support for Booth’s assertion comes from C. Glenn Begley of Amgen and Lee Ellis, an M.D. Anderson Cancer Center researcher. They published a March 2012 paper in Nature calling for higher standards in preclinical cancer research. Of 53 papers chosen as “landmark” studies, only 6 had results that were reproducible by Amgen researchers. The authors note that some of the irreproducible clinical papers had spawned entire fields of literature with hundreds of papers expanding on elements of the original observation. Worse, some even triggered a “series of clinical trials.” In 2011, researchers at Bayer published broadly similar findings.

Since my background is economics, let me say for the record that almost none of these studies would ever be accepted for publication in an economics journal. The reason? They almost all involve testing without theory. Economics journals usually don’t publish results showing random “links” between variables, even when the relationship is statistically significant. Instead, authors are usually required to have a defensible theory about why a relationship might be expected to exist, and to derive testable implications from that theory. If the theory survives one test, it will generally be subjected to more tests. If it fails several empirical tests, it will generally be discarded.

Here, for example, is a simple theory of cancer (which may be right or wrong). Cancer susceptibility begins with genes. If you have a parent or grandparent who experienced a certain type of cancer, you are more likely to get the same cancer. But maybe your risk of a specific cancer is also heightened if you have a family history of some other type of cancer. Environment and your behavior with respect to that environment also matters. More education and more income enhance your ability to avoid cancer risks. So the more educated you are and the higher your income, the lower your risk.

There. That’s a theory with some plausibility. I believe it’s probably consistent with a lot of evidence. Now let’s take up the question of coffee drinking. It’s not enough to find a difference in cancer incidence between the drinkers and the nondrinkers. My theory requires me to also adjust for family history, education, income, etc.

Anyone who has ever done regression analysis knows that the adding or dropping a variable can cause the correlation coeffient for some other varable to change signs (to go from positive to negative, for example) or to go from “significant” to “not significant,” or vice versa.

Even when you are testing with a plausible theory you can find spurious correlations. But testing without theory almost guarantees it.

Comments (13)

Trackback URL | Comments RSS Feed

  1. Vicki says:

    Love the video.

  2. brian says:

    It does seem like there have been more observational studies in recent years.

  3. Devon Herrick says:

    A couple years ago I asked a drug researcher about the statistical techniques she used to assess the effectiveness of her target compounds. She said that if the relationship is so weak that you have to use statistics to measure the effect, then the relationship isn’t strong enough to be an effective drug.

  4. Chris says:

    I’m always surprised and annoyed at how few supposed scientists seem to actually understand science. You see this a lot in climate science too.

    I think a lot of it comes from natural human bias. If your job is to do research, and you get a grant or funding to conduct some research, most people would recognize that there is a bias to come up with a result that might please the source of your funding if they have an interest in the research. POM Wonderful funding research into the health benefits of their fruit drinks for instance. Likewise, people might recognize a potential for bias for your own personal views, such as a global warming alarmist conducting research to justify his own existing conclusions.

    There is another form of bias though, a bias of importance. No one wants to say “I spent the last 2 years of my life doing a study and discovered nothing of relevance or importance.” So there is intense internal pressure to make some sort of important conclusion for your research, you need to justify the expense of your employment in some way to ensure future job security. This is as insidious as other forms of bias, but not always as recognized. Almost every scientist will have a natural bias to showcase their field, research, or expertise, as either more important or more relevant than it may actually be.

  5. Charlie Bond says:

    Thank you for a useful reminder that we are surrounded by randomness, and that correlation is either the product of astute observation or the figment of someone’s imagination.

    Sadly, as a country, we have been subjected to considerable randomness in health care policy. Indeed, I give a keynote entitled The Hap Hassard History of Health Care in America. (Hap Hassard was my first boss who was the lawyer who created Blue Shield and helped beat Earl Warren’s attempt at a single payer system in California when Warren was governor (a one-vote margin). Hap later went to Washington to help AMA defeat national health insurance when Truman pushed the idea after WWII. He and a guy from Whittaker Baxter are thought to have created the slogan “Socialized Medicine.”)

    As an economist, John, I trust you would agree that the pricing of health care can certainly be said to be nothing but random. And therein lies the bulk of our problem with health care economics.

    So thank you, John, for the reminder that correlation is not necessarily corroboration and that if you are trying to sew things together it is not wise to rely on what seems, but what seams.

    Charlie Bond

  6. Don McCanne says:

    This reminds me of when I was a medical student at UCSF half a century ago, and two of our most distinguished professors debated whether or not smoking caused lung cancer. The professor arguing against a causal relationship pointed out that a then recent journal article noted a correlation between cigar smoking and rectal carcinoma. He commented that “that’s one helluva way to smoke a cigar!”

    That said, I do have a problem with the concept that studies published in economics journals somehow meet a higher standard than those published in medical journals. It is usually not the articles published in the medical journals that are deficient, but rather the fault is more often with the unwarranted conclusions that readers extrapolate from the articles’ findings.

    In contrast, articles published in economics journals may be testing theory, but the theories very often reflect ideological biases. Testing theories of consumer-directed health care may result in “desired” outcomes, but what is desired by an economist who wishes to reduce health care utilization regardless of the value of the health care delivered may be quite different from the results desired by an economist who wants to see that all of us receive the beneficial health care services that we should have.

    Medical science and economic science may share a common term, but they don’t share a common science.

  7. Jennie Fiedler says:

    I believe Chris makes a very valid point. If you go looking for a particular result, at some point you’re bound to get it. As a layperson, I like to apply the KISS principle when it comes to keeping myself healthy. Most things in moderation, exercise, good nutrition (which involves supplementation and a minimum of processed food), and an as chemical-free environment as I can manage. Meaning if I can’t read all the ingredients on something I’m going to clean my home and clothing with, or put on my body or use on my garden or yard, I won’t use it. As far as coffe goes, two cups in the morning and I’m done. I have consumed much more than that in the past and of course the point John makes in his blog isn’t actually about whether or not coffee causes cancer, but a little common sense goes a long way. I surmised years ago that the “medical industrial complex” isn’t looking for drugs that actually cure cancer because a cure wouldn’t be profitable. The FDA routinely pulls experimental treatment protocols, even when they show promise and patients are responding to them. I’m beginning to believe that in the case of cancer, saving lives ins’t the goal, just extending them for a while to make lots of money is. Needless to say, I wouldn’t put a lot of faith in medicine if I ever did contract a catastrophic illness like cancer.

  8. Paul H. says:

    Excellent post.

  9. William Stuart says:

    As I read John’s typical astute analysis, I’m reminded of one of my college banking economics texts, written by economist and baseball writer Lawrence Ritter. He found a statistically significant correlation between some economic variable (I believe it was fluctuation in some measure of the nation’s money supply)and the Washington Senators’ team batting average over a period of about 40 or 50 years. A true and accurate correlation, but no theory that it could possibly prove or support.

  10. Alieta Eck says:

    This is so funny. I dutifully clip out the positive articles about coffee, because I LOVE coffee. But your point is very well taken. The take-away message is to do everything in moderation, keep your bowels moving well, and enjoy life!

    Alieta Eck

  11. Kenneth A. Fisher, M.D. says:

    John, you are right-on, our literature is replete with this non-sense.

  12. Frank Timmins says:

    As long as the subject is statistical studies, bias and agendas is there any better example than the presumption of significant cancer risks of “second hand smoke”? With deference to the health dangers of “first hand smoke” and the impact of allergy problems for some resulting from “second hand smoke”, is there a better example of agenda driven spurious correlation than the second hand smoke/cancer connection?

  13. George Sack says:

    This reminds me of some of the efforts in Epidemiology – when there is no reasonable underlying mechanistic idea and then “interpretations” are developed after the expensive studies.