It’s not every day that a heart diet controversy is truly settled. But earlier this month a report in the British Medical Journal solved the cold case of a nearly 50-year old heart study with a hidden, and crucial, finding. Better still, a landmark study published this week sharpens the point.
In 1966 the Sidney Diet Heart Study, named after the city in which it originated, randomly assigned 448 middle-aged men with a prior heart attack to either eat as they chose, or to eat a diet low in trans fats and cholesterol but rich in omega-6 polyunsaturated fats (primarily safflower oil). For seven years the researchers tracked heart attacks and deaths in both groups. The results shocked everyone: despite a sharp drop in cholesterol 6% more men died in the diet group, suggesting that 1 out of 18 died because of the diet. But the 1978 report offered no information about the causes of the deaths, and experts have often discounted the study as a potential fluke, arguing that men in the diet group may have died from unrelated, non-cardiac issues.
There is heated controversy over what constitutes a heart friendly diet. The American Heart Association recommends increasing intake of omega-6 fatty acids, while some researchers have suggested that doing so may increase heart attacks. Moreover, the widely accepted science impugning trans (‘saturated’) fats, and supporting polyunsaturated fats, is based largely on observational studies in which researchers observe diet and other habits among large groups. These studies often raise tantalizing questions, but they offer dubious answers. One can never tell if such findings represent the chicken or the egg: Does eating more fish lead to heart health, or do heart healthy people eat more fish? This kind of confounding explains why study results about the dangers or benefits of certain foods come and go with the wind, cycling like fashion trends.
Less trumpeted, however, is a small core of reliable answers from randomized trials of diet. The best known example may be the Lyon Diet Heart Study which showed dramatic effects of a Mediterranean diet (think Lyon, France). The Mediterranean diet, studied in hundreds of heart patients over nearly a decade, emphasizes fruits and vegetables, grains, white meat over red meat, and olive oil. Compared to an AHA diet focused on lowering cholesterol, the Mediterranean diet prevented deaths for 1 in 30 heart patients, and heart attacks for 1 in 18. That makes it three times more powerful than a statin drug for heart patients. Moreover, in an unprecedented effort, researchers from Spain this week published a paper in the New England Journal of Medicine showing that the same diet works equally well in men and women attempting to prevent a first heart attack or stroke.
Unfortunately, years-long, rigorous diet trials like the Lyon and Spain study are rare. An excellent example, however, would be the Sidney study.
Christopher Ramsden, a National Institutes of Health investigator and a researcher at the University of North Carolina, has been compiling data from diet trials for years. To determine once and for all the effect of omega-6 fatty acid diets Ramsden reached out to Boonseng Leelarthaepin, a young research assistant at the study’s inception and now the only surviving researcher from the Sidney group. Improbably, Leelarthaepin managed to locate the 9-track tapes containing the study data, and after converting the obsolete format into readable information, Ramsden’s group extracted the causes of death for each subject in the study.
The results are both remarkable and instructive: blood tests during the study showed that cholesterol and triglyceride levels dropped substantially in the diet group, precisely the intended effect. But the final outcome was a 6% increase in fatal heart problems—accounting completely for the difference in survival. The Sidney diet, using an exclusive increase in omega-6 fatty acids, increased coronary and cardiac deaths.
What do the findings mean for a heart healthy diet? First, diets rich in both omega-3 and omega-6 fatty acids—together—have, paradoxically, reduced heart attacks in randomized trials, and the Mediterranean diet is the most powerful, dependable example. With the study published this week we now have formidable proof that this works not just for those with heart problems but also for those at lowest risk, patients who have never had a heart attack or stroke.
The success of combining the two polyunsaturated fats, however, raises the question of whether omega-3 fatty acids may be the beneficial ingredient and omega-6 the dangerous one. Sadly, supplementing a diet with omega-3 fish oil pills has failed to provide beneficial effects on heart health in dozens of studies, (though the trials were short, averaging about two years). Thus until larger, longer trials can separate out the critical effects of each element, it appears that polyunsaturated fats should be taken in the form of food—not pills—and consumed together, not alone. Most importantly, exclusively increasing omega-6 fatty acids while lowering fats and cholesterol, is likely to be dangerous.
It took nearly a half century to get them, but these are answers about eating for heart health that are likely to last even longer.
Results of the largest and arguably most important trial ever of thrombolytics (clot-busting drugs) for acute stroke were published last week in The Lancet, and the study’s conclusions are breathtaking. Not because of the study results, which are unsurprising, but because the authors’ conclusions suggest that they have gone stark, raving mad.
The International Stroke Trial 3 (‘IST-3’) was a remarkable achievement. The study enrolled 3100 patients, nearly four times that of any previous stroke trial, randomly assigned either to treatment with intravenous thrombolytic drugs, or treatment without the drugs. But unlike earlier studies the ‘pragmatic’ design of IST-3 was unblinded, used no placebos, and included the elderly, the non-elderly, and those with strokes of all severities. In other words, it enrolled common stroke patients having common strokes, a real-world test of the drugs.
Thrombolytics have remained controversial for acute stroke partly because nine of the eleven major trials to date have demonstrated either no benefit or else harm. Supporters argued, however, that early treatment (0 to 3 hours from symptom onset) in the famous NINDS trial made the study unique (other trials went up to 4 or 6 hours), and justified recommendations for use in the early time period. The 3-hour cutoff faded, however, when a 2005 trial used the drug successfully between 3 and 4.5 hours, and again when a respected review group argued that the data suggest similar effects up to 9 hours. It is now apparent to all that NINDS was not unique, and the stroke world has been waiting with bated breath for a large, high quality effort to retest the fundamental question: do thrombolytics decrease death and disability in acute stroke?
In IST-3 the drug was given between 0 and 6 hours, and the data generated two clear findings: First, the drugs failed to reduce death or dependence at six months. In the thrombolytic group 36.57% were alive and independent, while in the control group the number was 35.13%, a difference of about 1%. The difference would have had to be roughly 5% or more to be considered anything other than a wash.
And second, there was no discernible relationship between timing of administration and drug effect. The drug looked good in the first three hours, but then harmful for the next 90 minutes, and then good again for the next 90. This is a biologically nonsensical (i.e. random) distribution, suggesting that time differences are not a likely mediator of drug effect.
Thus, in one fell swoop every important argument in favor of thrombolytics for acute stroke was dashed. Worse still, the trial was unblinded, and as part of the protocol patients were enrolled only if both they and their doctor considered the drug to be “promising, but unproven.” This is a distinct, and marked, advantage for the drug group. Patients and doctors tend to be more hopeful when a “promising” drug is given. Doctors and staff may treat more aggressively, or more attentively, and patients are inspired to work harder toward recovery. Non-blinded trials are known to significantly enhance the effect of any intervention in comparison to control groups. Given the results, it is thus quite possible that the unblinded design of IST-3 has hidden significant harms of thrombolytics.
Whence, then, the authors’ claim of benefit? The authors describe a “secondary exploration” of their data using ordinal analysis. This uncommon method of measurement examined if thrombolytics may have ‘shifted’ some patients toward better categories of outcome, despite not shifting them toward being alive or independent. Lo and behold, it appeared to be so (though only with the help of an unexplained statistical “adjustment”). Of course, in any group of exploratory analyses some will appear favorable by random chance—which is why there’s only one primary outcome in any trial—because if you keep flipping the coin and moving the goal posts, eventually you’ll hit . . . something. Most damningly, in a moment of clarity, the authors themselves have described ordinal analysis (in a separate paper about IST-3) as “not appropriate for the primary analysis of outcome.” And yet they write their conclusions as if the illusory “shift” were the primary outcome.
With advances in scientific literacy it has been years since I have seen a top journal allow authors to proclaim a conclusion in direct conflict with their own primary study results. And yet the authors blithely conclude that thrombolytics “improved functional outcome.” Worse, an accompanying editorial trumpets that “the role of stroke and emergency physicians is now not to identify patients who will be given rt-PA, but to identify the few who will not.” Welcome to Wonderland.
These statements feel not just forced, but frankly delusional. Has neuro gone psycho? The results of IST-3 indicate, at best, a profound disappointment (even the hallucinated benefit would be tinier than any previously claimed) and at worst the beginning of the end for thrombolytics in stroke. In either case, reality may be tough to handle, but it is not a matter of debate, or interpretation, or perception. The primary outcome failed. We have a phrase for that: no benefit.
Last week The Lancet published a meta-analysis of 27 statin trials, an attempt to determine whether patients with no history of heart problems benefit from the drugs—true story. The topic is controversial, and no less than six conflicting meta-analyses have been performed—also a true story. But last week’s study claims to show, once and for all, that for these very low risk patients, statins save lives—true story.
Actual true story: the conclusions of this study are neither novel nor valid.
The Lancet meta-analysis, authored by the Cholesterol Treatment Trialists group, examines individual patient data from 27 statin studies. Their findings disagree with an analysis published in 2010 in the Archives of Internal Medicine, and with analyses from two equally respected publications, the Therapeutics Letter and the Cochrane Collaboration.* Despite this history of dueling data the authors of last week’s meta-analysis, in a remarkable break from scientific decorum, conclude their report with a directive for the writers of statin guidelines: the drugs should be broadly recommended based on the new analysis.
As an editorialist points out, if implemented, the CTT group recommendations in the United States would lead to 64 million people, more than half of the population over the age of 35, being started on statin therapy—true story.
Where is the magic, you ask, in this latest effort? What is different? In some ways, nothing. Indeed just a year and a half earlier The Lancet published a meta-analysis of 26 of the same 27 studies, with the same results, by the same authors (true story, and an odd choice on the part of the journal). So the findings aren’t new. They are, however, at odds with other meta-analyses. Why? It is the way they calculated their numbers. This meta-analysis, like the earlier one from the same group, reports outcomes per-cholesterol-reduction. The unit they use is a “1 mmol/L reduction in low density lipoprotein (LDL)”, in common U.S. terms, a roughly 40-point drop in LDL.
That’s the magic: each of the benefits reported in the paper refers to patients with a 40-point cholesterol drop. Voilá. One can immediately see why these numbers would look different than numbers from reviews that asked a more basic question: did people who took statins die less often than people taking a placebo? (The only important question.) Instead, they shifted the data so that their numbers corresponded precisely to patients whose cholesterol responded perfectly.
Patients whose cholesterol drops 40 points are different than others, and not just because their body had an ideal response to the drug. They may also be taking the drug more regularly, and more motivated. Or they may be exercising more, or eating right, and more health conscious than other patients. So it should be no surprise that this analysis comes up with different numbers than a simple comparison of statins versus placebo pills. Ultimately, then, this new information tells us little or nothing about the benefits someone might expect if they take a statin. Instead it tells us the average benefits among those who had a 40-point drop in LDL.
But LDL drop cannot be predicted. Some won’t drop at all, some will drop just a bit, and some may drop more. Therefore the numbers here tell an interesting story about certain patients who took statins, but they have no relevance to patients and doctors considering statins. And yet, the latter group is the target of the study's concusions.
True story: in prior meta-analyses that found no mortality benefit the investigators simply looked at studies of patients without heart disease and compared mortality between the statin groups and the placebo groups. No machinations, no acrobatics, no per-unit-cholesterol. They took a Joe Friday approach (just the facts, ma’am), and found no mortality benefit.
Perhaps never has a statistical deception been so cleverly buried, in plain sight. The study answers this question: how much did the people who responded well to the drug benefit? This is, by definition, a circular and retrospective question: revisiting old data and re-tailoring the question to arrive at a conclusion. And to be fair they may have answered an interesting, and in some ways contributory, question. However the authors’ conclusions imply that they answered a different, much bigger question. And that is not a true story.
Guideline writers, doctors, patients, journalists, and policy makers will all have to pay close attention to avoid the trappings of deceptive data, dressed up as a true story.
*The Cochrane Collaboration analysis reports an overall mortality benefit with statins (RR=0.86), however their summary suggests that statins should be used for primary prevention “with caution.” In particular on p.12, after a discussion of the biases in many of the trials that led to their numerical finding, they clearly state that using statins for patients with anything less than a 2% per year risk of coronary events “is not supported by existing evidence.” This cutoff encompasses virtually all people that would be considered candidates for primary prevention.
At Albany Medical College, the master of physical examination was Dr. Beebe. The son and grandson of country doctors, Richard Beebe was an internist who made house calls. He earned his degree in the 1920’s at Johns Hopkins in the wake of giants like Osler, Cushing, and Halsted, and for six decades he roamed the hallways of my alma mater with a doctor’s bag in hand. I once saw a history and physical he had written. Penned in a small, looping cursive that slanted elegantly, the document was four full pages, a masterpiece.
Beebe worked at AMC between 1928 and 1998, a period in which medicine was transformed. With antibiotics, modern surgical technique, and advancing technologies, everything changed. Everything, that is, but the H&P.
It is therefore with some wistfulness and much respect that we at The NNT Group now aim to re-imagine this final vestige of pre-modern medicine. In a nod to Dr. Beebe, and to contemporary proponents of the doctor’s touch like Abraham Verghese, The Group has begun a project to help usher this skill set into the age of evidence.
Why does the H&P need an update? Because every element of the H&P is a medical test. Each potential finding theoretically helps us determine whether pathology is present. Tonsillar exudates? Pharyngitis is present. Leg not swollen or painful? Thrombosis is not. Shaking chills? Infection is possible. These elements, each one a medical test with a result, accrue and affect our thinking until ultimately we cross a threshold for action.
But in the age of evidence-based thinking, our understanding of how to use medical tests has evolved. We now understand that the utility and power of a test rests on two pillars: accuracy and its context. The need for accuracy is obvious, but why does context matter? Because no test is perfect.
Do not, for instance, test a man for pregnancy. It will only disappoint. While pregnancy tests may be intrinsically quite accurate, they are imperfect. Testing a man will give us either an answer we knew (he’s not pregnant), or an answer we know to be wrong (he’s pregnant). Both are unhelpful. Indeed testing a woman for pregnancy can also be meaningless, when it is done in the wrong context: immediately following an act of conception, for instance, the result will likely be wrong. But test a woman after a missed menses and a pregnancy test is powerful. Context trumps accuracy.
Of course, for a test to be helpful it must be accurate. But while accuracy is necessary, it is not sufficient. Any decrement in accuracy, and any step away from context, increases the chance of an incorrect or unhelpful result.
These are the realities of medical testing. Thus in order to increase the utility of an H&P, we must maximize accuracy and context for every element. Oddly, we never have. On medical rounds each morning physicians around the world put a stethoscope to the lungs and the hearts of their patients—without knowing how accurate such a maneuver is, or how relevant the findings will be.
In the spirit of progress, The NNT Group is proud to present a new section of the website designed for the Beebe in all of us: The Diagnostics Section. This area contains Likelihood Ratios for a variety of medical tests, most of them elements of the H&P. Likelihood Ratios are a combination of sensitivity and specificity, i.e. accuracy. And if you click on an LR on one of these pages you will find that you cannot use that LR without selecting a pretest probability, i.e. context.
Pretest probability is a test’s context. The pretest probability of a man being pregnant is zero, and the test is therefore meaningless. But what about the chance that a person with chest pain has an aortic dissection? During the H&P of such a patient we will have a pre-test probability, best-guess judgment about the likelihood of dissection (is it 1%? 10? 50?), and we may ask “do you have ‘tearing/ripping’ pain?” If the answer is ‘yes’ then we click on the LR number, drag the slider to our pre-test probability, and voilà: the likelihood of dissection is apparent.
Play with it. See how you like it. Let us know. There are a few conditions posted and many more to come.
Dr. Beebe believed deeply in, and was a true practitioner of, the Art of medicine. To me he was medical royalty, a bridge to history. A few weeks after I departed medical school Richard Beebe also departed, at the doctorly age of 98. We stand on the shoulders of giants like Beebe in respecting the Art of this science, as we endeavor to elevate the science of this Art.
(Cross-posted to The SMART-EM Blog)
To the keen observer, you might just notice a new section of our website: diagnostics! After we started churning out our reviews of NNTs for therapies, we realized that we needed a better resource out there for diagnostics as well. And what better to use than the JAMA Rational Clinical Exam series? If you’re not familiar with these articles, they summarize the likelihood ratios of signs, symptoms, laboratory values and imaging findings for a particular disease process.
We teamed up with a pair of hard-working, rockstar emergency physicians, Shahriar Zehtabchi and Rodrigo Kong, who have reviewed the Rational Clinical Exam series and summarized them. With a little web magic, we think our new Likelihood Ratio Reviews are the most interactive and easy-to-use tools yet! For findings with useful likelihood ratios (>2 for positives, <0.5 for negatives), you click on a number and get an interactive slider where you can estimate your patient's pre-test probability and see how the likelihood ratio for that particular finding increases or decreases the chance your patient has that particular disease. "Now," you say, "Graham, this is all fine and good, but hold up -- what, uh, is, a, uh... likelihood ratio?" Glad you asked. We've got a quick few paragraphs of explanation as well as some deeper, nitty-gritty evidence-based medicine info on our Diagnostics and Likelihood Ratios, Explained page. But simply, they combine sensitivity and specificity into one number, and do a better job than either alone at helping you decide if a particular finding or test helps you evaluate a patient for a disease — and if it does, how much it helps you.
It’s a lot of talking, and sometimes it’s easiest to just show you. We’ve got 5 likelihood ratio reviews up and running, with a bunch more almost ready for publication:
Like the idea? Hate it? Not working in your browser? Please send us an email, comment here, or send me a tweet! Tweet to @grahamwalker
In the last few months there have been tectonic shifts in the world of cancer screening. Peer-reviewed journals, lay press outlets, and prominent medical groups have all published new material with far-reaching implications. This post is about mammography and the import of these recent events, including an unfortunate review in the New England Journal of Medicine. In the coming weeks I will post about PSA testing.
For three decades public opinion and public policy on mammograms have been perfectly synchronized, and perfectly wrong. What started with a seemingly minor error in study interpretation snowballed into a prickly political movement with lobbyists, deep pockets, and a blind faith in early detection. Unfortunately, the original error, one I first described in Hippocrates’ Shadow, led to years of misunderstanding and misinterpretation. But recent study reports have fueled a new interest in the truth about mammography.
Dr. Laura Esserman published the most powerful and important of these articles in the Journal of the American Medical Association in 2009.[i] Dr. Esserman and her colleagues explore U.S. public health data, and point out hard truths. For starters they note that cancer diagnoses increased, and never came back down, after mammography screening was introduced. This is not minor. An effective screening test shouldn’t increase cancer diagnoses. It should allow doctors to find the same cancers, but earlier. There may be a bump when screening begins, but the numbers should soon return to their baseline. Mammograms, however, permanently increased these numbers, which means they identify ‘cancers’ that never would have been found. That’s not early detection, that’s extra detection.
This would be fine if the extra detection came with a life-saving benefit, i.e. if the dangerous cancers were being found early enough to make a life-saving difference. But Esserman points out that deaths never dropped (as they should have) after the introduction of mammography.
Less than a year after the Esserman paper, a study published in the New England Journal of Medicine reported breast cancer mortality statistics from Norway before, during, and after the introduction of mammography.[ii] Norway’s experience is notable because they used mammography in selected regions only. Breast cancer deaths dropped where mammography was phased in. Encouraging. But deaths also dropped where mammography wasn’t available, and the drop was of equal magnitude. This strongly suggests that breast cancer treatments, and not mammography screening, were responsible for the decrease.
A few months later, in March 2011, Dr. Peter Gøtzsche published a study of mammography. Gøtzsche, best known for authoring one of the first major reviews to find no mortality benefit with mammography, has now published dozens of follow-up papers including three this year. His March publication re-examined the mammogram data from yet another angle and again concluded no benefit.[iii] Gøtzsche is an expert in biostatistics and study methodology and his papers tend to be squeaky clean, which is often infuriating to his many detractors and challengers. Despite the often very emotional responses to Gøtzsche’s work,[iv],[v] one finding is clear: in randomized trials there is no evidence of an overall mortality benefit to screening mammograms.[vi]
The most recent addition, and arguably most crippling to the popular view of mammography, is an Archives of Internal Medicine study published online in October (as of this posting it is not yet published in print).[vii] It is a rather blunt report demonstrating that the lives of women whose breast cancers are detected by mammography are rarely, if ever, saved by the mammogram. In other words, the time difference between detection by mammogram and detection clinically (as a lump) is virtually never the difference between life and death. This dismantles the fundamental claim of mammography and discredits the I-had-a-mammogram-and-it-saved-my-life argument, an anecdotal and therefore inarguable refrain that is ubiquitous in popular discussions of mammography.
These recent studies have gained surprising traction in the popular press, culminating in a New York Times piece in November that began,
“After decades in which cancer screening was promoted as an unmitigated good, as the best — perhaps only — way for people to protect themselves from the ravages of a frightening disease, a pronounced shift is under way.”[viii]
But change is hard and there is always a fist-shaking codger. In September the New England Journal of Medicine took on that role by publishing a review outlining the wonderful benefits of screening mammography.[ix]
I feel sheepish pointing out the Journal’s lazy predilection for reviews that flout evidence and ennoble false expertise. But I also feel obliged to point out how an educated author, writing for an educated editorial board, at an occasionally wonderful journal, went off the rails. It is a cautionary tale.
Dr. Ellen Warner, the author of the review, is an established expert on breast cancer screening and breast imaging. She begins her September 15th review article with a dubious claim: that the small improvement in breast cancer deaths seen over the past twenty years is due in equal parts to advances in treatment and increases in mammography. Her reference is a widely publicized NEJM article from 2006,[x] which drew its conclusions using complex computer models. Computer models, of course, need inputs in order to produce outputs. So what were the inputs that led a computer to believe that mammography saves lives? They were obviously not trial data, as Dr. Gøtzsche has shown. Nor were they public health data, as Dr. Esserman has shown.
I emailed the author of the original paper (the paper itself was vague on the point), Dr. Don Berry. He was gracious and prompt, and explained to me that the computers were fed data regarding the stage and type of breast tumors diagnosed by mammography, and compared these data to stage and type of tumors found by other means. The computers then projected the benefits of mammography based on these differences.
This is weird. Predicting the behavior of cancer cells is famously shaky, a great failing of most early detection programs. Prostate specific antigen (‘PSA’), chest x-ray screenings for lung cancer, and ultrasound screening for ovarian cancer, for instance, have all failed. In each case the problem was prognoses based on tumor data. If looking at cancer, either on imaging or under a microscope, could bring certainty then all of these tests would save lives. But predicting the behavior of complex cells inside of an infinitely complex human body is profoundly difficult. Tumor characteristics are therefore about as reliable as a fortune cookie. To take these fortune cookies and use them in another arena entirely, i.e. to determine whether mammograms save lives, is just kooky.
In addition, the biases in this type of data are too many to count. Even if we believed the predictive ability of stage and size data, there are many reasons that mammogram-detected tumors are different than others. For one, they are famously from the subtypes of tumors that are less aggressive, and more likely to be ‘cancer’ that would never spread or cause a problem. In addition, any tumors found between mammograms are aggressive and rapidly growing (that’s why they were detected in between mammograms rather than during a mammogram). Which points out another bias: mammogram-detected tumors are typically slow-growing, which is why a mammogram detected them—and also why they’re more easily a treatable subtype. These are just a few reasons why mammogram-detected tumors enjoy a distinct and powerful false advantage in any comparison of mammogram-detected tumors to other tumors.
Despite all of this, soon after the original 2006 Berry paper, Don Berry’s study group began working with the United States Preventive Services Task Force.[xi] Thus tumor stage data became the bedrock for the USPSTF mammography guidelines. And in a wonderful example of closing the loop, those guidelines are cited as supporting evidence for Dr. Warner’s NEJM review. In other words, in 2006 Dr. Berry’s group fed biased tumor data to a computer, the NEJM published the results, the USPSTF then used these results for its recommendations, and the NEJM then published a review by Dr. Warner citing both of these prior papers as proof. This is a game of evidence telephone—and the very first call was a prank.
Her reliance on Berry’s tumor data is just one of Dr. Warner’s contortions. She also repeatedly asserts that mammography reduces mortality. With the right qualifier she might be able to get away with this, but she fails to make the distinction between breast-cancer mortality and overall mortality. Most authors, keenly aware of this controversy, are careful to distinguish.
What controversy? I’ve blogged and also published an article for the Journal of the National Cancer Institute on this topic, and briefly it goes like this: despite a half million women enrolled there is no mortality benefit in mammogram trials, as Dr. Gøtzsche has shown. This result was shocking when it was first published in 2000 because early trials had claimed that mammograms reduce breast cancer mortality. But mammograms never reduced overall mortality. Overall, or all-cause mortality, is the only valid way to measure the effects of any medical intervention.
Why? As noted, mammograms lead to more diagnoses. Not just earlier diagnoses, more diagnoses. And while many are benign, virtually all are treated. A fast-talking surgeon I knew once said, “when in doubt, cut it out.” And that’s just what he did. Thus women are more likely to have surgery and more likely to have radiation and chemotherapy if they have mammograms. These treatments occasionally cause fatal complications. But if the only outcome that we talk about from trials is ‘breast cancer mortality’ then fatal complications from unnecessary treatments are never counted.
The same is true for any medical treatment. In the 1980’s promising therapies for heart patients were withdrawn after large reviews showed that all-cause mortality was higher.[xii] [xiii] [xiv] In one case cholesterol-lowering drugs reduced deaths from heart attacks, but increased deaths overall.[xv] The pills were abandoned. (These were the fibrate drugs, incidentally, a drug class on the comeback; medical memories run short.) Thus it is essential to count all deaths that occur in any trial of a medical treatment, so that side effects and unanticipated consequences are all counted. In some cases these side effects negate the beneficial effect of the treatment.
The early confusion about how to count mortality outcomes led to a mistaken conclusion: that mammograms saved lives overall. But now we know better. In the massive Cochrane review published in 2000 by Gøtzsche (72 pages long, more than 450 references), breast cancer mortality was 15% lower in the mammography group. But all-cause mortality was the same in the two groups. In other words, mammograms didn’t save lives. Whatever life-saving benefit might have been attributable to mammography was offset by fatal consequences, such as complications from treatment.[xvi]
Thus the only way for Dr. Warner to retain even a soupçon of truth in her claims that mammography reduces mortality would have been to specifically state that she meant breast cancer mortality, and not overall mortality. She didn’t, making her statements incontrovertibly wrong.
Finally, Dr. Warner makes an argument that I have not heard before. She curtly declares that randomized trials of mammography screening are “irrelevant” because of changes in the disease and its treatment. She is, of course, entitled to this opinion and she may even be right. However this is an opinion that instantly renders most medical treatments irrelevant, since very few are based on recent trials. And in fact some mammography trials have just closed, or are ongoing, making this a bizarre proposal that reeks of an excuse to ignore their findings. Worse still, if Dr. Warner truly believed that there were no relevant trial data available then she could not logically claim a proven mortality benefit with mammography, something she does repeatedly. One cannot eat the mammography cake and have it too.
Breast cancer is a disease of overwhelming importance. Resources should be devoted to extinguishing it. Reviews like Dr. Warner’s are a travesty precisely because they ensure that the focus of research and innovation will be misplaced. Mammography is a failed intervention; if breast cancer screening is to be fruitful, we must look elsewhere. Moreover, money squandered on mammography shamefully deflects resources from the exciting progress in breast cancer treatment.
I am frustrated. The NEJM review was more than an opportunity to correct the record on breast cancer screening. It was a chance to rejoin science and society at a time when policy and data are increasingly distant, and increasingly in need of each other. Yes, there is controversy. Yes, some will be angry. But that is our own fault. It is time to confront our errors and correct our future, and it should be science, scientists, and peer-reviewed journals that lead the way.
[xii] Echt DS, Liebson PR, Mitchell LB, et al. Mortality and morbidity in patients receiving encainide, flecainide, or placebo. The Cardiac Arrhythmia Suppression Trial. N Engl J Med. 1991;324(12):781–788.
[xiii] The Cardiac Arrhythmia Suppression Trial II Investigators. Effect of the antiarrhythmic agent moricizine on survival after myocardial infarction. N Engl J Med. 1992;327(4):227–233.
[xiv] Lechat P, Packer M, Chalon S, et al. Clinical effects of beta-adrenergic blockade in chronic heart failure: a meta-analysis of double-blind, placebo- controlled, randomized trials. Circulation. 1998;98(12):1184–1191.
Cancer screening studies: Does sepsis count? How about being stabbed?
Imagine you have the grim task of tallying deaths. You’ve been asked to count the number of people who died due to lung cancer in a study during which subjects were randomly assigned to either undergo yearly screening with ‘CAT’ scans, or yearly checkups with no CAT scans. The study was aiming to determine if CAT scans reduce deaths due to lung cancer. You pore through the data, and become quickly vexed.
Your first problem: a number of people in the study had CAT scans that seemed to show cancer, and then died because of complications of the surgery to remove it. You’re not sure if these are deaths due to surgery or deaths due to lung cancer. In a few, the tumor was benign. A handful of others died of severe infections after intensive chemotherapy. Were these cancer deaths? There is also a group of people who died suddenly from a blood clot to the lung, which is more common in cancer patients. But people who don’t have cancer die suddenly from blood clots too. And then you come across a strange one: in one city the cancer treatment center was in a bad part of town, and two people were mugged and killed leaving the center. These can’t be cancer deaths, but… if they hadn’t been diagnosed with cancer, they wouldn’t have been there.
This seems silly. After all, you don’t want to make CAT scans look falsely bad because of the part of town they’re in. But, on the other hand, you wouldn’t want to ignore the deaths that occur when CAT scans lead to unnecessary surgeries, because that would make them look falsely good. Problem is, neither one of these deaths would show up in the study results if the only thing that was reported was ‘deaths due to lung cancer.’ And if you were a person thinking about whether or not you want to have a CAT scan to screen for lung cancer you would want to know about all of the possible outcomes that may befall you as a result of CAT scans, whether it was being helped or being harmed. And if one of these is more likely than the other, you want to know that too.
So how should deaths be counted in order to be fair and true in determining whether CAT scans reduce deaths? The answer is that we count all deaths. Period. We don’t pretend that we know how to divide it up, and we don’t ignore bad parts of town, or deaths due to unnecessary surgery, or deaths from chemotherapy. We acknowledge that classifying deaths is tricky business and we count them all, even people who die when their parachute doesn’t open because they went skydiving after being diagnosed on a CAT scan with cancer.
Bottom line is that screening tools like mammography and CAT scan lead to changes in behavior and downstream health consequences, and we want to know about all of them. They all matter if they can lead to death, whatever the path. And if the two groups were truly balanced and equal at the beginning of the study (i.e. the trial was truly randomized) then any major differences between groups at the end are very likely to be due to CAT scans. Any other way of counting deaths in a cancer screening trial allows biases to creep in, biases that can mislead in any number of directions.
In fact, the results of screening studies have been misleading us for decades, a fact increasingly apparent to public health researchers. ‘Death due to breast cancer’ was the outcome used in early mammogram studies, and it was a mistake. The only outcome that should have been used was ‘all-cause mortality’, i.e. the overall number of deaths in each group. Now we’re paying the price, as decades-long tracking studies from around the world are all coming to the same conclusion: mammograms have had little to no effect on breast cancer death rates (Norway, Denmark, Europe, the US). The frustrating reality is that all-cause mortality, the only unbiased outcome, was right in front of our noses all along and it was telling us the answer (no mammogram study ever showed a decrease in all-cause mortality). We didn’t need this three decade, multi-trillion dollar, failed experiment. Mammograms never did save lives, and if we had read the results correctly, we would have known it.
Needless to say, at TheNNT.com we do not accept disease-specific mortality as an outcome for any kind of study. We consider these outcomes fraught with problems, misleading, and not patient-centered. Patients don’t want to avoid a certain type of death—they just don’t want to die.
This is a long way of getting to the point: in our recent post of the NNTs from the National Lung Screening Trial the NNT to save a life was 217. This comes from the all-cause mortality numbers in the two groups (7.02% in the CT screening group and 7.48% in the x-ray group), a difference of 0.46%. Thus, for those who were confused by the difference between the study itself, which reported a benefit based on ‘lung cancer mortality’, this is how we came to our slightly different number. We believe that this is a much more valid, and more patient-centered approach to reporting data of all kinds.
NNT Reader Mail!
A curious reader asked us about our review of Heparin for Acute Coronary Syndrome. How do we reach a different conclusion than the Cochrane Collaboration in our assessment? This happens every now and then: we occasionally disagree with the conclusions of a systematic review source, even when we are reporting their summary statistics or their results. Strange? Well, we treat systematic review studies the same way physicians and scientists treat any study. In other words we examine the study, scrutinize the results, and sometimes conclude something different than the original authors concluded. Conclusions in most studies are interpretive, not factual. They are a way of translating the results. When our reading leads us to a different conclusion, and particularly when our conclusion is based on a more patient-oriented view, we report our conclusions and then explain the difference in the narrative section.
For the heparin review, the discrepancy between our answer and the answer in the Cochrane review lies in the varying time periods during which the outcomes were measured in key trials, and how they were combined in the Cochrane review.
The RISC trial was published in 1990. Roughly 800 high-risk men with “unstable coronary artery disease (CAD),” which included non-Q wave myocardial infarction (MI), were randomized to 4 treatments arms: Aspirin (ASA) + placebo, ASA+heparin, placebo+heparin, placebo+placebo. Heparin was administered for 5 days. The primary outcome was death or MI at 3 months, but outcomes were also assessed at 5 days and 1 month. Interestingly, patients were removed from the study after enrollment if their in-hospital ‘bicycle ergometry’ stress test was negative. This means that more people got heparin than were represented in the final study publication (a maneuver that would not be tolerated in a study published today), and the remaining subjects were therefore much more likely to have true angina and thus be unusually amenable to anticoagulant therapy.
Figure 2 (left) from the study beautifully illustrates the effect that heparin had. In the first week there was a tiny difference in ‘death or MI’ (really just heart attacks, since deaths were never different) in the ASA+heparin compared to the ASA+placebo group, but this difference, which was never statistically significant, vanished by 1 and 3 months. Incidentally, the real difference to notice in Figure 3 is between the two groups that got aspirin (the two lower lines) and the two that did not (the upper lines). That is clinically and statistically significant.
AP = ASA/Placebo AH = ASA/Heparin
The FRISC trial was published in 1996. Again, high-risk patients with “unstable CAD” were enrolled, including non-Q wave MIs. There were 1500 subjects, making it by far the largest trial contributing to the Cochrane Review analysis and the main driver of any reported ‘benefit’. Dalteparin was the heparin agent used. All patients received ASA and were randomized to 2 treatment arms: dalteparin vs placebo for 35-45 days. The primary outcome was ‘death or MI’ but, surprisingly, the authors chose to measure this outcome at 6 days for their primary outcome, even though they had nearly complete follow-up for these measures through 150 days.
Again, there was no mortality difference between groups. Patients in the dalteparin group experienced fewer MIs in the first week. This outcome measure was used in the Cochrane meta-analysis. But by 150 days, or roughly 5 months, there was no longer a difference in the rate of MIs between those receiving heparin and placebo, (p = 0.30). At this longer duration of follow-up, mortality was again identical between groups at 5.5% vs 5.4%. (Table 6, FRISC)
In conclusion there is certainly no mortality benefit associated with heparin use in ACS. This is remarkable when one considers how uncommonly high risk the subjects were in these studies, many of whom would today be classified as NSTEMI or STEMI. Second, while there does seem to be a small difference in the incidence of MI in patients given heparin vs. placebo in the first week, this benefit starts to vanish by 10 days and is not present at 3 to 5 months.
As a final note, in both trials randomization started when patients were admitted (in the RISC study up to 72 hours after admission, and in the FRISC study once on the wards), so the utility of this therapy in the ED management of ACS would not be supported by these trials even if they had shown a benefit.
Harms associated with heparin are well documented (though these studies didn’t track them closely) and the resources consumed by heparins of all kinds are immeasurable. Therefore we thought it was particularly important to focus on whether there are sustained and measurable benefits to heparin. We don’t see any. Do you? Let us know in the comments below!
Welcome to the new NNT blog! Expect some evidence-based commentary on recent stories in the news, deeper explanations about our reviews and our perspective on evidence and evidence-based medicine.
If you have suggestions, requests, or questions about a particular NNT review, please send us a message and we’ll try to address it as soon as possible.