Peer group benchmarks are not appropriate for health care quality report cards
Article Outline
See related article on page 1041.
Publicly reported data on hospital performance are now more widely available than ever before. Risk- adjusted cardiac or cardiac surgery outcome reports are published regularly by health agencies in California,1 New York,2 New Jersey,3 Pennsylvania,4 and Ontario.5 In addition, risk-adjusted cardiac or cardiac surgery outcomes are reported as components of more comprehensive hospital report cards developed by proprietary vendors such as HealthGrades6 and Solucient.7 In each of these report cards, risk-adjusted mortality or morbidity at individual hospitals is implicitly compared with a normative standard, and the statistical significance of any difference is evaluated. This comparison is made in either of 2 ways: 1) by direct standardization, using a standard population to estimate how many events a hospital would have experienced if its case mix was similar to that of the standard population but its quality of care was unchanged; and 2) by indirect standardization, using a standard population to estimate the expected number of events at each hospital if its quality of care was similar to that in the standard population but its case mix was unchanged. The latter approach has become more widely used because it is more robust in the setting of multiple risk factors and hundreds or thousands of risk strata based on unique combinations of different risk factors.
With indirect standardization, one evaluates whether a hospital's observed number of adverse events is significantly greater than (or less than) what would be expected based on the characteristics of its patients, given the normative quality of care in the standard population. This evaluation is done either by estimating ratios or differences between observed and expected outcomes, using a patient-level model that omits hospital effects,8 or by estimating hospital effects within a hierarchical or mixed-level model.9 In either case, the choice of a standard population may be controversial. In a paper published in this issue of the Journal, Austin et al10 consider different “benchmarks” for performance profiling and use empirical data from the Ontario Myocardial Infarction Database to explore whether hospital report cards should “adjust for relatively immutable contextual characteristics in addition to patient characteristics.”
Consistent with previous studies (including at least 2 based on more detailed clinical data), Austin et al found that 30-day mortality after an acute myocardial infarction was lowest at teaching hospitals11, 12 and highest at small-volume nonteaching hospitals,13, 14 after adjusting for age, sex, 4 measures of cardiac severity, and 4 important comorbidities. The proportion of patients whose most responsible physician was a board-certified cardiologist had a marginally significant effect in the expected direction.15 Adjusting for these hospital characteristics reduced the number of low-mortality outlier hospitals from 4 to 1, and the number of high-mortality outlier hospitals from 3 to 0. This finding is not surprising; it is intuitively obvious that comparing hospitals with peer institutions will lead to a smaller group of outliers, on average, than comparing the same hospitals to all institutions in a state or province. Every teacher and professor deals with this problem when he or she decides whether to “curve” grades based on class (peer group) averages. If there is any independent association between hospital characteristics and outcomes, as multiple studies have demonstrated, then hospital-specific outcome rates are closer, on average, to their peer group mean than to the overall mean. It must be so.
The authors conclude from this analysis that “the choice of benchmark will significantly impact the conclusions of report cards,” and that “different stakeholders will have differing opinions as to the most appropriate benchmark.” Indeed, prior surveys of hospital administrators16 and collections of response letters from hospitals that are involved in public reporting programs17 confirm that many hospital leaders prefer to be evaluated against peer institutions than against the entire universe of competing hospitals. In my opinion, this argument is specious, for at least 3 reasons.
First, let us consider what is being compared. Austin et al evaluated the risk of death after a myocardial infarction, after adjusting for age, sex, 4 measures of cardiac severity, and 4 important comorbidities. Based on their Figure 2, the log odds difference of 0.57 between teaching hospitals and small volume nonteaching hospitals translates into actual 30-day mortality of about 4.5% and 7.7% at these 2 sets of hospitals, respectively. This is a very important difference from the public health perspective. To borrow the widely used metric of “number needed to treat” from clinical epidemiology, only about 31 AMI patients would need to be diverted from small nonteaching hospitals to prevent 1 death in the initial 30-day period after treatment. This is not a trivial difference in an unimportant outcome that should be covered up through risk adjustment; instead, the poorer performance of small- and medium-volume nonteaching hospitals (and perhaps other “peer groups”) should be exposed for all to appreciate.
Second, if one supports peer-group benchmarking, then how should one define the relevant peer group? Who should make this decision, and on what basis? Report card sponsors that try to define peer groups may head down the proverbial “slippery slope,” pulled between hospitals arguing for ever narrower peer groups and consumer advocates arguing for broader peer groups. At the extreme, hospital leaders may argue that their hospital is unique (perhaps because it is the only small suburban nonteaching hospital with staff cardiologists in county X) and therefore cannot be compared with any other hospital. Taken to this logical extreme, peer group comparisons become uninformative and useless. I would argue that the only defensible peer group for a public reporting program is geographically defined to include hospitals that compete (or potentially compete) in the same market.
Finally, as Austin et al describe and their Figure 4 illustrates, the mortality distributions at different types of hospitals overlap substantially. The same point has recently been made in critiquing the Leapfrog Group's use of fixed volume thresholds for evaluating “evidence-based hospital referral” practices.18, 19 It is clearly possible for small- and medium-volume nonteaching hospitals to perform as well or better than the average teaching hospital. Given that small- and medium-volume nonteaching hospitals are clearly capable of excellent performance, why shouldn't we hold every hospital fully accountable? Why should rural and small-volume hospitals be held to a lower standard, especially since many of these small hospitals compete directly against larger institutions in the market for hospital services? Why should teaching hospitals be held to a higher standard that would obscure the true benefit that the average teaching hospital offers to its patients and its community? Shouldn't hospitals that are performing below the community norm recognize that fact, even if they performing at the same level as their peer group? Should public policy be driven by a perceived need to protect the fragile egos of administrators and physicians at low-performing hospitals? These are philosophical questions that cannot be answered using the empirical data presented by Austin et al.
In 2004, we as health care providers live in an era of accountability and continuous evaluation. Those of us who practice medicine are subject to the professional requirements of licensure, certification, and recertification. Surely no one would suggest that physicians should be licensed, certified, or recertified based on examination scores that are benchmarked to “peer group” colleagues who are of similar age, graduated from similar training programs, or work in similar practice settings. Why should hospitals be treated any differently?
References
- Available at: http://www.oshpd.ca.gov/HQAD/HIRC/hospital/Outcomes/CABG/index.htm and http://www.oshpd.ca.gov/HQAD/HIRC/hospital/Outcomes/HeartAttacks/index.htm. Accessed May 10, 2004
- Adult cardiac surgery in New York State 1998–2000; Percutaneous coronary interventions (angioplasty) in New York State, 1999–2001 report; Pediatric congenital cardiac surgery in New York State 1997–1999. Available at: http://www.health.state.ny.us/nysdoh/heart/heart_disease.htm. Accessed May 10, 2004
- Cardiac surgery in 2000 in New Jersey: a consumer report. Available at: http://www.state.nj.us/health/hcsa/cabmenu.htm. Accessed May 10, 2004
- Pennsylvania's guide to coronary artery bypass graft (CABG) surgery 2002. Available at: http://www.phc4.org/reports/cabg/02/default.htm. Accessed May 10, 2004
- Enhanced Feedback for Effective Cardiac Treatment (EFFECT). Available at: http://www.ccort.ca/EFFECT.asp. Accessed May 10, 2004
- Available at: http://www.healthgrades.com. Accessed May 10, 2004
- 2003 Solucient 100 top hospitals: cardiovascular benchmarks for success. Available at: http://www.100tophospitals.com/media/cardio03_facts.asp. Accessed May 10, 2004
- . Specification issues in measurement of quality of medical care using risk-adjusted outcomes. J Econ Soc Meas. 2000;26:267–281
- . Statistical methods for profiling providers of medical care (issues and applications). J Am Stat Assoc. 1997;92:803–814
- Austin PC, Alter DA, Anderson GM, et al. The impact of the choice of benchmark on the conclusions of hospital report cards. Am Heart J 2004;148:1041–6
- Severity-adjusted mortality and length of stay in teaching and nonteaching hospitals (results of a regional study). JAMA. 1997;278:485–490
- Relationship of hospital teaching status with quality of care and mortality for Medicare patients with acute MI. JAMA. 2000;284:1256–1262
- The association between hospital volume and survival after acute myocardial infarction in elderly patients. N Engl J Med. 1999;340:1640–1648
- . Relationship between annual volume of patients treated by admitting physician and mortality after acute myocardial infarction. JAMA. 2001;285:3116–3122
- Outcome of acute myocardial infarction according to the specialty of the admitting physician. N Engl J Med. 1996;335:1880–1887
- . The California Hospital Outcomes Project (how useful is California's report card for quality improvement?). Joint Comm J Qual Improve. 1998;24:31–39
- Annual report of the california hospital outcomes project. Volume one: study overview and results summary. Sacramento, Calif: Office of Statewide Health Planning and Development; 1993;
- Hospital coronary artery bypass graft surgery volume and patient mortality, 1998–2000. Ann Surg. 2004;239:110–117
- . The volume-outcome relationship (from Luft to Leapfrog). Ann Thorac Surg. 2003;75:1048–1058
PII: S0002-8703(04)00406-5
doi:10.1016/j.ahj.2004.06.012
© 2004 Elsevier Inc. All rights reserved.
Refers to article:
- Impact of the choice of benchmark on the conclusions of hospital report cards
