Grading quality of evidence and strength of recommendations in clinical practice guidelines
Journal compilation Ó 2009 Blackwell Munksgaard
Grading quality of evidence and strength of recommendationsin clinical practice guidelines
Part 1 of 3. An overview of the GRADE approach and grading qualityof evidence about interventions
The GRADE (Grades of Recommendation, Assessment, Development, and
Evaluation) approach provides guidance to grading the quality of underlying
evidence and the strength of recommendations in health care. The GRADE
systemÕs conceptual underpinnings allow for a detailed stepwise process that
defines what role the quality of the available evidence plays in the development
of health care recommendations. The merit of GRADE is not that it eliminates
judgments or disagreements about evidence and recommendations, but rather
that it makes them transparent. This first article in a three-part series describes
1Department of Epidemiology, Italian National
the GRADE framework in relation to grading the quality of evidence about
Cancer Institute Regina Elena, Rome, Italy;
interventions based on examples from the field of allergy and asthma. In the
2Department of Medicine, Jagiellonian University
GRADE system, the quality of evidence reflects the extent to which a guideline
School of Medicine, Krakow, Poland; 3Department
panelÕs confidence in an estimate of the effect is adequate to support a particular
of Medicine and Social and Preventive Medicine,
recommendation. The system classifies quality of evidence as high, moderate,
School of Medicine and Biomedical Sciences, StateUniversity of New York at Buffalo, Buffalo, NY, USA;
low, or very low according to factors that include the study methodology,
4Iberoamerican Cochrane Center, Servicio de
consistency and precision of the results, and directness of the evidence.
Epidemiología Clínica y Salud Pfflblica, Hospital deSant Pau, Barcelona, Spain; 5Centro deInvestigación BiomØdica en Red de Epidemiología ySalud Pfflblica (CIBERESP), Spain; 6Allergy/Immunology Section, Respiratory Institute,Cleveland Clinic, Cleveland, OH, USA; 7Departmentof Medicine, McMaster University, Hamilton, ON,Canada; 8Department of Internal Medicine andPsychiatry, Duke University and Durham VA MedicalCenter, Durham, NC, USA; 9Centre for Reviews andDissemination, University of York, York, UK;10HTA-Zentrum, Universität Bremen, Bremen,Germany; 11School of Population Health, Faculty ofMedical and Health Sciences, University of Auckland,New Zealand; 12Service des Maladies Respiratoires,Hôpital Arnaud de Villeneuve, Montpellier, France;13Inserm UMR 780, France; 14Department of ClinicalEpidemiology and Biostatistics, McMasterUniversity, Hamilton, ON, Canada
Key words: clinical practice guidelines; evidence basedmedicine; grading.
Holger J. Schünemann MD PhDDepartment of Clinical Epidemiology & BiostatisticsMcMaster University Health Sciences Centre,Room 2C10B1200 Main Street West HamiltonON L8N 3Z5Canada
Accepted for publication 16 November 2008
When offered a diagnostic procedure or a treatment
Guideline panels develop recommendations on the basis
option, patients ask themselves and the clinicians taking
of the balance between the desirable and the undesirable
care of them about the benefits and downsides of that
consequences of the diagnostic or therapeutic options in
choice. They ask: What will I gain? Will I feel better
question. They will recommend the option that results in
(reduced symptoms or morbidity and improved quality
greater net benefit and recommend against the option
of life)? Will I live longer (reduced mortality)? Patients
that results in greater net loss. The strength of their
also ask: What will I lose? Is it safe and will I dislike
recommendation will depend on the extent to which they
some aspects related to the intervention (adverse events,
can be confident that desirable effects outweigh undesir-
burden – extra time and effort)? How much will it cost
able effects, or vice versa. A systematic approach to
me? Thus, the decision to choose among the options
grading the strength of recommendations can minimize
depends on the balance between their desirable and
bias and aid interpretation (4, 5). The Grades of
undesirable consequences. This balance weighs not only
Recommendation, Assessment, Development, and Eval-
what patients will gain or lose but also how much they
uation (GRADE) working group has conducted a review
will gain or lose (one can estimate it based on the
of existing grading systems and developed a system for
evidence from current research), and how important are
grading the quality of evidence and strength of recom-
the gains and losses for them (patientsÕ values and
mendations that addresses shortcomings of prior systems
preferences for the different outcomes and interven-
(4, 6–9). The resulting GRADE system has a number of
advantages over other grading systems (Table 1). These
The role of a clinician is not only to order a diagnostic
advantages are reflected in the increasing number of
test or prescribe a treatment, but also to advise patients –
professional societies and organizations endorsing or
sometimes to decide for them – which of the available
using the GRADE system – examples include the
tests or treatments is likely to be most beneficial and
American College of Chest Physicians (ACCP) (10), the
American Thoracic Society (ATS) (7), the British Medical
As we cannot predict the future, we always have to
Journal (11), the Cochrane Collaboration (12), the
make these decisions under uncertainty about the
Endocrine Society (13), the European Respiratory Society
outcomes for a particular patient. Optimal decision-
(ERS), Infectious Disease Society of America (IDSA),
making requires informing these decisions with the best
Surviving Sepsis Campaign (14), UpToDateÒ (15), and
the World Health Organization (WHO) (16) among
experience of the effect of similar management of
others (a comprehensive list of endorsing organizations
similar patients). Clinical practice guidelines can help
is available at http://www.gradeworkinggroup.org). Most
clinicians and patients make these decisions but their
recently, the Allergic Rhinitis and its Impact on Asthma
application is not always easy as every patient isdifferent.
Table 1. Merits of the GRADE system for grading quality of evidence and strength
of recommendations in comparison to other systems
Clinical practice guidelines offer recommendations for
1. Clear separation between quality of evidence and strength of recommendations*
diagnostic procedures or treatment options for typical
2. Explicit and comprehensive criteria for downgrading or upgrading quality of
patients. They are Ôsystematically developed statements
to assist practitioner and patient decisions about appro-
3. Explicit consideration of the relative importance of various outcomes to patients
priate health care for specific clinical circumstancesÕ (1).
4. Explicit acknowledgement of values and preferences assumed when making
The purpose of guidelines is Ôto make explicit recom-
5. Transparent process of moving from evidence to recommendations
mendations with a definite intent to influence what
6. Explicit advice to make recommendations about the most appropriate course
clinicians doÕ (2). Clinical decisions – and also the
of action, even when very little evidence is available
related recommendations and their strength – depend on
7. Grading the strength only for recommendations about the diagnostic or
both the research evidence and the values and prefer-
therapeutic course of action, but not about prognosis or etiology
ences of patients. For clinicians and patients to be
8. Clear and pragmatic interpretation of ÔstrongÕ and ÔweakÕ recommendations
confident that following these recommendations will do
9. Balance between simplicity and methodological comprehensiveness
more good than harm, guidelines need to be evidence-
*In the context of clinical practice guidelines: Quality of evidence, the extent to
based, transparent, and explicit about whose values and
which our confidence in an estimate of the treatment effect is adequate to support
preferences were taken into account and how they
particular recommendation; Strength of recommendation, the extent to which we
influenced the final recommendations. Systematic ap-
can, across the range of patients for whom the recommendations are intended, be
proach, transparency, and explicitness also facilitate
confident that desirable consequences of an intervention outweigh undesirable
implementation, adaptation to local circumstances, and
consequences (or vice versa, that undesirable consequences outweigh desirableones – in this case one would recommend against this intervention).
Journal compilation Ó 2009 Blackwell Munksgaard Allergy 2009: 64: 669–677
Grading quality of evidence and strength of recommendations
(ARIA) guideline panel decided to use the GRADE
recommendations would answer. As guidelines include
system for the 2009 revision of the guidelines and to
recommendations about the most appropriate course of
follow the GRADE approach in the future (17).
action, they should answer clinical management questions
We suggest conceptualizing GRADE as a system of
about diagnosis or treatment of disease, but not about
grading quality of evidence and also as a systematic and
prognosis or etiology. A clinical management question
transparent approach to the process of developing
should have four components: patient population, inter-
recommendations for clinical practice including indicat-
vention (diagnostic or therapeutic), alternative interven-
ing the strength of these recommendations (Table 2).
tion (comparison), and the outcomes of interest (23). For
In this series of three articles, we will present the
instance, consider the following: in patients with persis-
GRADE approach to transparent development of evi-
tent allergic rhinitis (patient population) should oral
dence-based recommendations. In this article, we will
H1-antihistamines (intervention) vs no oral H1-antihista-
start with a brief overview of GRADE approach and we
mines (alternative intervention) be used to improve
will discuss grading the quality of available evidence
quality of life, reduce symptoms, and minimize the
supporting the recommendations about therapeutic inter-
ventions. In a second article, we will present the approach
There are potential problems arising at the stage of
to grading the quality of available evidence about
asking a clinical question. One is the failure to consider all
diagnostic strategies. In a third article, we will present
relevant alternatives. This may be particularly important
the GRADE approach to formulating the recommenda-
in international guidelines where treatment options vary
tions, deciding on their strength, and suggested interpre-
for patients in many diverse jurisdictions. Two other
closely related mistakes in formulating questions are the
As many guidelines are adopting GRADE, including
failure to include all patient-important outcomes, e.g.
the 2009 revision of ARIA, it is important that allergists
disregarding quality of life or adverse effects, and placing
understand the underlying concepts. Therefore, this series
excessive emphasis on surrogate outcomes with question-
is intended for clinicians especially interested in allergy,
able importance to patients such as pulmonary function
who want to be able to fully interpret the recommenda-
or nasal airway resistance rather than objectively
tions in guidelines developed following the GRADE
measured quality of life or symptoms.
approach. Whenever possible, we will use examplesspecific to the field of allergy and asthma.
Decide on the relative importance of outcomes
For clinicians interested in more in depth review of the
GRADE approach and system, we recommend a series of
The GRADE approach asks guideline developers to
articles recently published in the British Medical Journal
make explicit judgments about the importance of each
(18–22) or an even more detailed series for guideline
outcome for making a recommendation. GRADE
developers that will be published in the Journal of
demands that those making recommendations classify
Clinical Epidemiology, and the earlier papers (4, 7).
each of the outcomes of interest as either critical formaking a recommendation, important but not critical, ornot important (24). Because experts, clinicians, andpatients differ in their preferences and how they value
particular outcomes (25), input from those affected by the
recommendation (i.e. patients, their families, or membersof the public) should be sought if possible. For example,
Following the GRADE approach, one begins with
outcomes such as mortality, quality of life, or exacerba-
formulating appropriate clinical questions that the
tions of asthma might be considered critical, nasalsymptoms judged by a physician or use of rescue
Table 2. An overview of steps followed during the development of an evidence-
medications – important but not critical, and peak
expiratory flow or nasal eosinophilia – not important,
but perhaps informative, for making a recommendation.
Define the scope of the guidelinesPrioritize the problems (60)
Identify the existing evidence for every clinical question
Ask precise clinical questionsDecide on the relative importance of outcomes
Every clinical question should then be answered based on
Identify the existing evidence for every clinical question
a systematic review of the relevant evidence (26). Guide-
line developers can either conduct the systematic review
Grade the quality of existing evidence for each outcome separatelyDetermine the overall quality of available evidence across outcomes
themselves or identify an existing high quality systematic
Decide on the balance between desirable and undesirable consequences
review. This systematic review will serve to create a
summary of available evidence that a guideline panel can
Formulate the recommendation reflecting its strength
use to inform judgments about the balance of desir-
able and undesirable effects in order to develop a
Ó 2009 The AuthorsJournal compilation Ó 2009 Blackwell Munksgaard Allergy 2009: 64: 669–677
recommendation. GRADE suggests that this summary of
Table 3. Quality of evidence and the explanation of the categories
evidence is prepared in a structured format of evidence
profile – a table including detailed judgments aboutquality of evidence and estimates of the effect for each
The recommendation and its strength depend not only on
the best estimates of the expected benefits and downsides,
but also on the confidence in these estimates. If we know
the best estimates of the magnitude of the effects, but we
have no confidence in these estimates (i.e. we do not
ÔbelieveÕ in them), it is very difficult to determine the
balance of desirable and undesirable consequences.
One of the factors that influence our confidence in the
estimates of treatment effects is the quality of supporting
evidence – the higher it is, the more confidence we have in
these estimates. Formal grading of the quality of evidence
and its explicit consideration are essential to the process
of developing recommendations. The examples of errors
arising from disregarding the quality of supporting
evidence are abundant in the modern history of medicine.
Consider the treatment of patients with myocardialinfarction. For about a decade, experts made recommen-
*The examples are not comprehensive. See text for criteria to downgrade or up-grade the quality of evidence.
dations ignoring the high quality evidence about thebenefit from thrombolysis or the lack of benefit, and
in fact a continuum and any categorization involves
possibly even harm, from routine administration of
arbitrariness and the possibility of oversimplification, we
antiarrhythmic agents in the early postmyocardial infarc-
think that clarity, transparency, and intuitive understand-
tion period (27). They based their judgments on patho-
ing of the four categories outweigh these limitations.
physiological considerations, such as reduction in thefrequency of arrhythmia that failed to recognize higher
Study design. Earlier systems of grading the quality of
quality evidence focusing on patient important outcomes
evidence relied almost exclusively on overall study design
including mortality. In the field of allergy and asthma,
(e.g. randomized trials vs observational studies). In the
there are less dramatic examples. As an illustration of
GRADE system, study design remains a critical, but not
misleading conclusions from relying on lower quality
a sole factor in judging the quality of evidence (Table 4).
evidence, one might consider a systematic review of
observational studies assessing the effect of inhaled or
options, randomized trials provide, in general, far
oral corticosteroids on height in children with asthma
stronger evidence than observational studies, yet rigorous
concluding that the use of inhaled beclomethasone
observational studies provide far stronger evidence than
dipropionate was not associated with diminished stature
uncontrolled case series. Therefore, in the GRADE
(28). However, a recent systematic review restricted to
system, a body of evidence obtained from randomized
randomized trials found a statistically significant decrease
trials is initially rated as high quality, and that obtained
in linear growth velocity in children with mild tomoderate asthma treated with moderate doses of beclo-
Table 4. Factors influencing the quality of evidence
methasone (29). A formal system of grading the quality ofevidence provides a strategy to clarify how reliable is the
Study design (experimental vs observational)
decreasing the risk of repeating the types of mistakes
Limitations in study design and/or execution
Inconsistency of resultsIndirectness of evidence
In the context of clinical practice guidelines, the
GRADE system defines quality of evidence as the extent
to which our confidence in an estimate of the treatment
Factors that can increase the quality of evidence
effect is adequate to support a particular recommenda-
tion. The GRADE system specifies four grades of
All plausible confounding may be working to reduce
evidence: high, moderate, low, and very low quality
the demonstrated effect or increase the effect if no effect was observed
(Table 3). Acknowledging that the quality of evidence is
Journal compilation Ó 2009 Blackwell Munksgaard Allergy 2009: 64: 669–677
Grading quality of evidence and strength of recommendations
from observational studies – as low quality. For example,
Inconsistency of results. Widely differing estimates of the
a systematic review of the effect of using feather bedding
treatment effect across individual studies (variability or
in the control of asthma symptoms identified no
heterogeneity of results) suggest true differences in
randomized trial addressing this clinical question (30).
underlying treatment effects (39). Authors of a systematic
The only available evidence indicating that more frequent
review should try to identify plausible explanations for
wheezing is associated with nonfeather pillows comes
inconsistent results but, if they do not succeed, the quality
from two case–control studies that found a 20% rise in
of evidence decreases. Variability in individual study
the population prevalence odds of wheezing from 1978 to
results may arise from clinical differences in populations
1991, and identified an increase (from 44% to 67%) in the
(e.g. drugs may have larger relative effects in sicker
use of nonfeather pillows as the only domestic indoor
patients), interventions (e.g. larger effects or larger side-
exposure that appeared to explain this (31). The initial
effects with higher drug doses), and outcome measures
rating for this evidence would be low quality.
(e.g. differences in the definition of Ôresponse to treat-
In the GRADE system, Ôexpert opinionÕ is not a
mentÕ), or from methodological limitations such as
category of quality of evidence, but an interpretation of
problems with randomization, early termination of trials,
existing evidence. Therefore, expert opinion is nearly
or publication bias (40, 41). For example, a systematic
always necessary to integrate and contextualize evidence,
review of subcutaneous allergen-specific immunotherapy
either from a clinical or from a methodological viewpoint.
in adults with allergic rhinitis found inconsistency in the
A well-designed and executed randomized trial or
effect of treatment on nasal symptoms that was suggested
observational study provides different quality evidence
both by visual examination of forest plot and statistical
than the one that was poorly conducted. Therefore,
tests. Despite the effort to find a reason for this
relying on study design alone has apparent limitations.
heterogeneity, authors of the systematic review were not
GRADE provides additional quality criteria that serve to
able to explain it (42). In another example, a systematic
overcome this shortcoming. We have identified five
review showed that ketotifen reduced the use of bron-
factors that can reduce the quality of evidence for each
chodilators in children with mild to moderate asthma.
study design and three that can increase it (Table 4).
However, there was significant heterogeneity among theresults of individual trials (I2 = 76.1%) (43). Subgroup
Limitations in study design and/or execution (risk of
and sensitivity analyses explained this heterogeneity – the
bias). Quality of evidence initially rated based on study
effect was stronger in school children than in infants or
design decreases when studies suffer from major meth-
preschool children (differences in populations) and it
odological limitations that can bias their estimates of the
disappeared in trials with adequate blinding (differences
treatment effect. These limitations include lack of alloca-
tion concealment, lack of blinding – particularly ifoutcomes are subjective and their assessment is highly
Indirectness of evidence. GRADE distinguishes two types
susceptible to bias, lack of accounting for a large
of indirectness – indirect comparisons and differences in
proportion of patients who started the study (large loss
populations, interventions, and outcomes of interest
to follow-up or outcome not measured in a large
between the studies (existing evidence) and the scope of
proportion of patients), failure to adhere to the inten-
the recommendation (clinical question).
tion-to-treat principle during the analysis, stopping early
Indirect comparison arises when, for instance, the
for benefit, or selectively reporting outcomes that show
recommendation addresses the choice between two active
an apparent treatment effect and failing to report other
drugs: A or B, but the available studies compared A vs
outcomes that show no evident effect (32–36). For
placebo and B vs placebo. Such trials allow indirect
example, the evidence for the effect of sublingual
comparison of the magnitude of the effect of A vs B. Such
immunotherapy in children with allergic rhinitis on the
an indirect comparison provides lower quality evidence
development of asthma, comes from a single randomized
than a head-to-head comparison of A vs B would provide.
trial with no description of randomization, concealment
This type of indirectness is common when choosing
of allocation, type of analysis, no blinding, and 21% of
between the drugs within the same class (e.g. long acting
children lost to follow-up (37). These very serious
b-agonists, oral or topical H1-antihistamines, allergen
limitations would warrant downgrading the quality of
extracts for immunotherapy, etc.). As an illustration, one
evidence by two levels (i.e. from high to low). In another
example, a systematic review showed that the family
patients with severe allergic rhinitis. Systematic review
therapy for children with asthma improved outcomes
of the studies in seasonal allergic rhinitis showed a
such as daytime wheeze and the number of functionally
consistent small to large effect of subcutaneous allergen-
impaired days. However, allocation was clearly not
specific immunotherapy (SCIT) compared with placebo
concealed in one of the two included trials and unclear
on symptoms of allergic rhinitis, ocular symptoms, and
whether it was concealed in the second trial (38). This
quality of life (42). Another systematic review showed
limitation might warrant downgrading the quality of
that sublingual allergen-specific immunotherapy (SLIT) is
also effective in reducing symptoms and medication
Ó 2009 The AuthorsJournal compilation Ó 2009 Blackwell Munksgaard Allergy 2009: 64: 669–677
Table 5. Sources of likely indirectness of evidence
Early administration of systemic corticosteroids in the
Both oral and intravenous routes are effective but there
emergency department to treat acute exacerbations
is no direct comparison of these two routes of
Oral H1-antihistamines for improving quality of life in
In the only study that measured quality of life, 60% of
adults with asthma and concomitant allergic rhinitis
patients had a past history of asthma but no symptoms
Ketotifen for long-term control of symptoms and wheeze
Inhaled corticosteroids, the mainstay of therapy of
asthma nowadays, were allowed as an additional
intervention in 60% of trials assessing ketotifen.
There was no enough information to assess the effect
of ketotifen as an add-on therapy in children with
Anti-leukotrienes plus inhaled glucocorticosteroids
Trials that measured asthma exacerbations and nighttime
vs inhaled glucocorticosteroids alone to prevent asthma
symptoms did not include patients with allergic rhinitis
exacerbations and nighttime symptoms in patients with
Avoidance of pet allergens in nonallergic infants or
Available studies used multifaceted interventions
preschool children to prevent development of allergy
directed at multiple potential risk factors in addition
Oral decongestant as a rescue medication in patients
Available studies used oral decongestants administered
regularly, but none investigated their use as a rescue
medication for quick alleviation of the symptoms
Intranasal glucocorticosteroids vs oral H1-antihistamines
In the available study, parents were rating the symptoms
in children with seasonal allergic rhinitis
and quality of life of their teenage children, instead
requirements in these patients (44); however, the magni-
seasonal allergic conjunctivitis showed that patients using
tude of the benefit achieved with SLIT compared with
topical sodium cromoglicate were more likely to perceive
that of SCIT is not clear, because they have been
benefit than those using placebo. However, only small
compared directly in only very few studies (45, 46).
trials reported clinically and statistically significant
Evidence supporting the recommendation is also indi-
benefits of active treatment, while a larger trial showed
rect when it comes from studies in which population,
a much smaller and a statistically not significant effect
intervention, alternative intervention, or outcomes of
(51). These findings suggest that smaller studies demon-
interest were different from those that the recommenda-
strating smaller effects might not have been published.
Evidence supporting a particular recommendation can
suffer from more than one of the above limitations, and
Imprecision of results. When studies include relatively few
the more serious they are, the lower the quality of the
patients and few events occur, estimates of the effect
evidence is. For example, randomized trials of high
usually have wide confidence intervals that include both
efficiency particulate air filters in patients with perennial
important benefits or no important effects (or even
allergic rhinitis suffered very serious limitations in design
important harm). With such indeterminate results, one
(warranting downgrading by 2 levels) and the results were
can judge the quality of the evidence lower than one
otherwise would, because of resulting uncertainty in the
The GRADE system offers three criteria that, when
effect. For instance, observational studies examining
fulfilled, can increase the quality of evidence. They are
the impact of exclusive breast feeding on the development
infrequently applicable, but are the most common reason
of allergic rhinitis in high-risk infants showed a relative
for upgrading the quality of evidence from well-performed
risk of 0.87 (95% CI: 0.48–1.58) that rules out neither
observational studies that without these additional merits
important benefit nor important harm (47).
would provide only low quality evidence.
Publication bias. The quality of evidence will be reduced
Large magnitude of effect. On rare occasions, when
when investigators fail to report (publish) studies they
studies yield large or very large estimates of the magni-
have undertaken – typically those that showed no effect.
tude of the effect, one may be more confident about the
Unfortunately, one must often guess about the likelihood
results. Based on modeling studies that provide estimates
of publication bias. The risk of publication bias is higher
of the magnitude of effect that is very unlikely to be
when only few small studies are available (48–50). For
explained by bias (53, 54), the GRADE system defines a
example, a systematic review of topical treatments for
large effect as a relative risk (RR) of >2.0 or <0.5 (based
Journal compilation Ó 2009 Blackwell Munksgaard Allergy 2009: 64: 669–677
Grading quality of evidence and strength of recommendations
on consistent evidence from at least two studies, with no
illnesses in early childhood when exposed to second-hand
plausible confounders) and a very large effect as a RR of
smoke from parental smoking. Moreover, the greater the
>5.0 or <0.2 (based on direct evidence with no major
exposure, the higher was the risk. While grading the
threats to validity). For example, the extremely large and
quality of available evidence supporting the recommen-
consistent effect of epinephrine injections in anaphylactic
dation to reduce second-hand smoke in children, one
shock leaves us convinced of the benefits of the interven-
might consider these results as indirect evidence of benefit
from reducing the second-hand tobacco smoke exposureand initially rate it as low quality evidence from
All plausible confounding would reduce the demonstrated
observational studies that is downgraded to very low
effect or increase it if no effect was observed. On rare
because of indirectness (evidence of increased risk with
occasions, all plausible biases may be working to
increased exposure rather than of benefit with reduced
underestimate the true treatment effect. For instance, if
exposure). The observed dose-response gradient would
only sicker patients receive an experimental intervention,
justify upgrading the quality of evidence back to low.
yet they still fare better than patients not receiving it do, itis likely that the actual effect may be larger than the data
Determine the overall quality of evidence across outcomes
suggest. There are few examples so far in the literatureand we were not able to identify one in the field of asthma
Each recommendation depends on the evidence about
and allergy. However, consider a systematic review of
outcomes identified when asking clinical questions and
observational studies that included 38 million patients,
regarded as important to patients. Following the
which demonstrated higher death rates in private for-
GRADE process, those making recommendations first
profit vs private not-for-profit hospitals (55). Biases
grade the quality of available evidence supporting each
related to different disease severity in patients in the
outcome separately. Subsequently, they specify the over-
two hospital types, and the spill-over effect from well-
all quality of evidence across these multiple outcomes,
insured patients would both lead to estimates in favor of
because guidelines provide a single grade of quality of
for-profit hospitals (56). One might therefore consider the
evidence for each recommendation. For any recommen-
evidence from these observational studies higher than low
dation, when the quality of evidence differs across
quality. Because the plausible biases would all diminish
outcomes, the GRADE system demands that the lowest
the demonstrated intervention effect, one might consider
grade of quality of available evidence for any of the
the evidence from these observational studies as moderate
outcomes deemed critical determines the overall quality
rather than low quality. A parallel situation exists when
of evidence supporting this recommendation. For exam-
observational studies have failed to demonstrate an
ple, based on a systematic review of monoclonal anti-IgE
association but all plausible biases would have increased
for chronic asthma in adults and children (58), one might
an intervention effect. This situation will usually arise in
grade the quality of evidence about asthma symptoms,
the exploration of apparent harmful effects. For example,
exacerbations, and quality of life as high, but the quality
because the hypoglycemic drug phenformin causes lactic
of evidence about adverse effects as moderate. Conse-
acidosis, the related agent metformin is under suspicion
quently, the overall quality of evidence supporting the
for the same toxicity. Nevertheless, very large observa-
recommendation about the use of this treatment would be
tional studies have failed to demonstrate an association
(57). Given the likelihood that clinicians would be morealert to lactic acidosis in the presence of the agent andover-report its occurrence, one might consider this
moderate or even high quality evidence refuting a causalrelationship between typical therapeutic doses of metfor-
The GRADE approach provides a comprehensive,
explicit, and transparent methodology for grading thequality of evidence and strength of recommendations
Dose-response gradient. The presence of a dose-response
about the management of patients. GRADE classifies the
gradient may also increase oneÕs confidence in the findings
quality of available evidence as high, moderate, low or
and thereby increase the quality of evidence. Most
very low. Although judgments are required at every step
evidence for dose-response gradient in the treatment of
of guideline development, a systematic and explicit
allergic diseases comes from well-performed randomized
approach to grading the quality of evidence facilitates
trials that do not require upgrading. However, consider
scrutiny and transparency of these judgments.
the following example: there are no studies of interven-
In the next article in this series, we will discuss the
tions aimed at reduction of second-hand tobacco smoke
GRADE approach to making recommendations about
exposure that examined development of asthma or
diagnostic methods in more detail and we will highlight
wheeze in children. On the other hand, observational
the differences in grading the quality of available evidence
studies found an increased risk of developing wheeze
between therapeutic and diagnostic interventions.
Ó 2009 The AuthorsJournal compilation Ó 2009 Blackwell Munksgaard Allergy 2009: 64: 669–677
Glasziou P, Jaeschke R, Vist GE et al.
11]; Available at: http://resources.
Higgins JPT, Deeks JJ, Glasziou P et al.
Vist GE, Bellamy R, Stockman L et al.
tation. Health Res Policy Syst 2006;4:25.
atrial fibrillation: observational study.
Bion J, Parker MM, Jaeschke R et al.
severe sepsis and septic shock: 2008. Crit
15. UpToDate. Editorial Policy. 2006 [cited
6. Atkins D, Eccles M, Flottorp S, Guyatt
16. World Health Organization. Guidelines
29. Sharek PJ, Bergman DA, Ducharme F.
mendations. BMJ 2008;336:1049–1051.
Jaeschke R, Helfand M, Liberati A et al.
What is ‘‘quality of evidence’’ and why
R, Falck-Ytter Y, Alonso-Coello P et al.
dence in clinical guidelines: report from
Journal compilation Ó 2009 Blackwell Munksgaard Allergy 2009: 64: 669–677
Grading quality of evidence and strength of recommendations
seasonal allergic conjunctivitis: system-
atic review and meta-analysis of efficacy
41. Fletcher J. What is heterogeneity and is
53. Bross ID. Pertinency of an extraneous
44. Wilson DR, Torres LI, Durham SR.
45. Mungan D, Misirligil Z, Gurbuz L.
mite-sensitive patients with rhinitis and
asthma – a placebo controlled study.
Canadian Institutes of Health Research.
systematic review and meta-analysis.
ysis of prospective studies. Acta Paediatr
chronic asthma in adults and children.
and dissemination of clinical research. J
Advanced topics in systematic reviews.
Priority setting. Health Res Policy Syst2006;4:14.
Ó 2009 The AuthorsJournal compilation Ó 2009 Blackwell Munksgaard Allergy 2009: 64: 669–677
How to do: Telerehabilitation in heart failure patients Department of Cardiac Rehabilitation and Noninvasive Electrocardiology, Abstract According to the present guidelines for heart failure patients, regular exercise training hasobtained the class of recommendation I, level of evidence A. Despite the benefits of cardiacrehabilitation, many heart failure patients are inactive. Common pat
The Biowatch Bulletin March 2006 1. RESEARCH ON HEALTH EFFECTS OF GM CROPS TO START IN SOUTH AFRICA 2. GROWTH IN SOUTH AFRICA’S ORGANIC AGRICULTURE SECTOR 3. NEW BOARD MEMBER FOR BIOWATCH SOUTH AFRICA TRUST 4. SA MILK PROCESSORS WANT GM GROWTH HORMONE TO BE OUTLAWED 5. INTERNATIONAL TREATY ON ACCESS TO BIOLOGICAL RESOURCES A STEP CLOSER? 6. LANDMARK DECISION ON INTERN