Every fortnight CREBP hosts a journal club – either a clinical or methodological topic. The outcomes of the group discussion are below:
Pros and cons of Hormone Replacement Therapy (HRT) in women – facilitated by Associated Professor Jane Smith
Main menopause symptoms
- Irregular light, heavy or prolonged periods
- hot flushes/sweats
- disturbed sleep
- dry vagina, and uncomfortable sex
- loss of libido
- more frequent and urgent need to pass urine
- joint and muscle pain
- headaches, migraine
- crawling sensation on skin
- tiredness, irritability
- memory loss, difficulty concentrating
HRT relieves menopausal symptoms – HRT use can be oral, transdermal, implants, or vaginal. Vaginal and transdermal are the safest options.
Women with severe symptoms (approx 20%) can benfit from use of HRT.
We mainly discussed 2 papers in the clinical discussion on the use of HRT and the harms and benefits, but also referred to a cochrane review.
The original Womens Health Initiative RCT of 16,000 postmenopausal women put on oestrogen and progesterone at an average age of 63 years. They were followed up for about 5 years at which time the number of breast cancers reached the threshold to halt the trial. Hormones used were conjugated equine estrogens and medroxyprogesterone.
It found that use of HRT in this cohort of women increased the relative risk of stroke, heart disease and breast cancer by between 25-40% after 5 years of use. It also reduced the risk of colorectal cancer and hip fracture by 34-36%. The risk of pulmonary embolism and VTE was more than doubled from start of use.(1)
The second paper was an RCT of 1000 women randomised to start on HRT from the age of menopause, an average age of 50 years and they were followed up for 10 years, with very different findings, namely reduced risk of death, heart failure, and myocardial infarction, without any apparent increase in risk of cancer, venous thromboembolism, or stroke. The hormones used were 17B oestrodiol and norethisterone.(2)
These results being so different have created debate about their veracity and a Cochrane review on the long term safety of HRT did not include the study as there was no placebo in the control group. The Cochrane review found harms started from 1 year after taking HRT of a similar nature to WHI results.
HRT works to relieve symptoms, and reduced fractue risk, but is not without harms, which increase with duration of use, so any long term continuation needs to be carefully reviewed on an individual basis with short term use the safest option if and when HRT is required.
- Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, et al. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women’s Health Initiative randomized controlled trial. JAMA. 2002;288(3):321-33. Epub 2002/07/19.
- Schierbeck LL, Rejnmark L, Tofteng CL, Stilgren L, Eiken P, Mosekilde L, et al. Effect of hormone replacement therapy on cardiovascular events in recently postmenopausal women: randomised trial. BMJ. 2012;345:e6409. Epub 2012/10/11
The impact of ordering investigations and getting normal test results on patients’ well being and health seeking behaviour - facilitated by Associate Professor Jane Smith
Background to the Issues
In primary care and secondary care tests are often ordered in patients without any clear diagnosis, or specific physical findings, as a triage tool “just in case” there may be something serious going wrong in a patient. There is a presumption that ruling out a diagnosis by getting normal results back is a good thing.
But most patients with vague symptoms are unlikely to have a serious illness.
It is known that clinicians commonly order diagnostic tests in patients, with vague symptoms such as tiredness, with one study showing the majority of “tired patients” had tests done but they were abnormal in only 3% of them.(1)
This suggests that it is unlikely for testing in patients without specific diagnostic symptoms or signs, to show abnormal results, or a give a diagnosis.
Then what about the normal results reassuring the patient and doctor alike that nothing serious is going on, and that the patient is healthy?
The Paper: Reassurance after diagnostic testing with a low pretest probability of serious disease
This is a systematic review and meta-analysis of 14 RCTs published about the impact of normal diagnostic test results on patients’ illness worry, anxiety, ongoing symptoms, and health seeking behaviour.
The time spans analysed were less than 3 months (short term emotional relief) and more than 3 months (long term cognitive relief).
Inclusion criteria for participation were patients with a low risk of disease.(2)
Investigations included endoscopy and/or H pylori testing (for dyspepsia), ECG , blood tests or continuous event monitoring (for chest pain or palpitations respectively), imaging (for back pain or headaches).
But the results suggest contrary to our clinical behaviour and beliefs, there is NO reassurance provided to patients by normal test results, in fact, some studies suggested an increase in anxiety resulted.
Regarding “health seeking” after investigations; The only changes found were that if 16 patients with dyspepsia were endoscoped, or 26 patients with low back pain had X rays, there would be one less visit to the doctor, at a cost of $4,000 to $16,000, plus irradiation to save $40-$100, this is a false economy.
Conclusions: Doing less achieves more.
- Gialamas A, Beilby JJ, Pratt NL, Henning R, Marley JE, Roddick JF. Investigating tiredness in Australian general practice. Do pathology tests help in diagnosis? Aust Fam Physician. 2003;32(8):663-6. Epub 2003/09/17.
- Rolfe A, Burton C. Reassurance after diagnostic testing with a low pretest probability of serious disease: systematic review and meta-analysis. JAMA Intern Med. 2013;173(6):407-16. Epub 2013/02/27.
Chronic Back Pain – facilitated by Dr James Dickinson
The two journal articles discussed were:
- Antibiotic treatment in patients with chronic low back pain and vertebral bone edema (Modic type 1 changes): a double-blind randomized clinical controlled trial of efficacy Reference: Hanne B. Albert; Joan S. Sorensen; Berit S. Christensen; Claus Manniche. Eur Spine J (2013) 22:697-707
- Does nuclear tissue infected with bacteria following disc herniations lead to Modic changes in the adjacent vertebrae? Reference: Hanne B. Albert; Peter Lambert; Jess Rollason; Joan S. Sorensen; Tony Worthington; Mogens B. Pedersen; Hanne S. Norgaard; Ann Vernallis; Federik Busch; Claus Manniche; Tom Elliot. Eur Spine J (2013) 22:690-696
The Journal club reviewed the trial, which has excited controversy because it was heralded in the press with stories of being a ground-breaking new cure for back pain, and that the author should be in the running for a Nobel Prize. Others, not surprisingly were skeptical (McCartney BMJ blog.)
The components of the trial were:
P Chronic back pain 6 months duration, after previous disc herniation, with adjacent bone edema, (Mobic type 2)
I Antibiotic amoxicillin/clavulanate three times daily for 10 weeks. (main analysis, but groups were divided to high and low dose: half received Amoxycillin 500/Clavulanate125, while half received double that dose.
C placebo, in identical capsules, same dose.
O Disease -specific disability and lumbar pain.
The inclusion criteria were strict. There was consecutive enrolment of eligible patients.
Modic changes: high reliability.
Randomisation was performed by a central lab, and was blinded to the clinicians and patients. We observed that the numbers in the randomisation were uneven: 90 active and 72 placebo with no explanation, but the two subgroups for high and low dose were exactly even. WE wondered whether the uneven rate was due to some design feature, to the dose response component of the study.
We noted that the dose response component was added late in the design, at the request of the Danish research council, then was not analysed for. We found it disturbing that the Research Council should ask for such a change, especially if it interfered with the power of the study. However, we thought that the researchers should have performed the full analysis, which would not be much more effort, nor take more space than what they did.
The allocation to groups was balanced, except that the active group had slightly worse disease as measure by a lower proportion with only type 1 changes, and “non-significant” differences for other measures.
Assessment of patient was by self-report questionnaires. These were obtained at the clinic visits and patients were not allowed to leave until they were complete, giving a 100% completion rate for the baseline visit, 91% at the 100 days, and 90% at the end of one year. Xray and MRI changes were noted, but the main outcome was the improvement in patient-important pain outcomes, including days away from work. These included the Roland Morris Disease-specific Questionnaire and the Lumbar Pain Rating Scale. To our group they seemed appropriate.
Outcomes: High completion rate: 90% It was odd that 2 were dropped out because they were >65 yrs. Patients reported gradual improvement: over 6-8 weeks, then further improvement in the period after the drugs had ceased, which is biologically plausible.
There were more GI side effects in antibiotic group: as expected.
There was a trend towards greater effect for double dose antibiotics.
The conclusions are appropriately cautious: they do not encourage widespread use, only among those who fit the criteria.
While the results of the trial are impressive, there was some concern about the biological mechanism and plausibility of the approach. The second paper, describing the bacteriological research behind the theory does have convincing evidence that bacteria do inhabit the discs.
Overall we decided that the trial is convincing. We would like to be sure about what happened in the randomisation: JD was deputed to write to the author.
PG felt that the result needs replication, and will write to the UK ???? to suggest this is an important topic for contracted research.
Various studies show that disc material is infected with propionobacterium acnes.
Uncontrolled study showed improvement with amoxicillin.
Bacteria in discs: 50%
Influence Of Circadian Time Of Hypertension Treatment On Cardiovascular Risk: Results Of The Mapec Study – facilitated by Associate Professor Jane Smith
Reference: Ramón C. Hermida, Diana E. Ayala, Artemio Mojón, and José R. Fernández
Chronobiology International, 27(8): 1629–1651, (2010)
This paper reported on 2156 hypertensive men and women about the age of 55 plus or minus 13.
The intervention was the time of day that antihypertensive medication was taken.
Half the subjects were randomly allocated to night time hypertensive tablet taking and half to morning time, the groups were otherwise matched and only about 20 from each were lost to follow up
Drugs used included ACE, ARB, CCCB, BB and thiazide
Follow up was for up to 8 years
Summary :Taking blood pressure treatment at night led to lower BP, better control, and significantly lower incidence of Myocardial infarction, Angina, coronary revascularisation, strokes and TIAs. Grouping all these together the relative risk was approximately halved as a result of changing the timing of tablets to night time
Take home message: taking blood pressure treatments at night halves the relative risk of having a significant vascular event
Comparisons of RCT and preference-based interventions – facilitated by Associate Professor Justin Keogh.
Methodological Question: What research questions might be best addressed by RCTs or preference designs whereby at least some of the participants get a choice on what intervention they are allocated to?
Reference: Janevic, M. R., Janz, N. K., Dodge, J. A., Lin, X., Pan, W., Sinco, B. R., & Clark, N. M. (2003). The role of choice in health education intervention trials: a review and case study. Social Science and Medicine, 56(7), 1581-1594.
Abstract: Although the randomized, controlled trial (RCT) is considered the gold standard in research for determining the efficacy of health education interventions, such trials may be vulnerable to “preference effects”; that is, differential outcomes depending on whether an individual is randomized to his or her preferred treatment. In this study, we review theoretical and empirical literature regarding designs that account for such effects in medical research, and consider the appropriateness of these designs to health education research. To illustrate the application of a preference design to health education research, we present analyses using process data from a mixed RCT/preference trial comparing two formats (Group or Self-Directed) of the “Women take PRIDE” heart disease management program. Results indicate that being able to choose one’s program format did not significantly affect the decision to participate in the study. However, women who chose the Group format were over 4 times as likely to attend at least one class and were twice as likely to attend a greater number of classes than those who were randomized to the Group format. Several predictors of format preference were also identified, with important implications for targeting disease-management education to this population.
There is a general belief that lifestyle factors like physical activity and a good diet are important for health and in reducing the symptoms and/or rates of many chronic conditions. However, many people are still insufficiently active and/or consume an inadequate diet. In addition, many participants in trials involving physical activity or dietary interventions do not complete the intervention. The question then becomes, is the RCT always the best design for an intervention study and if not, how can the preference design be used in some research and/or in practice contexts to improve these outcomes? For example, a GP might try encouraging a patient to be more physically active and wishes to determine what form of activity the patient may most like to do, so to maximize the potential that the patient will adhere to this activity in the medium- and long-term.
- Initial discussions centred on the relative advantages and disadvantages of the RCT and preference design. These focused on the better internal validity of the RCT and the better external validity of the preference design. This discussion then moved into comparing the various types of preference designs and what loss of internal validity by using a preference design would be acceptable.
- Based on these relative strengths and weaknesses, it was felt that these two approaches serve quite different needs. The RCT assesses the potential benefit and risk of an intervention in tightly controlled circumstances; with the aim being to demonstrate an effect. When sufficient research has been conducted to clearly demonstrate an effect, the preference design is then better suited to determine the real-world uptake of an intervention and the determinants of this uptake. Some discussion again centred on how much research was enough and how closely the samples used in these studies related to the patients with whom you work in practice.
- Overall, it was felt that the preference design is perhaps under-utilised in many research fields and when used appropriately can better demonstrate the real-world effects of interventions, as may be of interest to many clinicians.
The impact of enhancing students’ social and emotional learning: A meta-analysis of school-based interventions – facilitated by Senior Research Fellow Rae Thomas.
Methodological Question: How could meta-analyses convey meaningful results regarding effective interventions in child mental health to clinicians?
Reference: Durlak, J. A., Weissberg, R. P., Dymnicki, A. B., Taylor, R. D., & Schellinger, K. B. (2011). The impact of enhancing students’ social and emotional learning: A meta-analysis of school-based interventions. Child Development, 82, 405-432.
Abstract: This article presents findings from a meta-analysis of 213 school-based, universal social and emotional learning (SEL) programs involving 270,034 kindergarten through high school students. Compared to controls, SEL participants demonstrated significantly improved social and emotional skills, attitudes, behavior, and academic performance that reflected an 11-percentile-point gain in achievement. School teaching staff successfully conducted SEL programs. The use of 4 recommended practices for developing skills and the presence of implementation problems moderated program outcomes. The findings add to the growing empirical evidence regarding the positive impact of SEL programs. Policy makers, educators, and the public can contribute to healthy development of children by supporting the incorporation of evidence-based SEL programming into standard educational practice.
A lack of evidence-based treatments in child mental health
- In 2002 reported 90% of publicly funded child mental health services didn’t use EBTs (Hoagwood & Olin, 2002)
- 2012 research reported clinicians could identify less than 2/5 expert nominated EBTs from a list of 15 (Allen et al, 2012)
- 2013 research reported 58% of clinicians could defined an EBT using broad criteria and indicated use but 24% identified broad frameworks (e.g., br management)
- AND of those 58% of actual EBTs in use, 88% of clinicians said they modified the EBTs (Thomas et al., 2013)
How have previous meta-analyses considered intervention effectiveness?
- Some recent meta-analyses have considered other variables including individual interventions
- Individual interventions (Geeraert et al., 2004)
- Settings (e.g., clinic vs home; Selph et al., 2013)
- Components of interventions (Kaminski et al., 2008)
- Components and Implementation of interventions (Durlak et al., 2011)
- Because of intervention heterogeneity, often meta-analyses in these formats provide little information for clinicians about what intervention is most effective for which population, even so, cost to implement an intervention may be prohibitive for some organisations. In this paper effects of social and emotional learning programs were moderated by how the program was implemented.
- A meta-regression of studies to consider the unique contributors to effective interventions may provide more useful information to clinicians (i.e., quality of studies, components of effective interventions). However, if possible, clinicians should be given both options (i.e., an effective intervention and components of effective interventions).
A clinical dilemma – the logistics of applying ‘guidelines’ in practice – facilitated by Associate Professor Treasure McGuire.
NIOSH. NIOSH list of antineoplastic and other hazardous drugs in healthcare settings 2012. U.S. Department of Health and Human Services, Public Health Service, Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health, DHHS (NIOSH) Publication No. 2012−150. Available at: http://www.cdc.gov/niosh/docs/2012-150/pdfs/2012-150.pdf
The paper presents the current (2012) update of the National Institute for Occupational Safety and Health (NIOSH) ‘List of Antineoplastics and Other Hazardous Drugs’ designed to assist health professionals and institutions in managing occupational safe handling of these medication, together with a commentary on the updated publication. The ‘NIOSH List’ was first published in September 2004 (http://www.cdc.gov/niosh/docs/2004-165/) and updated in 2010. The 2012 update adds 26 medicines to the 2010 list. The review process for the addition of the new listings is described in the Federal Register: http://www.cdc.gov/niosh/docket/archive/pdfs/NIOSH-190/0190-080211-frn.pdf
Occupational risk from handling of therapeutic medicines in a healthcare setting carries the potential risk of adverse effects. Risks associated with occupational exposure to cytotoxics are well-established. However, other medicines, e.g. monoclonal antibodies, antivirals, hormones and some immunosuppressants, including those used for the treatment of cancer and other conditions, may also have damaging effects on the body. When these medicines are prepared and administered, workplace practices which minimise the risk of potentially harmful occupational exposure, must be utilised. These include the use of biological safety cabinets, closed system transfer devices and personal protective equipment. Several national and international health professional organisations have published guidelines on handling hazardous medicines, but in order to use these guidelines appropriately and effectively, healthcare workers need to be aware of which medicines pose a hazard, and the degree of hazard involved.
The recently published 2012 NIOSH List has led to concern being raised by staff within Mater Health Services (a tertiary hospital treating adult, paediatric and maternity patients) about the potential risk of occupational exposure to ‘hazardous’ medicines. There is not only debate about which medicines should be considered hazardous, but also absence of clear guidance regarding their handling and administration. In addition, there is difficulty distinguishing between toxicity caused by systemic administration for therapeutic purposes, and exposure from occupational handling and manipulation. This has led to a wide range of non-standardised, non-evidence based and costly strategies being employed by staff when handling new medication, based on their individual interpretation of ‘potentially hazardous’ medication. This is particularly evident in Mater’s public and private paediatric hospitals where staff maybe required to manipulate dose forms for administration to children. Recommendations for the preparation and handling of hazardous medicines should be based on evidence available of occupational toxicity and the inherent pharmacology of the medicines.
Discussions of the group
We initially discussed the process by which medicines are included on the NIOSH List; they are included if the medicine exhibits one or more of the following characteristics in humans or animals:
- Teratogenicity or other developmental toxicity
- Reproductive toxicity
- Organ toxicity at low doses
Structure and toxicity of new medicines that mimic existing medicines determined hazardous by the above criteria.
However, NIOSH recommends that each organisation should create its own medicines list considered to be potentially hazardous. Also, there are no recommendations regarding how individual medicines rated as hazardous should be handled. The group agreed that this approach placed both institutions and staff in an almost impossible position to navigate.
The facilitator provided the group with the approach Mater Health Services has taken to develop an occupational risk assessment tool to assess the risk potential of medication handled by staff.
Method: Medicines included for risk assessment were identified using the NIOSH list and expanded to include newly marketed therapeutic classes e.g. monoclonal antibodies and medicines identified by hospital staff as potentially hazardous. A search of primary literature was conducted using bibliographic databases, with additional studies identified through snowballing search techniques. Search terms included medicines of interest and the hazardous concepts – teratogenicity, carcinogenicity, genotoxicity, mutagenicity and tissue irritation. Citations retrieved were assessed by a three member panel and assigned potential relevance, in terms of quantum and quality of evidence; and study type: in-vitro, in-vivo, human (treatment exposure) and human (occupational exposure). These studies contributed to the overall occupational hazard risk category, using a rating scale developed by the team.
Results: The occupational hazard risk tool incorporated risk categories that quantified the level of available evidence, using a 7-point scale, ranging from “strong evidence of occupational risk” to “strong evidence of no occupational risk”. The tool was successfully applied to identify and stratify individual medicines and/or therapeutic classes of interest. Based on this stratification, recommendations for risk appropriate handling were created: Low occupational risk (no additional handling precautions); insufficient evidence to determine risk (defined handling precautions); High occupational risk (follow personal protective equipment (PPE) guidelines). This resulted in all medicines of interest receiving a “handling” recommendation, based on current evidence.
The group felt that such a tool could potentially reduce clinician anxiety over the occupational handling of medicines, by more closely linking evidence with handling recommendations. However, they recommended that each institution needs to make explicit their mission and perspective in conducting such an evidence-based strategy, to ensure that the staff occupational safety rather than, for example, institutional convenience is the driver of activity.
References of interest:
Traynor K. 2012 NIOSH hazardous-drugs update contains surprises. Am J Health-Sys Pharm 2012;69:1446-51.
The challenges of determining noninferiority margins: a case study of noninferiority randomized controlled trials of novel oral anticoagulants – facilitated by Professor Charles Leduc.
The challenges of determining noninferiority margins: a case study of noninferiority randomized controlled trials of novel oral anticoagulants. Grace Wangge MD MSc, Kit C.B. Roes PhD, Anthonius de Boer MD PhD, Arno W. Hoes MD PhD, Mirjam J. Knol PhD. CMAJ 2013; 185(3):222 – 227
The paper presents a way to produce a synoptic determination of common non inferiority (NI) margins for a set of noninferiority trials. It further allows for an evaluation of the impact of the published selected noninferiority margins on the published conclusions. The discussion uses a review of NI trials of direct thrombin inhibitors and direct inhibitors of facto Xa claimed to be as effective as enoxaparin for the prevention of venous thromboembolism in patients undergoing elective hip- or knee-replacement surgery.
The paper triggers two different processes; firstly revisit the appraisal rubrics of noninferiority trials and secondly discuss the process of presenting usable information from a review of noninferiority trials. The appraisal requires judgment on the following: respect of noninferiority trial assumptions, namely assay sensitivity and constancy of effect, the method used to define the noninferiority margin (NI) or threshold (the focus of the paper), and the assessment of potential harm. The assumptions are briefly discussed and we agree that the topic of the review supports a rare set of conditions that allow the authors to conclude prudently the assumptions were probably respected. There is no presentation of the potential harms from the test interventions.
The steps needed to generate the numerical data necessary for the analysis of noninferiority trials are: first estimate the effect of the active control (C) compared with placebo. For a conservative estimate of the effect size, the upper bound of the 95% confidence interval (CI) of the pooled effect size is used rather than the point estimate; this is referred to as M1. This is followed by defining M2 which is an estimate of how much of M1 should be preserved. It corresponds to the largest clinically acceptable difference in term of decreased efficacy (degree of inferiority) of the test drug (T) compared with the active control (C). The “loosest” estimate is 50% of M1. This becomes the noninferiority margin (NI). Noninferiority margin selection is a clinical decision, much like expected effect size when calculating sample size in superiority trials; it approaches the same concept of clinically relevant difference between 2 interventions. We were reminded that the selection of the non-inferiority margin must take into account other clinically relevant considerations like adverse outcomes. For example, a wider margin (tolerating a new intervention that would preserve only 50% of the original effect size) will be tolerated for a new intervention that produces less frequent/severe adverse outcomes.
For this report the authors independently defined M1 by pooling results from original studies. The advantage of this process is on developing noninferiority margins (NI) independently of the reviewed noninferiority studies. By presenting the studies’ data in a forest plot, the position of the NI threshold can be shown in respect to each study’s delta (difference between Control intervention and Test intervention, C-T) and confidence interval. The paper presents forest plots of both Risk Difference (RD) and Relative Risk difference (RR) but the tables were hidden in Appendix 4, not presented in the paper but available from the CMAJ website. The detailed presentation of the method used to identify the studies is presented in Appendix 1 (CMAJ website). The authors report they only used published papers and did not attempt retrieving unpublished reports and rightfully identify publication bias as an important source of error in determining the NI margin. The effort of defining the M1, the original pooled effect size and its confidence interval, must follow proper systematic review / meta-analysis practice. This alternative is to be preferred to the aggregation of published margins of noninferiority.
A synoptic study of noninferiority trials should favour using the relative risk difference (RR) instead of the risk difference (RD) in order to control the effect of baseline risk on the absolute difference (RD). Using RD requires demonstrating that the baseline risk has not differed significantly between the T and the C studies.
Discussions of the group
We agreed with the author’s conclusion that: “… substantial variation in the noninferiority margin existed between the trials, suggesting that the different clinical judgments and perceptions of the investigators played a role.” We support the view that a systematic review of noninferiority trials should independently estimate the noninferiority margin(s) and produce a graphical representation of the trials’ outcomes showing both the trials’ published NI margin and the estimates. In a synoptic study, the forest plot should allow the reader to select a relevant M2, whether it is 50%, 67% or a different proportion of M1. It should be directly obtainable from the presented graph.
The flood of noninferiority trials raises the possibility that we will face a dearth of placebo-controlled efficacy trials. It is possible to include a placebo arm in almost all comparative studies, much like the protocols used in the evaluation of cancer treatments. Participants who cannot tolerate the control intervention can become part of a de facto placebo group. Comparing the addition of the new treatment to the control treatment to the control treatment alone can show a potential independent increment in treatment efficacy. We were reminded that hitching new treatments to progressively less efficacious control treatments in sequential noninferiority trials leads to the ultimate studies showing no effect over placebo; this is the slippage to placebo bias or “biocreep”. We remarked that we may one day confirm noninferiority of a new treatment to a control treatment that was never directly shown to be better than placebo.
References of interest:
D’Agostino RB S, Massaro JM, Sullivan LM. Non-inferiority trials: design concepts and issues — the encounters of academic consultants in statistics. Stat Med 2003;22:169-86.
U.S. Department of Health and Human Service, Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), Office of Biostatistics and Office of New Drugs. Guidance for Industry Non-Inferiority Clinical Trials: Draft Guidance. March 2010. 66p. (Robert Temple lead author)
Comparisons of established risk prediction models for cardiovascular disease: systematic – facilitated by Senior Research Fellow Georga Cooke
Siontis, George, et al. “Comparisons of established risk prediction models for cardiovascular disease: systematic review.” BMJ: 344 (2012).
OBJECTIVE: To evaluate the evidence on comparisons of established cardiovascular risk prediction models and to collect comparative information on their relative prognostic performance.
DESIGN: Systematic review of comparative predictive model studies.
DATA SOURCES: Medline and screening of citations and references.
STUDY SELECTION: Studies examining the relative prognostic performance of at least two major risk models for cardiovascular disease in general populations.
DATA EXTRACTION: Information on study design, assessed risk models, and outcomes. We examined the relative performance of the models (discrimination, calibration, and reclassification) and the potential for outcome selection and optimism biases favouring newly introduced models and models developed by the authors.
RESULTS: 20 articles including 56 pairwise comparisons of eight models (two variants of the Framingham risk score, the assessing cardiovascular risk to Scottish Intercollegiate Guidelines Network to assign preventative treatment (ASSIGN) score, systematic coronary risk evaluation (SCORE) score, Prospective Cardiovascular Münster (PROCAM) score, QRESEARCH cardiovascular risk (QRISK1 and QRISK2) algorithms, Reynolds risk score) were eligible. Only 10 of 56 comparisons exceeded a 5% relative difference based on the area under the receiver operating characteristic curve. Use of other discrimination, calibration, and reclassification statistics was less consistent. In 32 comparisons, an outcome was used that had been used in the original development of only one of the compared models, and in 25 of these comparisons (78%) the outcome-congruent model had a better area under the receiver operating characteristic curve. Moreover, authors always reported better area under the receiver operating characteristic curves for models that they themselves developed (in five articles on newly introduced models and in three articles on subsequent evaluations).
CONCLUSIONS: Several risk prediction models for cardiovascular disease are available and their head to head comparisons would benefit from standardised reporting and formal, consistent statistical comparisons. Outcome selection and optimism biases apparently affect this literature.
Discussions of the group
- This was a challenging paper to read for many of our group.
- This paper assumed a deep understanding of statistical techniques used in prognostic models. Our resident experts gave us a brief tutorial on discrimination, calibration and reclassification.
- Optimism bias was a new concept to many – we debated how this was different to publication bias.
- We discussed how we communicate risk, natural history and prognosis with patients.
- Overall, this study would not change what we do in clinical practice, rather support our current practice. We will look forward to future publications around aligning a methodological approach to systematic review of risk prediction studies.
Follow up reading
For those wanting to cement the mini-tutorial we had on discrimination, calibration and reclassification, these articles may be of interest:
Cook, Nancy R. “Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve.” Clinical Chemistry 54.1 (2008): 17-23.
Echouffo-Tcheugui, Justin B., and Andre P. Kengne. “Risk models to predict chronic kidney disease and its progression: A systematic review.” PLoS medicine 9.11 (2012): e1001344.
Long-term Follow-up of the TIPS early detection in psychosis study: Effects on 10-year outcome – facilitated by Dr Andrew Amos
Hegelstad WtV, Larsen TK, Auestad B, Evensen J, Haahr U, et al. Long-term Follow-up of the TIPS early detection in psychosis study: Effects on 10-year outcome. American Journal of Psychiatry 2012; 169(4):374-380.
Hegelstad and colleagues (2012) report the 10-year follow-up results of an experiment which compared psychotic symptoms, remission/recovery rates, and functional indicators between two geographic locations. One area was subjected to a public health intervention intended to reduce the duration of untreated psychosis (DUP) – the early detection area (ED), while the other location left normal detection procedures in place (no-ED). It was suggested that there were no relevant differences in the treatment received or populations covered at each location. The public health intervention included advertising; outreach to schools and primary care doctors to educate the public about warning signs of psychotic illness; and early detection teams who would rapidly attend people identified as possibly having a psychotic illness.
The original study was constructed in response to an established association between longer DUP and worse outcomes in psychotic illness. The quasi-experimental design was intended to test the hypothesis that a longer DUP causes a worse outcome. An alternative hypothesis is that an insidious onset of psychotic illness is associated both with a longer DUP due to the absence of prominent hallucinations/delusions, and with a worse outcome, as negative symptoms are less responsive to treatment. DUP was 16 weeks in the no-ED area, and 5 weeks in the ED area.
Hegelstad and colleagues’ report of data appears to engage in the processes of selective reporting bias and spin as defined by Vera-Badillo et al (2013; journal club article 13/02/13).
- At 10 year follow-up, all the clinical and functional measures of the original study are not significantly different (and one measure is worse for the treatment group). Despite this, the authors claim in the abstract to have demonstrated that earlier detection of psychosis has reduced deficits and improved function. They base this conclusion on a completely new measure called “recovery”, which was not considered at baseline, 1 year, or 5-year follow-up, and which appears to be a proxy for vocational outcome, itself not significantly different at five-year follow-up.
- They do not examine the original hypothesis, that longer DUP causes a worse outcome, despite reporting a logistic regression which demonstrates no association between DUP and outcomes.
- They do not consider the alternative hypothesis, which emerged spontaneously before it was presented, that the intense effort to identify patients with psychosis might also sample selectively and recruit a less affected group of people.
- They do not report hospitalisation, despite this measure being significantly worse for ED patients at five-year follow-up (suggesting early detection actually caused a worse outcome)
- They discuss only the putatively better outcome of recovery in the ED group, without discussing the relatively worse outcome in independent living for the ED group
- They suggest that the non-significant results may be due to the selective attrition of relatively higher-functioning patients from the ED group, but do not discuss the fact that there was a selective attrition of patients with longer DUP from the ED group, and not from the no-ED group. If longer DUP is a causal mechanism, this would improve the measured outcomes of the ED patients.
Discussion included references to other literature particularly prone to the use of spin and bias. The support of pharmaceutical companies in the disclosure statement was noted, and potential advantages for pharmaceutical companies were explored. An analogy was suggested between criticisms of over-prescription of stimulant medication in ADHD and potential over-prescription of antipsychotic medication or other interventions for people with psychotic illness, psychotic experiences, or general distress at the same time as psychotic symptoms.
It was noted that schizophrenia has a prevalence of around 1% in most epidemiological studies, while subclinical psychotic experiences are believed to occur in up to 8-10% of the population (with higher rates in younger people). It was speculated that the intense effort to identify patients with psychotic illness might lower the threshold for entry into care, and that once antipsychotic medication was in place, it might be maintained for years as prophylaxis despite no definitive psychotic illness ever emerging.
It was also discussed that while the Vera-Badillo paper examined the proportion of spin/bias in more than 150 RCTs, Hegelstad et al (2012) reports the only methodological adequate study in this area. Spin/bias therefore was suggested to be especially damaging, as there is no counterweight to provide context.
Finally, the extension of the spin/bias analysis to the more general early psychosis literature was floated. The possibility of widespread spin/bias was suggested by a quote from a prominent early intervention advocate:
- “In psychiatric as well as other reform processes, logic and scientific evidence are necessary but insufficient. Rhetoric, marketing, effective networking, altruistic promotion of a vital public health issue, economic arguments, and a confluence of common interests have fuelled the momentum and are vital for real reform to take root.” – McGorry (2005; p.s2)
Bias in reporting of end points of efficacy and toxicity in randomized, clinical trials for women with breast cancer facilitated by Associate Professor Charles Leduc
Vera-Badillo FE, Shapiro R, Ocana A, Amir E, Tannock IF. Bias in reporting of end points of efficacy and toxicity in randomized, clinical trials for women with breast cancer. Annals of Oncology. 2013. (advance access 9 January 2013, doi:10.1093/annonc/mds636)
Boutron I, Dutton S, Ravaud P, Altman DG. Reporting and Interpretation of Randomized Controlled Trials With Statistically Nonsignificant Results for Primary Outcomes. JAMA.2010;303(20):2058-2064. doi:10.1001/jama.2010.651.
Summary of the study
Many clinicians and reviewers make their minds up on an intervention after only reading the abstract. Can this lead to a misunderstanding of the outcomes of an RCT? This paper is a pragmatic systematic review aiming to quantify “Bias” and “Spin” in the reporting of results in the abstracts of breast cancer therapy RCTs.
They performed a pragmatic literature search: Medline only; English only; ended in August 2011; no small studies (< 200 participants); and focussed on trials that could « change clinical practice ». The scales used were not validated but the process is clearly presented. Of note is the short time between the last period of review (August 2011) and the online publication date (January 2013), about 16 months.
The paper essentially shows that non-significant results (no benefit for experimental arm) in regards to Primary Endpoints are under-reported in the abstract and replaced by significant results observed in secondary endpoints. Unfavourable results tend to be suppressed from the abstract results and/or conclusion sections. Interestingly, in significant favourable outcome situations, the importance of severe toxicity is under-reported. A different approach used by Boutron showed that the underreporting and the spin also occurred in the main paper. The main finding can be summarised by the authors finding that spin and bias were used to suggest efficacy in 59% of the trials that had no significant difference in their Primary Endpoints.
Also of note; industry funding was not associated with biased reporting of toxicity and the journal impact factor was not associated with biased reporting of toxicity.
The group also found the “Hierarchy scale for reporting of adverse events” very interesting and practical.
Trying to understand why reporting bias and spin were so frequent, it was noted that potentially unrealistic primary outcomes for trials may compel authors to spin non-significant results. We also noted that registration of the trials does not appear to bind authors to reporting registered primary endpoints.
We also acknowledge that current instructions to authors writing abstracts indicate that they must “highlight what is interesting” which is often interpreted as reporting positive and significant results rather than relevant non-significant primary endpoint results. Guidelines for reporting results of trials do not include criteria related to consistency between the stated primary endpoints, reported results in the main paper, and reported results in the abstract. It might be worthwhile to include such a step to avoid the biases documented in the referenced papers.
The Boutron et al paper reported on the “severity of spin” and considering the prevalence of spin confirmed in this paper we may want to find out if the presence of spin is associated with more citations, and if it is, can we find a dose-effect relationship between spin severity and citation frequency.
Again, Caveat lector!
SPIRIT 2013 Statement: Defining standard protocol items for clinical trials facilitated by Associate Professor Elaine Beller
SPIRIT 2013: Defining Standard Protocol Items for Clinical Trials
Chan A-W, Tetzlaff JM, Altman DG, Laupacis A, Gøtzsche PC, Krleža-Jerić K, Hróbjartsson A, Mann H, Dickersin K, Berlin J, Doré C, Parulekar W, Summerskill W, Groves T, Schulz K, Sox H, Rockhold FW, Rennie D, Moher D. SPIRIT 2013 Statement: Defining standard protocol items for clinical trials. Ann Intern Med 2013; Online first version. http://annals.org/article.aspx?articleid=1556168
Chan A-W, Tetzlaff JM, Gøtzsche PC, Altman DG, Mann H, Berlin J, Dickersin K, Hróbjartsson A, Schulz KF, Parulekar WR, Krleža-Jerić K, Laupacis A, Moher D. SPIRIT 2013 Explanation and Elaboration: Guidance for protocols of clinical trials. BMJ 2013;346:e7586.
The SPIRIT guidelines give a checklist of 33 items that should be included in clinical trial protocols. We wanted to look at the checklist and discuss particular items. We chose three items where we felt these were poorly done in recent grant applications we have reviewed:
1) Item 6, background and rationale
2) Item 12, outcomes
3) Item 19, data management
Discussion of the group:
Background and rationale
The background section of a grant application or protocol is frequently not a systematic or complete representation of the existing evidence. It is often a biased reporting of the literature.
Inclusion of the search strategy was suggested as an improvement.
The rationale is often poorly described. The incremental gain for doing this trial should be outlined (e.g. new extension of the population, different variant of the intervention).
Outcomes should be linked explicitly to the objectives/hypothesis.
We thought of it in the following framework: an outcome as a concept (e.g. diabetic control), the measurement chosen to represent that outcome (e.g. HbA1c), and the way of using that measurement to compare groups (e.g. difference in final value of HbA1c between the groups, adjusted for baseline HbA1c).
We agree with the SPIRIT authors that the clinical relevance of the chosen efficacy and harm outcomes should be given. That is, we should justify our choice of outcomes for this particular setting and stage of knowledge. If using a surrogate (like HbA1c), we should provide justification that it is a good surrogate for clinically-relevant outcomes.
This is often weak in grant applications, and could be better used to justify the personnel requested in a grant application. Investigators under-estimate the time needed and technical requirements for good data management. We thought the example in the SPIRT E&E paper was a good one.
We will use the SPIRIT checklist and explanatory paper when we write grant applications and protocols, and in our teaching (e.g. workshops).
Methods to increase response to postal and electronic questionnaires facilitated by Associate Professor Jane Smith
Edwards PJ, Roberts I, Clarke MJ, DiGuiseppi C, Wentz R, Kwan I, Cooper R, Felix LM, Pratap S. Methods to increase response to postal and electronic questionnaires. Cochrane Database Syst Rev. 2009(3):MR000008. Epub 2009/07/10.
Our Interest in this:
Surveys are a commonly used method in GP research. The systematic review title speaks for itself, and was chosen to find out what helps and what hinders.
This Cochrane review is 474 pages long and provides many comparisons of the significant impact of many different things including the types of communications, the words included in email subject line, incentives, pictures, signatures, gender, types of questions, and even threats.
Many surveys are delivered electronically now, so information about postal questionnaires may not be as relevant now.
We discovered a mix of “easy to guess” as well as less obvious things that change response rates.
Incentive rewards work especially when attached to the questionnaires rather than given after a response is received, but the value of the reward is not that important.
Monetary ones work better than non monetary ones with paper based surveys. Non monetary rewards (including money value vouchers) work with electronic ones.
Shorter questionnaires have higher response rates, even one page compared to 2 pages makes a difference.
Photos of investigators attached to emails improve responses, but it is not clear if they have to be particularly attractive ones or not, but there is a suggestion, that looking good may be advantageous.
Interesting questions are more likely to be answered.
Closed questions get higher response rates, but if repeated surveys sent open and closed questions response rates differ less.
Gender of researcher and signature may make difference, females had better responses, but we wondered if this would depend on the topics asked about.
“Veiled threats” improve survey returns, only one study on university hall residents.
Not including the word “survey” in the email subject line also improves the response rate.
Following up and reminding recipients works to a varying extent, phone calls appear less effective than mail or email.
There are many ways we can improve the likelihood of getting our surveys completed and returned.
These include rewarding recipients with incentives, making surveys shorter and questions more interesting, and sending reminders.
Despite it’s length the review is interesting to read.
Meta-analysis of fall-risk tools in hospitalized adults facilitated by Joyce Kee-Hsin Chen, RN, Supervisor, Dept. of Nursing Taipei Medical University-WanFang Medical Center, TW
Abstract of article
The aim of the study was to identify which fall-risk tool is most accurate for assessing adults in the hospital setting.
Falls can have physical, emotional, social, and financial consequences. Risk assessment affords the first opportunity in prevention.
To standardize the use of a fall-risk tool across the Baylor Health Care System, nurse executives undertook a meta-analysis of published research on fall-risk assessment tools used with adult inpatients.
Both random-effects and fixed-effects models showed that Morse Fall Scale had significantly higher sensitivity than St Thomas’s Risk Assessment Tool (STRATIFY). Specificity of Morse Fall Scale was significantly lower than that of STRATIFY with the fixed-effects model, but the random-effects model showed the opposite. Morse Fall Scale had a significantly higher Youden index than STRATIFY with the fixed-effects model (P = .001), but the result from random-effects model indicated no significant difference (P = .117). The sensitivity, specificity, and Youden index fell within the 95% confidence intervals.
Meta-analysis is a useful methodology for evaluating current evidence when variation exists in the literature.
Rapid critical appraisal of a systematic review
Step 1: Orientation: What question did the review ask?
|Population/ Problem||Hospitalize adults (age>=18 y/o)|
|Intervention||Fall-Risk Tools (MSF, STRATIFY)|
|Outcome(s)||Accurate screening (sensitivity, specificity and Youden index)|
|Type of question: Prognosis question Best study design: Randomized control trail, Prospective Cohort study (Outcomes are compared for matched groups with and without exposure or risk factor), Case-control study|
Step 2: How well was the review done?
|Did the search find all the relevant evidence?||
|Were the studies critically appraised?||
|…and were only the sufficiently valid studies included?||
|Did the authors “total up” the studies with summary tables and plots?||
|…and were the results similar between studies – Heterogeneity?||
Discussion of the group:
- Searching: (1) Pubmed and MEDLINE is similar database (duplicate); (2) Supplementary search (checking the references list, citation, registered trial). (3) The search process done by Oct. 2008, but it published on Nov (2 years lag). 2010. (4) Limited search in acute hospital setting.
- More detail information should be add, such as the definition of fall, research setting (eg. acute in-patient hospitals, community, hospices…), length of follow-up, percentage(%) of faller, consequences of fall (injury?), quality of the research…
- The quality of each study included into this MA did not report in this article. It should describe how the quality of each study was assessed using predetermined quality criteria appropriate to the type of clinical question.
- Ideally, initial instrument development study should not be included in a meta-analysis of validation instruments study. It may over-estimate the effect of the result.
- The best study of prognostic question is cohort prospective study. Non-prospective study should be excluded.
- Length of follow-up should be long enough to detect the outcome of interest. “Time” is very important variable in this MA.
- The Youden index of MFS is about 0.4, the Youden index of STRATIFY is about 0.2 (pretty low). Both instruments may be useless in clinical setting.
- Authors did not provide statistical analysis about homogeneity, but we can see the heterogeneity among the researches (different setting, age group, wide confidence interval…). Subgroup analysis is needed.
- Recommendations to further project: (1) It probably difficult to find good quality pediatric assessment tools to do a SM/MA. Consider about: (2) Hospital wide falls prevention program vs. Risk assessment tool? (3) What is the definition of “fall” in pediatric population? (4)What are the different variables differ from adults in pediatric population? (5) High risk fall pediatric population (giant disturbance, cerebral palsy…) and the consequence (head injury, fracture…) of falls.
Training family physicians in shared decision-making to reduce the overuse of antibiotics in acute respiratory infections: a cluster randomized trial facilitated by Professor Chris Del Mar
Légaré F, Labrecque M, Cauchon M, Castel J, Turcotte S, Grimshaw J. Training family physicians in shared decision-making to reduce the overuse of antibiotics in acute respiratory infections: a cluster randomized trial. Can Med Assoc J 2012.
Professor Chris Del Mar presented this cluster RCT – relevant to the CREMARA – together with the ‘Shared Decision Making Support Tools’ (appendix to the paper).
Associate Professor Elaine Beller presented the Study Protocol (BMC Fam Pract 2011) and pilot Protocol (BMC Fam Pract 2007)
1 Randomisation after baseline data collected is suboptimal. Although loss from each arm was similar, some baseline characteristics were not balanced (eg Table 3 and Table 4).
An added refinement might be to stratify by prescribing rates (detected in the Baseline phase).
2 The patient recruitment was low: average of 3/physician for the whole season (and some physicians recruited none!). Presumably this was because each patient had to be recruited by the RA in the waiting room. But does this mean that the patients influenced might have been higher – might other patients presenting with ARIs have reduced prescribing rates? IE to what extent is the intervention generalisable (or do we have to intervene with every patients in the waiting room to get this effect)? See 3 below for a possible way of addressing this.
3 Outcomes: ‘intention to use antibiotics’ is clearly a sub-optimal primary outcome because it is so soft. In Australia (and UK) it would be possible to measure actual ABs dispensed. In the meantime, it would be good to know whether there are any measures of harder outcomes, including ABs prescribed. I will write to France Légaré to ask if they have access to these data. (This would also address issue 2 above).
4 What was the intervention? As with all complex interventions, the effective components are sometimes hard to tease out. In this case, was it the ‘epidemiological’ education that did the trick, or the introduction to ‘shared decision-making’?
5 More minor things:
a. More clusters would be better (and easier in Australia where practices appear to be smaller)
b. What’s the commercial influence? It is fee-for-service, or capitation, (or blended) payment systems? (In Australia, GPs might prescribe ABs because they think it makes commercial sense – “It’s what the patients come for, why wouldn’t I give it to them?”)
c. Was there contamination (despite the controls not having access to the Training, because the GPs were academic doctors who may all have known about the trial, its intent, and its hoped for outcome?
Sitting time and all-cause mortality risk in 222 497 Australian adults facilitated by Professor Chris Del Mar
Van der Ploeg HP, Chey T, Korda RJ, Banks E, Bauman A. Sitting time and all-cause mortality risk in 222 497 Australian adults. Arch Intern Med 2012;172:494-500.
BACKGROUND: Prolonged sitting is considered detrimental to health, but evidence regarding the independent relationship of total sitting time with all-cause mortality is limited. This study aimed to determine the independent relationship of sitting time with all-cause mortality.
METHODS: We linked prospective questionnaire data from 222 497 individuals 45 years or older from the 45 and Up Study to mortality data from the New South Wales Registry of Births, Deaths, and Marriages (Australia) from February 1, 2006, through December 31, 2010. Cox proportional hazards models examined all-cause mortality in relation to sitting time, adjusting for potential confounders that included sex, age, education, urban/rural residence, physical activity, body mass index, smoking status, self-rated health, and disability.
RESULTS: During 621 695 person-years of follow-up (mean follow-up, 2.8 years), 5405 deaths were registered. All-cause mortality hazard ratios were 1.02 (95% CI, 0.95-1.09), 1.15 (1.06-1.25), and 1.40 (1.27-1.55) for 4 to less than 8, 8 to less than 11, and 11 or more h/d of sitting, respectively, compared with less than 4 h/d, adjusting for physical activity and other confounders. The population-attributable fraction for sitting was 6.9%. The association between sitting and all-cause mortality appeared consistent across the sexes, age groups, body mass index categories, and physical activity levels and across healthy participants compared with participants with preexisting cardiovascular disease or diabetes mellitus.
CONCLUSIONS: Prolonged sitting is a risk factor for all-cause mortality, independent of physical activity. Public health programs should focus on reducing sitting time in addition to increasing physical activity levels.
Discussion of the group:
- Very interesting hypothesis-generating study.
- The large proportion of people claiming <4 hours sitting /day stretches credibility.
- What was the alternative to sitting?
- Perhaps the known and unknown confounders could explain the observed effect, the size of which was modest.
- Probably too soon to recommend less sitting until better data are available.