Why are preclinical studies mostly positive but the major clinical phase III trials neutral/negative? Our analysis clearly revealed that neutral results are underreported and that there is a lack of preclinical cardioprotection studies with a prospective, power analysis-based design, and a publication bias for positive results in those studies without a prospective design.
We realize that a power analysis is an established instrument to avoid a type II error in a specific, individual study, i.e., to prospectively determine a sample size with the aim to not miss an expected or reasonable effect size with a given probability. We here used a retrospective power analysis to quantitatively characterize a research field with a larger number of data sets with the effect of the size that was actually observed as sufficiently powered or not with an adequate sample size, if it were repeated. In our study, the neutral results can serve as a positive control for such use since they all had a power of less than 0.8, whether they were prospectively planned or not. However, the low power is no surprise since non-significant p-values always correspond to low retrospective power [13].
Thus, our use of a retrospective power analysis was not aiming to assess whether or not a given hypothesis in an individual study is indeed correct, but on the chance that positive results in a research field are repeated under the assumption of the sample size which was used and the effect size which was observed—this is in our view one essential feature of robustness [3]. Of course, there are other factors, apart from too small sample size, which undermine the robustness of preclinical cardioprotection studies, most notably lack of a priori definition of exclusion and inclusion criteria, lack of proper randomization and lack of blinding of the investigators [1, 2]. Different from power, these other factors cannot be identified retrospectively, unless they are explicitly specified in the study.
We are aware that the approach to calculating retrospective power can be criticized, since non-significant p-values always correspond to low retrospective power as mentioned above [13]. Nevertheless, we think that the approach is justified for our intention in the present study. We could not determine the positive predictive value because the baseline probability for a positive result was unknown for the analyzed studies.
Clinical trials are designed prospectively and must be pre-registered, e.g., on clinicaltrials.gov or else, and will not be published in a rigorous, high-impact journal unless they are pre-registered—so the final results can always be compared to the original study design, and neutral and negative data cannot be hidden. There are attempts to establish pre-registration also for preclinical studies [19] but pre-registration is unfortunately not really accepted and used by the cardioprotection community so far, and its use is not a prerequisite for publication in a decent journal—a measure that largely promoted the use of pre-registration for clinical trials.
What can be done to improve robustness of preclinical cardioprotection studies and their potential translation to clinical practice? For truly exploratory studies, a positive publication bias will certainly remain. It is unrealistic to expect scientists who have generated preliminary exploratory data which are neutral to pursue these studies and report these data; they will prefer to move on to something more exciting. However, when there is the aim of translation—and most publications in the field start in fact their introduction with an emphasis on the mortality and morbidity from ischemic heart disease—neutral data should indeed be reported. The only 13% data sets with neutral results in our analysis, therefore, most probably reflect the tip of the iceberg and contradict all reason and experience; this notion is supported by the few translationally most important pig studies with a prospective design where 36% were neutral. We realize that for innovative exploratory studies with truly novel findings, an a priori effect size cannot be estimated. However, whenever a priori information on the intervention under study exists and an effect size can be quantified or assumed for power analysis, we recommend a prospective power analysis, and—if translation to patient benefit is aimed for—a power of 0.9 for a significance level of α = 0.05 effect; we recognize that a power of 0.9 requires a larger effect size and/or a larger sample size. Even in the absence of prior data, when translation to patient benefit is aimed for, one could define a “clinically relevant” effect size of e.g., an infarct size reduction by 25% of the area at risk or by 5% of left ventricular mass, respectively, as a surrogate and then still use a prospective power analysis-based study design. At the very least, the authors of all studies that did not use a prospective power analysis-based study design but aim for translation could be requested to present the exact p-values and corresponding confidence intervals of their significant study results.
For journal editors, we propose to not only explicitly encourage publication of neutral data but also not to put the burden to explain discrepant results from prior studies on the authors of the neutral study, in particular when the neutral study has a power analysis-based prospective design. Strictly speaking, only the rejection of the Null hypothesis needs a reasonable explanation. Maybe, a journal editor in a case where a neutral study did not confirm a prior positive study should solicit a comment from the authors of the prior positive study. Basic Research in Cardiology has just done that, and the results were indeed enlightening [14, 16]. Also, the request for the identical repetition of a positive prior study is futile, as that would require not only use of animals of the same breed, sex and age [15,16,17], the same anesthesia, surgical approach and study protocol including route of administration, timing and dosing of the cardioprotective intervention, but also keeping such minute conditions constant as time of the year [16, 21], time of the day [4, 5], the composition of diet and tap water [15] which all impact on the study results and its robustness but are impossible to replicate. Of course, all these variables are also relevant for clinical cardioprotection trials and may differ between one trial and another one. Therefore, it is an important consideration whether or not a cardioprotection of interest relies on robust preclinical data before embarking on a clinical trial.
In conclusion, better reporting of positive and neutral data will further improve the rigor and robustness of preclinical cardioprotection studies and thus facilitate their translation to patient benefit [1, 2, 8, 15].
Comments (0)