Article Text

Download PDFPDF

Living in the grey area: a case for data sharing in observational epidemiology
  1. Brian D Johnston
  1. Correspondence to Dr Brian D Johnston, Department of Pediatrics, Harborview Medical Center, University of Washington, 325 Ninth Ave, Box 359774, Seattle, WA 98104, USA; ipeditor{at}

Statistics from

In 2001, Injury Prevention journal published a paper by Macpherson and colleagues looking at the effect of a bicycle helmet ordinance on bicycle ridership in East York, Ontario, Canada.1 The questions addressed by the authors are important. While helmet laws may be enacted to promote helmet use and reduce bicycle-related injury, they could easily have unintended and undesirable consequences. Critics point out that helmet laws may send a message that cycling is inherently dangerous and could dissuade riders. In addition, there may be others who will choose not to ride if wearing a helmet is perceived as uncomfortable, cumbersome or costly. These potential effects on ridership must be acknowledged, measured and—if needed—mitigated. Active transportation through cycling is a promising strategy for building physical activity into daily life, directly addressing the public health burden of overweight and obesity, and reducing carbon emissions.2 Thus, any benefits from a law intended to prevent injury must be weighed against its effects on competing public health priorities and the interests of private citizens. Macpherson's paper measured ridership in one area before and after enactment of a helmet law and suggested that, in fact, the law had very little impact on the number of child cyclists. These data were also the focus of her 2003 PhD thesis.3

In 2003, Robinson published a research letter in Injury Prevention critical of Macpherson's findings.4 She argued that the study was sited in a jurisdiction where helmet laws were not enforced, that the total number of observations was small and that substantial year-to-year variability in ridership might be driven by variation in sites chosen for observation, date and time of day observed, and average weather conditions. Macpherson and colleagues, in a reply, attempted to address these concerns and called for more study. In a related development, the journal, in 2006, published a correction to tabular data in the 2001 paper.5 The authors asserted that using the corrected data did not change the direction or interpretation of their findings. The corrected table 1 was ostensibly available online as a supplementary file.

Fast forward then to 2010: since September of that year I have been engaged in a series of email exchanges with Dr Michael Kary who expressed anew concerns about the Macpherson paper and other work published by her group. Kary's arguments have been posted online and can be reviewed there by interested readers.6 As editor, I felt my primary obligation was to insure that the integrity of the scientific record was preserved, at least insofar as it involves papers published in our journal. To that end, the most relevant arguments seemed to me to involve:

  • inability to find the ‘corrected table 1’ that we indicated was available online;

  • selective and sometimes contradictory descriptions of the effect of weather as a confounding factor in these studies; and

  • discrepancies between the cyclist counts and ridership rates reported in our journal and those reported in other publications, ostensibly from the same data set.

I was dismayed to note that, in fact, there was no ‘corrected table 1’ to be found in the online supplementary materials. We had published a notice of correction and pointed readers to a file that for many years did not exist. I do not know that it ever existed online, as it seems hard to imagine the file would simply disappear once posted. Happily, despite a major change in the manuscript handling system in use and the coming and going of several editorial assistants, the staff in London were able to find an original paper copy of the corrected table and have uploaded this to the site. I can only offer apologies for our error here.

Kary's concerns about the effects of weather, like Robinson's previous critique of the study procedures, merit careful consideration. I believe that a number of aspects of this paper could be criticised in terms of methodology and interpretation of results. Indeed, most observational epidemiology suffers from deficits in design and uncontrolled confounding. We trust authors to note these limitations in their text, referees to flag these during peer review and readers to raise these issues in post-publication feedback. And we encourage investigators to anticipate and respond openly to legitimate critiques, building on this discussion to improve the design and credibility of subsequent work. But the data, as reported, should be unimpeachable and subject to review, re-analysis and reinterpretation as needed.

Which is why Kary's third point is most concerning. We would like to believe that the authors have a single dataset with a reproducible method for analysing and reporting results. Basic data, like the number of subjects counted in a year, should not change from publication to related publication. My impression is that the overall conclusions of the papers are not altered regardless of which data are considered definitive. But the inconsistency without explanation diminishes the credibility of the results and diverts attention from the central research question.

I contacted the authors and pressed them on this point. They replied that ‘In Canada, the ethics guidelines around data suggest (or require) that data be destroyed within 5 years. Although we did publish a follow-up paper using a similar database, the data and the SAS code used in this article (and the follow-up article) no longer exist’.7 Without access to the original data file, there is no way for anyone to definitively resolve these contradictions. I have thus closed our review of this case.

As an editor, I firmly believe that the scientific process is best served by efforts to retain, manage and share research data. With carefully archived data, results can be verified and new analyses conducted with existing information. Were such a process encouraged, even allowed, 10 years ago, many of the concerns aired about Macpherson's paper could have been resolved quickly and collaboratively. But is it only recently that investigators, funders and regulatory agencies have started to view research data as a public investment that must be carefully managed. Archiving and reuse of these data represents the best return on investment for funders and, one can argue, maximises the societal benefit returned for the aggregate personal risk and inconvenience assumed by research participants. In this sense, data management is economically sound and ethically imperative.

Of course, there are enormous hurdles to clear in achieving this vision. Journals need to develop policies for data archiving and data access in conjunction with published work. We need methods for linking to data files and for indexing, citing and crediting data authors for the product they have created. Funders requiring data archiving and accessibility will need to build financial support for this process into their awards. Investigators will need to consider data management from the inception of their study. Plans for structuring and documenting their database will be impacted by new standards and expectations. Informed consent, data security and subsequent protection of personal and confidential information will also be affected. We should expect scrutiny where health information is involved, but one can argue that centralised, standard data management is likely to be more secure than today's haphazard patchwork of individual investigators managing their own data under the varied guidance of local research ethics boards.

It will be no small task to move research culture to a point where data management and data sharing become the norm. But there is help. The Organisation for Economic Co-operation and Development (OECD) has released principles and guidelines for managing access to research data.8 The US National Science Foundation now requires grantees to have a data management plan.9 And, in the UK, the Medical Research Council calls for research data to be ‘managed and curated’ effectively, including setting minimum retention periods for some types of research information.10 As these requirements come to the fore, organisations and services for data management have also been developed (see, for example, Using a data management service encourages standardisation and improves access to other researchers. It also addresses issues of back up and security that individual investigators may find daunting.

As strategies and expectations for data management and accessibility are clarified, it is my hope that Injury Prevention, along with other BMJ Group journals, will lead the public health field in setting standards for data sharing in conjunction with publication. In the meantime, we are happy to work with authors who have archived data or analysis codes to share. Observational epidemiology, in particular, could use a nudge in this direction. It is almost impossible to imagine ‘replicating’ most observational studies—they are highly contextual, time intensive and subject to innumerable influences beyond the control of investigators. But once a data set is assembled, the analysis itself should be replicable. Making data available for colleagues to explore, reformat and reinterpret, takes us out of the grey areas of individual endeavour and into the bright light of a broadly shared process. This is clearly the way forward for our field.


View Abstract


  • Competing interests None.

  • Provenance and peer review Commissioned; internally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.