PT - JOURNAL ARTICLE AU - Yifan Zhang AU - Erin E Holsinger AU - Lea Prince AU - Jonathan A Rodden AU - Sonja A Swanson AU - Matthew M Miller AU - Garen J Wintemute AU - David M Studdert TI - Assembly of the LongSHOT cohort: public record linkage on a grand scale AID - 10.1136/injuryprev-2019-043385 DP - 2020 Apr 01 TA - Injury Prevention PG - 153--158 VI - 26 IP - 2 4099 - http://injuryprevention.bmj.com/content/26/2/153.short 4100 - http://injuryprevention.bmj.com/content/26/2/153.full SO - Inj Prev2020 Apr 01; 26 AB - Background Virtually all existing evidence linking access to firearms to elevated risks of mortality and morbidity comes from ecological and case–control studies. To improve understanding of the health risks and benefits of firearm ownership, we launched a cohort study: the Longitudinal Study of Handgun Ownership and Transfer (LongSHOT).Methods Using probabilistic matching techniques we linked three sources of individual-level, state-wide data in California: official voter registration records, an archive of lawful handgun transactions and all-cause mortality data. There were nearly 28.8 million unique voter registrants, 5.5 million handgun transfers and 3.1 million deaths during the study period (18 October 2004 to 31 December 2016). The linkage relied on several identifying variables (first, middle and last names; date of birth; sex; residential address) that were available in all three data sets, deploying them in a series of bespoke algorithms.Results Assembly of the LongSHOT cohort commenced in January 2016 and was completed in March 2019. Approximately three-quarters of matches identified were exact matches on all link variables. The cohort consists of 28.8 million adult residents of California followed for up to 12.2 years. A total of 1.2 million cohort members purchased at least one handgun during the study period, and 1.6 million died.Conclusions Three steps taken early may be particularly useful in enhancing the efficiency of large-scale data linkage: thorough data cleaning; assessment of the suitability of off-the-shelf data linkage packages relative to bespoke coding; and careful consideration of the minimum sample size and matching precision needed to support rigorous investigation of the study questions.