As we begin a new decade of research in trauma and orthopaedics, we should aim to make the most of the best available data. The last decade saw a huge increase in the volume of routinely recorded healthcare data. These datasets, particularly clinical registries and large administrative databases, can be valuable sources of information but need to be understood, analysed, interpreted, and reported carefully. We have previously highlighted the importance of understanding why a dataset was established, as well as the quality of the data in order to guide the interpretation of research findings.1,2 In this editorial, we aim to revisit both the importance of such data sources and the critical methodological principles that should be followed when drawing inferences from large datasets.
We recognise that big data offers the potential to answer many questions, particularly in relation to rare events and rare diseases, that cannot be answered using traditional methods.3,4 It also offers an opportunity to track practice over time and examine healthcare delivery throughout big healthcare systems.5-12 There is also huge potential in linking big data sets to address questions that cannot be looked at in any other ways.5,13
We have previously highlighted the dangers of misclassification bias, lumping, reliance on proxy outcomes, and overlooking both measured and unmeasured confounders.1 We have also both celebrated and warned against the power of such large numbers; while alluring, they must be interpreted using sound clinical understanding. There is a risk that size of a datasets may expand at the expense of data quality,14,15 which needs to be carefully understood before inferences are drawn.
We should embrace the opportunities provided by large datasets, both to guide practice and generate hypotheses. However, although inferences drawn from registry data and administrative databases will increasingly contribute to debates, they cannot replace other study designs, particularly prospective cohort studies and randomised controlled trials. The appended framework for the reporting of registry and big data studies lays out the minimum information that should be presented, both to help readers interpret study findings appropriately and to improve the reproducibility of these important studies. Transparent reporting is at least as important in this arena as it is in others, and will be mandated.
Over the past few years, we have raised our expectations around study reporting and supported the use of well-established guidelines, such as the Consolidated Standards of Reporting Trials (CONSORT), the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Strengthening and Reporting of Observational Studies in Epidemiology (STROBE) statements. We have previously suggested using the Reporting of studies Conducted using Observational Routinely-collected health Data (RECORD) Statement for ‘big data’ studies. The information and guidelines recommended by Perry et al in 2014 were excellent and set a new standard that should be followed when reporting big data studies.1 We suggested at the time that these should be used as an adjunct to the STROBE statement.
We now propose an expanded version that seeks to guide authors and to reassure readers. This document will further support methodological transparency and allow us to fully exploit the huge opportunities made available by large datasets. We also encourage authors to publish protocols for big data studies in our sister journal Bone & Joint Open, to reassure readers that any findings were not simply the result of statistical oddities from data mining, but were considered analyses based on a priori hypotheses. We do not believe that there is a conflict between our expanded recommendations and the RECORD statement, but welcome the views of our authors, readers, reviewers and other colleagues who work with big data or rely on such studies to inform their clinical practice.
SEARCHeD:
Section/Topic | Item No. | Checklist item |
---|---|---|
Title and abstract | ||
1a | Identification as a healthcare registry study in the title or abstract | |
1b | Structured summary of study design, methods, results, and conclusions | |
1c | Data source including name of databases and geographic location | |
1d | Data processing undertaken including linkage and cleaning | |
Introduction | ||
Background and objectives | 2a | Scientific background and rationale for study |
2b | Specific objectives (if exploratory) and/or hypotheses | |
Methods | ||
Study design | 3a | Description of study design including data sources used, geographic location and data linkage |
3b | Description of the routine healthcare data utilised, data set completeness and internal QA of the registry | |
3c | Reference to study registration document or protocol if available. Approval number and date must be included | |
Participants | 4a | A clear statement of the inclusion criteria for participants included in the study |
4b | Population level selection criteria including filtering based on data quality, availability and linkage | |
4c | Data source and/or queries used including codes, time frames for recruitment, exposure and outcomes | |
4d | Settings and locations where the data were collected | |
Variables | 5 | Extent of missing co-variable data, handling of incomplete data, and flow diagram for dataset |
6a | Completely defined co-variables, demographic variables, justification for selection including potential confounders and missing potentially relevant data | |
6b | If using matched or comparison cohort series (e.g. propensity matching) selection and matching criteria | |
Outcomes | 7 | How outcomes were determined. Justification of outcome measures, including choice of follow-up duration |
Statistical methods | 8a | Precisely define access to source datasets – is this an extract? |
8b | Methods for data processing and handling of missing data. Flow chart for data cleaning | |
8c | Methods for data linkage if appropriate, e.g. single identifier or other method of linkage Describe any QA steps for linkage | |
Results | ||
Participant flow | 9 | Patients available described by text and flow diagram (required) |
Matching | 10a | Patient numbers in each cohort based on matching criteria, or other criteria (if undertaken) |
10b | A table showing baseline demographic and clinical characteristics for each group, and QA for matching (if undertaken) | |
Numbers analysed | 11 | For each group, number of participants (denominator) included in each analysis and what proportion of the potential registry population was included |
Outcomes and estimation | 12a | Effect estimates (e.g. odds ratios) along with precision estimates (e.g. 95% CI) for each analysis |
12b | Make clear which confounders were adjusted for and which were not. Provide data to support the choice of statistical model, e.g. explicitly test the proportional hazards assumption before reporting data from Cox regression models | |
Sensitivity analysis | 13 | Where sensitivity analyses have been undertaken, they should be reported completely |
Discussion | ||
Generalisability | 14 | Generalisability (external validity, applicability) of the findings to individual and population settings |
Limitations | 15a | Discussion of implications of using routinely collected data not collected for this research question should be thoroughly discussed and explored. Finding should be set against pre-existing research and justification of the use of registry data as opposed to other methods. |
15b | Study limitations, addressing sources of potential bias, imprecision, and, if relevant, multiplicity of analyses | |
Biases | 16 | Specific considerations should be given to misclassification bias, unmeasured confounders, and changing eligibility criteria over time |
Other information | ||
Registration | 17 | Registration number and name of study registry or source dataset |
Protocol | 18 | Where the full protocol can be accessed, if available. Who and when approval was given for the analysis along with application reference number |
Funding | 19 | Sources of funding and other support |
References
1. Perry DC , Parsons N , Costa ML . ‘Big data’ reporting guidelines: how to answer big questions, yet avoid big problems . Bone Joint J . 2014 ; 96-B ( 12 ): 1575 – 1577 . Google Scholar
2. Perry DC , Parsons N , Costa ML . Surgeon level data: understanding the plot . Bone Joint J . 2013 ; 95-B ( 9 ): 1156 – 1157 . Crossref PubMed Google Scholar
3. Metcalfe D , Peterson N , Wilkinson JM , Perry DC . Temporal trends and survivorship of total hip arthroplasty in very young patients: a study using the National Joint Registry data set . Bone Joint J . 2018 ; 100-B ( 10 ): 1320 – 1329 . Crossref PubMed Google Scholar
4. Broadhurst C , Rhodes AML , Harper P , Perry DC , Clarke NMP , Aarvold A . What is the incidence of late detection of developmental dysplasia of the hip in England? A 26-year national study of children diagnosed after the age of one . Bone Joint J . 2019 ; 101-B ( 3 ): 281 – 287 . Google Scholar
5. Metcalfe D , Zogg CK , Judge A , et al. Pay for performance and hip fracture outcomes: an interrupted time series and difference-in-differences analysis in England and Scotland . Bone Joint J . 2019 ; 101-B ( 8 ): 1015 – 1023 . Crossref PubMed Google Scholar
6. Middleton R , Wilson HA , Alvand A , et al. Outcome-based commissioning of knee arthroplasty in the NHS: system error in a national monitoring programme and the unintended consequences on achieving the Best Practice Tariff . Bone Joint J . 2018 ; 100-B ( 12 ): 1572 – 1578 . Crossref PubMed Google Scholar
7. Kristensen TB , Dybvik E , Furnes O , Engesæter LB , Gjertsen JE . More reoperations for periprosthetic fracture after cemented hemiarthroplasty with polished taper-slip stems than after anatomical and straight stems in the treatment of hip fractures: a study from the Norwegian Hip Fracture Register 2005 to 2016 . Bone Joint J . 2018 ; 100-B ( 12 ): 1565 – 1571 . Crossref PubMed Google Scholar
8. Larsen P , Rathleff MS , Østgaard SE , Johansen MB , Elsøe R . Patellar fractures are associated with an increased risk of total knee arthroplasty: A Matched Cohort Study of 6096 Patellar Fractures with a mean follow-up of 14.3 Years . Bone Joint J . 2018 ; 100-B ( 11 ): 1477 – 1481 . Crossref PubMed Google Scholar
9. Jameson SS , Asaad A , Diament M , et al. Antibiotic-loaded bone cement is associated with a lower risk of revision following primary cemented total knee arthroplasty: an analysis of 731,214 cases using National Joint Registry data . Bone Joint J . 2019 ; 101-B ( 11 ): 1331 – 1347 . Crossref PubMed Google Scholar
10. Lamb JN , Matharu GS , Redmond A , Judge A , West RM , Pandit HG . Patient and implant survival following intraoperative periprosthetic femoral fractures during primary total hip arthroplasty: an analysis from the national joint registry for England, Wales, Northern Ireland and the Isle of Man . Bone Joint J . 2019 ; 101-B ( 10 ): 1199 – 1208 . Crossref PubMed Google Scholar
11. Abram SGF , Judge A , Beard DJ , Carr AJ , Price AJ . Long-term rates of knee arthroplasty in a cohort of 834 393 patients with a history of arthroscopic partial meniscectomy . Bone Joint J . 2019 ; 101-B ( 9 ): 1071 – 1080 . Crossref PubMed Google Scholar
12. Moeini S , Rasmussen JV , Salomonsson B , et al. Reverse shoulder arthroplasty has a higher risk of revision due to infection than anatomical shoulder arthroplasty: 17 730 primary shoulder arthroplasties from the Nordic Arthroplasty Register Association . Bone Joint J . 2019 Jun ; 101-B ( 6 ): 702 – 707 . Crossref PubMed Google Scholar
13. Sabah SA , Moon JC , Jenkins-Jones S , et al. The risk of cardiac failure following metal-on-metal hip arthroplasty . Bone Joint J . 2018 ; 100-B ( 1 ): 20 - 27 . Erratum in: Bone Joint J. 2018 Sep ; 100-B ( 9 ): 1260 . Crossref PubMed Google Scholar
14. Masters J , Metcalfe D , Parsons NR , Achten J , Griffin XL , Costa ML ; WHiTE Collaborative Investigators . Interpreting and reporting fracture classification and operation type in hip fracture: implications for research studies and routine national audits . Bone Joint J . 2019 ; 101-B ( 10 ): 1292 – 1299 . Crossref PubMed Google Scholar
15. Cundall-Curry DJ , Lawrence JE , Fountain DM , Gooding CR . Data errors in the National Hip Fracture Database: a local validation study . Bone Joint J . 2016 ; 98-B ( 10 ): 1406 - 1409 . Crossref PubMed Google Scholar
Open access statement
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial No Derivatives (CC BY-NC-ND 4.0) licence, which permits the copying and redistribution of the work only, and provided the original author and source are credited. See https://creativecommons.org/licenses/by-nc-nd/4.0/.
Follow B. Ollivere @benollivere
Follow D. Metcalfe @TraumaDataDoc
Follow D. C. Perry @MrDanPerry