Population of interest and setting
Guy’s and St Thomas’ NHS Foundation Trust (GSTT) is a multi-site inner-city healthcare institution providing general and emergency services predominantly to the South London boroughs of Lambeth and Southwark. NHS is the National Health Service in the UK. The acute-admitting site (St Thomas’ Hospital) has an emergency department with a large critical care service. A second hospital site (Guy’s Hospital) provides elective surgery, haemato-oncology, renal transplantation and other specialist services. There are also several community sites providing dialysis, rehabilitation and long-term care. Only COVID-19 cases admitted through the emergency department (ED) during March 13th 2020 and May 27th 2021 were included in this study. Patients dying or being discharged in the first 24h were considered most likely to have reached study endpoint independent of any steroid effect and were excluded from the primary analysis.
SARS-CoV-2 laboratory testing
GSTT has an on-site laboratory providing SARS-CoV-2 testing to all patients and hospital care workers (HCW). The policies and technologies employed for SARS-CoV-2 testing changed over time based on national and local screening guidance and improvements in diagnostics. Our laboratory began testing on 13th March 2020 with initial capacity for around 150 tests per day, before increasing to around 500 tests per day in late April during wave one, and up to 1000 tests per day during the second wave.
Assays used for the detection of SARS-CoV-2 RNA include PCR testing using Aus Diagnostics or by the Hologic Aptima SARS-CoV-2 Assay. Testing commenced during the first wave on 13th March 2020 limited to cases requiring admission or inpatients who had symptoms of fever or cough, as per national recommendation; guidance suggested cases who did not require admission should not be tested. Cases without laboratory confirmation of SARS-CoV-2 infection were not included.
Cases were identified by the first positive SARS-CoV-2 RNA test. The severe cases were measured by hypoxia upon admission to hospital. Cases were taken to be hypoxic if on admission they had oxygen saturations of < 94%, if they were recorded as requiring supplemental oxygen, or if the fraction of inspired oxygen was recorded as being greater than 0.21.
Determination of SARS-CoV-2 lineage
Whole genome sequencing of residual samples from SARS-CoV-2 cases was performed using GridION (Oxford Nanopore Technology), using version 3 of the ARTIC protocol  and bioinformatics pipeline . Samples were selected for sequencing if the corrected CT value was 33 or below, or the Hologic Aptima assay was above 1000 relative light units (RLU). During the first wave sequencing occurred between March 1st − 31st, whilst sequencing restarted in November 2020 and is ongoing. Lineage determination was performed using updated versions of pangolin 2.0 . Samples were regarded as successfully sequenced if over 50% of the genome was recovered and if lineage assignment by pangolin was given with at least 50% confidence.
Data sources, extraction and integration
Clinical, laboratory and demographic data for all cases with a laboratory reported SARS-CoV-2 PCR RNA test on nose and throat swabs or lower respiratory tract specimens were extracted from hospital electronic patient record (EPR) data sources using records closest to the test date (DXC Technology’s i.CM EPR, Philips IntelliVue Clinical Information Portfolio (ICIP) Critical Care, DXC Technology’s MedChart, e-Noting and Citrix Remote PACS - Sectra). Data was linked to the Index of Multiple Deprivation (IMD), with 1 denoting the least deprived areas, and 5 the most deprived ones. Age and sex were extracted from EPR. Self-reported ethnicity of cases was stratified to be White, BAME (Black, Asian and Minority Ethnic) and Unknown according to the 18 ONS categories of White (British, Irish, Gypsy and White-Other), Black (African, Caribbean, and Black-Other), Asian (Bangladeshi, Chinese, Indian, Pakistan, and Asian-Other), and Mixed/Other.
Comorbidities, medication history, and medicine data were extracted from the EPR and e-Noting using structured queries with corresponding dictionaries. Comorbidities were extracted from any of the databases covering the pathway of the cases from arrival in accident and emergency through inpatient general ward and critical care unit, where applicable, to hospital discharge or death. If a comorbidity was not recorded, we assume that it was not present. Cases were characterised as having/not having a past medical history of hypertension, cardiovascular disease (stroke, transient ischaemic attack, atrial fibrillation, congestive heart failure, ischaemic heart disease, peripheral artery disease or atherosclerotic disease), diabetes mellitus, chronic kidney disease, chronic respiratory disease (chronic obstructive pulmonary disease, asthma, bronchiectasis or pulmonary fibrosis) and neoplastic disease (solid tumours, haematological neoplasias or metastatic disease). Additionally, checks on free text data were performed by a cardiovascular clinician to ensure the information was accurate.
Steroid treatment was measured by number of prescription-days with dexamethasone, hydrocortisone, prednisolone or methylprednisolone. Duration of treatment with steroids was calculated as cumulative days throughout first hospital admission after the first SARS-CoV-2 PCR positive test through to discharge or death during that admission. Analysis for lengths of steroid use were conducted in multivariate model with steroid use ≤ 3 days versus steroid use > 3 days. The cut-off for the steroid treatment days were chosen according to the interquartile range of steroid-days (3 to 10 days) in RECOVERY trial. Sensitivity analysis was conducted with continuous steroid days as the variable input in the Cox proportional hazards model.
The outcome was all-cause in-hospital mortality (WHO-COVID-19 Outcomes Scale 8), with patients still hospitalised at the end of the cohort considered censored.
The general statistics were summarised with mean and standard deviation (SD) for continuous variables if the distribution is normal and median and interquartile range (IQR) if the distribution is non-normal. Count and percentages were used for categorical variables. For the comparisons of the cohort statistics with different lengths of steroid use days (< 3 days vs. ≥ 3 days), Kruskal-Walllis test was used for continuous variables and Chi-squared test for categorical variables. The reference significant level was set to be p < 0.05.
Cox proportional hazards models were used for time-to-event survival analysis in which the time was starting from hospital admission and events as the defined outcomes. Adjusted hazards ratios for the primary and secondary outcomes using Cox proportional hazards models were presented. The adjusted variables used in the model were selected via literature review  and clinical experts (Additional file Table A). Age, sex, Body Mass Index (BMI) > 30kg/m2, hypertension, cardiovascular disease, diabetes, respiratory disease, chronic kidney disease, sequenced SARS-CoV-2 variant and medications including steroids and tocilizumab/sarilumab were used as pre-defined covariates to adjust in multivariable models. As the distribution of steroid days is right skewed (steroid days ≥ 0), before modelling, the continuous steroid days were transformed with the log of steroid days plus one (log(steroid days + 1)). Missing values of the variant, BMI and ethnicity were imputed as a new category and cases with missing values in IMD were discarded. There were no missing values in other adjusted variables.
Data management was performed using SQL databases, with analysis carried out on the secure King’s Health Partners (KHP) Rosalind high-performance computer infrastructure  running Jupyter Notebook 6.0.3, R 3.6.3 and Python 3.7.6.