Currently, the Global Initiative on Sharing All Influenza Data (GISAID)1 contains the largest SARS-CoV-2 viral sequence database to date, containing more than 14 million samples to date. Despite the large number of sequences deposited, the utility of most samples for data analysis is limited due poorly annotated clinical information. Despite this, we collated a dataset of samples from patients annotated with favourable outcomes (such as mild, asymptomatic disease) as our controls, and samples annotated with patients with unfavourable outcomes (dead, critical) as our cases. We utilised the RONIN platform to run our machine learning tool, VariantSpark2, to perform an association study on 3412 cases and 7109 controls with the aim of detecting mutations in SARS-CoV-2 that correlate with patient outcome. Our approach identified mutations previously known to impact viral transmission rates and disease severity, such as D614G and V1176F, associated with the Brazil and South Africa variants of concern. We also found mutations in the nsp14 protein, and novel mutations in the spike regions associated with worse patient outcome. Using our epistasis tool BitEpi3, we also detected putative higher order epistatic interactions representing novel interacting loci with putative impact on disease severity. We modelled the consequences of our candidate mutations on protein conformation using AlphaFold4, providing structural context to our results. Taken together, we present a data-driven approach to rapidly identify mutations and mutation combinations of interest, including protein modelling of relevant mutations, which can aid variant tracking and surveillance efforts. Future work involves clustering approaches to capitalise on the full repository of GISAID sequences, with the aim of identifying more pertinent mutations affecting patient outcome.