The UK Biobank – a veritable fountain of discovery


With a robust collection of 500,000 individuals, the UK Biobank will succeed in its mission to study the determinants of disease through genetics and proteomics combined with extensive electronic health records and ongoing assessments

What the UK Biobank is and its current status

The UK Biobank (abbreviated UKB or UKBB) began in 2006 as a long-term study to investigate the ‘respective contributions of genetic predisposition and environmental exposure (including nutrition, lifestyle, medications etc.) to the development of disease’ per Wikipedia. Over the course of four years (2006 to 2010) 22 collection centers across the United Kingdom those born between 1937 and 1970 (ages 40 to 69) reflected diverse regions, ethnicities, and socio-economic status.

An extensive interview process, including lifestyle, family history, as well as four tests of cognitive function, along with an assessment of hearing and vision was undertaken. In addition, standard biomarker measurements for BMI, blood pressure, forced expiratory volume (FEV) and forced vital capacity (FVC) (standard measures of lung function) were collected along with blood, saliva and urine samples. Importantly, consent was obtained to access all medical and health-related records, in addition to permission to re-contact years or even decades later for additional assessments, including taking additional biological samples.

In a recent presentation of the Festival of Genomics, the Deputy CEO of the UK Biobank, Dr. Mark Effingham, shared the following (edited) chart, with the incident cases of disease as of 2020 (10 years after sample collection) and the expected numbers in 2027.

Adapted from a slide presented by Dr. Mark Effingham, Deputy CEO of the UK Biobank, at the Feb 2022 Festival of Genomics presented by the Journal of Precision Medicine.

As you can see from the table, these many thousands of individuals contain the game-changing potential for detailed study of the interplay between the genetics (predisposition) and the environmental factors for health and disease.

Dr. Effingham also presented the worldwide participation with the UK Biobank data, along with the explosion of high-impact publications this unique dataset has created.

Adapted from a slide presented by Dr. Mark Effingham, Deputy CEO of the UK Biobank, at the Feb 2022 Festival of Genomics presented by the Journal of Precision Medicine.

To get additional perspective on the relatively short history, in this video celebrating a 20 year investment milestone, it was clearly said in this YouTube video by early leaders in the project the MRC spent 65M GBP in 2005, and 96M GBP the year after.

Several milestones are illustrated below. Affymetrix designed a custom 847K genotyping chip for all 500K samples from participants, the pharmaceutical company Regeneron organized five additional partners to help provide funding for the exome sequencing of all 500K samples, and whole genome sequencing is currently underway, with 40% of the samples already finished.

In late 2020 Olink was selected to provide Explore 1536 proteomics data for 53,000 UK Biobank participants, later expanded to Explore 3072 in mid-2021. Metabolomic data (249 metabolic biomarkers) provided by Finnish NMR service provider Nightingale Health.

Selected milestones for the UK Biobank

Already the UK Biobank has published a wealth of interesting and insightful data digging into their mission, finding the determinants of disease. It is truly at the intersection of genetics, environment, and disease that new breakthroughs in impacting healthcare will be made.

With 4,500 publications (and counting), I thought it would be fun to dig into a few highlighted publications from the UK Biobank ‘news’ webpage.

One published result: social interaction and isolation

One of the first papers in 2018 used the 847K Affymetrix array data to come up with an interesting genetic association, between the genotypes of 452K individuals and their association with loneliness and regular participation in social activities. Published in Nature Communications titled “Elucidating the genetic basis of social interaction and isolation” it found 15 genomic loci for loneliness, “and demonstrate a likely causal association between adiposity and increased susceptibility to loneliness and depressive symptoms”.

On top of this data, 6 loci were identified for regularly attending a sports club or gym, 13 loci identified for regular attendance at a pub or other social club, and 18 loci discovered for religious group involvement. Who knew that religious group involvement had a genetic component!

A second result: cognitive decline and a simple eye test

A common ophthalmologic test is for retinal thickness, optical coherence tomography (OCT) is used to measure the thickness of the retina. In 2018 it was discovered that thinning of the retina is associated with cognitive decline, and can be used as a method for early detection of dementia. Published in JAMA Neurol. 2018, “Association of Retinal Nerve Fiber Layer Thinning With Current and Future Cognitive Decline: A Study Using Optical Coherence Tomography”, selecting these individuals with a simple test and then with medications and lifestyle changes could impact mental decline before it occurs.

A third result: breast cancer risk and confirming the importance of the IGF-pathway in breast cancer

Some of the greatest power of the UK Biobank sample set and genomic data comes with a method called Mendelian Randomization. This approach uses genetic variant data, circulating protein biomarker data, and phenotype data to determine causality of a given condition. Here’s a great 2 minute YouTube video to explain it, by one of the pioneers of the technique, Dr. George Davey Smith of University of Bristol.

A brief 2 minute video primer on Mendelian Randomization with a key developer of the statistical method, combining genomics with circulating biomarker information and disease phenotype to determine causal proteins in disease.

Here a cytokine called Insulin-like Growth Factor-1 was identified as a causal protein for breast cancer; for every 5 nmol/L of IGF-1 in circulation, the risk of breast cancer increases by 1.05. 206,263 samples from women in the UK Biobank were used in this study, and provided the strongest evidence to-date for the causal role of the IGF-pathway in breast cancer.

The  2020 publication in Annals of Oncology is called “Insulin-like growth factor-1, insulin-like growth factor-binding protein-3, and breast cancer risk: observational and Mendelian randomization analyses with ∼430 000 women”.

A fourth result: TV watching and mortality

Following 490,000 individuals across 7 years, an association between the amount of television consumption and death was clearly connected. Published in 2021 in Mayo Clinical Proceedings, “Understanding How Much TV is Too Much: A Nonlinear Analysis of the Association Between Television Viewing Time and Adverse Health Outcomes” demonstrated the lowest risk was less than 2 hours of television watching per day.

Above 2 hours per day, the risk was an 8.6% rise in cardiovascular disease mortality over the seven year observation period.

From the abstract: “Substituting TV time with sleeping, walking, or moderate or vigorous physical activity was associated with reduced risk for all outcomes when baseline levels of substitute activities were low.”

Take-home lesson: stay away from extended television watching!

A fifth result: 454,787 whole-exome sequencing analysis identifying 594 genes connected to 3,994 traits

This is a remarkably powerful paper, the first using the whole-exome sequencing project organized by Regeneron Pharmaceuticals. From the 454.7K whole-exome dataset, 12M coding variants were identified. Within these 12M variants, 1M were predicted loss-of-function mutations, and another 1.8M were deleterious mis-sense mutations.

The 594 genes associated with the 3,994 traits had a p-value of <2×10-11 significance. The main findings were key genetic associations to liver disease, eye disease, and cancer of several types. In addition, key risk-reducing genes were identified for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3).

Lastly, six genes were associated with specific brain imaging phenotypes. This work was published 2021 in the journal Nature, titled “Exome sequencing and analysis of 454,787 UK Biobank participants”.

Upcoming work – the Pharma Proteomics Project

As mentioned previously, Olink is currently generating over 3000 markers on 53,000 UK Biobank samples with a collaboration of 13 pharmaceutical companies to combine the WES and WGS datasets, the circulating biomarker data, and outcome phenotypes to determine the cause of many diseases. There will be a flood of discovery using the combination of genomic and circulating proteome data combined with the power of deeply phenotyped samples. To learn more about the Pharma Proteome Project, Dr. Christopher Whelen of Biogen gave the following talk in 2021 that sums up the Pharma Proteomics Project, their aims and current progress.

Dr. Christopher Whelan of Biogen Pharmaceuticals presenting the Pharma Proteomics Project at a 2021 UK Biobank conference in the Mult-omics section.

If you prefer to read about this presentation rather than watching a video, I wrote this summary up for the Olink blog here.  

Lastly, already there is underway an expanded version of the UK Biobank called Our Future Health, described by Dr. Fiona Watt (Executive Chair of the Medical Research Council) in this video celebrating 20 years of the UKB as “The UK Biobank Times Ten”. So definitely there is a lot more to come – these are exciting times!


About Dale Yuzuki

A sales and marketing professional in the life sciences research-tools area, Dale currently is employed by Olink as their Americas Field Marketing Director. https://olink.com For additional biographical information, please see my LinkedIn profile here: http://www.linkedin.com/in/daleyuzuki and also find me on Twitter @DaleYuzuki.

Leave a comment

Your email address will not be published. Required fields are marked *